A simple model for DNA bridging proteins and bacterial or human genomes: bridging-induced attraction and genome compaction

We present computer simulations of the phase behaviour of an ensemble of proteins interacting with a polymer, mimicking non-specific binding to a piece of bacterial DNA or eukaryotic chromatin. The proteins can simultaneously bind to the polymer in two or more places to create protein bridges. Despite the lack of any explicit interaction between the proteins or between DNA segments, our simulations confirm previous results showing that when the protein-polymer interaction is sufficiently strong, the proteins come together to form clusters. Furthermore, a sufficiently large concentration of bridging proteins leads to the compaction of the swollen polymer into a globular phase. Here we characterise both the formation of protein clusters and the polymer collapse as a function of protein concentration, protein-polymer affinity and fibre flexibility.


Introduction
Within living cells, either in bacteria or in eukaryotes, DNA rarely exists in its naked form. Instead, it is associated with DNA-binding proteins, which are involved in processes such as transcription and gene regulation, as well as in genome packaging and organisation [1,2]. The best characterised example of packaging comes from eukaryotes, where the first level of chromosome compaction is provided by the wrapping of DNA around histone octamers in a structure called the nucleosome [3,4]; strings of nucleosomes then fold into larger structures which form the chromatin fibre [5]. In bacteria, histone-like proteins also determine genome structure; an example is the H-NS protein, which forms dimers [6,7] that bind non-specifically to AT-rich DNA, and this binding sometimes leads to gene silencing [8]. DNA-associated proteins are also involved in gene regulation; for instance, some eukaryotic proteins compact the fibre into transcriptionally-inactive heterochromatin, whilst others remodel active euchromatin, deeply affecting the local and global folding of the fibre [9,10].
In general, most DNA-binding proteins are positively charged [1], and this provides a non-specific attractive interaction with negatively-charged DNA. Additional sequencedependent interactions allow transcription factors and other regulatory proteins to recognise and bind tightly to their DNA targets [11]. The combination of non-specific and specific binding makes it possible for proteins to search for their targets more efficiently via facilitated diffusion [11][12][13], a combination of 3-dimensional diffusion in the bulk and 1-dimensional polymer bridging protein Figure 1. Schematic of the model we consider: DNA or chromatin is modelled as a bead-and-spring polymer (here a flexible polymer is depicted, relevant for euchromatin, see text), while bridging proteins are modelled as spheres which interact with all beads in the polymer with an attractive interaction (see methods). diffusion along the DNA (when the protein is non-specifically bound).
The number of DNA-binding proteins in a cell is extremely large. For instance, in E. coli, the bacterial genome is approximately 4.6 Mbp long, and there are ∼30 000 DNA-binding proteins-potentially enough for one protein to bind every ∼150 bp [14]. In eukaryotes the numbers are also large, with roughly one nucleosome every 200 bp [1], and tens of thousands of transcription factors [9]. Therefore, it is important to understand the collective (statistical) behaviour of a large number of DNA-binding proteins interacting with a substrate genome, whether bacterial DNA or eukaryotic chromatin. In this work we use a simple bead-and-spring polymer model which, with a suitable choice of parameters, can be used to describe either of these systems. For eukaryotic chromatin we adopt the common model [15][16][17][18] in which the nucleosomes arrange into a fibre of diameter 30 nm with the DNA compacted such that there is ∼3 kbp per 30 nm length 3 ; in vitro measurements show such fibres to have persistence lengths in the range 40-200 nm [20]. For the case of the bacterial chromosome, it is more appropriate to model naked DNA; the relevant parameters are the hydration thickness of 2.5 nm, and the well characterised persistence length of 50 nm (since the persistence length is orders of magnitude longer than the thickness, such a polymer is described as semi-flexible). A schematic diagram is shown in figure 1. We model DNA-binding bridging proteins as simple isotropic spheres; like many transcription factors (which can act singly or in conjunction with others), these spheres can simultaneously interact with two or more DNA segments and so form a bridge between the two. Such bridges can then form loops in the intervening DNA or chromatin fibre, and this can play an important part in gene regulation [1,9]. Examples of transcription factors that can act as such bridges to loop DNA include bacterial H-NS [6,7,21], and eukaryotic CTCF [22]. 3 Fibres of 30 nm diameter have been well studied in vitro, however the precise structure of such fibres is still a subject of debate [19]. Furthermore, recent work has questioned the existence of the 30 nm fibre in vivo for some cell types [10]. Our results, however, will remain qualitatively the same whatever the thickness of the underlying chromatin fibre of which our polymer is a coarse grained representation.
The model considered here is the simplest version of those studied recently in [15], where-despite the lack of an explicit interaction between one protein and another, or between one DNA segment and another-bound proteins come together to form clusters. Such clustering is driven by a 'bridginginduced-attraction' which relies on proteins being able to bind to two or more different DNA segments, and results from a combination of entropic and kinetic effects (the relative balance between which, is slightly different in the different cases considered). This clustering is a generic phenomenon: it occurs with flexible and semi-flexible polymers (representing chromatin or DNA), with both specific and non-specific binding, and with proteins of different size. The model is also similar to the 'strings-and-binders' model of [16,17], with the main differences being that their model is defined on a 3D lattice, they specifically considered chromatin (i.e. the polymer is flexible), and the protein-chromatin interaction was switched on for a only a fraction of chromatin beads (so only specific binding was modelled). Those studies focussed on polymer behaviour and found that increasing protein concentration drives chromatin compaction, and that the exponents reflecting the frequencies of chromatin contacts were (at a suitable protein concentration), similar to those observed in HiC experiments [23,24] (previously also predicted by a polymeronly model known as the fractal globule [18,23]). We also note the similarity between protein induced DNA compaction, and that induced by the presence of smaller, charged ligands (see e.g. [25]).
Here, we provide complementary simulations to those described in [15][16][17], and systematically study both the clustering dynamics and the bridging-induced compaction of DNA as a function of protein concentration and protein-DNA affinity. The picture emerging from our results is as follows. At low protein concentrations, the polymer is weakly affected, and the most visible phenomenon is a bridging-induced clustering (as in [15]). At high protein concentrations, the polymer collapses into a globular state (as in [16,17]). This bridginginduced compaction is similar to the theta collapse of polymers in a poor solvent [26], and can be quantitatively followed by measuring, for example, the radius of gyration of the polymer. Flexible and semi-flexible polymers behave similarly. We also find that the clusters typically fuse and coarsen, to eventually yield only one cluster at equilibrium. The coarsening dynamics is fast at low and high protein concentrations, but becomes extremely slow at intermediate concentrations; it is also affected by the polymer persistence length (i.e. its stiffness). We believe our simple model can provide a deeper theoretical understanding of the physics underpinning the compaction of bacterial and eukaryotic chromosomes [15][16][17].

Methods
To study the phase behaviour and kinetics of the system, we perform molecular dynamics (MD) simulations via the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) software package [27]. We coarse grain the system by treating DNA, or chromatin, as a bead-and-spring polymer. DNA-binding proteins are modelled as spheres. LAMMPS was run in Brownian dynamics (BD) mode, where a molecular dynamics algorithm is used with a stochastic thermostat, which models the thermal fluctuations and viscosity of an implicit solvent. For computational efficiency, hydrodynamic interactions are neglected. This coarse-grained approachwhere molecules are represented by only a small number of component spheres-allows our simulations to cover much larger length and time scales than studies which include greater (e.g. atomistic) detail [28][29][30][31].
Translational motion of the ith bead (whether a polymer bead or a protein) evolves according to the Langevin equation where k B is the Boltzmann constant, T is the temperature, and r i is the position of the bead, which has mass m i ; F i is the net force on the bead due to interactions with the rest of the system, γ i ≡ γ is the friction due to the solvent, and η(t) is a vector representing random uncorrelated noise, such that To describe the interactions between the beads we use phenomenological interaction potentials (e.g. truncated Lennard-Jones) since these offer a significant improvement in computational efficiency over more realistic potentials. This choice means that we do not consider any additional electrostatic repulsion between atoms, but since at physiological salt concentrations the Debye screening length is of order 1 nm, we do not expect any significant effect on our results.
For the polymer chain model, the i-th bead is connected to the i + 1th with a finitely extensible non-linear elastic (FENE) spring described by the potential where r i,i+1 = |r i − r i+1 | is the separation of centres of the beads. Here the first term represents a hard steric interaction with prevents adjacent beads from overlapping (the Weeks-Chandler-Andersen (WCA) potential-see equation (3) below), and the second gives the maximum extension of the bond R 0 . Throughout we use R 0 = 1.6 σ , and set the bond energy where σ is the polymer bead diameter (see below for the values we use in the different models). Steric interactions between non-adjacent DNA beads, and between any two proteins, are also given by the WCA potential Here d ij is the mean of the diameters of the two spheres, i.e. for two polymer beads d ij = σ .
The bending rigidity of the polymer is introduced via a Kratky-Porod potential for every three adjacent DNA beads where θ B is the angle between three consecutive beads (measured such that θ B = 0 if the three beads are collinear), and K BEND is the bending energy. The persistence length in units of σ is given by l p = K BEND /k B T (see below for the values we use in the different models). The interaction between protein bridges and DNA/ chromatin is defined as a shifted truncated Lennard-Jones potential. To define this, we start from the usual Lennard-Jones potential, If we truncate this potential at d ij = r thr and require it to be continuous, we obtain Equation (6) reduces to the purely repulsive WCA potential for r thr = 2 1/6 d ij , while if r thr > 2 1/6 d ij an attractive tail sets in. The parameter controls the strength of the interaction, but due to the shifting of the function this is not the same as the energy minimum of the potential; for clarity we use the unprimed symbol to denote the true value of the energy at the minimum. For simplicity, we take the size of the protein equal to that of the DNA/chromatin bead (considering different sizes would be more appropriate for most DNA-binding proteins, but this does not qualitatively affect the results we present here).
As described above, to model naked DNA we use σ = 2.5 nm (one bead is therefore ∼7.35 bp) and set l p = 20 σ = 50 nm. For chromatin, we use σ = 30 nm, taking this to contain 3 kbp of packaged DNA, and use a persistence length of 90 nm which is appropriate for transcriptionally active, open euchromatin [15].
The simulation time units are defined in terms of the friction γ (see equation (1); we use γ = 1 unless otherwise stated), which is in turn related to the diffusion constant of a bead of size σ via D = k B T /γ . A natural time scale is then the Brownian time, τ B , which is the order of magnitude of the time needed for a bead to diffuse across its own size; for a bead of diameter σ , τ B = σ 2 /D. One Brownian time in simulation units therefore corresponds to a different time-scale for DNA and chromatin, ∼36 ns and ∼0.6 ms respectively-estimated using a viscosity for the cytosol in bacteria or the nucleoplasm for chromatin, of 1 cP (DNA, semi-flexible polymer), or 10 cP (chromatin, flexible polymer) respectively. In the results below all times given are mapped to physical units.

Results
We begin the presentation of our results with the case of a flexible polymer. As detailed above, this is relevant for active euchromatin [15,32], (we assume a persistence length l P = 90 nm, chosen towards the lower end of the 40-200 nm range determined experimentally [20]). The key parameters in our system are: protein (bridge) concentration, c b , polymer concentration, c p , and protein-polymer affinity, , which measures the minimum of the protein-DNA interaction potential 4 . Instead of changing c p and c b independently, we fix c p and vary x ≡ c p /c b ; as we shall see, varying this ratio leads to a change in behaviour from the bridging-induced attraction of [15] to the binder-induced compaction of [16,17]. Note that [16,17] used comparatively small polymers, where only every second or third monomer interacts with the proteins; under these conditions a small x corresponds to only single digit numbers of proteins-hence in that work they mostly consider x > 1 which is larger than the values we consider here. Figures 2(a)-(c) show snapshots taken from a simulation where a 5000-bead polymer (corresponding to a 15 Mbp euchromatin fibre, see Methods) interacts with 1000 proteins (hence x = 0.2). For the relatively large value for the affinity between protein and DNA used here ( = 2.83 k B T ), the bridges stick essentially irreversibly to the chromatin. Clusters of proteins form early on during the simulation (figure 2(a)) due to the bridging-induced attraction reported in [15]; later, these coarsen (figures 2(b) and (c)) to leave eventually only one aggregate in steady state (the number of clusters versus time is shown in figure 2(d)). We define two 4 Another parameter is the ratio between the polymer size, quantified by its radius of gyration R g , and the size of the (cubic) simulation box, L. If R g L, this corresponds to the dilute regime, an R g of the same order of magnitude as L is semi-dilute, whereas higher R g (say for a c p corresponding to a volume fraction of 10% or more) corresponds to concentrated polymer. We work in the dilute and semi-dilute regimes, and did not find any qualitative differences between their behaviours. proteins as belonging to the same cluster if their centre-centre separation is below a threshold; for the case of chromatin this is chosen to be 90 nm (3σ ; figures 2-4). The full dynamics corresponding to figure 2 are shown in supplementary movie 1 (stacks.iop.org/JPCM/27/064119/mmedia), and this reveals that the coarsening occurs predominantly by fusion of clusters which meet stochastically. There is little evidence of Ostwald ripening, which would lead to the growth of a cluster at the expense of another one which would shrink-presumably because the interaction between bridges is large enough that the disintegration of an aggregate, once formed, is very unlikely. The final clustered state is also robust to different initial conditions, e.g. it still appears if the proteins are initially bound at regularly spaced positions along a swollen polymersee [15].
Our simulations show that the fraction of proteins involved in the clustering is very high even for the relatively low protein concentration considered in figure 2: this means that once proteins form bridges, they come together through the bridginginduced attraction (i.e. there are very few isolated bridges-see also figure 3). Due to the coarsening of the initial clusters into a large single final aggregate, the average size of the cluster is very close to that of the total number of bridges ( figure 2(d)). We have studied here protein concentrations in the range 0.1 x 0.5; while we expect coarsening until a single aggregate is left for all x that we studied, for intermediate x our simulation time (10 6 Brownian times) is not sufficient to observe the completion of this process, so that the state with multiple . The inset shows a log-log plot for R g versus x, which can be compared e.g. to figure 1(d) in [17]. Also shown is the proportion of proteins found in clusters; note that even before the compaction sets in, most bridges are clustering. The length of the polymer is fixed at 5000 beads, and the other parameters are as in figure 2.  clusters could be dynamically stabilised 5 . As noted above the coarsening occurs through fusion of clusters, which must come together stochastically via diffusion; for small clusters, their diffusion is relatively fast, but as more proteins are added, the size of the initial clusters grows, and the kinetics of coarsening become very slow. For much larger x, the proteins initially bind all over the chromatin: there are more initial aggregates which are closer together, so again the coarsening is faster. Although for the affinity considered in figure 2 proteins form clusters whatever the value of x, the impact of protein binding on genomic structure is modest for low x, but is massive for large x. This can be appreciated, for instance, by looking at the plot of the radius of gyration of the polymer, R g , versus x (figure 3), which shows that increasing x induces a transition from an overall open, swollen phase with large R g , to a compact one, with small R g . Although we do not measure here the exponents associated with this transition (which would require simulations at multiple values of the polymer contour length), we expect that the exponent ν (where R g ∼ N ν , with N the number of polymer beads) should be close to 0.588 (the value for a swollen polymer [26]), whereas for large x, ν should approach 1/3. For the affinity considered in figure 3, the equilibrium state is a coexistence of an open region of the polymer with a globular regions corresponding to the cluster formed by the bridging-induced attraction. The size of the globule depends on x, i.e. increasing x increases the globule size until it eventually takes over the whole fibre.
The bridging-induced compaction observed in figure 3 has previously been documented in [16,17]; in some of the cases considered in [16] the collapsed state could also be reached by changing at large x. In this case, the switch between open and collapsed phase appears similar to that of the theta collapse of a polymer in a poor solvent, which can be triggered, for instance, by an increase in the effective attractive interaction between the monomers in the polymer [26].
The reorganisation of the polymer which occurs as a result of the bridging-induced attraction provides a useful framework for understanding several observations in chromosome biology. Recent HiC experiments have uncovered a power law governing the decay in contact probability with distance along the polymer [23,24], with an exponent which depends on cell type; in [17] it was shown that by changing the protein concentration, c p , it was possible to vary the power law dictating the decay of contacting probability within a simulated polymer. Also, it is known that chromatin fibres are disordered, with compact heterochromatic regions interspersed amongst open euchromatic ones [10]; the coexistence of a cluster or globule state with a more open, or swollen region reported here provides a generic pathway to drive segregation of different chromatin states. Figure 4 shows the effect that different protein-DNA affinities have on the fraction of proteins in clusters, for a small value of x. This fraction is close to zero for low affinity, and increases sharply above a critical value, > c (where c ∼ 2.2 k B T for r thr = 54 nm (1.8 σ ), and c ∼ 3.1 k B T for r thr = 39 nm (1.3 σ ); we expect in general c to depend on both c p and c b separately). Consistent with the results of figure 3, we only observe a modest (35-40%) decrease in polymeric size around c (this is because the value of x is below that required for full compaction).
We now discuss the case where the polymer is semi-flexible-modelling proteins like H-NS binding nonspecifically to bacterial DNA [6,7]. We find that the polymer rigidity plays an important role. First, figures 5(a)-(c) (and see also supplementary movie 2 (stacks.iop.org/JPCM/27/064119/mmedia)) show that the clusters which arise due to the bridging-induced attraction now are cylindrical in shape, rather than spherical. For the semi-flexible case we define the clustering threshold as 3.5 nm (1.4σ ; figures 5-7). We still observe coarsening, but at a lower rate compared to the flexible polymer (see figure 5(d) 6 ); our simulations are also inconclusive as to whether a single cluster or several remain in equilibrium. The row-like clusters are qualitatively similar to the structures which have been inferred for H-NS interacting with DNA on the basis of atomic force microscopy [7]. As for the flexible case in figure 3, increasing x eventually leads to bridging-induced compaction of the DNA ( figure 6). However, unlike in the flexible case, x decreases only gradually (rather than a sharp fall at a critical value as in figure 3); possibly, this is because in the semi-flexible case the bridges are more likely to cause long-distance looping, which can more efficiently compact the polymer than the local folding which is more common in the flexible fibre.
Finally, the effect of changing protein-DNA affinity is similar to the flexible case, and the critical threshold beyond which clustering sets in, for the same concentration as in figure 4, is now slightly higher at c ∼ 3.

Discussion and conclusions
To summarise, here we have studied a simple statistical physics model for a DNA molecule or chromatin fibre interacting with a solution of DNA-binding proteins which can bind to the DNA at more than one point to form bridges. We varied the flexibility of the polymer, as well as the concentration and the polymer binding affinity of the proteins.
For a sufficiently attractive interaction between DNA and the bridges, we observe protein clustering (see e.g. ones). This clustering is reminiscent of the aggregation seen experimentally in mixtures of DNA and nanoparticles (representing 'synthetic proteins' [33]), and is an example of the bridging-induced attraction discussed in [15]: it arises because bridging enhances the local DNA density, which in turn recruits more bridges-setting up a positive feedback loop. The formation of clusters may be described as a phase transition, which is triggered by an increase in protein-DNA affinity (figures 4 and 7). Simulations suggest that separate clusters are unstable, as they fuse to give larger aggregates when they meet stochastically. For intermediate values of the protein concentration, such coarsening does not always proceed to leave a single cluster within the duration of our simulations. Presumably, this is due to a kinetic reason: if clusters form in far-away regions of the DNA, then fusion between two of them requires their meeting by chance through diffusion-this is a slow process for large clusters.
The clustering which we generically observe may provide a simple framework to understand genome organisation. For instance, binding to, and forming bridges within, a flexible polymer can yield zones of compact (and transcriptionally inactive) heterochromatin [9,10] and zones of more open euchromatin, like those found in mammalian chromosomes. Similarly, binding to semi-flexible polymers yields structures reminiscent of those seen experimentally with bacterial DNA and H-NS [6,7].
In this work we also showed that the concentration of protein bridges plays an important role in the physics of the system. For sufficiently strong protein-DNA interactions, proteins are preferentially bound to the DNA, hence the relevant physical parameter is the ratio between the protein and polymer concentration, or equivalently the linear packing fraction of proteins along the DNA, which we call x. If x is low, bridges cluster and condense the DNA locally, and one is left with a globular region coexisting with swollen regions. When x increases, the swollen region shrinks and the size of the polymer drops sharply. When x is large enough, the whole polymer becomes compacted. These results provide a link between the phenomena described in [15] (clustering of bridges through the bridging-induced attraction for relatively low values of x), and the findings of [17] (where x is typically larger than 1, and a switch between an open and a bridginginduced compacted phase was observed upon variation of either protein concentration or protein-DNA affinity). Barbieri et al [17] also showed that the compaction is linked to a qualitative change in the statistics of contact formation, which may be relevant to the interpretation of HiC experiments on inter-chromosomal contacts [17].
Both the bridging-induced attraction and compaction are likely to be generic phenomena that can be observed with DNA-binding bridges which bind either non-specifically or specifically. It would be of interest to repeat the study provided here both with protein bridges with limited valencies and mixtures of specific and non-specific interactions, and to see how this changes the physics of the bridging-induced compaction. One might expect, for example, that a more specific interaction with DNA or chromatin might lead to stabilization of multiple clusters, segregation of polymer regions with different binding properties, or the formation of local domains of increased polymer interaction-all of which have far reaching implications for genome organisation.