Modeling the Structure and Interactions of Intrinsically Disordered Peptides with Multiple Replica, Metadynamics-Based Sampling Methods and Force-Field Combinations

Intrinsically disordered proteins play a key role in many biological processes, including the formation of biomolecular condensates within cells. A detailed characterization of their configurational ensemble and structure–function paradigm is crucial for understanding their biological activity and for exploiting them as building blocks in material sciences. In this work, we incorporate bias-exchange metadynamics and parallel-tempering well-tempered metadynamics with CHARMM36m and CHARMM22* to explore the structural and thermodynamic characteristics of a short archetypal disordered sequence derived from a DEAD-box protein. The conformational landscapes emerging from our simulations are largely congruent across methods and force fields. Nevertheless, differences in fine details emerge from varying combinations of force-fields and sampling methods. For this protein, our analysis identifies features that help to explain the low propensity of this sequence to undergo self-association in vitro, which are common to all force-field/sampling method combinations. Overall, our work demonstrates the importance of using multiple force-field and sampling method combinations for accurate structural and thermodynamic information in the study of disordered proteins.


Collective variables definition and input parameters
Cα -Cα contacts, Cγ -Cγ contacts and the number of backbone H-bonds were computed by means of COORDINATION collective variable implemented in PLUMED 3 . The number of contacts N c between two groups of atoms (A and B) is defined as follows: where the index i spans the atoms in the group A and the index j spans the atoms in the group B; s ij is a switching function (which varies between 0 and 1) defined as follows: where r ij is the distance between the i atom of group A and the j atom of group B and d 0 , r 0 , n and m are parameters of the switching function. Group A and group B are both constituted by C α and C γ atoms for Cα -Cα contacts and Cγ -Cγ contacts, respectively; group A accounts for backbone hydrogen atoms and group B includes backbone oxygen atoms for the calculation of backbone H-bonds. Input parameters are summarized in Table S.1. To improve the computational efficiency, we employed a neighbour list (NLIST command in PLUMED) which was updated every 5 simulation steps (NL_STRIDE command) and adopting a cutoff value equal to 5, 4 and 3 nm for Cα -Cα contacts, Cγ -Cγ contacts and the number of backbone H-bonds, respectively (NL_CUTOFF command). Such values were chosen so that for rij values equal or higher than the cutoff, sij was constant and equal to 0.
Dihedral correlation dc was computed with respect to ψ backbone dihedral angles by means of DIHCOR collective variable implemented in PLUMED: where Nd is the number of considered dihedral angles.
The alpha helical, parallel, and antiparallel beta sheet content in the protein were obtained by means of ALPHARMSD, PARABETARMSD and ANTIBETARMSD collective variables implemented in PLUMED and proposed by Pietrucci and Laio 4 . The collective variable has the following form: where the index i spans all possible groups of six residues that can form an alpha helix, a parallel beta sheet or an antiparallel beta sheet and ri is the root mean square displacement (RMSD) between the actual structure of the group and an ideal alpha helix, parallel beta sheet or antiparallel beta sheet; d 0 , r 0 , n and m are parameters of the switching function. Input parameters are summarized in Table S Mass-weighted radius of gyration Rg, asphericity b' and the relative shape anisotropy κ 2 were calculated using the GYRATION collective variable implemented in PLUMED 1 , whose definitions were originally proposed by Vymětal and Vondrášek 3 .
The radius of gyration was calculated using: where n is the number of atoms in the protein, with the position of the centre of mass rCOM given as: The asphericity was calculated using: The relative shape anisotropy was calculated using: where I 1 , I 2 and I 3 are the three eigenvalues of the gyration tensor:

Figure S1
One-dimensional probability densities of DHH1N as a function of single collective variables at 300 K with CHARMM36m PTMetaD-WTE, CHARMM22* PTMetaD-WTE and CHARMM22* unbiased simulations.

Figure S2
Time evolution of the one-dimensional probability densities of DHH1N as a function of single collective variables at 300 K for CHARMM36m PTMetaD-WTE.

Figure S3
Time evolution of the one-dimensional probability densities of DHH1N as a function of single collective variables at 300 K for CHARMM22* PTMetaD-WTE.

Figure S4
Correlation of the collective variables for DHH1N. Data are taken from the 300 K replica of CHARMM36m PTMetaD-WTE.

Figure S5
Time evolution of the one-dimensional probability densities of DHH1N as a function of single collective variables at 300 K for CHARMM36m BEMD. The first 250 ns was not included for data analysis.

Figure S6
Time evolution of the one-dimensional probability densities of DHH1N as a function of single collective variables at 300 K for CHARMM22* BEMD.

Supplementary Note 8: input parameters for bias-exchange metadynamics simulations
Input parameter for the collective variables included in bias exchange metadynamics simulations are summarized in Table S.4 and Table S.5 for CHARMM22* and CHARMM36m force field, respectively. As described in the main text, simulations were performed adopting the ordinary metadynamics scheme; hills height is equal to 0.3 kJ mol -1 and bias potential is applied every 2500 simulation steps. After 160 ns the system explored a wide region for each collective variable and lower and upper boundaries were introduced as static harmonic bias potentials (LOWER_WALLS and UPPER_WALLS commands in PLUMED) to improve convergence. a Structure numbering is the same of Figure S9. b Size of the system during NVT production phase after NpT equilibration.

Supplementary Note 10: data analysis and error estimate
Postprocessing of PLUMED-defined collective variables was calculated using PLUMED 2.5.2 3 . 1D-FE profiles and 2D-FES were plotted with Matplotlib 8 and PyEMMA 2.5.7 9 in Jupyter Notebook 10 . Schematic visualisation of protein structures was produced using VMD 11 (http://www.ks. uiuc.edu/). Contact-map data were analysed using PyEMMA 2.5.7 9 and secondary-structure analysis was conducted using DSSP algorithm 12 in MDTraj 1.9.5 13 on Jupyter Notebook.
Error analysis for PTMetaD-WTE was conducted by means of Tiwary reweighting scheme 14 and block averages [15][16][17] . Error analysis for BEMD was conducted using block average 15-17 only because we analysed the unbiased replica. To estimate the statistical error for radius of gyration of CHARMM36m PTMetaD-WTE, we divided the trajectory into ten equal blocks with N = 10, each containing 10,000 frames ( " = 10000) [100 ns of data per block for a total of 1000 ns of data]. The mean value ̅ " for each block was first calculated using the Tiwary reweighting scheme 14 , with block number i = 1, 2, …, N till N = 10 and " the number of frames in each block.
(-)() (S11) We used a similar method to estimate the average and errors for secondary-structure analysis of CHARMM36m PTMetaD-WTE. DSSP assign a secondary-structure element to every residue for each frame. For every single frame, the total number of residues assigned to each secondary-structure element was summed up, and the mean total number of residues per element was calculated using the Tiwary reweighting scheme 14 for each of the ten block according to Eq. S9. The average number of residues belong to each secondary-structure element and its standard error were calculated using Eq. S10 and S11. Overall, we estimate the averages and standard errors for " = 250, 500, 1000, 2500, 5000, 7500, 10000, 15000 and 20000 respectively. Figure S10 below shows that " ≥ 2500 (25 ns of data per block) should give a good estimate of the standard error for the average number of residues assigned to Loops/irregular elements; Figure S11 below shows that " ≥ 10000 (100 ns of data per block) should give a good estimate of the standard error for the average number of residues assigned to a-helix. Overall, 100 ns of data per block was used throughout the work for all simulations.

Figure S11
Evolution of the standard error on the average number of residues assigned to Loops/Irregular elements for CHARMM36m PTMetaD-WTE (Figure 3b) with the size of block.

Figure S12
Evolution of standard error on the average number of residues assigned to a-helix for CHARMM36m PTMetaD-WTE (Figure 3b) with the size of block.
To estimate the statistical errors of the 1D-FE profiles and probability density distributions (e.g., Figure 1 and Figure S1), we used the same blocking approach as described above, with 100 ns of data per block and a total of 10 block in most cases. Within each block, a histogram was calculated using kernel density estimation via the HISTOGRAM command in PLUMED with proper Tiwary reweighting 17 , after which the ten histograms were blockaveraged and normalized to obtain a mean probability density distribution P(s) with associated error bars (based on the python code in https://www.plumed.org/doc-v2.6/userdoc/html/lugano-4.html). With s representing each of the PLUMED-defined collective variables, the 1D-FE profiles F(s) was then obtained by inverting the probability density distribution according to: ( ) = − % ln ( ) (S10) each of the nine replicas was conducted, after which the final error is calculated from individual errors by the following Equation, = D∑ E " , "02 "0. F (S11) Figure S13. Error analysis of a-helical residues for unbiased CHARMM22*, CHARMM22* BEMD and CHARMM22* PTMetaD-WTE. The numbers refer to the initial structures of the nine independent unbiased simulations, and the concatenated unbiased refer to concatenating the nine data sets into a single trajectory for analysis (as in the original manuscript). c) and d) have excluded the data of structure 7. Figure S14. Time evolution and probability distribution ofa-helical residues for unbiased CHARMM22*, CHARMM22* BEMD and CHARMM22* PTMetaD-WTE. The structure numbers refer to the initial structures of the nine independent unbiased simulations.