Accurate and Automated High-Coverage Identification of Chemically Cross-Linked Peptides with MaxLynx

Cross-linking combined with mass spectrometry (XL-MS) provides a wealth of information about the three-dimensional (3D) structure of proteins and their interactions. We introduce MaxLynx, a novel computational proteomics workflow for XL-MS integrated into the MaxQuant environment. It is applicable to noncleavable and MS-cleavable cross-linkers. For both, we have generalized the Andromeda peptide database search engine to efficiently identify cross-linked peptides. For noncleavable peptides, we implemented a novel dipeptide Andromeda score, which is the basis for a computationally efficient N-squared search engine. Additionally, partial scores summarize the evidence for the two constituents of the dipeptide individually. A posterior error probability (PEP) based on total and partial scores is used to control false discovery rates (FDRs). For MS-cleavable cross-linkers, a score of signature peaks is combined with the conventional Andromeda score on the cleavage products. The MaxQuant 3D peak detection was improved to ensure more accurate determination of the monoisotopic peak of isotope patterns for heavy molecules, which cross-linked peptides typically are. A wide selection of filtering parameters can replace the manual filtering of identifications, which is often necessary when using other pipelines. On benchmark data sets of synthetic peptides, MaxLynx outperforms all other tested software on data for both types of cross-linkers and on a proteome-wide data set of cross-linked Drosophila melanogaster cell lysate. The workflow also supports ion mobility-enhanced MS data. MaxLynx runs on Windows and Linux, contains an interactive viewer for displaying annotated cross-linked spectra, and is freely available at https://www.maxquant.org/.


Turn on Peak refinement.
This new option can be enabled on Misc. tab under the Group-specific parameters tab.

Set up protein sequence related information.
Add your fasta file but disable including contaminants. Decrease the peptide length to of your choice and increase the peptide mass to adapt cross-linked peptide searches (cross link software typically uses 6000 Da)

Set your FDR values.
Here you can currently disable "Second peptide" searches, this option is not functional for this current version.

Note for Bruker TIMS instruments.
Increase the max charge from 4 to 6 because cross-linked peptides tend to have higher charge states.
Note that if you previously analysed your TIMS TOF data set with earlier MaxQuant versions, you might encounter some problems. In this case, you should make a new mqpar.xml from the MaxQuant 2.0.4 version.

Inspect your results.
Go to Crosslink MS/MS table (where the information comes from the crosslinkMsms.txt table under the combined/txt folder after MaxQuant/MaxLynx analysis is finished). Select a row and make sure to be MS/MS spectra panel on the visualization. Then click on "Display selected spectrum" to view your identification. You can see the peptide sequence-based information on the Peptide sequence window under the MS/MS spectra panel.
We extended the fragment annotations here for cross link products.
-Any fragment coming from alpha or beta peptide has or , respectively. -If a fragment contains an entire other peptide, then it has "Pep" (e.g. 4 is a 4 fragment from an alpha peptide which is linked to a beta peptide).
-If it is MS-cleavable cross linking search, then it is possible to have some fragments with The search settings used for the non-cleavable DSS data set. All the settings except for OpenPepXL and MaxLynx were taken from Beveridge and co-workers 1   MaxLynx-specific parameter analysis: a. Combination with total-score and partial score values We have performed an analysis for parameter scanning and we compared the MaxLynx results (MaxQuant v.2.0.3) with the different setting combinations of total score (the cross linked peptide score) as 0, 20 and 40 and the partial-score as 0, 10 and 20. Note that the default min-match parameter was left as 3. Here, we did not separate protein intra-and inter-cross links. We used the default MS/MS analyzer setting. We performed this analysis for DSS, DSBU, DSSO data sets by Beveridge et al.

DSS data set by Beveridge et al
As seen on the left of the Supplementary Figure S4 when total-score was set to 0, the distributions for the correct CSMs shift down as corresponding to increasing the partial-score. This can be expected because increased partial-score will also remove some correct CSMs with very low-total-scores. The highest number of incorrect CSMs was found here when no score-filtering applied (total-score=0 and partial-score=0). When partial-score was set to 10, the number of incorrect CSMs decreases.
When total-score was set to 40 (the Supplementary Figure S4 on the right), the effect of the partial score almost disappears on the number of correct CSMs and the correct CSMs distributions, partial-scores=0, 10 and 20, became almost the same for the number of the correct CSMs. This change was not observed for the distributions for the incorrect CSMs and the distribution for the incorrect CSMs across the three partial-score remains more less the same.
The results for total-score=20 (the Supplementary Figure S4 on the middle), were intermediate and it was observered that slightly less correct CSMs compared to the results with total-score=0. The number of incorrect CSMs were remained the same for the partial-score=10 with the number from the total-score=0 and partial-score=10.
The Supplementary Figure S5 shows the results from the same analysis but for the number of unique crosslinks. The almost same patterns observed in Figure 1 can be also seen here. Two plots on the left shows the results from total-score=0 with combination of partial-scores from 0, 10 and 20. The middle plots for total-score=20 and the right plots for total-score=40 are shown with combination of partial-scores from 0, 10 and 20.

DSBU data set by Beveridge et al
As seen on the Supplementary Figure S6, there is a trend on when a partial-score increases, the number of correct CSMs decreases for the DSBU data set At fixed total score (total-score=0, total-score=20 or total-score=40), partial-score=10 keeps a similar number of correct CSMs but removes more than one-third of incorrect CSMs compared the results at the partial-score=0. Increasing the partial-score=20 resulted in more decreasing number of correct CSMs, so the decrease in the incorrect CSMs was not compensated. We also compared the number of unique cross links at the DSBU data set (Supplementary Figure S7). Although the effect of increasing both total-score and partial-score was not as strong as the results for CSMs, this trend is still similar to the Supplementary Figure S6. We observed that increasing the partial-score from 0 to 10 reduced the number of incorrect cross links but keep more less the same number of correct cross links. Figure S7.

DSSO data sets by Beveridge et al
The CSM results are at first surprising because there is no clear pattern compared to the other results (Supplementary Figure S8). When we have a close look at total-score=0 and partial-score=0, we observed that there are high-scored decoys affecting the results. The affect can be also due to the lower number of CSMs for this data set and any high-scored decoy CSMs have more profound affect. Unlike typical proteomics search, XL-MS have some decoys that are composed of target and decoy protein. Most of these high-scored decoys are half-decoy (decoy-is-linked-to-target). We expect that separate FDR could help here. With high-score filtering, total-score=40, this obscure result is diminished. In general, we did not observe a clear pattern for the total-score. Therefore, we suggest to leave this value as 0. However, we observed a consistent behavior on the partial-score value to help reducing incorrect hits, with the optimum value as 10, therefore we recommend this using the filter option based on the partial-score.

b. Effect on separating protein intra-and inter-crosslinks for FDR control
We have performed several more MaxLynx runs to evaluate the effect of using separate FDR together with the combination of total-score and partial-score. Note that we kept the MS/MS analyzer settings as default to see its direct affect for the synthetic data set by Beveridge et al. In addition to the synthetic data set, we did six different MaxLynx runs for the proteome-wide data set (total score=0; partial score=0, 10 or 20; separate FDR=on/off).

DSS data set by Beveridge et al
We observed that there was a slight increase in correct CSMs compared to without separating CSMs into inter-and intra-protein cross links at total-score=0 and partial-score=10 but the other settings did not show a clear difference on the number of correct CSMs (sometime less, sometimes more CSMs) (Supplementary Figure S10). The used database contains one Cas9 protein and 10 contaminant proteins (such as keratin). Note that this database was not created by us, but we used the provided database by Beveridge et al (CITE). At the total-score=0 and partial-score=10, the correct CSMs were 566, 735, 679 and the incorrect CSMs were 3, 5 and 9. Here there was only one CSM from inter-protein cross link, which was between two different contaminant protein. The same settings but with separating FDR, the correct CSMs were increased to 571, 742, 684 and the incorrect CSMs were 3, 11 and 10 (4 of these incorrect CSMs were related to contaminants). In most of the settings, there was a slight increase in the number of the incorrect CSMs. This is related to also considering these possible cross links involving contaminants.

Number of CSMs for the DSS data set by Beveridge et al, when protein inter-and intra-cross links were split. Each plot contains the results from the three replicates from the data set, which shown as Replicate_1, Replicate_2 and Replicate_3. The plots on the upper panel show the distributions for the number of correct CSMs whereas the plots on the lower panel show the distributions for the number of incorrect CSMs.
Two plots on the left shows the results from total-score=0 with combination of partial-scores from 0, 10 and 20. The middle plots for total-score=20 and the right plots for total-score=40 are shown with combination of partialscores from 0, 10 and 20.
The effect of separating protein cross links for the FDR calculation is less obvious for the unique cross links compared to the change on the CSMs. Moreover, the distribution for unique cross links corresponds well to the changes to CSMs (Supplementary Figure 11).

Number of unique cross links for the DSS data set by Beveridge et al, when protein inter-and intra-cross links were split. Each plot contains the results from the three replicates from the data set, which shown as Replicate_1, Replicate_2 and Replicate_3. The plots on the upper panel show the distributions for the number of correct cross links whereas the plots on the lower panel show the distributions for the number of incorrect cross links.
Two plots on the left shows the results from total-score=0 with combination of partial-scores from 0, 10 and 20. The middle plots for total-score=20 and the right plots for total-score=40 are shown with combination of partial-scores from 0, 10 and 20.

DSBU and DSSO data sets by Beveridge et al
For both of the MS-cleavable data sets, the number of correct CSMs were increased clearly. The number of incorrect CSMs did not typically increased but rather stayed the same or similar value.
The effect of separating FDR only increased the number of correct unique cross links at total-score=0 settings, when this was increased to higher number there were mostly no changes.

Proteome-wide data analysis
We have also analyzed the effect of separate protein intra-and inter-cross links FDR on the proteome wide study, the proteome-wide dataset of cross-linked D. melanogaster cell lysate (PXD012546). The effect of separating FDR has more profound on MS-cleavable cross linker on such data set. Because, especially with MS-cleavable cross linkers, we find very few intra-decoy-proteins unless, observing so many inter-decoy-proteins (Supplementary Figure 6)

c. Effect of high-charged theoretical peaks and neutral losses
We observed that excluding high-charged and neutral loss related peaks, the identification rates were increased for FTMS analyzers (Table S7-S12). We have also run with excluding only neutral losses but the combination together with excluding high charge, we observed the highest affect. Including much more peaks on theoretical spectrum for a cross linked peptide over populated and increased the chance to make a match by mistake. Furthermore, We have tested also on TIMS-TOF data set but we observed that this leads typically to increase in number of false identifications, in addition to increase in overall identification numbers. This could be due to TOF analyzers are not as accurate as FTMS analyzers. Therefore, for cross linking example for FTMS analyzers, we recommend to excluding these default MS/MS analyzer settings for FTMS analyzers but not TIMS-TOF (Table S13-S14).

d. Effect of increasing FDR to 5%
The below tables show the results at FDR=1% and 5% (separate protein intra-and inter-cross links). For DSS, DSSO and DSBU data sets, we used total-score=0, partial-score=10, min-match=3. For Beveridge and co-workers data set, we excluded higher charge and neutral losses. We also tested TIMS-TOF DSSO and DSBU data sets.
For DSS data set, there were slight increase with overall number of correct CSMs and slight increase for the incorrect CSMs (See Supplementary Table S15). The number of correct unique cross links also remained the same but there was a slight increase in the number of incorrect cross links (from 6,14 and 9 increased to 11,20 and 13 for the replicate1, replicate2 and replicate 3, respectively (See Supplementary Table S16).
For the DSSO and DSBU data sets as well as TIMS-TOF DSSO and DSBU data sets, the results remained the same. Increasing here from FDR=0.001 to FDR=0.05 only caused to adding more decoy CSMs to the identification lists. We also observed the same for TIMS-TOF data set with MS-cleavable cross linkers.
We suggest that a user should use FDR=1%. First of all, increasing FDR mostly resulted in only increasing incorrect identifications or decoys. Second, we do not currently have any further FDR control on unique cross links but only on CSM-level (we have planned to do for the next project). Unique cross links were derived directly from the CSMs that were selected at given FDR and therefore the actual FDR value of the unique cross links is expected to be higher than the set CSM-FDR value.