- Journal List
- NIHPA Author Manuscripts
- PMC2041839

# Efficient Calculation of Exact Mass Isotopic Distributions

## Abstract

This paper presents a new method for efficiently calculating the exact masses in an isotopic distribution using a dynamic programming approach. The resulting program, isoDalton, can generate extremely high isotopic resolutions as demonstrated by a FWHM resolution of 2×10^{11}. This resolution allows very fine mass structures in isotopic distributions to be seen, even for large molecules. Since the number of exact masses grows exponentially with molecular size, only the most probable exact masses are kept, the number of which is user specified.

## Introduction

For most practical applications, computing the very fine isotopic structure is unnecessary as it is beyond the resolution of today’s mass spectrometers. However, for theoretical considerations, it is useful to have a method that can calculate the exact mass distributions of these fine structure clusters, which could prove useful as the resolution of mass spectrometers improves. One of the challenges in calculating distributions is maintaining a very high resolution around isotope peaks and yet being computationally efficient.

A number of methods [1–9] have been employed to elucidate the fine isotopic structure. The majority are polynomial based methods that rely on pruning to reduce the complexity to a more manageable size. The pruning strategies typically use a threshold to eliminate permutations whose contribution falls below some preset value. This creates errors in the isotopic distribution profile since a significant number of terms are eliminated.

Rockwood et. al [8–9] uses a Fourier transform method to zoom in and achieve ultrahigh resolution around a single mass peak. As with any Fourier analysis method, care must be taken to choose an appropriate window function. The windowing will cause the underlying Dirac delta functions representing fine isotopic masses to be convolved together if they are closer together than the width of the analysis window.

A new method based on dynamic programming [10] is presented that can efficiently calculate isotopic distributions. In principle, the resolution is infinite, since there is no restriction on how close in mass the states can be to each other. In practice, the resolution seen depends on machine precision and the probability of the mass states.

Low probability states that are very close to other more probable states may either be eliminated or merged, depending on the state reduction strategy employed.

This method can operate like a polynomial pruning method if low probability states are eliminated. If neighboring states are merged instead, it can operate more like the Fourier transform method. In the Fourier transform method, all peaks under the window contribute whereas the merging in the dynamic programming case is local.

The implementation of the algorithm is a program called isoDalton that has been written in MATLAB [12] and is freely available under the GNU Lesser General Public License [13], which allows use in both proprietary and free programs. The program includes all the isotopes of all the elements with the standard isotopic compositions [14]. Custom isotopic compositions can be easily added.

## Algorithm Description

Calculating ion distributions for large molecules require expanding the polynomial of the form

where
${E}_{j}^{i}$ represents the j^{th} isotope of the i^{th} element in the molecule. The *N _{i}* superscript outside the parenthesis represents the number of atoms of the i

^{th}element [4]. This will generate a combinatorial explosion in the number of terms for large molecules. The number of coefficients for the multinomial ${({E}_{1}^{i}+{E}_{2}^{i}+\cdots +{E}_{{I}_{i}}^{i})}^{{N}_{i}}$ representing the i

^{th}element with

*N*atoms and I

_{i}_{i}isotopes is given by [15]:

and the coefficients of the multinomial are given by:

The total number of terms T in the expanded polynomial of equation 1 is the number of terms in the product of the elemental multinomial coefficients and is given by:

which gives the number of possible masses in the isotopic fine structure.

For bovine insulin C_{254}H_{377}N_{65}O_{75}S_{6} the number of possible terms is 1.56 × 10^{12}, which clearly precludes any brute force attack. In practice, one only needs a fraction of the terms since most of the terms are extremely unlikely. The least probable term is ^{13}C_{254}^{2}H_{377}^{15}N_{65}^{17}O_{75}^{35}S_{6} that has a probability of 0.2610×10^{−2422} of occurring. In fact, the top 1,000 terms represents 99.96 % of the cumulative probability distribution.

An efficient method based on dynamic programming can be used to calculate the overall distribution of possible molecular weights given the isotopic distribution for each element. To apply dynamic programming, we first frame this calculation in the context of a Markov process {*X _{t}*}

*operating on a discrete state space S. The state transition probabilities are given by:*

_{tT}This gives the probability of arriving at state S_{j} at step n+1, given that it was in state S_{i} at step n. The state transition probabilities are required to have the following properties:

The initial state probabilities are given by:

The efficient way to calculate the probability of being in state S_{j} at step n+1 is to use a forward trellis algorithm [16]. An illustration of this computation can be seen in figure 1.

The state probabilities for step n+1 are calculated by

where 1 ≤ *j* ≤ *N*(*n*+1),1 ≤ *n* ≤ *T* − 1. N(n) implies that the number of states is a function of step n.

The trellis algorithm gains its efficiency by collapsing the possible paths that can lead to a particular state. Only the state probabilities at step n along with the transition probabilities are used to calculate the state probabilities for the next step. This is known as a first order markov model or chain [17].

In the context of calculating the isotope distribution, the states are the set of unique molecular masses that can exist at each step. At each step, all isotopes of one atom of a particular element are added, i.e. ${({E}_{1}^{i}+{E}_{2}^{i}+\cdots +{E}_{{I}_{1}}^{i})}^{1}$. This means that the state transition probabilities are non-stationary since they depend on the isotope distribution of the particular element being added. The Markov chain can be thought of as the sequence of adding elements with all associated isotopes at each step. The length of the chain is the number of elements in the molecule.

The number of states at each step is also non-stationary since particular combinations of isotopes lead to unique masses. The states at step n+1 is the set of unique masses computed by adding the mass of any state at step n with any isotope of the element being added. The probabilities of these states are given in equation 8. These states are then either pruned or combined to reduce computational complexity and this process is call state reduction.

## State Reduction

### Most Probable Exact Masses

Keeping the distribution of *all* exact masses becomes impractical for all but the smallest molecules. If one is interested in the exact masses of the most probable isotope mass combinations, as is typically the case, then the states with lowest probabilities are eliminated. This is done by computing all states for step n+1, sorting these states based on probabilities, and then keeping only the top N_{max} most probable states where N_{max} is user specified. Once all the elements have been added at the last step, isoDalton returns the exact masses of the N_{max} most probable isotopic mass combinations. The “true” probabilities of these exact masses are only approximations since eliminating states prunes potential path combinations that affect probability values. Increasing N_{max} will reduced this error.

### Exact Probability Distribution

If one is interested in seeing the overall probability distribution of near integer separated values, then close mass values can be combined as follows. Let M_{old1} and M_{old2} be the masses of states S_{i} and S_{j} that are the closest together in terms of mass values and let P_{old1} and P_{old2} be their respective probabilities. Then a new state is created that has mass and probability of:

For a particular step n, the states are combined in this fashion until there are N_{max} states. Combining states in this manner results in a probability distribution of N_{max} masses that are the center of masses of the isotopic fine structure exact masses. These are exact probabilities for these “center of mass” weights since they sum to 1 as expected of a probability distribution. However, the masses are not exact.

## Results and Discussion

As an example, we find the complete molecular weight distribution of the amino acid Glycine, C_{2}H_{5}N_{1}O_{2} (including the amino and carboxyl end groups). We view the Markov process as the sequence of adding the elements H, H, H, H, H, C, C, N, O, and O, where the order of elements is taken by starting with the element with the fewest number of isotopes. Starting with elements with fewest isotopes minimizes the growth in the number of states. The isotopes used in this example are the NIST values [14].

The initial state probability *π*(0) is simply the isotopic composition of hydrogen {0.999885, 0.000115} at states (masses) {1.0078250321, 2.0141017780}. To compute the new state probabilities, we add another hydrogen and compute the resulting states and probabilities of being in these states. The new states at step n+1 are all permutations of adding the molecular weights of the states at step n with all the isotopes of element E^{i}. The atomic weights for all states at each step can be seen in figure 2. This was created by adding the Glycine elements H, H, H, H, H, C, C, N, O, O.

A more abstract view of the trellis can be accomplished by finding at each step the state with the minimum atomic weight, and then subtracting this value from all states for this step. Figure 3 shows this abstracted view where the distances between states can be more clearly seen. The bottom row in the figure shows which element was added at each step. Above the trellis, the number of states is given that exist for each step. We start off with two states since Hydrogen has two isotopes and we end up with 216 states, or distinct atomic weights after adding all 10 elements. Many of the states are clustered together near unitary atomic weight increments. In practice, only the states at step n are kept since the state history is not needed and would only consume memory.

The complete molecular weight distribution of Glycine can be seen in figure 4. The distribution is plotted with two different scales. Positive values are the probabilities of isotopic masses where the scale is on the left upper side. Since most of the probabilities are very small, a log scale is plotted by taking the log10 of the probabilities. This results in negative numbers that are seen in the lower half of the plot with the scale on the right lower side. There are 216 unique mass values in this plot that are clustered near 13 integer values. There is a single dominant peak at mass 75.03 with probability 0.97. There are four mass values clustered at 76.03 and are shown in the inset plot. There is a very small single peak at mass 87.08 with probability 3.56×10^{−32} and can be seen clearly on the log10 scale with value −31.4484. The closest states occur at 81.05 with a separation of 7.21 × 10^{−5}, which means a FWHM resolution of 1.12 × 10^{6} is needed to resolve these peaks.

_{2}H

_{5}N

_{1}O

_{2}. There are 216 unique masses of which the probabilities are shown on two scales. These two scales are the typical probability scale (upper left) and the log10 probability scale (lower right) that shows

**...**

For large molecules, keeping all unique states will cause the number of states to grow too large for memory and speed considerations. To keep the states small and still have a useful distribution, one can combine the states by adding clustered weights together. This is done by adding all the probabilities together in a cluster. In the Glycine example, this would reduce the number of final states from 216 to 13. As a result, one can trade off distribution precision for a reduction in computational complexity.

To show the program’s utility with large molecules, bovine insulin C_{254}H_{377}N_{65}O_{75}S_{6} was chosen for comparison purposes with previous publications. Figure 5 shows the fine isotopic structure around the 5736.6 Da peak of protonated bovine insuline with a FWHM resolution of 2 × 10^{11}, which is a resolution of several orders of magnitude finer than previously shown. This distribution was calculated by keeping the top 100,000 most probable states, 416 of which are near the 5736.6 peak. The resolution is similar for the other isotopic clusters that range in mass from 5730.6 to 5759.7.

## Run times

There is a tradeoff between execution speed and the FWHM resolution. Table 6 shows the tradeoff between the number of states and the run time for exact mass calculations. The time per state actually gets better with additional states due to a relatively fixed overhead of processing the state vectors in the program.

## Conclusion

The paper presents a new method based on dynamic programming that can calculate the distribution of exact masses in an isotopic distribution. The resulting Matlab program isoDalton can be used to calculate the isotopic fine structure of large molecules with extremely high resolution. This is done by keeping only the most probable states representing exact masses. The program can also calculate isotopic distributions with a true probability profile at the expense of knowing exact masses by merging very close mass states.

## Acknowledgments

The work was supported by the National Institutes of Health grant 1R43HG003743-01

## Footnotes

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (387K) |
- Citation

- Efficient calculation of accurate masses of isotopic peaks.[J Am Soc Mass Spectrom. 2006]
*Rockwood AL, Haimi P.**J Am Soc Mass Spectrom. 2006 Mar; 17(3):415-9. Epub 2006 Feb 3.* - Using dynamic programming to create isotopic distribution maps from mass spectra.[Bioinformatics. 2007]
*McIlwain S, Page D, Huttlin EL, Sussman MR.**Bioinformatics. 2007 Jul 1; 23(13):i328-36.* - An efficient method to calculate the aggregated isotopic distribution and exact center-masses.[J Am Soc Mass Spectrom. 2012]
*Claesen J, Dittwald P, Burzykowski T, Valkenborg D.**J Am Soc Mass Spectrom. 2012 Apr; 23(4):753-63. Epub 2012 Feb 15.* - The isotopic distribution conundrum.[Mass Spectrom Rev. 2012]
*Valkenborg D, Mertens I, Lemière F, Witters E, Burzykowski T.**Mass Spectrom Rev. 2012 Jan-Feb; 31(1):96-109. Epub 2011 May 16.* - Mass isotopomer distribution analysis at eight years: theoretical, analytic, and experimental considerations.[Am J Physiol. 1999]
*Hellerstein MK, Neese RA.**Am J Physiol. 1999 Jun; 276(6 Pt 1):E1146-70.*

- MetAssign: probabilistic annotation of metabolites from LC–MS data using a Bayesian clustering approach[Bioinformatics. 2014]
*Daly R, Rogers S, Wandy J, Jankevics A, Burgess KE, Breitling R.**Bioinformatics. 2014 Oct; 30(19)2764-2771* - Calculation of the Isotope Cluster for Polypeptides by Probability Grouping[Journal of the American Society for Mass Sp...]
*Olson MT, Yergey AL.**Journal of the American Society for Mass Spectrometry. 2009 Feb; 20(2)295-302* - BRAIN 2.0: Time and Memory Complexity Improvements in the Algorithm for Calculating the Isotope Distribution[Journal of the American Society for Mass Sp...]
*Dittwald P, Valkenborg D.**Journal of the American Society for Mass Spectrometry. 2014; 25(4)588-594* - A Computational Framework for High-Throughput Isotopic Natural Abundance Correction of Omics-Level Ultra-High Resolution FT-MS Datasets[Metabolites. ]
*Carreer WJ, Flight RM, Moseley HN.**Metabolites. 3(4)853-866* - Molecular Isotopic Distribution Analysis (MIDAs) with Adjustable Mass Accuracy[Journal of the American Society for Mass Sp...]
*Alves G, Ogurtsov AY, Yu YK.**Journal of the American Society for Mass Spectrometry. 2014; 25(1)57-70*

- Efficient Calculation of Exact Mass Isotopic DistributionsEfficient Calculation of Exact Mass Isotopic DistributionsNIHPA Author Manuscripts. Aug 2007; 18(8)1511

Your browsing activity is empty.

Activity recording is turned off.

See more...