Send to

Choose Destination
IUCrJ. 2020 Feb 27;7(Pt 2):342-354. doi: 10.1107/S2052252520000895. eCollection 2020 Mar 1.

The predictive power of data-processing statistics.

Author information

Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, England.
MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, England.
Institute for Cell and Molecular Biosciences, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 1HH, England.
Science Technology and Facilities Council, Rutherford Appleton Laboratory, Didcot OX11 0FA, England.
Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot OX11 0FA, England.


This study describes a method to estimate the likelihood of success in determining a macromolecular structure by X-ray crystallography and experimental single-wavelength anomalous dispersion (SAD) or multiple-wavelength anomalous dispersion (MAD) phasing based on initial data-processing statistics and sample crystal properties. Such a predictive tool can rapidly assess the usefulness of data and guide the collection of an optimal data set. The increase in data rates from modern macromolecular crystallography beamlines, together with a demand from users for real-time feedback, has led to pressure on computational resources and a need for smarter data handling. Statistical and machine-learning methods have been applied to construct a classifier that displays 95% accuracy for training and testing data sets compiled from 440 solved structures. Applying this classifier to new data achieved 79% accuracy. These scores already provide clear guidance as to the effective use of computing resources and offer a starting point for a personalized data-collection assistant.


X-ray crystallography; experimental phasing; machine learning; macromolecular crystallography; phasing; structure determination

Supplemental Content

Full text links

Icon for International Union of Crystallography Icon for PubMed Central
Loading ...
Support Center