Improved Reporting of Crystal Structures:
The Impact of Publishing Policy on Data Quality

Brian McMahon*1, Peter R. Strickland1, John R. Helliwell2

1International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England; 2School of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, England and CCLRC Daresbury Laboratory, Daresbury, Warrington, WA4 4AD, England

The International Union of Crystallography (IUCr) has developed an archival and exchange standard for crystal structure data, the Crystallographic Information Framework (CIF). This includes formal and precise definitions of several thousand tags (data names) identifying specific items of experimental and derived data collected or calculated during a crystal structure determination. With such a well-specified standard it becomes possible to construct machine-readable lists of items required for a full description of a crystal structural model, and also to perform algorithmic checks on the self-consistency and quality of a submitted data set. The ready extraction of community-agreed key data indicators provides a convenient synopsis of the precision of a structure determination.

The IUCr publishes several journals that report crystal structures, and has used these properties in the development of an electronic submission, review and publication workflow for small-unit-cell structures. The result has been a significant improvement in the detail and quality of reporting of structures published in IUCr and other research journals, and the checkCIF suite of validation tests employed in this workflow has become a de facto standard for judging the quality of a crystal structure determination (the available tests are listed at http://journals.iucr.org/services/cif/datavalidation.html).

 IUCr journals are coordinating debate within the community towards similar agreed standards for assessing the quality of published biological macromolecular structures. This is a younger subject area, harnessing new technologies for extracting the best possible data from a crystal. In consequence, it has less well-defined quality criteria, but the use of data dictionaries permits specific types of checking to be introduced in response to the needs and concerns of the community.

In response to the ever-increasing output of routinely determined crystal structures, and in the spirit of open access to data, a number of crystallography facilities are also making available data sets of structures not submitted for publication. To address the concerns of the community about the reliability of unrefereed research results entering the public domain in this way, many such repositories also apply the IUCr checkCIF tests and present the results alongside the data sets in their repositories.

Keywords: crystallography, data quality, electronic publishing, peer review, exchange format