19th International CODATA
Conference
Category: Plenary - Mark-Up Languages
XML Description of Protein Structural Data for Data Grid and Computing Grid
Haruki Nakamura
Institute for Protein Research, Osaka University, Japan
The Protein Data Bank (PDB) has been a primary archive of three-dimensional
structural information of biological macromolecules. Protein Data Bank Japan
(PDBj, http://www.pdbj.org/)
has been curating new PDB entries as a member of world-wide Protein Data Bank
(wwPDB) [1] along with Research Collaboratory for Structural Bioinformatics
(RCSB) and European Bioinformatics Institute (EBI).
A new extensible mark-up language (XML) describing the PDB data, the pdbML, is being developed by wwPDB. Its structure is defined in XML Schema (pdbx-v1.000.xsd at http://deposit.pdb.org/pdbML/), based on Macromolecular Crystallographic Information Format (mmCIF). The entire content in the pdbML is now available from ftp://beta.rcsb.org/pub/pdb/uniformity/data/XML. To make the most of the XML format, we, PDBj, have constructed an XML-based PDB data browser (xPSSS: xml-based Protein Structure Search Service at http://www.pdbj.org/xpsss/), using the native XML-DB. The information of the biological and biochemical functions of proteins is also browsed. In addition to simple searches, full XPath searches are also implemented. This allows users to perform complicated searches and control the output of their search in details. The xPSSS is also used by the SOAP service for large-scale analyses and data grid applications.
In multiscale biological
systems, integration of the simulation methods for models at different levels
is essential, and a new platform, BioPfuga (Biosimulation Platform United on
Grid Architecture), has been developed [2]. It requires that (1) application
programs are divided into a set of many pieces, and that (2) data communication
be made among the program components by a standard XML description. An example
of the BioPfuga application to hybrid QM(HF)/QM(DFT)/MM method will be shown.
References:
[1] Berman et al. (2003) Nature Struct. Biol. 10, 980.
[2] Nakamura et al. (2004) New Generat. Comput. 22,157-166.