19th International CODATA Conference
Category: Poster

ENBI and BioCASE: The ABCD of connecting specimen databases in Europe

Javier de la Torre (j.torre@bgbm.org), Wolfgang Lipp, Markus Döring & Walter G. Berendsohn
Freie Universität Berlin, Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM), Department of Biodiversity Informatics and Laboratories, Germany


Each specimen in a natural history collection documents the occurrence of a specific organism at a certain time and place, plus a wide range of other biological information. This data will realize its full potential once the content is integrated and made accessible, making it possible to combine results from different collections. As of September 2004, there are already around more than 80 data provider worldwide accessible trough the portal of the Global Biodiversity Information Facility. The goal of our subproject within the European Network for Biodiversity Information (ENBI) is to connect 100 European databases by the end of 2004 to broaden the network with highly valuable data only available in European institutions. The software used for networking was developed by BioCASE (A Biological Collection Information Service in Europe), a project coordinated at the BGBM. To overcome the disparity of database structures, the diversity of database and systems software, and the plethora of data formats, a wrapper tool was created that provides a unified web-based access to data providers. Up to now a total of 129 databases covering more than 250 collections were connected to the network through 32 data providers in 9 countries.

Data is transmitted using the ABCD format, an XML standard that defines over 700 terms describing specimen data, developed under the auspices of the CODATA task group with the same name. ABCD acts as an ontology, a 'taxonomy of terms': when database attributes (columns of tables) are identified with terms of ABCD, the result is a unified description of databases. Collection databases may contain hundreds of attributes, resulting in thousands of possible mappings. Part of our efforts aim to aid in the mapping process by creating a configuration tool that provides (1) lexical analysis of the terms in the schemas to be mapped (the 'Google approach'); (2) a structured catalog of terms in the schemas (the 'Yahoo approach'); (3) the incorporation of experts' assessments of the importance of terms for certain domains; and (4) formats and tools to identify terms of the schema with terms of other schemas or standards.

ENBI: European Network for Biodiversity Information

BioCASE: Biological Collection Access Service

ABCD: Access to Biological Collection Data

GBIF: Global Biodiversity Information Facility