19th
International CODATA Conference
Category: Poster
ENBI and BioCASE: The
ABCD of connecting specimen databases in Europe
Javier de la Torre (j.torre@bgbm.org),
Wolfgang Lipp, Markus Döring & Walter G. Berendsohn
Freie Universität Berlin, Botanic Garden and Botanical Museum Berlin-Dahlem
(BGBM), Department of Biodiversity Informatics and Laboratories, Germany
Each specimen in a natural history collection documents the occurrence of a
specific organism at a certain time and place, plus a wide range of other biological
information. This data will realize its full potential once the content is integrated
and made accessible, making it possible to combine results from different collections.
As of September 2004, there are already around more than 80 data provider worldwide
accessible trough the portal of the Global Biodiversity Information Facility.
The goal of our subproject within the European Network for Biodiversity Information
(ENBI) is to connect 100 European databases by the end of 2004 to broaden the
network with highly valuable data only available in European institutions. The
software used for networking was developed by BioCASE (A Biological Collection
Information Service in Europe), a project coordinated at the BGBM. To overcome
the disparity of database structures, the diversity of database and systems
software, and the plethora of data formats, a wrapper tool was created that
provides a unified web-based access to data providers. Up to now a total of
129 databases covering more than 250 collections were connected to the network
through 32 data providers in 9 countries.
Data is transmitted using the ABCD format, an XML standard that defines over 700 terms describing specimen data, developed under the auspices of the CODATA task group with the same name. ABCD acts as an ontology, a 'taxonomy of terms': when database attributes (columns of tables) are identified with terms of ABCD, the result is a unified description of databases. Collection databases may contain hundreds of attributes, resulting in thousands of possible mappings. Part of our efforts aim to aid in the mapping process by creating a configuration tool that provides (1) lexical analysis of the terms in the schemas to be mapped (the 'Google approach'); (2) a structured catalog of terms in the schemas (the 'Yahoo approach'); (3) the incorporation of experts' assessments of the importance of terms for certain domains; and (4) formats and tools to identify terms of the schema with terms of other schemas or standards.
ENBI:
European Network for Biodiversity Information
BioCASE: Biological Collection
Access Service
ABCD: Access
to Biological Collection Data
GBIF: Global Biodiversity Information
Facility