CODATA Workshop on "The New Frontier in Defining the Data to Knowledge Paradigm"

Presentation : First thoughts on the experiment: Paul Berkman

Conveners: Mark Parsons, Dave Clark, Liao Shunbao and Paul Berkman

In 1957-58, the International Geophysical Year launched an era of international and interdisciplinary research on the Earth system. In addition to developing the first comprehensive look at the Earth system, among the other significant achievements was the establishment of the World Data Center system and the first artificial satellites in orbit. Today, we are being overwhelmed by the volume of digital information and diverse strategies for its management. Needs are becoming acute to share diverse digital data across boundaries, particularly when 15% of the digital information is considered to be structured and the remaining 85% is unstructured for the purposes of knowledge discovery. Like the first satellites, which just demonstrated the potential for planetary observing systems with diverse payloads, dynamically integrating data and discovering knowledge from disparate data centers would be a demonstration of capacity.

This workshop will begin planning an international experiment with data from at least two highly disparate data centers (one of which is a World Data Center). The experiment will be designed to: (a) dynamically, comprehensively and objectively integrate these data; (b) derive meaningful relationships from the data; and (c) generate knowledge to address a well-defined Earth system science problem. The problem will be related to the Polar Regions, in recognition of the International Polar Year that will be convened from March 2007 to March 2009. To successfully design this international and interdisciplinary data experiment, we will need input from data managers, software engineers, metadata experts, Earth scientists and other individuals involved with data preservation, access and analysis.

This workshop will be convened as a panel with active audience participation. The panel members will include individuals from the CODATA sessions on: Steps Towards a System of Systems; Best Practices; Virtual Observatories in the Geosciences; and Data Mining, Data Integration and Knowledge Discovery. This workshop is a product of discussions from the International Polar Year (http://www.ipy.org) and Electronic Geophysical Year (http://www.egy.org) programs. Paul Berkman will serve as the panel moderator.

 
We have had preliminary discussions with the Committee on Earth Observation Satellites (CEOS) / Working Group on Information Systems Services (WGISS) / WGISS Test Facility for Coordinated Enhanced Observing Period (WTF-CEOP) to apply the knowledge-discovery experiment to WTF-CEOP metadata.  WTF-CEOP is a prototype distributed data integration system with in-situ, satellite and numerical-weather-prediction model output data.
 
Based on our discussions, the WTF-CEOP metadata could provide the nucleus for implementing the knowledge discovery experiment that is being introduced in coordination with the Electronic Geophysical Year and International Polar Year.   There is societal relevance and a good rationale for working with WTF-CEOP with regard to the health, biodiversity and agricultural societal benefit areas that are elaborated by the Group on Earth Observations - Global Earth Observation System of Systems (http://www.earthobservations.org/progress/societal_benefits/societal_benefits.html). 

To apply the experiment to WTF-CEOP metadata, we will need granule level (rather than collection level) descriptions.    The opportunity with digital resources is to utilize their inherent structure / patterns to implement granule-level descriptions in an automated manner that will dynamically identify objective relationships within and between resources.  "Automated granularity" (see http://www.jstage.jst.go.jp/article/dsj/5/0/84/_pdf):

In the CODATA workshop, we will discuss: 
Potential Phases of the Experiment
  1. Demonstrate that metadata can be repurposed in an interoperable manner
  2. Use repurposed metadata to identify relationships between datasets
  3. Link repurposed metadata to actual datasets in relational contexts 
  4. Enhance granularity of datasets directly to interpret relationships within and between datasets
  5. Elaborate on the iterative process of adjusting the granularity for additional interpretations
 Phase 1: Adding Value to WTF-CEOP Metadata   
Objective: Demonstrate interoperability and value added of metadata that has been repurposed with automated granularity.   

Rationale: Metadata is ubiquitous, contains subjective descriptions of content and context, requires significant effort that is not scalable, and designed to facilitate access (rather than discovery of relationships). 

Experimental Design: The metadata would refer to datasets, reports, policy documents or other information resources that are associated with the hydrological cycle.  The metadata and associated digital objects would be selected based on a specific experimental framework associated with GEO-GEOSS societal benefit areas. 

Experimental Methods: Utilize general structural features of metadata (e.g., colon ":" as a boundary condition / rule set) as well as common elements (e.g., ISO standards) to automate the granularity and framework for dynamically relating elements within and between metadata records .   

The CODATA workshop will provide an opportunity to elaborate the experimental design and methods as well as specific societal hypotheses that could be tested with WTF-CEOP and other metadata during Phase 1.  The workshop also will provide an opportunity to consider logistics and funding requirements as well as collaborations with other data center programs to effectively implement the experiment in a timely manner.