Proceedings
Table of Contents
Keynote
Speakers
Invited
Cross-Cutting Themes
CODATA
2015
Physical
Science Data
Biological
Science Data
Earth
and Environmental Data
Medical
and Health Data
Behavioral
and Social Science Data
Informatics
and Technology
Data
Science
Data
Policy
Technical
Demonstrations
Large
Data Projects
Poster
Sessions
Public
Lectures
Program
at a Glance
Detailed
Program
List
of Participants
[PDF File]
(To view PDF files, you must have Adobe
Acrobat Reader.)
Conference
Sponsors
About
the CODATA 2002 Conference
|
1.
Preserving and Archiving S&T Data
2.
Legal Issues in the use of S&T Data
3.
Interoperability and Data Integration
4.
Information Economics for S&T Data
5.
Emerging Tools and Techniques for Data Handling
6.
Ethics in the use of S&T Data
7. CODATA 2015
1.
Preserving and Archiving S&T Data
1.
The Challenge of Archiving and Preserving Remotely Sensed Data
John L. Faundeen
US Geological Survey, EROS Data Center, USA
Few
would question the need to archive the scientific and technical
(S&T) data generated by researchers. At a minimum, the data
are needed for change analysis. Likewise, most people would value
efforts to ensure the preservation of the archived S&T data.
Future generations will use analysis techniques not even considered
today. Until recently, archiving and preserving these data were
usually accomplished within existing infrastructures and budgets.
As the volume of archived data increases, however, organizations
charged with archiving S&T data will be increasingly challenged.
The US Geological Survey has had experience in this area and has
developed strategies to deal with the mountain of land remote sensing
data currently being managed and the tidal wave of expected new
data. The Agency has dealt with archiving issues, such as selection
criteria, purging, advisory panels, and data access, and has met
with preservation challenges involving photographic and digital
media.
2.
The Virtual Observatory: The Future of Data and Information Management
in Astrophysics
David Schade
Canadian Astronomy
Data Centre, Herzberg Institute of Astrophysics, National Research
Council, Canada
The
concept of a “Virtual Observatory”, which would put the power of
numerous ground-based and space-based observatories at the fingertips
of astrophysical scientists, was once a pipe dream but is now represented
by funded projects in Canada, the United States, the United Kingdom,
and Europe. Astronomical data has been primarily digital for 15
years and the change from analogue (e.g. photographic plates) to
digital form triggered an appreciation for the scientific value
of data “archiving” and the development of astronomy data centres
around the world. These facilities do much more than passively
“archive” their content. They have scientific and technical staff
that develop the means to add value to datasets by additional processing,
they integrate datasets from different wavelength regimes with one
another, they distribute those data via the web, and they actively
promote the use of archival data. The next step is to federate the
diverse and complimentary collections residing in data centres around
the world and develop seamless means for users to simultaneously
access and query multi-wavelength databases and pixels and to provide
the computational resources for cross-correlation and other processing.
In analogy to “the greatest encyclopedia that has ever existed”
that has effectively come into being because of the internet, the
Virtual Observatory will be an historic leap forward in the ability
of scientists, and all human beings, to understand the universe
we are part of.
3.
Towards a New Knowledge of Global Climate Changes: Meteorological
Data Archiving and Processing Aspects
Alexander M. Sterin
All-Russian Research Institute of Hydrometeorological Information
(RIHMI-WDC), Russia
This presentation will
focus on a wide range of aspects related to meteorological data
utilization for getting new empirical information on climate variations.
The problems of meteorological data collection, their quality assurance
and control, and their archiving will be discussed.
The first and the main
focus will be on the problem of environmental data archiving and
preservation. The collection of Russian Research Institute for Hydrometeorological
Information - World Data Center (RIHMI-WDC) is currently located
on 9-track magnetic tapes. The total amount of these tapes is about
60 thousand volumes. The current archiving media are obsolete, so
urgent efforts on moving the collection onto modern media are beginning.
The second focus will be on the multi-level approach in constructing
the informational products based on primary meteorological observational
data. This approach presumes that on the lowest level (zero level)
there are raw observational data. On the next level (level number
one) there are the observational data that have passed the quality
check procedures. Normally, in the level one the erroneous and suspicious
data are flagged. The higher levels contain the derivative, data
products. It appears that most customers prefer special derivative
data products that are based on the primary data and that have much
easier to use formats and modest volumes, rather than the primary
observational data that have more complicated formats and huge volumes.
The multi-level structure of the derivatives for climate studies
includes the derivatives based on observational data directly (characteristics
which require the calculations based on the observational data directly),
derivatives of the higher level that are based on the further generalization
of products - derivatives of the lower level, and so on. Examples
of such a multi-level structure of data products will be given.
The third focus will
be on the cycles of data processing that are required for large,
data-based climate-related projects. As a result of previous experience,
it is important to preserve and to reutilize the observational data
collections and to provide again the main calculations. The preservation
of primary observational data is very important, because it may
be necessary to recalculate the products of higher levels "from
the very beginning." It appears that normally these cycles
may need to be repeated once (or even more than once) per decade.
The last focus will
be on the software instrumentation to obtain new information and
new knowledge in climate changes. The technological aspects in processing
huge volumes of data in various formats will be described.
4.
Strategies for Selection and Appraisal of Scientific Data
for Preservation
Seamus Ross, University of Glasgow
and Principal Director ERPANET, UK
With many governments and commercial organisations creating kilometres
of analogue documents every year archivists have long been confronted
with the challenge of handling substantial quantities of records.
Recognising the impossibility of retaining this material and documenting
it in ways that would enable future users to discover and use it
archivists developed the concepts of appraisal. Typically archives
retain only between 5% and 10% of the records created by an organisation.
Indeed, in ensuring that sufficient material is retained to provide
an adequate record of our cultural, scientific, and commercial heritage
effective retention and disposal strategies have proven essential.
As we make the transition from a paper-based world to a digital
one archivists continue to recognise the power of appraisal as they
attempt to manage the increasing amounts of material created digitially.
The concepts that underlie appraisal are poorly understood outside
the narrow confines of the archival world, but a wider appreciation
of them might bring benefits to other data creating and using communities.
Appraisal is both a technical process and an intellectual activity
that requires knowledge, research, and imagination on the part of
the appraiser. Appraisal, characterised at its simplest level, involves
establishing the value of continuing to retain and document data
or records; what administrative, evidential, informational, legal,
or re-usable value does a record, document, or data set. The problem
is of course compounded in the digital environment by the technical
aspects of the material itself. Does technology change the processes,
timeframe or relevance of appraisal? Or to paraphrase from the InterPARES
Appraisal TaskForce Report (January 2001) what impact does it have
on ensuring that material of 'lasting value is preserved in authentic
form'.
After charting the mechanisms and processes for appraisal the paper
examines how the digital environment has focused attention on establishing
during the appraisal process whether or not it is feasible to maintain
the authenticity and integrity of digital objects over time and
what impact this has on the process and point in the life of a digital
objects that it must be appraised. The paper concludes by building
on this work to examine the impact of the formal process of appraisal
in the archiving of scientific data sets, who should be involved
and responsible for the process, what appraisal criteria might be
appropriate, and at what stage in the life cycle of a digital object
appraisal should be cared out.
2.
Legal Issues in Using and Sharing Scientific and Technical Data
1.
Search for Balance: Legal Protection for Data Compilations in the
U.S.
Steven Tepp
US Copyright Office, Library of Congress, USA
The United States has
a long history of providing legal protection against the unauthorized
use of compilations of scientific and technical data. That protection,
once broad and vigorous, is now diffuse and uncertain. In light
of modern Supreme Court precedent, the U.S. Congress has struggled
for several years to find the appropriate balance between providing
an incentive for the creation of useful compilations of data through
legal protections which allow the compiler to reap commercial benefit
from his work and promoting the progress of science and useful arts
by allowing researchers and scientists to have unfettered access
to and use of such databases. My presentation will outline the history
and current state of the legal protection afforded to databases
in the United States and will then discuss the different legislative
models of legal protection that have been the subject of considerable
debate in the U.S. Congress in recent years.
2.
Legal (dis)incentives for creating, disseminating, utilizing and
sharing data for scientific and technical purposes
Kenji Naemura
Keio University, Shonan-Fujisawa Campus, Japan
While Japanese policy
makers differ on practical strategies for recovery and growth after
a decade of struggling economy, they all agree on a view that, for
restructuring the industry in a competitive environment, more vital
roles should be played by advanced S&T, as well as by improved
organizational and legal schemes. It is with this view that national
research institutions have undergone structural reforms, and that
national universities are to follow them in a near future.
Many of the enhanced
legal schemes - e.g., patents to be granted to inventions in novel
areas, copyrights of digital works, and other forms of IPRs - are
supposed to give incentives for S&T researchers to commercialize
their results. However, some schemes - e.g., private data and security
protections - may become disincentives for them to disseminate,
utilize and share the results.
Thus the sui generis
protection of databases introduced by the EU Directive of 1996 has
raised a serious concern in the scientific community. The Science
Council of Japan conducted a careful study in its subcommittee on
the possible merits and demerits of introducing a similar legal
protection framework in this country. Its result was published as
a declaration of its 136th Assembly on October 17, 2001. It emphasized
"the principle of free exchange of views and data for scientific
research and education" and, expressing its opposition against
a new type of legal right in addition to the copyright, stated that
caution should be exercised in dealing with the international trend
toward such legislation.
There are various factors
that need to be considered in evaluating the advantages and disadvantages
of legal protection of S&T data. They are related to the nature
of research area, the data, the originating organization, the research
fund, the user and his/her purpose of use, etc. Geographical, linguistic,
cultural and economical conditions should also be considered when
studying the consequences. After all, any incentives for advancing
S&T may not be easily translated into economic figures, but
other types of contributions to the humane society must be more
highly valued.
3.
Scientific and Technical Data Policy and Management in China
Sun Honglie
Chinese Academy of Sciences, Beijing, China
The 21st century is
known as an information era, in which scientific and technical data,
as an important information source, will have significant effects
on the social and economic development of the world. Scientific
and technical data contain academic, economic, social and other
values. However, the basic ways of deriving the greatest value from
scientific data are not just in their creation and storage, but
in their dissemination and wide application. In this regard, issues
of scientific and technical data policies and management have been
considered as a strategic measure in the national information system
and in the scientific and technical innovation programs in China.
So far, scientific and technical data policy and management in China
has made progress, in which:
a) A preliminary working
pattern of scientific and technical data management has been shaped-the
main lead being taken by government-professional sections, with
scientific institutes and universities serving a subsidiary role;
b) Digitization and networking are becoming more and more universal;
and
c) Professional data management organizations are being formed and
expanded.
At present, the scientific
and technical data policy and management in China are mainly focused
on: establishing and implementing the rules for "management
and sharing of national scientific and technical data"; initiating
a special project for the construction of a national scientific
and technical data sharing system; and developing measures for the
management of this data sharing system.
4.
A Contractually Reconstructed Research Commons for Scientific Data
in a Highly Protectionist Intellectual Property Environment
J.H. Reichman, Duke University School of Law, Durham, NC, USA
and
Paul F. Uhlir, The National
Academies, Washington, DC, USA
There are a number of
well-documented economic, legal, and technological efforts to privatize
government-generated and commercialize government-funded scientific
data in the United States that were heretofore freely available
from the public domain or on an "open access" basis. If
these pressures continue unabated, they will likely lead to a disruption
of long-established scientific research practices and to the loss
of new opportunities that digital networks and related technologies
make possible. These pressures could elicit one of two types of
responses. One is essentially reactive, in which the public scientific
community adjusts as best it can without organizing a response to
the increasing encroachment of a commercial ethos upon its upstream
data resources. The other would require science policy to address
the challenge by formulating a strategy that would enable the scientific
community to take charge of its basic data supply and to manage
the resulting research commons in ways that would preserve its public
good functions without impeding socially beneficial commercial opportunities.
Under the latter option, the objective would be to reinforce and
recreate, by voluntary means, a public space in which the traditional
sharing ethos of science can be preserved and insulated from the
commodifying trends. This presentation will review some approaches
that the U.S. scientific community might consider in addressing
this challenge, and that could have broader applicability to scientific
communities outside the United States.
3.
Interoperability and Data Integration
1.
Interoperability in Geospatial Web Services
Jeff de La Beaujardiere
NASA Goddard Space
Flight Center, USA
This
talk will outline recent work on open standards for implementing
interoperable geospatial web services. Beginning in 1999,
a series of Testbeds--operated by the OpenGIS Consortium (OGC),
sponsored in part by US federal agencies, and involving the technical
participation of industry, government and academia--has developed
specifications and working implementations of geographic services
to be deployed over HTTP. Pilot Projects and Technology Insertion
Projects have tested and deployed these standards in real-world
applications.
These information-access services can provide an additional layer
of interoperability above the data search capabilities provided
by National Spatial Data Infrastucture (NSDI) Clearinghouse nodes.
The Web Map Service (WMS; published 2000) provides graphical renderings
of geodata. The Web Feature Service (WFS; 2002) provides point,
line and vector feature data encoded in the XML-based Geography
Markup Language (GML; 2001). The Web Coverage Service (WCS;
in preparation) provides gridded or ungridded coverage data.
Additional specifications for catalog, gazetteer, and fusion services
are also in progress. This talk will provide an overview of
these efforts and indicate current areas of application.
2.
Expanding Spatial Data Infrastructure Capabilities to Optimize Use
and Sharing of Geographic Data: A Developing World Perspective
Santiago Borrero
Global Spatial Data Infrastructure (GSDI), Instituto Geografico
Agustin Codazzi, Colombia
The availabity of spatial
data infrastructure (SDI) capabilities at all levels, backed by
international standards, guidelines and policies on access to data
is needed to support human sustainable development and to derive
scientific, economic and social benefits from spatial information.
In this context, this
paper focuses on the need for and the current situation regarding
spatial data infrastructures, in particular, from the Developing
World perspective. To this end, the author (i) presents GSDI and
PC IDEA aims, scope and expected contribution; and (ii) then, based
on these initiatives and business plans, presents observations on
the possibilities for improved data availability and interoperability.
More than 50 nations are in the process of developing SDI capabilities
and an increasing number of geodata related initiatives at all levels.
Finally, the author evaluates the need for better cooperation and
coordination among spatial data initiatives and, where feasible
and convenient, integration to facilitate data access, sharing and
applicability.
3.
Interoperability of Biological Data Resources
Hideaki Sugawara, National Institute of Genetics, Japan
Biological data resources are composed of databases and data mining
tools. The International Nucleotide Sequence Database (DDBJ /EMBL
/GenBank ) and homology search programs are typical resources that
are indispensable to life sciences and biotechnology. In addition
to these fundamental resources, a number of resources are available
on the Internet, e.g. those listed in the annual as we are able
to observe in the yearly database issue of the journal, Nucleic
Acid Research.
Biological data objects
widely span: from molecule to phenotype; from viruses to mammoth;
from the bottom of the sea to outer space.
Users' profile are also
wide and diverse, e.g. to find anticancer drugs from any organisms
in anywhere based on crosscutting heterogeneous data resources distributed
in various categories and disciplines. Users often find a novel
way of utilization that the developer did not imagine. Biological
data resources have been often developed ad hoc without any international
guidance for the standardization resulting in heterogeneous systems.
Therefore, the crosscutting is a hard task for bioinformatician.
It is not practical to reform large legacy systems in accordance
with a standard, even if a standard is created.
Interoperability may
be a solution to provide an integrated view of heterogeneous data
sources distributed in many disciplines and also in distant places.
We studied Common Object Request Broker Architecture (CORBA) to
find that it is quite useful to make data sources interoperable
in a local area network. Nevertheless, it is not straightforward
to use CORBA to integrate data resources over fire-walls. CORBA
is not fire-wall friendly.
Recently, XML (eXtentible
Markup Language) becomes widely tested and used by so-called e-Business.
XML is also extensively extended to biology. However, it is not
sufficient for the interoperability of biological data resources
to define Document Type Definition (DTD) or XML schema. It is because
multiple groups define different XML documents for the common biological
object. These heterogeneous XML documents will be made interoperable
by use of SOAP (Simple Object Access Protocol), WSDL (Web Service
Definition Language) and UDDI (Universal Description, Discovery
and Integration). The author will introduce implementation and evaluation
of these technologies in WDCM (http://wdcm.nig.ac.jp), Genome Information
Broker (http://gib.genes.nig.ac.jp/) and DDBJ (http://xml.nig.ac.jp).
DDBJ: DNA Data Bank of Japan
EMBL: European Molecular Biology Laboratory
GenBank: National Center for Biotechnology Information
4.
The Open Archives Initiative: A low-barrier framework for interoperability
Carl
Lagoze
Department of Computer Science, Cornell University, USA
The Open Archives Initiatives
Protocol for Metadata Harvesting (OAI-PMH) is the result of work
in the ePrints, digital library, and museum community to develop
a practical and low-barrier foundation for data interoperability.
The OAI-PMH provides a method for data
repositories to expose metadata in various forms about their content.
Harvesters may then access this metadata to build value-added services.
This talk will review the history and technology behind the OAI-PMH
and describe applications that build on it.
4.
Information Economics for S&T Data
1.
Legal Protection of Databases and Science in the "European
Research Area":
Economic Policy and IPR Practice in the Wake of the 1996 EC Directive
Paul
A. David
Stanford University and All Souls College, Oxford
At the Lisbon Meeting
of the European Council in March 2000, the member states agreed
that the creation of a "European Research Area"should
be a high priority goal of EU and national government policies in
the coming decade. Among the policy commitments taking shape are
those directed toward substantially raising the level of business
R&D expenditures, not only by means of subsidies and fiscal
tools (e.g., tax incentive), but also through intellectual property
protections aimed at "improving the environment" for business
investment in R&D. The Economic Commission of the EU currently
is preparing recommendations for the implementation of IP protections
in future Framework Programmes and related mechanisms that fund
R&D projects, including policies affecting the use of legal
protections afforded to database owners under the national implementations
of the EC's Directive of March 11, 1996. This paper reviews the
economics issues of IPR in databases, the judicial experience and
policy pressures developing in Europe in the years following the
implementations of the EC's directive. It attempts to see the likely
implications these will carry for scientific research in the ERA.
2.
International Protection of Non-Original Databases
Helga Tabuchi
Copyright Law
Division, WIPO, Geneva, Switzerland
At the request of its
member States, the International Bureau of the World Intellectual
Property Organization (WIPO) commissioned external consultants to
prepare economic studies on the impact of the protection of non-original
databases. The studies were requested to be broad, covering not
only economic issues in a narrow sense, but also social, educational
and access to information issues. The consultants were furthermore
expected to focus in particular on the impacts in developing, least
developed and transition economies.
Five of the studies
were completed in early 2002 and were submitted to the Standing
Committee on Copyright and Related Rights at its seventh session
in May 2002. The positions of the consultants differ significantly.
The studies are available on WIPO's website at <http://www.wipo.int/eng/meetings/2002/sccr/index_7.htm>.
Most recently another
consultant has been commissioned to prepare an additional study
that focuses on Latin American and the Caribbean region. The study
will be submitted to the Committee in due course.
3.
The Digital National Framework: Underpinning the Knowledge Economy
Keith Murray
Geographic Information Strategy, Ordinance Survey, UK
Decision making requires
knowledge, knowledge requires reliable information and reliable
information requires data from several sources to be integrated
with assurance. An underlying factor in many of these decisions
is geography within an integrated geographic information infrastructure.
In Great Britain, the
use of geographic information is already widespread across many
customer sectors (eg central government, local authorities, land
& property professionals, utilities etc) and supports many hundreds
of private sector applications. An independent study in 1999 showed
that £100 billion of the GB GDP per annum is underpinned by
Ordnance Survey information. However little of the information that
is collected, managed and used today can be easily cross referenced
or interchanged, often time and labour is required which does not
directly contribute to the customers project goals. Ordnance Survey's
direction is driven by solving customers needs such as this.
To meet this challenge
Ordnance Survey has embarked on several parallel developments to
ensure that customers can start to concentrate on gaining greater
direct benefits from GI. This will be achieved by making major investments
in the data and service delivery infrastructure the organisation
provides. Key initiatives already underway aim to establish new
levels of customer care, supported by establishing a new customer
friendly on-line service delivery channels. The evolving information
infrastructure has been designed to meet national needs but is well
placed to support wider initiatives such as the emerging European
Spatial Data Infrastructure (ESDI) or INSPIRE as it is now called.
Since 1999 Ordnance
Survey has been independently financed through revenues from the
sale of goods. It is this freedom which is allowing the organisation
to further invest surplus revenues into the development of the new
infrastructure. Ordnance Survey's role is not to engage in the applications
market, but to concentrate on providing a high quality spatial data
infrastructure. We believe that the adoption of this common georeferencing
framework will support, government, business and the citizen in
making the key decisions in the future, based on joined up geographic
information and thereby sound knowledge.
4.
Borders in Cyberspace: Conflicting Public Sector Information Policies
and their Economic Impacts
Peter Weiss
Strategic Planning
and Policy Office, National Weather Service, National Oceanographic
and Atmospheric Administration (NOAA), USA
Many
nations are embracing the concept of open and unrestricted access
to public sector information -- particularly scientific, environmental,
and statistical information of great public benefit. Federal information
policy in the US is based on the premise that government information
is a valuable national resource and that the economic benefits to
society are maximized when taxpayer funded information is made available
inexpensively and as widely as possible. This policy is expressed
in the Paperwork Reduction Act of 1995 and in Office of Management
and Budget Circular No. A-130, “Management of Federal Information
Resources.” This policy actively encourages the development of a
robust private sector, offering to provide publishers with the raw
content from which new information services may be created, at no
more than the cost of dissemination and without copyright or other
restrictions. In other countries, particularly in Europe, publicly
funded government agencies treat their information holdings as a
commodity to be used to generate revenue in the short-term. They
assert monopoly control on certain categories of information in
an attempt – usually unsuccessful -- to recover the costs of its
collection or creation. Such arrangements tend to preclude other
entities from developing markets for the information or otherwise
disseminating the information in the public interest. The US government
and the world scientific and environmental research communities
are particularly concerned that such practices have decreased the
availability of critical data and information. And firms in emerging
information dependent industries seeking to utilize public sector
information find their business plans frustrated by restrictive
government data policies and other anticompetitive practices.
5.
Emerging Tools and Techniques for Data Handling
1.
From GeoSpatial to BioSpatial: Managing Three-dimensional Structure
Data in the Sciences
Xavier R. Lopez, Oracle Corporation
Standard relational database management technology is emerging
as a critical technology for managing the large volumes of 2D and
3D vector data being collected in the geographic and life sciences.
For example, database technology is playing an important role in
managing the terabytes of vector information used in environmental
modeling, emergency management, and wireless location-based services.
In addition, three-dimensional structure information is integral
to a new generation of drug discovery platforms. Three dimensional
structure-based drug design helps researchers generate high-quality
molecules that have better pharmacological properties. This type
of rational drug design is critically dependent on the comprehensive
and efficient representation of both large (macro) molecules and
small molecules. The macromolecules of interest are the large protein
molecules of enzymes, receptors, signal transducers, hormones, and
antibodies. With the recent availability of detailed structural
information about many of these macromolecule targets, drug discovery
is increasingly focused toward detailed structure-based analysis
of the interaction of the active regions of these large molecules
with candidate small-molecule drug compounds that might inhibit,
enhance, or otherwise therapeutically alter the activity of the
protein target. This paper will explain the means to manage the
three dimensional types from the geosciences and biosciences in
object-relational database technology in order to benefit from the
performance, scalability, security, and reliability of commercial
software and hardware platforms. This paper will highlight recent
developments in database software technologies to address the 3D
requirements of the life science community.
2.
Benefits and Limitations of Mega-Analysis Illustrated using the
WAIS
John
J. McArdle, Department of Psychology, University of Virginia,
USA
David Johnson, Building Engineering and Science Talent,
San Diego, CA, USA
The
statistical features of the techniques of meta-analysis, based on
the summary statistics from many different studies, have been highly
developed and are widely used (Cook et al, 1994). However, there
are some key limitations to meta-analysis, especially the necessity
for equivalence of measurements and inferences about individuals
from groups. These problems led us to use an approach we have termed
“mega-analysis” (McArdle & Horn, 1980-1999). In this approach
all raw data from separate studies are used as a collective. The
techniques of mega-analysis rely on a variety of methods initially
developed for statistical problems of “missing data,” “selection
bias,” “factorial invariance,” “test bias,” and “multilevel analyses.”
In the mega-analysis of multiple sets of raw data (a) the degree
to which data from different collections can be combined is raised
as a multivariate statistical question, (b) unique estimation of
parameters with more breadth, precision, and reliability than can
be achieved by any single study, and (c) meta-analysis results emerge
as a byproduct, so the assumptions may be checked and demonstrate
why a simpler meta-analysis is adequate. Mega-analysis techniques
are illustrated here using a collection of data from the popular
”Wechsler Adult Intelligence Scale”(WAIS), including data from thousands
of people in over 100 research studies.
3.
Publication, Retrieval and Exchange of Data: an Emerging Web-based
Global
Solution
Henry Kehiaian
ITODYS, University of Paris 7, Paris, France
In the era of enhanced
electronic communication and world-wide development of information
systems, electronic publishing and the Internet offer powerful tools
for the dissemination of all type of scientific information. This
is now made available in electronic form primary, secondary, as
well as tertiary sources. However, because of the multitude of existing
physico-chemical properties and variety of modes of their presentation,
the computer-assisted retrieval of the numerical values, their analysis
and integration in databases is as difficult as before. Accordingly,
the need to have standard data formats is more important than ever.
CODATA has joined forces with IUPAC and ICSTI to develop such formats.
Three years after its establishment the IUPAC-CODATA Task Group
on Standard Physico-Chemical Data Formats (IUCOSPED) has made significant
progress in developing the presentation of numerical property data,
as well as the relevant metadata, in standardized electronic format
(SELF).
The retrieval of SELFs is possible via a web-based specialized Central
Data Information Source, called DataExplorer, conceived as a portal
to data sources.
An Oracle database has been designed and developed for DataExplorer
at FIZ Karlsruhe, Germany. URL http://www.fiz-karlsruhe.de/dataexplorer/
ID: everyone; Password: sesame. DataExplorer is now fully operational
and demonstrates the concept using 4155 Chemical components, 998
Original Data Sources, 41 Property Types, and 3805 Standard Electronic
Data Files (SELF). Inclusion of additional data will be actively
pursued in the future.
A link has been established from DataExplorer to one of the associated
Publishers, the Data Center of the Institute of Chemical Technology,
Praha, Czech Republic.
Retooling SELF in SELF-ML, an XML version of the current SELF formats,
is under way.
Besides an on-line demonstration of DataExplorer from FIZ-Karlsruhe
and Praha, the procedure will be illustrated by computer demonstration
of two publications: (1) Vapor-Liquid Equilibrium Bibliographic
Database ; (2) ELDATA, the International Electronic Journal of Physico-Chemical
Data.
This Project was awarded
$ 100,000 under the ICSU (International Council for Science) Grants
Program 2000 for new innovative projects of high profile potential
Acknowledgments
We express our sincere thanks for the financial assistance of UNESCO
and ICSU and its associated organizations, IUPAC, CODATA and ICSTI,
for helpful discussions to IDF, IUCr, and CAS representatives, and
for the contributions of all IUCOSPED Members and Observers, to
FIZ Karlsuhe administration and its highly competent programmers,
and to all the associated Publishers.
4.
Creating Knowledge from Computed Data for the Design of Materials
Erich Wimmer
Materials Design
s.a.r.l., France and USA
The
dramatic progress in computational chemistry and materials science
has made it possible to carry out ‘high-throughput computations’
resulting in a wealth of reliable computed data including crystallographic
structures, thermodynamic and thermomechanical properties, adsorption
energies of molecules on surfaces, and electronic, optical and magnetic
properties. An exciting perspective comes from the application of
combinatorial methodologies, which allow the generation of large
sets of new compounds. High-throughput computations can be employed
to obtain a range of materials properties, which can be stored together
with subsequent (or parallel) experimental data. Furthermore, one
can include defects such as vacancies or grain boundaries in the
combinatorial space, and one can apply external pressure or stress
up to extreme conditions. Convenient graphical user interfaces facilitate
the construction of these systems and efficient computational methods,
implemented on networked parallel computers of continuously growing
computational power allow the generation of an unprecedented stream
of data. This lecture will discuss experience with
a
technology platform,
MedeA
(Materials Exploration and Design Analysis), which has been developed
by Materials Design with the capabilities described above in mind.
using heterogeneous catalysis as an example, I will illustrate how
chemical concepts can be combined with high-throughput computations
to transform the computed data into information and knowledge and
enable the design of novel materials.
6.
Ethics in the Creation and use of Scientific and Techincal Data
1.
Ethics and Values Relating to Scientific & Technical Data:
Lessons from Chaos Theory
Joan E. Sieber, NSF
Current literature reveals manifold conflicting, shifting and cross-cutting
values to be reconciled if we are to pursue intelligent, data-management
policies. Projects currently underway to deal with these complexities
and uncertainties suggest the inevitability of a paradigm shift.
Consider, e.g., questions of what data to archive, how extensively
to document it, how to maintain its accessibility despite changing
software and hardware, who should have access, how to allocate the
costs of sharing, and so on. Traditional normative ethical theories
(e.g., utilitarianism) can suggest guiding principles, and in today's
global culture, recent ethical (e.g., Rawlsian) notions such as
consideration of the interests of unborn generations and of persons
situated very differently from oneself suddenly have immediate practical
implications. However, such traditional approaches to ethical problem
solving offer little guidance for dealing with problems that are
highly contextual, complex, ill-defined, dynamic and fraught with
uncertainty. Narrowly defined safety issues give way to notions
of the ecology of life on Earth. Minor changes can have major consequences.
The stakeholders are not only scientists and engineers from one's
own culture, but persons, professions, businesses and governments
worldwide, as they exist today and into the future. Issues of scientific
freedom and openness are in conflict with issues of intellectual
property, national security, and reciprocity between organizations
and nations. Ethical norms, codes, principles, theories, regulations
and laws vary across cultures, and often have unintended consequences
that hinder ethical problem solving. Increasingly, effective ethical
problem solving depends on integration with scientific and technological
theory and "know how" and empirical research on the presenting
ethical problem. For example, we look increasingly to psychological
theories and legal concepts for clearer notions of privacy, and
to social experiments, engineering solutions and methodological
innovation for ways to assure confidentiality of data. We often
find that one solution does not fit all related problems.
Chaos theory has taught
us principles of understanding and coping with complexity and uncertainty
that are applicable to ethical problem solving of data-related issues.
Implications of chaos theory are explored in this presentation,
both as new tools of ethical problem solving and as concepts and
principles to include in the applied ethics education of students
in science and engineering.
2.
Understanding and improving comparative data on science and technology
Denise Lievesley, UNESCO Institute for Statistics
Statistics can serve
to benefit society, but, when manipulated politically or otherwise,
may be used as instruments by the powerful to maintain the status
quo or even for the purposes of oppression. Statisticians working
internationally face a range of ethical problems as they try to
'make a difference' to the lives of the poorest people in the world.
One of the most difficult is the dilemma between open accountability
and national sovereignty (in relation to what data are collected,
the methods used and who is to have access to the results).
This paper will discuss
the role of the UNESCO Institute for Statistics (UIS), to explain
some of the constraints under which we work, and to address principles
which govern our activities. The UIS is involved in
- The collection and
dissemination of cross-nationally comparable data and indicators,
guardianship of these databases and support of, and consultation
with, users
- The analysis and
interpretation of cross-national data
- Special methodological
and technical projects including the development of statistical
concepts
- The development and
maintenance of international classifications, and standardised
procedures to promote comparability of data
- Technical capacity
building and other support for users and producers of data within
countries
- Establishing and
sharing good practice in statistics, supporting activities which
improve the quality of data and preventing the re-invention of
the wheel
- Advocacy for evidence-based
policies
Of these activities
one of the key ones is to foster the collection of comparable data
across nations, the main objectives being to enable countries to
gain a greater understanding of their own situation by comparing
themselves with others, thus learning from one another and sharing
good practice; to permit the aggregation of data across countries
to provide a global picture; and to provide information for purposes
of the accountability of nations and for the assessment, development
and monitoring of supra-national policies.
Denise Lievesley will
discuss the consultation being carried out by the UIS to ensure
that the data being collected on a cross-national basis are of relevance
to national policies on science and technology. The consultation
process was launched with an expert meeting where changes in science
policy were debated and ways in which the UIS might monitor and
measure scientific and technological activities and progress across
the world were identified. A background paper was produced based
on the experiences and inputs of experts from different regions
and organizations, which addresses key policy issues in science
and technology. The UIS will use this document as a basic reference
for direct consultation with UNESCO Member States and relevant institutions.
A long term strategy for the collection of science and technology
data will be developed as a result of these consultations.
It is vital to build
on the experience of developed countries through the important statistical
activities of OECD and Eurostat but nevertheless to ensure that
the collection of cross-nationally harmonised data does not distort
the priorities of poorer countries. We are seeking a harmony of
interests in data collection and use and the views of the participants
will be sought as to how this might be achieved.
3.
Ethics - An Engineers' View
Horst Kremers, Comp. Sci., Berlin, Germany
The engineering profession
has long experience in developing principles for appropriate relations
with clients, publishing Codes of Ethics, and developing and adhering
to laws controlling the conduct of professional practice. A high
demand exists in society for reliable engineering in planning, design,
construction and maintenance. One of the primary objectives of an
engineer's actions is to provide control over a situation by providing
independent advice in conformance with moral principles in addition
to sound engineering principles. In a world where life to an increasing
extent depends on the reliable functioning of complex information
systems and where new technical techniques emerge without chance
for controlled experimentation and assessment, the need to inject
ethical principles into scientific and technological decisionmaking
and to fully consider the consequences of professional actions is
mandatory. This presentation reviews several Code of Ethics development
efforts and reflects on the Codes relative to action principles
in science and technology. A potential role for CODATA is presented.
4.
Ethics in Scientific and Technical Communication
Hemanthi Ranasinghe, University of Sri Jayewardenepura, Sri Lanka
Research can be described as operationally successful when the research
objectives are achieved and technically successful when the researcher's
understanding is enhanced, more comprehensive hypothesis are developed
and lessons learned from the experience. However, research is not
successful scientifically until the issues, processes and findings
are made known to the scientific community. Science is not an individual
experience. It is shared knowledge based on a common understanding
of some aspect of the physical or social world. For that reason,
the social conventions of science play an important role in establishing
the reliability of scientific knowledge. If these conventions are
disrupted, the quality of science can suffer. Thus, the reporting
of scientific research has to be right on ethical grounds too.
General category of ethics in communication covers many things.
One is Error and Negligence in Science. Some researchers may feel
that the pressures on them are an inducement to haste at the expense
of care. For example, they may believe that they have to do substandard
work to compile a long list of publications and that this practice
is acceptable. Or they may be tempted to publish virtually the same
research results in two different places or publish their results
in "least publishable units"papers that are just
detailed enough to be published but do not give the full story of
the research project described.
Sacrificing quality to such pressures can easily backfire. A lengthy
list of publications cannot outweigh a reputation for shoddy research.
Scientists with a reputation for publishing a work of dubious quality
will generally find that all of their publications are viewed with
skepticism by their colleagues. Another vital aspect of unethical
behavior in scientific communication is Misconduct in Science. This
entails making up data or results (fabrication), changing or misreporting
data or results (falsification), and using the ideas or words of
another person without giving appropriate credit (plagiarism)-all
strike at the heart of the values on which science is based. These
acts of scientific misconduct not only undermine progress but the
entire set of values on which the scientific enterprise rests. Anyone
who engages in any of these practices is putting his or her scientific
career at risk. Even infractions that may seem minor at the time
can end up being severely punished. Frank and open discussion of
the division of credit within research groupsas early in the
research process as possible and preferably at the very beginning,
especially for research leading to a published papercan prevent
later difficulties.
While misallocation of credit or errors arising from negligence-are
matters that generally remain internal to the scientific community.
Usually they are dealt with locally through the mechanisms of peer
review, administrative action, and the system of appointments and
evaluations in the research environment. But misconduct in science
is unlikely to remain internal to the scientific community. Its
consequences are too extreme: it can harm individuals outside of
science (as when falsified results become the basis of a medical
treatment), it squanders public funds, and it attracts the attention
of those who would seek to criticize science. As a result, federal
agencies, Congress, the media, and the courts can all get involved.
All parts of the research system have a responsibility to recognize
and respond to these pressures. Institutions must review their own
policies, foster awareness of research ethics, and ensure that researchers
are aware of the policies that are in place. And researchers should
constantly be aware of the extent to which ethically based decisions
will influence their success as scientists.
7.
CODATA 2015
1.
Scholarly Information Architecture
Paul Ginsparg
Cornell University, USA
If we were to start
from scratch today to design a quality-controlled archive and distribution
system for scientific and technical information, it could take a
very different form from what has evolved in the past decade from
pre-existing print infrastructure. Ultimately, we might expect some
form of global knowledge network for research communications. Over
the next decade, there are many technical and non-technical issues
to address along the way, everything from identifying optimal formats
and protocols for rendering, indexing, linking, querying, accessing,
mining, and transmitting the information, to identifying sociological,
legal, financial, and political obstacles to realization of ideal
systems. What near-term advances can we expect in automated classification
systems, authoring tools, and next-generation document formats to
facilitate efficient datamining and long-term archival stability?
How will the information be authenticated and quality controlled?
What differences should be expected in the realization of these
systems for different scientific research fields? What is the proper
role of governments and their funding agencies in this enterprise,
and what might be the role of suitably configured professional societies?
These and related questions will be considered in light of recent
trends.
2.
The role of scientific data in a complex world
Werner Martienssen
Physikalisches
Institut der Universitaet, Frankfurt am Main, Germany
Physicists
try to understand and to describe the world in terms of natural
laws. These laws cover two quite different approaches in physics.
First, the laws show up a mathematical structure, which in general
is understood in terms of first principles, of geometrical relations
and of symmetry arguments. Second, the laws contain data which are
characteristic for the specific properties of the phenomena and
objects. Insight into the mathematical structure aims at an understanding
of the world in ever more universally applicable terms. Insight
into the data shows up the magnificent diversity of the world's
materials and ist behavior Whereas the description of the world
in terms of a unified theory one day might be reduced to only one
set of equations, the amount of data necessary to describe the phenomena
of the world in their full complexity seems to be open-ended.
A unified theory has not been formulated up to now; nor can we say
that our knowledge about the data would be perfect in any sense,
Much has to be done, still. But being asked for, where do we expect
to be in data physics and chemistry in ten to fifteen years, my
answer is: We -hopefully - will be able to merge the two approaches
of physics. On the basis of our understanding of Materials Science
and by using the methods of computational physics we will make use
both of the natural laws as well as of the complete set of known
data in order to modulate, to study and to generate new materials,
new properties end new phenomena.
3.
Life Sciences Research in 2015
David Y. Thomas, Biochemistry
Department, McGill University, Montreal, Canada
Much of the spectacular
progress of life sciences research in the past 30 years has come
from the application of molecular biology employing a reductionist
approach with single genes, often studied in simple organisms. Now
from the technologies of genomics and proteomics, scientists are
deluged with increasing amounts, varieties and quality of data.
The challenge is how life sciences researchers will use the data
output of discovery science to formulate questions and experiments
for their research and turn this into knowledge. What are the important
questions? We now have the capability to answer at a profound level
major biological problems of how genes function, how development
of organisms is controlled, and how populations interact at the
cellular, organismal and population levels. What data and what tools
are needed? What skills and training will be needed for the next
generation of life sciences researchers? I will discuss some of
the initiatives that are planned or now underway to address these
problems.
|