18th International Conference
|
Abstracts: Keynote and Invited Cross-Cutting Themes |
||
Program |
Abstracts |
Science
Specialty Session Abstracts Physical Science Data Biological Science Data Earth and Environmental Data Medical and Health Data Behavioral and Social Science Data Informatics and Technology Data Science Data Policy Technical Demonstrations Large Data Projects Roundtables Poster Sessions |
|
7. CODATA 2015
1.
The Challenge of Archiving and Preserving Remotely Sensed Data
John L. Faundeen
US Geological Survey, EROS Data Center, Sioux Falls, SD 57198-0001
Few would question the need to archive the scientific and technical (S&T) data generated by researchers. At a minimum, the data are needed for change analysis. Likewise, most people would value efforts to ensure the preservation of the archived S&T data. Future generations will use analysis techniques not even considered today. Until recently, archiving and preserving these data were usually accomplished within existing infrastructures and budgets. As the volume of archived data increases, however, organizations charged with archiving S&T data will be increasingly challenged. The US Geological Survey has had experience in this area and has developed strategies to deal with the mountain of land remote sensing data currently being managed and the tidal wave of expected new data. The Agency has dealt with archiving issues, such as selection criteria, purging, advisory panels, and data access, and has met with preservation challenges involving photographic and digital media.
2.
The Virtual Observatory: The Future of Data and Information Management
in Astrophysics
David Schade
Canadian Astronomy
Data Centre, Herzberg Institute of Astrophysics, National Research
Council, Canada
The concept of a “Virtual Observatory”, which would put the power of numerous ground-based and space-based observatories at the fingertips of astrophysical scientists, was once a pipe dream but is now represented by funded projects in Canada, the United States, the United Kingdom, and Europe. Astronomical data has been primarily digital for 15 years and the change from analogue (e.g. photographic plates) to digital form triggered an appreciation for the scientific value of data “archiving” and the development of astronomy data centres around the world. These facilities do much more than passively “archive” their content. They have scientific and technical staff that develop the means to add value to datasets by additional processing, they integrate datasets from different wavelength regimes with one another, they distribute those data via the web, and they actively promote the use of archival data. The next step is to federate the diverse and complimentary collections residing in data centres around the world and develop seamless means for users to simultaneously access and query multi-wavelength databases and pixels and to provide the computational resources for cross-correlation and other processing. In analogy to “the greatest encyclopedia that has ever existed” that has effectively come into being because of the internet, the Virtual Observatory will be an historic leap forward in the ability of scientists, and all human beings, to understand the universe we are part of.
3.
Towards a New Knowledge of Global Climate Changes: Meteorological
Data Archiving and Processing Aspects
Alexander M. Sterin
All-Russian Research Institute of Hydrometeorological Information
(RIHMI-WDC), Russia
This presentation will focus on a wide range of aspects related to meteorological data utilization for getting new empirical information on climate variations. The problems of meteorological data collection, their quality assurance and control, and their archiving will be discussed.
The first and the main
focus will be on the problem of environmental data archiving and
preservation. The collection of Russian Research Institute for Hydrometeorological
Information - World Data Center (RIHMI-WDC) is currently located
on 9-track magnetic tapes. The total amount of these tapes is about
60 thousand volumes. The current archiving media are obsolete, so
urgent efforts on moving the collection onto modern media are beginning.
The second focus will be on the multi-level approach in constructing
the informational products based on primary meteorological observational
data. This approach presumes that on the lowest level (zero level)
there are raw observational data. On the next level (level number
one) there are the observational data that have passed the quality
check procedures. Normally, in the level one the erroneous and suspicious
data are flagged. The higher levels contain the derivative, data
products. It appears that most customers prefer special derivative
data products that are based on the primary data and that have much
easier to use formats and modest volumes, rather than the primary
observational data that have more complicated formats and huge volumes.
The multi-level structure of the derivatives for climate studies
includes the derivatives based on observational data directly (characteristics
which require the calculations based on the observational data directly),
derivatives of the higher level that are based on the further generalization
of products - derivatives of the lower level, and so on. Examples
of such a multi-level structure of data products will be given.
The third focus will be on the cycles of data processing that are required for large, data-based climate-related projects. As a result of previous experience, it is important to preserve and to reutilize the observational data collections and to provide again the main calculations. The preservation of primary observational data is very important, because it may be necessary to recalculate the products of higher levels "from the very beginning." It appears that normally these cycles may need to be repeated once (or even more than once) per decade.
The last focus will be on the software instrumentation to obtain new information and new knowledge in climate changes. The technological aspects in processing huge volumes of data in various formats will be described.
4.
Strategies for Selection and Appraisal of Scientific Data
for Preservation
Seamus Ross, University of Glasgow
and Principal Director ERPANET, UK
With many governments and commercial organisations creating kilometres of analogue documents every year archivists have long been confronted with the challenge of handling substantial quantities of records. Recognising the impossibility of retaining this material and documenting it in ways that would enable future users to discover and use it archivists developed the concepts of appraisal. Typically archives retain only between 5% and 10% of the records created by an organisation. Indeed, in ensuring that sufficient material is retained to provide an adequate record of our cultural, scientific, and commercial heritage effective retention and disposal strategies have proven essential. As we make the transition from a paper-based world to a digital one archivists continue to recognise the power of appraisal as they attempt to manage the increasing amounts of material created digitially. The concepts that underlie appraisal are poorly understood outside the narrow confines of the archival world, but a wider appreciation of them might bring benefits to other data creating and using communities.
Appraisal is both a technical process and an intellectual activity that requires knowledge, research, and imagination on the part of the appraiser. Appraisal, characterised at its simplest level, involves establishing the value of continuing to retain and document data or records; what administrative, evidential, informational, legal, or re-usable value does a record, document, or data set. The problem is of course compounded in the digital environment by the technical aspects of the material itself. Does technology change the processes, timeframe or relevance of appraisal? Or to paraphrase from the InterPARES Appraisal TaskForce Report (January 2001) what impact does it have on ensuring that material of 'lasting value is preserved in authentic form'.
After charting the mechanisms and processes for appraisal the paper examines how the digital environment has focused attention on establishing during the appraisal process whether or not it is feasible to maintain the authenticity and integrity of digital objects over time and what impact this has on the process and point in the life of a digital objects that it must be appraised. The paper concludes by building on this work to examine the impact of the formal process of appraisal in the archiving of scientific data sets, who should be involved and responsible for the process, what appraisal criteria might be appropriate, and at what stage in the life cycle of a digital object appraisal should be cared out.
Steven Tepp
US Copyright Office, Library of Congress, USA
The United States has a long history of providing legal protection against the unauthorized use of compilations of scientific and technical data. That protection, once broad and vigorous, is now diffuse and uncertain. In light of modern Supreme Court precedent, the U.S. Congress has struggled for several years to find the appropriate balance between providing an incentive for the creation of useful compilations of data through legal protections which allow the compiler to reap commercial benefit from his work and promoting the progress of science and useful arts by allowing researchers and scientists to have unfettered access to and use of such databases. My presentation will outline the history and current state of the legal protection afforded to databases in the United States and will then discuss the different legislative models of legal protection that have been the subject of considerable debate in the U.S. Congress in recent years.
2.
Legal (dis)incentives for creating, disseminating, utilizing and
sharing data for scientific and technical purposes
Kenji Naemura
Keio University, Shonan-Fujisawa Campus, Japan
While Japanese policy makers differ on practical strategies for recovery and growth after a decade of struggling economy, they all agree on a view that, for restructuring the industry in a competitive environment, more vital roles should be played by advanced S&T, as well as by improved organizational and legal schemes. It is with this view that national research institutions have undergone structural reforms, and that national universities are to follow them in a near future.
Many of the enhanced legal schemes - e.g., patents to be granted to inventions in novel areas, copyrights of digital works, and other forms of IPRs - are supposed to give incentives for S&T researchers to commercialize their results. However, some schemes - e.g., private data and security protections - may become disincentives for them to disseminate, utilize and share the results.
Thus the sui generis protection of databases introduced by the EU Directive of 1996 has raised a serious concern in the scientific community. The Science Council of Japan conducted a careful study in its subcommittee on the possible merits and demerits of introducing a similar legal protection framework in this country. Its result was published as a declaration of its 136th Assembly on October 17, 2001. It emphasized "the principle of free exchange of views and data for scientific research and education" and, expressing its opposition against a new type of legal right in addition to the copyright, stated that caution should be exercised in dealing with the international trend toward such legislation.
There are various factors that need to be considered in evaluating the advantages and disadvantages of legal protection of S&T data. They are related to the nature of research area, the data, the originating organization, the research fund, the user and his/her purpose of use, etc. Geographical, linguistic, cultural and economical conditions should also be considered when studying the consequences. After all, any incentives for advancing S&T may not be easily translated into economic figures, but other types of contributions to the humane society must be more highly valued.
3.
Scientific and Technical Data Policy and Management in China
Sun Honglie
Chinese Academy of Sciences, Beijing, China
The 21st century is known as an information era, in which scientific and technical data, as an important information source, will have significant effects on the social and economic development of the world. Scientific and technical data contain academic, economic, social and other values. However, the basic ways of deriving the greatest value from scientific data are not just in their creation and storage, but in their dissemination and wide application. In this regard, issues of scientific and technical data policies and management have been considered as a strategic measure in the national information system and in the scientific and technical innovation programs in China. So far, scientific and technical data policy and management in China has made progress, in which:
a) A preliminary working
pattern of scientific and technical data management has been shaped-the
main lead being taken by government-professional sections, with
scientific institutes and universities serving a subsidiary role;
b) Digitization and networking are becoming more and more universal;
and
c) Professional data management organizations are being formed and
expanded.
At present, the scientific and technical data policy and management in China are mainly focused on: establishing and implementing the rules for "management and sharing of national scientific and technical data"; initiating a special project for the construction of a national scientific and technical data sharing system; and developing measures for the management of this data sharing system.
4.
A Contractually Reconstructed Research Commons for Scientific Data
in a Highly Protectionist Intellectual Property Environment
J.H. Reichman, Duke University School of Law, Durham, NC, USA
and
Paul F. Uhlir, The National Academies,
Washington, DC, USA
There are a number of well-documented economic, legal, and technological efforts to privatize government-generated and commercialize government-funded scientific data in the United States that were heretofore freely available from the public domain or on an "open access" basis. If these pressures continue unabated, they will likely lead to a disruption of long-established scientific research practices and to the loss of new opportunities that digital networks and related technologies make possible. These pressures could elicit one of two types of responses. One is essentially reactive, in which the public scientific community adjusts as best it can without organizing a response to the increasing encroachment of a commercial ethos upon its upstream data resources. The other would require science policy to address the challenge by formulating a strategy that would enable the scientific community to take charge of its basic data supply and to manage the resulting research commons in ways that would preserve its public good functions without impeding socially beneficial commercial opportunities. Under the latter option, the objective would be to reinforce and recreate, by voluntary means, a public space in which the traditional sharing ethos of science can be preserved and insulated from the commodifying trends. This presentation will review some approaches that the U.S. scientific community might consider in addressing this challenge, and that could have broader applicability to scientific communities outside the United States.
1.
Interoperability in Geospatial Web Services
Jeff de La Beaujardiere
NASA Goddard Space
Flight Center, USA
This
talk will outline recent work on open standards for implementing
interoperable geospatial web services. Beginning in 1999,
a series of Testbeds--operated by the OpenGIS Consortium (OGC),
sponsored in part by US federal agencies, and involving the technical
participation of industry, government and academia--has developed
specifications and working implementations of geographic services
to be deployed over HTTP. Pilot Projects and Technology Insertion
Projects have tested and deployed these standards in real-world
applications.
These information-access services can provide an additional layer
of interoperability above the data search capabilities provided
by National Spatial Data Infrastucture (NSDI) Clearinghouse nodes.
The Web Map Service (WMS; published 2000) provides graphical renderings
of geodata. The Web Feature Service (WFS; 2002) provides point,
line and vector feature data encoded in the XML-based Geography
Markup Language (GML; 2001). The Web Coverage Service (WCS;
in preparation) provides gridded or ungridded coverage data.
Additional specifications for catalog, gazetteer, and fusion services
are also in progress. This talk will provide an overview of
these efforts and indicate current areas of application.
Santiago Borrero
Global Spatial Data Infrastructure (GSDI), Instituto Geografico
Agustin Codazzi, Colombia
The availabity of spatial data infrastructure (SDI) capabilities at all levels, backed by international standards, guidelines and policies on access to data is needed to support human sustainable development and to derive scientific, economic and social benefits from spatial information.
In this context, this paper focuses on the need for and the current situation regarding spatial data infrastructures, in particular, from the Developing World perspective. To this end, the author (i) presents GSDI and PC IDEA aims, scope and expected contribution; and (ii) then, based on these initiatives and business plans, presents observations on the possibilities for improved data availability and interoperability. More than 50 nations are in the process of developing SDI capabilities and an increasing number of geodata related initiatives at all levels. Finally, the author evaluates the need for better cooperation and coordination among spatial data initiatives and, where feasible and convenient, integration to facilitate data access, sharing and applicability.
3.
Interoperability of Biological Data Resources
Hideaki Sugawara, National Institute of Genetics, Japan
Biological data resources are composed of databases and data mining
tools. The International Nucleotide Sequence Database (DDBJ /EMBL
/GenBank ) and homology search programs are typical resources that
are indispensable to life sciences and biotechnology. In addition
to these fundamental resources, a number of resources are available
on the Internet, e.g. those listed in the annual as we are able
to observe in the yearly database issue of the journal, Nucleic
Acid Research.
Biological data objects widely span: from molecule to phenotype; from viruses to mammoth; from the bottom of the sea to outer space.
Users' profile are also wide and diverse, e.g. to find anticancer drugs from any organisms in anywhere based on crosscutting heterogeneous data resources distributed in various categories and disciplines. Users often find a novel way of utilization that the developer did not imagine. Biological data resources have been often developed ad hoc without any international guidance for the standardization resulting in heterogeneous systems. Therefore, the crosscutting is a hard task for bioinformatician. It is not practical to reform large legacy systems in accordance with a standard, even if a standard is created.
Interoperability may be a solution to provide an integrated view of heterogeneous data sources distributed in many disciplines and also in distant places. We studied Common Object Request Broker Architecture (CORBA) to find that it is quite useful to make data sources interoperable in a local area network. Nevertheless, it is not straightforward to use CORBA to integrate data resources over fire-walls. CORBA is not fire-wall friendly.
Recently, XML (eXtentible
Markup Language) becomes widely tested and used by so-called e-Business.
XML is also extensively extended to biology. However, it is not
sufficient for the interoperability of biological data resources
to define Document Type Definition (DTD) or XML schema. It is because
multiple groups define different XML documents for the common biological
object. These heterogeneous XML documents will be made interoperable
by use of SOAP (Simple Object Access Protocol), WSDL (Web Service
Definition Language) and UDDI (Universal Description, Discovery
and Integration). The author will introduce implementation and evaluation
of these technologies in WDCM (http://wdcm.nig.ac.jp), Genome Information
Broker (http://gib.genes.nig.ac.jp/) and DDBJ (http://xml.nig.ac.jp).
DDBJ: DNA Data Bank of Japan
EMBL: European Molecular Biology Laboratory
GenBank: National Center for Biotechnology Information
4.
The Open Archives Initiative: A low-barrier framework for interoperability
Carl
Lagoze
Department of Computer Science, Cornell University, USA
The Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) is the result of work in the ePrints, digital library, and museum community to develop a practical and low-barrier foundation for data interoperability. The OAI-PMH provides a method for data repositories to expose metadata in various forms about their content. Harvesters may then access this metadata to build value-added services. This talk will review the history and technology behind the OAI-PMH and describe applications that build on it.
1.
Legal Protection of Databases and Science in the "European
Research Area":
Economic Policy and IPR Practice in the Wake of the 1996 EC Directive
Paul
A. David
Stanford University and All Souls College, Oxford
At the Lisbon Meeting
of the European Council in March 2000, the member states agreed
that the creation of a "European Research Area"should
be a high priority goal of EU and national government policies in
the coming decade. Among the policy commitments taking shape are
those directed toward substantially raising the level of business
R&D expenditures, not only by means of subsidies and fiscal
tools (e.g., tax incentive), but also through intellectual property
protections aimed at "improving the environment" for business
investment in R&D. The Economic Commission of the EU currently
is preparing recommendations for the implementation of IP protections
in future Framework Programmes and related mechanisms that fund
R&D projects, including policies affecting the use of legal
protections afforded to database owners under the national implementations
of the EC's Directive of March 11, 1996. This paper reviews the
economics issues of IPR in databases, the judicial experience and
policy pressures developing in Europe in the years following the
implementations of the EC's directive. It attempts to see the likely
implications these will carry for scientific research in the ERA.
2.
International Protection of Non-Original Databases
Helga Tabuchi
Copyright Law
Division, WIPO, Geneva, Switzerland
At the request of its member States, the International Bureau of the World Intellectual Property Organization (WIPO) commissioned external consultants to prepare economic studies on the impact of the protection of non-original databases. The studies were requested to be broad, covering not only economic issues in a narrow sense, but also social, educational and access to information issues. The consultants were furthermore expected to focus in particular on the impacts in developing, least developed and transition economies.
Five of the studies were completed in early 2002 and were submitted to the Standing Committee on Copyright and Related Rights at its seventh session in May 2002. The positions of the consultants differ significantly. The studies are available on WIPO's website at <http://www.wipo.int/eng/meetings/2002/sccr/index_7.htm>.
Most recently another consultant has been commissioned to prepare an additional study that focuses on Latin American and the Caribbean region. The study will be submitted to the Committee in due course.
3.
The Digital National Framework: Underpinning the Knowledge Economy
Keith Murray
Geographic Information Strategy, Ordinance Survey, UK
Decision making requires knowledge, knowledge requires reliable information and reliable information requires data from several sources to be integrated with assurance. An underlying factor in many of these decisions is geography within an integrated geographic information infrastructure.
In Great Britain, the use of geographic information is already widespread across many customer sectors (eg central government, local authorities, land & property professionals, utilities etc) and supports many hundreds of private sector applications. An independent study in 1999 showed that £100 billion of the GB GDP per annum is underpinned by Ordnance Survey information. However little of the information that is collected, managed and used today can be easily cross referenced or interchanged, often time and labour is required which does not directly contribute to the customers project goals. Ordnance Survey's direction is driven by solving customers needs such as this.
To meet this challenge Ordnance Survey has embarked on several parallel developments to ensure that customers can start to concentrate on gaining greater direct benefits from GI. This will be achieved by making major investments in the data and service delivery infrastructure the organisation provides. Key initiatives already underway aim to establish new levels of customer care, supported by establishing a new customer friendly on-line service delivery channels. The evolving information infrastructure has been designed to meet national needs but is well placed to support wider initiatives such as the emerging European Spatial Data Infrastructure (ESDI) or INSPIRE as it is now called.
Since 1999 Ordnance Survey has been independently financed through revenues from the sale of goods. It is this freedom which is allowing the organisation to further invest surplus revenues into the development of the new infrastructure. Ordnance Survey's role is not to engage in the applications market, but to concentrate on providing a high quality spatial data infrastructure. We believe that the adoption of this common georeferencing framework will support, government, business and the citizen in making the key decisions in the future, based on joined up geographic information and thereby sound knowledge.
4.
Borders in Cyberspace: Conflicting Public Sector Information Policies
and their Economic Impacts
Peter Weiss
Strategic Planning
and Policy Office, National Weather Service, National Oceanographic
and Atmospheric Administration (NOAA), USA
Many nations are embracing the concept of open and unrestricted access to public sector information -- particularly scientific, environmental, and statistical information of great public benefit. Federal information policy in the US is based on the premise that government information is a valuable national resource and that the economic benefits to society are maximized when taxpayer funded information is made available inexpensively and as widely as possible. This policy is expressed in the Paperwork Reduction Act of 1995 and in Office of Management and Budget Circular No. A-130, “Management of Federal Information Resources.” This policy actively encourages the development of a robust private sector, offering to provide publishers with the raw content from which new information services may be created, at no more than the cost of dissemination and without copyright or other restrictions. In other countries, particularly in Europe, publicly funded government agencies treat their information holdings as a commodity to be used to generate revenue in the short-term. They assert monopoly control on certain categories of information in an attempt – usually unsuccessful -- to recover the costs of its collection or creation. Such arrangements tend to preclude other entities from developing markets for the information or otherwise disseminating the information in the public interest. The US government and the world scientific and environmental research communities are particularly concerned that such practices have decreased the availability of critical data and information. And firms in emerging information dependent industries seeking to utilize public sector information find their business plans frustrated by restrictive government data policies and other anticompetitive practices.
Xavier R. Lopez, Oracle Corporation
Standard relational database management technology is emerging
as a critical technology for managing the large volumes of 2D and
3D vector data being collected in the geographic and life sciences.
For example, database technology is playing an important role in
managing the terabytes of vector information used in environmental
modeling, emergency management, and wireless location-based services.
In addition, three-dimensional structure information is integral
to a new generation of drug discovery platforms. Three dimensional
structure-based drug design helps researchers generate high-quality
molecules that have better pharmacological properties. This type
of rational drug design is critically dependent on the comprehensive
and efficient representation of both large (macro) molecules and
small molecules. The macromolecules of interest are the large protein
molecules of enzymes, receptors, signal transducers, hormones, and
antibodies. With the recent availability of detailed structural
information about many of these macromolecule targets, drug discovery
is increasingly focused toward detailed structure-based analysis
of the interaction of the active regions of these large molecules
with candidate small-molecule drug compounds that might inhibit,
enhance, or otherwise therapeutically alter the activity of the
protein target. This paper will explain the means to manage the
three dimensional types from the geosciences and biosciences in
object-relational database technology in order to benefit from the
performance, scalability, security, and reliability of commercial
software and hardware platforms. This paper will highlight recent
developments in database software technologies to address the 3D
requirements of the life science community.
2.
Benefits and Limitations of Mega-Analysis Illustrated using the
WAIS
John
J. McArdle, Department of Psychology, University of Virginia,
USA
David Johnson, Building Engineering and Science Talent
The statistical features of the techniques of meta-analysis, based on the summary statistics from many different studies, have been highly developed and are widely used (Cook et al, 1994). However, there are some key limitations to meta-analysis, especially the necessity for equivalence of measurements and inferences about individuals from groups. These problems led us to use an approach we have termed “mega-analysis” (McArdle & Horn, 1980-1999). In this approach all raw data from separate studies are used as a collective. The techniques of mega-analysis rely on a variety of methods initially developed for statistical problems of “missing data,” “selection bias,” “factorial invariance,” “test bias,” and “multilevel analyses.” In the mega-analysis of multiple sets of raw data (a) the degree to which data from different collections can be combined is raised as a multivariate statistical question, (b) unique estimation of parameters with more breadth, precision, and reliability than can be achieved by any single study, and (c) meta-analysis results emerge as a byproduct, so the assumptions may be checked and demonstrate why a simpler meta-analysis is adequate. Mega-analysis techniques are illustrated here using a collection of data from the popular ”Wechsler Adult Intelligence Scale”(WAIS), including data from thousands of people in over 100 research studies.
3.
Publication, Retrieval and Exchange of Data: an Emerging Web-based
Global
Solution
Henry Kehiaian
ITODYS, University of Paris 7, Paris, France
In the era of enhanced
electronic communication and world-wide development of information
systems, electronic publishing and the Internet offer powerful tools
for the dissemination of all type of scientific information. This
is now made available in electronic form primary, secondary, as
well as tertiary sources. However, because of the multitude of existing
physico-chemical properties and variety of modes of their presentation,
the computer-assisted retrieval of the numerical values, their analysis
and integration in databases is as difficult as before. Accordingly,
the need to have standard data formats is more important than ever.
CODATA has joined forces with IUPAC and ICSTI to develop such formats.
Three years after its establishment the IUPAC-CODATA Task Group
on Standard Physico-Chemical Data Formats (IUCOSPED) has made significant
progress in developing the presentation of numerical property data,
as well as the relevant metadata, in standardized electronic format
(SELF).
The retrieval of SELFs is possible via a web-based specialized Central
Data Information Source, called DataExplorer, conceived as a portal
to data sources.
An Oracle database has been designed and developed for DataExplorer
at FIZ Karlsruhe, Germany. URL http://www.fiz-karlsruhe.de/dataexplorer/
ID: everyone; Password: sesame. DataExplorer is now fully operational
and demonstrates the concept using 4155 Chemical components, 998
Original Data Sources, 41 Property Types, and 3805 Standard Electronic
Data Files (SELF). Inclusion of additional data will be actively
pursued in the future.
A link has been established from DataExplorer to one of the associated
Publishers, the Data Center of the Institute of Chemical Technology,
Praha, Czech Republic.
Retooling SELF in SELF-ML, an XML version of the current SELF formats,
is under way.
Besides an on-line demonstration of DataExplorer from FIZ-Karlsruhe
and Praha, the procedure will be illustrated by computer demonstration
of two publications: (1) Vapor-Liquid Equilibrium Bibliographic
Database ; (2) ELDATA, the International Electronic Journal of Physico-Chemical
Data.
This Project was awarded $ 100,000 under the ICSU (International Council for Science) Grants Program 2000 for new innovative projects of high profile potential
Acknowledgments
We express our sincere thanks for the financial assistance of UNESCO
and ICSU and its associated organizations, IUPAC, CODATA and ICSTI,
for helpful discussions to IDF, IUCr, and CAS representatives, and
for the contributions of all IUCOSPED Members and Observers, to
FIZ Karlsuhe administration and its highly competent programmers,
and to all the associated Publishers.
4.
Creating Knowledge from Computed Data for the Design of Materials
Erich Wimmer
Materials Design
s.a.r.l., France and USA
The
dramatic progress in computational chemistry and materials science
has made it possible to carry out ‘high-throughput computations’
resulting in a wealth of reliable computed data including crystallographic
structures, thermodynamic and thermomechanical properties, adsorption
energies of molecules on surfaces, and electronic, optical and magnetic
properties. An exciting perspective comes from the application of
combinatorial methodologies, which allow the generation of large
sets of new compounds. High-throughput computations can be employed
to obtain a range of materials properties, which can be stored together
with subsequent (or parallel) experimental data. Furthermore, one
can include defects such as vacancies or grain boundaries in the
combinatorial space, and one can apply external pressure or stress
up to extreme conditions. Convenient graphical user interfaces facilitate
the construction of these systems and efficient computational methods,
implemented on networked parallel computers of continuously growing
computational power allow the generation of an unprecedented stream
of data. This lecture will discuss experience with
a
technology platform,
MedeA
(Materials Exploration and Design Analysis), which has been developed
by Materials Design with the capabilities described above in mind.
using heterogeneous catalysis as an example, I will illustrate how
chemical concepts can be combined with high-throughput computations
to transform the computed data into information and knowledge and
enable the design of novel materials.
1.
Ethics and Values Relating to Scientific & Technical Data:
Lessons from Chaos Theory
Joan E. Sieber, NSF
Current literature reveals manifold conflicting, shifting and cross-cutting
values to be reconciled if we are to pursue intelligent, data-management
policies. Projects currently underway to deal with these complexities
and uncertainties suggest the inevitability of a paradigm shift.
Consider, e.g., questions of what data to archive, how extensively
to document it, how to maintain its accessibility despite changing
software and hardware, who should have access, how to allocate the
costs of sharing, and so on. Traditional normative ethical theories
(e.g., utilitarianism) can suggest guiding principles, and in today's
global culture, recent ethical (e.g., Rawlsian) notions such as
consideration of the interests of unborn generations and of persons
situated very differently from oneself suddenly have immediate practical
implications. However, such traditional approaches to ethical problem
solving offer little guidance for dealing with problems that are
highly contextual, complex, ill-defined, dynamic and fraught with
uncertainty. Narrowly defined safety issues give way to notions
of the ecology of life on Earth. Minor changes can have major consequences.
The stakeholders are not only scientists and engineers from one's
own culture, but persons, professions, businesses and governments
worldwide, as they exist today and into the future. Issues of scientific
freedom and openness are in conflict with issues of intellectual
property, national security, and reciprocity between organizations
and nations. Ethical norms, codes, principles, theories, regulations
and laws vary across cultures, and often have unintended consequences
that hinder ethical problem solving. Increasingly, effective ethical
problem solving depends on integration with scientific and technological
theory and "know how" and empirical research on the presenting
ethical problem. For example, we look increasingly to psychological
theories and legal concepts for clearer notions of privacy, and
to social experiments, engineering solutions and methodological
innovation for ways to assure confidentiality of data. We often
find that one solution does not fit all related problems.
Chaos theory has taught us principles of understanding and coping with complexity and uncertainty that are applicable to ethical problem solving of data-related issues. Implications of chaos theory are explored in this presentation, both as new tools of ethical problem solving and as concepts and principles to include in the applied ethics education of students in science and engineering.
2.
Understanding and improving comparative data on science and technology
Denise Lievesley, UNESCO Institute for Statistics
Statistics can serve to benefit society, but, when manipulated politically or otherwise, may be used as instruments by the powerful to maintain the status quo or even for the purposes of oppression. Statisticians working internationally face a range of ethical problems as they try to 'make a difference' to the lives of the poorest people in the world. One of the most difficult is the dilemma between open accountability and national sovereignty (in relation to what data are collected, the methods used and who is to have access to the results).
This paper will discuss the role of the UNESCO Institute for Statistics (UIS), to explain some of the constraints under which we work, and to address principles which govern our activities. The UIS is involved in
Of these activities one of the key ones is to foster the collection of comparable data across nations, the main objectives being to enable countries to gain a greater understanding of their own situation by comparing themselves with others, thus learning from one another and sharing good practice; to permit the aggregation of data across countries to provide a global picture; and to provide information for purposes of the accountability of nations and for the assessment, development and monitoring of supra-national policies.
Denise Lievesley will discuss the consultation being carried out by the UIS to ensure that the data being collected on a cross-national basis are of relevance to national policies on science and technology. The consultation process was launched with an expert meeting where changes in science policy were debated and ways in which the UIS might monitor and measure scientific and technological activities and progress across the world were identified. A background paper was produced based on the experiences and inputs of experts from different regions and organizations, which addresses key policy issues in science and technology. The UIS will use this document as a basic reference for direct consultation with UNESCO Member States and relevant institutions. A long term strategy for the collection of science and technology data will be developed as a result of these consultations.
It is vital to build on the experience of developed countries through the important statistical activities of OECD and Eurostat but nevertheless to ensure that the collection of cross-nationally harmonised data does not distort the priorities of poorer countries. We are seeking a harmony of interests in data collection and use and the views of the participants will be sought as to how this might be achieved.
3.
Ethics - An Engineers' View
Horst Kremers, Comp. Sci., Berlin, Germany
The engineering profession has long experience in developing principles for appropriate relations with clients, publishing Codes of Ethics, and developing and adhering to laws controlling the conduct of professional practice. A high demand exists in society for reliable engineering in planning, design, construction and maintenance. One of the primary objectives of an engineer's actions is to provide control over a situation by providing independent advice in conformance with moral principles in addition to sound engineering principles. In a world where life to an increasing extent depends on the reliable functioning of complex information systems and where new technical techniques emerge without chance for controlled experimentation and assessment, the need to inject ethical principles into scientific and technological decisionmaking and to fully consider the consequences of professional actions is mandatory. This presentation reviews several Code of Ethics development efforts and reflects on the Codes relative to action principles in science and technology. A potential role for CODATA is presented.
4.
Ethics in Scientific and Technical Communication
Hemanthi Ranasinghe, University of Sri Jayewardenepura, Sri Lanka
Research can be described as operationally successful when the research
objectives are achieved and technically successful when the researcher's
understanding is enhanced, more comprehensive hypothesis are developed
and lessons learned from the experience. However, research is not
successful scientifically until the issues, processes and findings
are made known to the scientific community. Science is not an individual
experience. It is shared knowledge based on a common understanding
of some aspect of the physical or social world. For that reason,
the social conventions of science play an important role in establishing
the reliability of scientific knowledge. If these conventions are
disrupted, the quality of science can suffer. Thus, the reporting
of scientific research has to be right on ethical grounds too.
General category of ethics in communication covers many things.
One is Error and Negligence in Science. Some researchers may feel
that the pressures on them are an inducement to haste at the expense
of care. For example, they may believe that they have to do substandard
work to compile a long list of publications and that this practice
is acceptable. Or they may be tempted to publish virtually the same
research results in two different places or publish their results
in "least publishable units"papers that are just
detailed enough to be published but do not give the full story of
the research project described.
Sacrificing quality to such pressures can easily backfire. A lengthy
list of publications cannot outweigh a reputation for shoddy research.
Scientists with a reputation for publishing a work of dubious quality
will generally find that all of their publications are viewed with
skepticism by their colleagues. Another vital aspect of unethical
behavior in scientific communication is Misconduct in Science. This
entails making up data or results (fabrication), changing or misreporting
data or results (falsification), and using the ideas or words of
another person without giving appropriate credit (plagiarism)-all
strike at the heart of the values on which science is based. These
acts of scientific misconduct not only undermine progress but the
entire set of values on which the scientific enterprise rests. Anyone
who engages in any of these practices is putting his or her scientific
career at risk. Even infractions that may seem minor at the time
can end up being severely punished. Frank and open discussion of
the division of credit within research groupsas early in the
research process as possible and preferably at the very beginning,
especially for research leading to a published papercan prevent
later difficulties.
While misallocation of credit or errors arising from negligence-are
matters that generally remain internal to the scientific community.
Usually they are dealt with locally through the mechanisms of peer
review, administrative action, and the system of appointments and
evaluations in the research environment. But misconduct in science
is unlikely to remain internal to the scientific community. Its
consequences are too extreme: it can harm individuals outside of
science (as when falsified results become the basis of a medical
treatment), it squanders public funds, and it attracts the attention
of those who would seek to criticize science. As a result, federal
agencies, Congress, the media, and the courts can all get involved.
All parts of the research system have a responsibility to recognize
and respond to these pressures. Institutions must review their own
policies, foster awareness of research ethics, and ensure that researchers
are aware of the policies that are in place. And researchers should
constantly be aware of the extent to which ethically based decisions
will influence their success as scientists.
1.
Scholarly Information Architecture
Paul Ginsparg
Cornell University, USA
If we were to start
from scratch today to design a quality-controlled archive and distribution
system for scientific and technical information, it could take a
very different form from what has evolved in the past decade from
pre-existing print infrastructure. Ultimately, we might expect some
form of global knowledge network for research communications. Over
the next decade, there are many technical and non-technical issues
to address along the way, everything from identifying optimal formats
and protocols for rendering, indexing, linking, querying, accessing,
mining, and transmitting the information, to identifying sociological,
legal, financial, and political obstacles to realization of ideal
systems. What near-term advances can we expect in automated classification
systems, authoring tools, and next-generation document formats to
facilitate efficient datamining and long-term archival stability?
How will the information be authenticated and quality controlled?
What differences should be expected in the realization of these
systems for different scientific research fields? What is the proper
role of governments and their funding agencies in this enterprise,
and what might be the role of suitably configured professional societies?
These and related questions will be considered in light of recent
trends.
2.
The role of scientific data in a complex world
Werner Martienssen
Physikalisches
Institut der Universitaet, Frankfurt am Main, Germany
Physicists
try to understand and to describe the world in terms of natural
laws. These laws cover two quite different approaches in physics.
First, the laws show up a mathematical structure, which in general
is understood in terms of first principles, of geometrical relations
and of symmetry arguments. Second, the laws contain data which are
characteristic for the specific properties of the phenomena and
objects. Insight into the mathematical structure aims at an understanding
of the world in ever more universally applicable terms. Insight
into the data shows up the magnificent diversity of the world's
materials and ist behavior Whereas the description of the world
in terms of a unified theory one day might be reduced to only one
set of equations, the amount of data necessary to describe the phenomena
of the world in their full complexity seems to be open-ended.
A unified theory has not been formulated up to now; nor can we say
that our knowledge about the data would be perfect in any sense,
Much has to be done, still. But being asked for, where do we expect
to be in data physics and chemistry in ten to fifteen years, my
answer is: We -hopefully - will be able to merge the two approaches
of physics. On the basis of our understanding of Materials Science
and by using the methods of computational physics we will make use
both of the natural laws as well as of the complete set of known
data in order to modulate, to study and to generate new materials,
new properties end new phenomena.
3.
Life Sciences Research in 2015
David Y. Thomas, Biochemistry
Department, McGill University, Montreal, Canada
Much of the spectacular progress of life sciences research in the past 30 years has come from the application of molecular biology employing a reductionist approach with single genes, often studied in simple organisms. Now from the technologies of genomics and proteomics, scientists are deluged with increasing amounts, varieties and quality of data. The challenge is how life sciences researchers will use the data output of discovery science to formulate questions and experiments for their research and turn this into knowledge. What are the important questions? We now have the capability to answer at a profound level major biological problems of how genes function, how development of organisms is controlled, and how populations interact at the cellular, organismal and population levels. What data and what tools are needed? What skills and training will be needed for the next generation of life sciences researchers? I will discuss some of the initiatives that are planned or now underway to address these problems.
Biodiversité
- quelles sont les espèces, où se trouvent-elles?
Guy Baillargeon
Les connaissances sur les espèces vivantes sont documentées dans des systèmes de classification élaborés et constamment mis-à-jour par les taxonomistes. D'autre part, la connaissance de la distribution des espèces au sein de la biosphère est encore aujourd'hui principalement dérivée de l'information associée à des spécimens conservés dans les musées et les collections d'histoire naturelle. À ceci s'ajoute pour plusieurs groupes d'organismes vivants, un grand nombre d'observations individuelles colligées par des groupes d'intérêt spécialisés (tel que dans le cas des oiseaux, par les clubs d'ornithologie). Le Réseau mondial d'information sur la biodiversité (SMIB), mieux généralement connu sous son nom anglais de 'Global Biodiversity Information Facility' (GBIF), entend favoriser l'accès à toute cette information en établissant un vaste réseau distribué de bases de données scientifiques transopérables et ouvertes à tous. Encore en début d'implantation, GBIF jouera bientôt un rôle crucial en favorisant la standardisation, la digitalisation et la dissémination de l'information scientifique relative à la biodiversité partout dans le monde. Déjà, plusieurs organisations membres de GBIF ont annoncé leur intention de s'associer pour développer un inventaire de toutes les formes de vie connues (Catalogue de la Vie) et un nombre croissant d'institutions permettent l'accès direct aux données de leurs collections par voie de requêtes distribuées. La présentation fournira des exemples de ce qu'il est déjà possible de faire en matière de transopérabilité en associant un ou plusieurs systèmes de classification avec un moteur de recherche et de cartographie automatisé interreliant les données de distribution de plusieurs millions de spécimens et d'observations fournies par des dizaines d'institutions participantes à l'un des réseaux d'information distribués qui coexistent présentement sur l'Internet.
Presentation is in French;
this is the English abstract: Biodiversity - what are the species, where are they? Guy Baillargeon Knowledge on living species is documented through elaborate classification systems that are constantly updated by taxonomists. Knowledge on the distribution of species in the biosphere is still today mainly derived from label information associated with specimens preserved in natural history collections. In addition, for many living organisms (such as birds), large numbers of individual observations are collected by specialized interest groups. The Global Biodiversity Information Facility (GBIF) intends to facilitate access to much of these data by establishing an interoperable, distributed network of scientific databases freely available to all. Still in its early stages, GBIF is expected to play soon a crucial role in promoting the standardization, digitization and global dissemination of the world's scientific biodiversity data within an appropriate framework for property rights and due attribution. Already, organisations associated with GBIF have announced their intention of working together towards a Catalogue of Life conceived as a knowledge set of names of all known organisms and a growing number of institutions are providing direct access to the data associated with their collections via distributed queries. Examples will be presented of what is already possible in terms of interoperability when coupling one or many classifications with an automated search and map engine that interconnects millions of distributional records provided by dozens of institutions participating to one of the many distributed biodiversity information network that coexist now on the Internet. |
Visualizations
of our Planet's Atmosphere, Land & Oceans
Fritz Hasler
NASA Goddard Laboratory for Atmospheres, USA
See how High-Definition Television (HDTV) is revolutionizing the way we communicate science. Go back to the early weather satellite images from the 1960s and see them contrasted with the latest US and international global satellite weather movies including hurricanes & "tornadoes". See the latest visualizations of spectacular images from NASA/NOAA remote sensing missions like Terra, GOES, TRMM, SeaWiFS, Landsat 7 including new 1 - min GOES rapid scan image sequences of Nov 9th 2001 Midwest tornadic thunderstorms. New computer software tools allow us to roam & zoom through massive global images, e.g. Landsat tours of the US, and Africa, showing desert and mountain geology as well as seasonal changes in vegetation. See dust storms in Africa and smoke plumes from fires in Mexico. Fly in and through venues using 1 m IKONOS "Spy Satellite" data. See vortexes and currents in the global oceans that bring up nutrients. See the how the ocean blooms in response to these currents and El Niño/La Niña climate changes. The presentation will be made using the latest HDTV technology from a portable computer server.
Presented by Dr. Fritz Hasler of the NASA Goddard Space Flight Center. http://Etheater.gsfc.nasa.gov