CODATA 2002 Conference logo
Recherche, Science et Technologie Quebec
 

CODATA logo18th International Conference



CODATA 2002

Frontiers of Scientific and Technical Data

Montréal, Canada
29 September - 3 October

Medical and Health Data

Program
Program at a Glance

Summary Table of Program

[printable PDF Version]


Detailed Conference Program

Forms and Information
Hotel Floor Plan [pdf]

Registration Form [pdf]
Hotel reservation [pdf]
Local Information [pdf]
Call for papers [pdf]

Abstracts
Keynote Speakers
Invited Cross-Cutting Themes

Workshops and Tutorials
Saturday, 28 September
Sunday, 29 September

Prize Award and Banquet
Conference Tours
Sponsors of CODATA 2002

T
o view PDF files, you must have Adobe Acrobat Reader.)

Science Specialty Session Abstracts
Physical Science Data

Biological Science Data
Earth and Environmental Data
Medical and Health Data
Behavioral and Social Science Data
Informatics and Technology

Data Science
Data Policy
Technical Demonstrations
Large Data Projects
Roundtables

Poster Sessions
 


Track I-D-2:
The US National Library of Medicine's Visible Human Project® Data Sets


Chair: Michael J. Ackerman, National Library of Medicine, National Institutes of Health, USA

In the mid-1990s, the US National Library of Medicine sponsored the acquisition and development of the Visible Human Project® database. This image database contains anatomical cross-sectional images, which allow the reconstruction of three-dimensional male and female anatomy to an accuracy of less than 1.0 mm. The male anatomy is contained in a 15 gigabyte database, the female in a 40 gigabyte database.

This session will consist of four papers. The first will summarize the history of the Visible Human Project® and the development of the Visible Human data sets. We will then explore the problems encountered in the real-time navigation of such large image databases. The third paper will discuss the extraction of data from such a database, and the final paper will cover the problems of validation.

1. The Visible Human Project® Image Data Sets
From Inception to Completion and Beyond

Richard A. Banvard, National Library of Medicine, National Institutes of Health, USA

The Visible Human Project® Data Sets resulted from a recommendation of the National Library of Medicine (NLM) Board of Regents' 1987 Long Range Plan that stated the NLM should "thoroughly and systematically investigate the technical requirements and feasibility of instituting a biomedical images library." At the suggestion of an expert panel convened by the Board and reporting in April 1990 that - "NLM should undertake a first project, building a digital image library of volumetric data representing a complete normal adult human male and female. This 'Visible Human' project would include digital images derived from computerized tomography, magnetic resonance imaging, and photographic images from cryosectioning of cadavers." - the University of Colorado was contracted in August 1991 by NLM to undertake collection of this "Visible Human" image data set. In November 1994 the Visible Human Male data set was announced and released to the public, followed one year later by the Visible Human Female. The data sets are available via FTP at no cost, to anyone holding a no cost license. Each image: CT, MRI and cryosection is stored as a separate file; can be downloaded singularly or in any number up to the entire data set. Several mirror sites have been established to facilitate download for international license holders. The images also can be purchased on tape for a fee from the National Technical Information Services (NTIS). This session will include a discussion of the genesis of the Visible Human Project®, a description of the University of Colorado's cryosectioning procedures, and descriptions of several of the more interesting and notable outcomes developed by license holders who have used The Visible Human Project® Data Sets.



2. Visible Human Explorer
Hao Le, Flashback Imaging Inc., Canada
Brian Wannamaker, Sea Scan International Inc., Canada

The technology for imaging in medical applications continues apace. This increases the potential for improvements in medical research, diagnostic procedures, and patient care. On the other hand, the increase in imaging activity also increases the shear volume of data that must be dealt with. The imagery may be reviewed for immediate diagnostic procedures and discarded. Or it may be stored or archived for further use. However, storage or archiving is effectively discarding unless effective means for recovering the data exist. Accessibility is an essential component of developing and distributing new knowledge from growing data volumes. This paper will discuss specific approaches to improving accessibility of large image databases like that of the Visible Human Project. Real time navigation in 2D and 3D of image databases as well as user interfaces designed for public and academic use will be outlined. The presentation will be illustrated with some thousands of images from the Visible Human Project.



3. The NLM Insight Registration and Segmentation Toolkit
William Lorensen, GE Research, USA

In 1999, the National Library of Medicine (NLM) awarded six contracts to develop a registration and segmentation toolkit. The overall objective of the project is to produce an application programming interface (API) implemented within a public domain toolkit. The NLM Segmentation and Registration Toolkit supports image analysis research in segmentation, classification and deformable registration of medical images. This toolkit meets the following critical technical requirements identified by the National Library of Medicine:

  • Work with the Visual Human Male and Female data sets.
  • Provide a foundation for future medical image understanding research.
  • Become a self sustaining code development effort.
  • Accommodate periodic and incremental modifications and additions.
  • Accommodate expansion to parallel implementations.
  • Accommodate large memory requirements.
  • Support a variety of visualization and/or rendering platforms.

In addition to the technical challenges presented by these requirements, the selected team and subcontractors, had to work as a distributed group. The software development experience of the groups also varied. Some members had created software for a large community while others had only developed software for their local groups. The team defined a web centric software development process modeled after the Extreme Programming approach that relies on rapid and parallel requirements analysis, design, coding and testing. Communication through web based mailing lists and bug trackers was supplemented with conventional telephone conferences.

The first public version of the software is scheduled for release in October, 2002.

This talk discusses the chronology of the project, the core architecture and algorithms as well as the light weight software engineering processes used throughout the project. Finally, we present lessons learned that will be of value to future distributed software development projects.

 

4. The Visible Human Data Sets: A Protoype and a Roadmap for Navigating Medical Imaging Data
Peter Ratiu, Harvard Medical School, Brigham and Women's Hospital, USA

The Visible Human Data Sets are to date the most complete, multi-modality data sets of human anatomy. The computational challenges posed have been widely discussed and many of them have been or are being solved by experts in various aspects of medical image analysis and medical informatics. Their approach, which has proved profitable, is to regard the Visible Human as a vast collection of bits, single and multi-channell images, with little regard to its intrinsic content B human anatomy. This approach allowed them to solve computational problems that had appeared overwhelming at the inception of the project: powerful servers can make available the individual images, manipulate and display them in various ways, on the desktop of end-users. An example of such solution implemented by computer scientists is the EPFL Server.

The more specific problem, of how to use this unique information in medical research has been less often addressed. One reason for this, is that the data is vast, its manipulation seemingly unwieldy for anatomists, until now more versed in using scalpels than mouse buttons. Another reason is, the inherent novelty of the data: for the first time, it opens the possibility of a quantitative approach to anatomy. However, this quantitative approach can be best exploited by first defining problems in anatomy, anthropology, pathology in these terms.

I will discuss two basic aspects of the Visible Human Project as a landmark data set:

  1. The problem of establishing a universal anatomical coordinate system, with applications in basic research as well as clinical medicine (radiology, clinical imaging), and how the VHP can contribute to the solution.
  2. The need for a quantitative comparative anatomy, as this is becoming apparent in a broad array of disciplines, ranging from physical anthropology to gynecology. I will present how the VHP data can be employed as a roadmap for navigating diverse data.

The aim of this presentation is to present the problems related to medical imaging data to experts in other fileds, in such manner, that it may spark a mutually profitable dialog with hitherto alien disciplines.



Track III-C-5:
Emerging tools and techniques for data handling in developing countries

Chair: Julia Royall, Chief, International Programs, and Director, Malaria Research Telecommunications Network, for the National Library of Medicine, USA

This session will feature three panelists, all working with various tools and information technology to manage data to improve health in Africa.

Allen Hightower is Chief, Data Management Activity at CDC’s National Center for Infectious Disease and a pioneer in initiating NLM’s malaria research network at a remote site on Lake Victoria in Kenya.  He has developed several tools for data collection and management which will change the speed and quality of data collection in Africa.

From the KEMRI-Wellcome Trust research unit on the coast of Kenya comes Tom Oluoch, systems operator/data manager and co-creator of a virtual library for this site, which brings together researchers from Kenya Medical Research Institute and Oxford University.  His eyewitness case study is full of concrete examples of how IT and data management have brought expansion and change to this remote research unit.

Bob Mayes is Chief, Health Informatics Section, Zimbabwe CDC AIDS Program.  CDC’s program of technical assistance to Zimbabwe focuses on strengthening surveillance and laboratory measures, scaling up promising prevention and care strategies, supporting behavior change communication projects, data mining, semantic management of data for systematic review, and promoting technology transfer.

The presenters will discuss individual examples and case studies, as well as talk about how these tools can facilitate the discovery process.

1. Field Data Collection for the Malaria Research Network in Kenya
Allen Hightower, Centers for Disease Control, USA

Allen Hightower is Chief, Data Management Activity at CDC's National Center for Infectious Disease and a pioneer in initiating NLM's malaria research network at a remote site on Lake Victoria in Kenya. He has developed several tools for data collection and management which will change the speed and quality of data collection in Africa. He is currently evaluating field data collection using paperless GPS/data collection systems via Pocket PC-based personal data assistants in two projects:

(1) collecting census and GPS data for a wash-durable bednet study area and
(2) conducting a survey in a 15 village area on bednet usage for linkage with other health-related data.


2. Eye witness account: the role of IT and data management in expansion and change at a remote research unit in Kenya
Tom Oluoch, KEMRI-Wellcome Trust, Kenya

From the KEMRI-Wellcome Trust research unit on the coast of Kenya comes Tom Oluoch, systems operator/data manager and co-creator of a virtual library for this site, which brings together researchers from Kenya Medical Research Institute and Oxford University. His eyewitness case study is full of concrete examples of how IT and data management have brought expansion and change to this remote research unit.


3. CDC in Zimbabwe: strengthening regional surveillance and laboratory measures, supporting infrastructure development and promoting technology transfer
Robert Mayes, CDC AIDS Program, Zimbabwe

Bob Mayes is Chief, Health Informatics Section, Zimbabwe CDC AIDS Program. CDC's program of technical assistance to Zimbabwe focuses on strengthening surveillance and laboratory measures, scaling up promising prevention and care strategies, supporting behavior change communication projects, data mining, semantic management of data for systematic review, and promoting technology transfer.

 

4. Complex Data From Health Research
Themba Mohoto, Reproductive Research Unit, Chris Hani Baragwanath Hospital, Oweto

In the continuing search for better health for all, health researchers are faced with numerous methodological problems of a complex nature in their efforts to strengthen health programs, evaluate health systems and measure the impact of interventions. This in turn has posed greater challenges for data analysts.

This paper investigates the types of data produced in health research including:

  1. Multi-stage survey data, e.g. Demographic Health Survey (DHS) in which data is collected at many levels such as household data, women data and children data and there is a need to link the data from these various levels.
  2. Longitudinal or Repeated measures studies. Such data can arise either from cohort studies or from clinical trials. In this type of study there are repeated observations within individuals.
  3. In clinical trial databases there are also difficulties with recording adverse events or concomitant medications, as there will be a variable number of these per patient.
  4. A new area is that of cluster randomized trials which combines features of multistage sample data with features of clinical trial data.

Statisticians in this area are investigating ways of dealing with these problems.

 


Track III-D-6:
Données & Santé : utilisations et enjeux
(Data and Health: Usage and Issues
)

Chair: Daniel Laurent, Université MLV, France ; Jean-Pierre Caliste UTC, France

Le poids du secteur de la Santé dans l'économie mondiale est devenu déterminant. Les dépenses pour la Santé représentent désormais 15% du PIB des Etats-Unis et 10% de celui de la France ou du Canada. Internet a bouleversé le domaine et accentué son ouverture au grand public en termes d'informations : il existe plus de 17 000 sites totalement dédiés à la santé et 40% des interrogations des internautes américains concernent des sites santé.

La diversité des situations médicales reposant sur l'utilisation de données complexes correspond a des angles d'approches variés : objectifs institutionnels et politiques, circulation d'informations médicales dans les réseaux spécialisés et par le biais d'internet, nouvelles pratiques médicales et assistance pour des sites isolés, évolution des services médicaux pour le praticien et le patient.

On constate une importance croissante des relations entre les composantes étatiques (systèmes de Sécurité sociale) et privées (assureurs, laboratoires pharmaceutiques…) tant au plan organisationnel (niveau de définition des actes) qu'à celui des aspects micro et macroéconomiques.

L'utilisation de données complexes (numériques, imagerie …) et la gestion des connaissances fait appel aux techniques de traitement de données de nature hétérogène en s'appuyant sur des approches résolument pluridisciplinaires venant conforter la théorie de l'information.

A partir de ces constatations, Codata France a fait du domaine de la santé l'un de ses trois axes prioritaires d'activité. Il propose d'organiser un atelier thématique sur cette question. Si nécessaire, il pourrait être réparti en 2 sessions spécialisées. Pourraient y être présentées les thématiques suivantes :

  • les réseaux d'information ou les " autoroutes " de l'information en santé.
  • données et internet (e health) : fiabilité, validité…
  • les réseaux de santé et les réseaux coordonnés de soins : de nouveaux enjeux pour le " managed care "
  • les systèmes d'information en santé : réseaux nationaux ou régionaux, réseaux d'établissements, réseaux de santé, cabinet médical…
  • les enjeux de la télémédecine
  • l'utilisation des données par centres d'appels (" call centers ")
  • le dossier médical du patient
  • la qualité des données
  • la protection et l'archivage des données
  • le confidentialité des données
  • l'intéropérabilité des données et des systèmes

 

1. New information systems for the public healthcare insurance organization : the Catalan Health Service (CatSalut) in Spain
TORT I BARDOLET (Jaume), Generalitat de Catalunya, CatSalut, Barcelona, Spain

Key words : information systems, health care organisation, insurance, risk management, data.
Mots clés : systèmes d'information, management des systèmes de santé, gestion du risque, données.

Ten years after its creation, the Catalan Health Service (SCS) is initiating a reorientation process aimed at consolidating its role as the public healthcare insurance organization for all citizens of Catalonia. This reorientation involves generating a series of actions oriented more towards attention to the insurance holder/citizen, while maintaining a close relation with the suppliers of healthcare services from the public network.

This transformation coincides with the intention of generating qualitative and quantitative advancement regarding the structure of information systems available up to now. Thus a new Systems Plan is being drawn up, oriented towards the SCS's function as a public healthcare insurance organization.

1. Definition of the SCS's management needs

Aims:
1.1 To manage resources efficiently
1.2 To implant processes for continued improvement in service quality
1.3 To bring about active client management
1.4 To manage risk
1.5 To implant efficient administrative processes

These aims involve a series of needs that must be taken into account when developing new management and information systems.

  • To back the management aims of the major working areas: demand, offer and internal administration, and lines of action for each of these (services).
  • To facilitate the systematic drawing up of management reports based on parameters enabling the executive structure to make decisions concerning steps to be taken.
  • To collect all necessary information properly and in good time by means of the most adequate software.

In order to specify these aims, a series of management levers has been devised:

To manage resources
To provide activity follow-up
To provide cost follow-up
To manage the quality level
To establish communication with clients
To manage risk
To improve the health of the population
To rationalize processes
To improve claim procedure for damages

Moreover, this has to be specified using pre-established follow-up parameters for drawing up the management reports.

2. Evaluating the developments and structure required

The proposal for the basic structure of the new information systems is based on three concepts and their corresponding identifiers:

- The insurance holder = personal identification code (CIP)
- The service providing unit = productive unit code (UP)
- The service / activity = service code

It has been planned that the different computer applications will work on a large data warehouse that will compile all activity (contracts of insurance holders with the productive structure) and which must make possible the generation of different views for each of the functions (see Graph 1).

The system has been graphically represented as a pyramid divided transversally into three parts. The lower trapezium shows the database (information) ; the middle, the computer applications (the treatment of information); and the upper triangle, the management information system.

The design of the information system is structured around four basic areas: demand, offer, activity and economy-finance (see Graph 2).

 

2. The planning and management of emergency treatment in Catalonia, by means of a specific information system
TORT I BARDOLET (Jaume), Generalitat de Catalunya, CatSalut, Barcelona, Spain

Key words : information systems, emergency, health care organisation, planning, data.
Mots clés : systèmes d'information, urgences, management des systèmes de santé, planification, données.

The Overall Emergency Plan has been used in Catalonia for the past three years. This is a global scheme that includes planning, precaution and prevention, management and supervision of emergency healthcare attention. It was created, above all, for those times of the year when there is an increased demand for healthcare attention for a variety of reasons.

The Plan includes:

  • The analysis of the population requiring emergency attention: user characteristics, reasons for the examination, analysis of user expectations and motivations.
  • Preventative actions: increased homecare coverage, increased influenza vaccine coverage, follow-up of users who have repeatedly requested emergency attention.
  • Organizational actions: the drawing up by the hospitals of annual working plans for emergency attention, telephone-based back-up for mental health professionals, and coordination among healthcare mechanisms.
  • An increase in the offer of contracted hospital discharges, and reinforcements in the summer and during periods of sustained growth in demand.

The information system
The Overall Emergency Plan is based on a specific information system -extranet- which makes it possible for a group of productive units from different healthcare areas to register - on a daily basis - emergency activity data from their centers, as well as other relevant information that allows the forecasting of increased demand and the quick and effective adoption of corrective measures. The extranet includes information regarding:

  • Specialized attention (hospitals):
    • data concerning activity: emergency cases attended and admitted, hospital admissions, discharges and transfers to other centers
    • data concerning resource availability: patients awaiting admission, waiting period, available beds
  • Continued primary attention: emergency activity of these centers
  • Primary attention: data concerning continued attention, number of house calls
  • Specific emergency services (061): number of telephone calls attended and services carried out.

3. Etude d'un système d'aide à la gestion de l'information dans la santé
Appliqué au domaine cardiovasculaire
Elisabeth Scarbonchi, Daniel Laurent, Christian Recchia, Université de Marne-la-Vallée, Institut Francilien d'Ingénierie des Services (I.F.I.S.), France

Dans le cadre d'un réseau de soins, le praticien et l'usager ont accès à un ensemble d'informations le concernant. Les informations sont réparties dans différents services d'un même hôpital voir plusieurs établissements. Dans ce système intervient la nature (typologie des informations), leur localisation et les volumes concernés, notamment les données informatiques lorsqu'il s'agit d'imagerie médicale.

Les réseaux à haut débit sont de nature à offrir des possibilités de connexion entre ces différentes sources pour une exploitation optimales dans les services de cardiologie.

Lorsqu'il s'agit de données numériques et textuelles, les techniques de datamining et textmining pourront être utilisés dans le but de produire de l'information à valeur ajoutée dans le cadre d'un fonctionnement opérationnel voir dans un contexte de recherche.

Lorsqu'il s'agit de sources d'images leur mise à disposition immédiate et interactive offre des possibilités et des perspectives d'animation et représentation dans un contexte opérationnel.

La mise en place d'un système d'informations multisources en réseau nécessitera de traiter avec une attention particulière les problèmes de sécurité et de propriété de données.

 

4. Données et santé : propriété, accès, protection, transmission. Les enjeux des réseaux de santé
Christian Bourret, Université de Marne-la-Vallée, France
Serge Chambaud, Institut National de la Propriété Industrielle (I.N.P.I.), France
Elisabeth Scarbonchi, Université de Marne-la-Vallée, France
Daniel Laurent, Université de Marne-la-Vallée, France

Mots clés : propriété des données, confidentialité, dossier médical patient, réseaux de santé, autoroutes de l'information.
Key words : data ownership, confidentiality, medical record, business methods, patents, health care management, information networks.

La propriété et la protection des données constituent un des enjeux majeurs de la société post-industrielle fondée sur les biens immatériels : les services et la diffusion de l'information. Dans le contexte du développement de l'industrie de l'information, les données médicales constituent un enjeu commercial très important. Ces données sont très spécifiques. Il s'agit avant tout de données personnelles, sensibles et confidentielles, faisant l'objet de législations particulières. Pour bâtir notre problématique, nous nous appuierons sur l'expérience française des réseaux de santé, que nous élargirons ensuite à des comparaisons avec les Etats-Unis.

Le premier enjeu étudié sera celui de la propriété et de l'usage des données produites par les réseaux de santé. Nous l'analyserons à partir du dossier médical patient. Les données qu'il renferme appartiennent-elles au patient ? Aux différents médecins et aux organisations (hôpitaux, cliniques, assurance maladie …) pris individuellement ? Au réseau ? A l'entité qui l'héberge : notaire de l'information, infomédiaire ou hébergeur ? La réponse est loin d'être évidente. L'ensemble des données : le dossier global partagé, constitue-t-il en termes de propriété un tout différent de la somme de ses parties ? Peut-on vraiment strictement séparer l'usage des données de leur propriété ? Nous analyserons les différentes réponses actuelles possibles à ces questions.

Nous évoquerons ensuite une autre question déterminante : l'accès du patient à la consultation de ses données de santé personnelles. En France, la nouvelle loi du 4 mars 2002 a posé de grands principes mais a laissé de nombreuses interrogations en suspens. Cet accès se fera-t-il directement ? Indirectement par le biais d'un médecin ? Et à quelles données le patient aura-t- il accès ? A l'intégralité de son dossier ou à un résumé ? Aura-t-il également accès aux commentaires des praticiens ? Nous tracerons des pistes de réflexion pour éclairer toutes ces questions.

Tout se complique encore quand, comme c'est largement le cas aux Etats-Unis, les patients constituent leur propre dossier médical. Dans ce cas, quelle en est la fiabilité ? Peut-il être utilisé par des professionnels qui engageraient ainsi leur responsabilité ?

En terme de propriété industrielle et intellectuelle, se pose aussi la question de la brevetabilité et de la protection des logiciels de création, de gestion ou de diffusion du dossier médical patient. Les dossiers médicaux patients sont-ils protégeables ? Les critères de brevetabilité classiques s'appliquent-ils ou non à eux ? Ou bien, constituent-ils des " business methods " et, dans ce cas, comment les protéger ? Les réponses peuvent varier selon les pays. Nous aborderons ces questions à travers une comparaison entre les possibilités offertes en France et aux Etats-Unis.

La transmission des données médicales constitue un autre enjeu majeur, celui des autoroutes de l'information. Nous examinerons deux aspects essentiels de l'évolution actuelle, notamment en France : l'effacement progressif de l'Etat au profit d'acteurs privés et le choix fondamental entre la sécurisation d'un réseau de transmission de données médicales (Réseau Santé Social de Cégétel-Vivendi) ou de la sécurisation des données elles-mêmes (France Télécom ou Cegedim).

 

5. Les réseaux de santé : une expérimentation française centrée sur le partage de l'information
Gabriella Salzano, Université de Marne-la-Vallée, France
Christian Bourret, Université de Marne-la-Vallée, France
Jean-Pierre Caliste, Université de Technologie de Compiègne (UTC), France
Daniel Laurent, Université de Marne-la-Vallée, France

Mots clés : réseaux de santé, systèmes d'information, information partagée.
Key words : health care management, information systems, data, shared information.

Depuis le début des années 1980, l'ensemble des grands pays industrialisés sont confrontés au problème de la maîtrise des coûts de leurs systèmes de santé et en particulier de ceux de l'hospitalisation. Une solution envisagée a été le " virage ambulatoire " visant à privilégier la médecine de ville en s'appuyant sur les nouvelles technologies de l'information et de la communication (NTIC). En France, une voie originale a été expérimentée : les réseaux de santé. Elle a été légitimée par la loi du 4 mars 2002 relative au droit des malades et à la qualité du système de santé.

Les réseaux de santé se veulent résolument au service du patient. Leurs objectifs sont de décloisonner le système de santé en améliorant l'indispensable relation ville-hôpital mais aussi les relations entre les différents professionnels en charge du même patient. Il s'agit d'assurer la qualité et la continuité de soins par la mise en place d'une organisation innovante, fondée sur des valeurs partagées, comme la construction de pratiques collégiales et non plus individuelles ou hiérarchisées, et un meilleur partage de l'information.

Les systèmes d'information constituent le pivot des réseaux de santé. Ils doivent tout d'abord assurer l'interopérabilité (coordination et intégration) de différents autres sous-systèmes, notamment les systèmes d'information propres aux hôpitaux ou cliniques, les logiciels de gestion de cabinets médicaux ou des autres professionnels. Il doivent aussi permettre l'accès à des bases de données ou à des logiciels d'aides à la décision (référentiels …) comme aux services de télémédecine. Ils doivent aussi assurer la gestion de services spécifiques au réseau : plate-forme d'orientation des urgences et / ou centre d'appels, dossier patient partagé au sein du réseau … Nous analyserons les principaux problèmes à résoudre, en termes d'organisation et d'applications.

Les réseaux de santé répondent à des forts besoins de changement. Leur mise en place et leurs performances doivent être évaluées. L'évaluation influence fortement l'élaboration du système d'information, car celui-ci devra fournir les données indispensables au suivi des indicateurs d'évaluation et répondre à des exigences de qualité, spécifiques aux objectifs des réseaux.

Dans cette communication, nous évoquerons les enjeux et les méthodologies d'évaluation des réseaux de santé, en soulignant les interactions avec les méthodologies d'élaboration des systèmes d'information, dans un cadre de management de projets complexes.

Behavioral and Social Science Data


Track I-C-4:
Government as a Driver in Database Development in the Behavioral Sciences


Chair: David Johnson, Building Engineering and Science Talent, USA

The behavioral sciences have not had a tradition of data sharing. Thus they have been somewhat behind other sciences in the development of databases. Officials in several science agencies of the US federal government have been concerned about this lack of data sharing and have taken measures to stimulate development. The purpose of this panel is to explore the ways that government agencies can arrange funding opportunities to stimulate innovation in areas that scientists within given fields have been reluctant to address.The work of three US agencies will be highlighted: The National Science Foundation, the National Institutes of Health, and the Federal Aviation Administration.

Government and science often exercise reciprocal influences on each other. The three examples that that will be explored in this panel session represent three discrete models by which governments may stimulate a science to produce knowledge in a way that it would not have in the absence of the government's effort.

 

1. Sharing data collection and sharing collected data: The NICHD Study of Early Child Care and Youth Development
Sarah L. Friedman, The NICHD Study of Early Child Care and Youth Development, USA

The NICHD Study of Early Child Care and Youth Development came to life as a result of a 1988 NICHD solicitation (RFA) and is scheduled to terminate at the end of 2009. The aim of the solicitation was to bring together investigators from different universities or research institutions to collaborate with NICHD staff on the planning and execution of one longitudinal study with data to be collected across sites. The idea for such a collaborative study was unprecedented in the scientific field of developmental psychology.

Ten data collection sites were selected on a competitive basis and the affiliated investigators, in collaboration with NICHD staff, have designed the different phases of the solicited longitudinal study and have implemented it. While the data collected at each of the sites belongs to the site, NICHD required that each of the 10 sites would send its data to a central location, the Data Acquisition and Analysis Center, for data editing, data reduction and data analyses. The study investigators, in collaboration of the data center staff, guide the data acquisition and analyses. Upon completion of an agreed upon quota of network authored scientific papers for a given phase of the study, individual study investigators get access to the data sets of the entire sample. A few months after the data sets and supporting documentation are available to individual study investigators for their exclusive use, the same data sets are made available to interested and qualified others in the scientific community.

While the archiving of the data is done by an NICHD grantee, the Murray Center at Radcliff College has expressed interest in archiving the data and supporting their use by interested and qualified investigators. If the grantee institutions will accept the Murray Center request, the data collected by the grantees will be available to the scientific community beyond the life of the grant.

 

2. Data Sharing at NIH and NIA
Miriam F. Kelty, National Institute on Aging, Office of Extramural Activities, USA

NIH published its policy mandating sharing of unique biological resources in 1986. Sixteen years later NIH published a draft policy. It states that NIH expects the timely release and sharing of final research data for use by other researchers. Further, NIH will require extramural and intramural investigators to promulgate a data sharing plan in their research proposals or to explain why a plan to share data is not possible. The policy is available for comment until June 1. The presentation will provide background information and summarize public comments.

NIA staff have been leading advocates for data sharing and have encouraged it among grantees, particularly when research involves large data sets that are valuable research resources and impractical to replicate. NIA will provide funds to make data that are well documented and user-friendly available to other researchers. Some examples of NIA supported activities in support of data sharing are described below:

The National Archive of Computerized Data on Aging (NACDA), located within the Interuniversity Consortium for Political and Social Research (ICPSR), is funded by the National Institute on Aging. NACDA's mission is to advance research on aging by helping researchers to profit from the under-exploited potential of a broad range of datasets. NACDA acquires and preserves data relevant to gerontological research, processing as needed to promote effective research use, disseminates them to researchers, and facilitates their use. By preserving and making available the largest library of electronic data on aging in the United States, NACDA offers opportunities for secondary analysis on major issues of scientific and policy relevance.

NACDA supports a data analysis system that allows the user to access subset variables or cases. The system can be used with a variety of data stets, including the Longitudinal Survey on Aging, National Survey of Self-Care and Aging, National Health and Nutrition Survey, National Hospital Discharge Survey, and the National Health Interview Survey.

NIA supports a range of studies that have agreed to make data available to researchers. An example is the Health and Retirement Study, a nationally representative study that collects data on aging and retirement. The study is based at the University of Michigan and the Michigan Center on Demography of Aging makes data available to a range of researchers. Some data is available to anyone for analysis while other data sets are restricted and require contractual agreements prior to being made available for use.

The presentation will address NIA's experience with the use of available data sets and raise some issues surrounding data sharing.



3. Data Archiving for Animal Cognition Research: The NIMH Experience
Howard S. Kurtzman, Cognitive Science Program, National Institute of Mental Health, USA

In July 2001, the National Institute of Mental Health (a component of the U.S. National Institutes of Health) sponsored a workshop on "Data Archiving for Animal Cognition Research." Participants included leading scientists as well as experts in archiving, publishing, policy, and law. Due to the focus on non-human research, participants were able to devote primary attention to important issues aside from protection of confidentiality, which has dominated most previous discussions of behavioral science archiving. The further limitation of the workshop's scope to animal cognition research allowed archiving to be examined realistically in the context of one particular scientific community's goals, methods, organization, and traditions.

The workshop produced a set of conclusions, detailed in a formal report, concerning: (1) the likely impacts of archiving on research and education, (2) guidelines for incorporating archiving into research practice, (3) contents of archives, (4) technical standards, and (5) organizational and policy issues. The presentation will review these conclusions and describe activities following up on the workshop. Also discussed will be the applicability of the workshop's conclusions to other areas of behavioral science and how this workshop's approach to stimulating archive development might serve as a model for other fields.

 

4. Data Sharing and the Social and Behavioral Sciences at the National Science Foundation
Philip Rubin, Division of Behavioral and Cognitive Sciences, USA

At the heart of the National Science Foundation's (NSF) strategic plan are people, ideas, and tools. In the latter area, our goal is to provide broadly accessible, state-of-the-art information-bases and shared research and education tools. We actively encourage data sharing across all of our fields of study. This presentation will provide examples from the social and behavioral sciences. As data sharing is encouraged and increased, however, there are growing concerns and issues related to privacy and confidentiality. These issues will also be discussed, as will future directions in information sharing.

At the NSF, the Directorate for Social, Behavioral, and Economic Sciences (SBE) participates in special initiatives and competitions on a number of topics, including infrastructure to improve data resources, data archives, collaboratories, and centers.

The breadth of fields is wide in our Directorate, ranging from Anthropology through Political Science and Economics. However, common to many of the disciplinary areas that we support is a rapid change in how the science is being done. What is emerging is a large scale social science, driven by computational progress, the need for scientific expertise across a number of domains, growing bodies of data and other information, and theoretical and practical issues that require for their understanding a broader view than has been taken in the past.

This change will be illustrated by some examples of recent or continuing projects that we are supporting. For example, physical anthropologists utilize tools from a wide range of overlapping disciplines ranging from molecular biology (population genetics) to field ecology to remote sensing (paleoanthropology). In all of these areas large amounts of data are generated that are conducive to the establishment of digital libraries, databases, web-based archives and the like. A recent SBE Infrastructure award will be described that supports a number of interrelated activities that will advance research in physical anthropology, evolutionary biology, neuroscience and any others that may require information and/or biomaterials from nonhuman primates.

An example in geography is the National Historical Geographic Information System (NHGIS) at the University of Minnesota, Twin Cities. This project upgrades and enhances U.S. Census databases from 1790 to the present, including the digitization of all census geography so that place-specific information can be readily used in geographic information systems. We expect that the NHGIS will become a resource that can be used widely for social science training, by the media, for policy research at the state and local levels, by the private sector, and in secondary education.

Last year the National Science Board approved renewal of NSF support for the Panel Study of Income Dynamics (PSID). The PSID is a longitudinal survey initiated in 1968 of a nationally representative sample for U.S. individuals and the family units in which they reside. The major objective of the panel is to provide shared-use databases, research platforms and educational tools on cyclical, intergenerational and life-course measures of economic and social behavior. With thirty-plus years of data on the same families, the PSID can justly be considered a cornerstone of the infrastructure support for empirically based social science research.

Additional examples abound, and will be discussed. These include CSISS, the Center for Spatially Integrated Social Science, at the University of Santa Barbara; the fMRI Data Center at Dartmouth College, a national cognitive neuroscience resource; data-rich linguistics projects that support both the preservation of knowledge of disappearing languages and statistically-guided approaches to increasing our understanding of ongoing language use; systems for storage and dissemination of multimodal (audio, visual, haptic, etc.) data; and systems and techniques for the meta-analysis of large scale data sets.

Data sharing is at the heart of NSF's mission and of our vision of the social and behavioral sciences. This presentation is intended to provide an overview of that vision.

 


Track I-D-6:
Database Innovation in the Behavioral Sciences and the Debate Over What Should Be Stored
Session organizer: US National Committee for the International Union of Psychological Sciences, National Academy of Sciences, Washington, D.C., USA

Chair: Merry Bullock, American Psychological Association

Data sharing is not the norm in behavioral science, although there are pockets of change and innovation. At the same time, a debate is underway regarding what data from experiments are worth placing in databases to be available for others. As it becomes possible to store huge quantities of data, it is becoming more necessary to assure that databases grow into useful tools rather than clogged informational arteries. This panel has two objectives: to inform attendees of innovations and to discuss the possible criteria for determining what should be included in databases.

Panelists will discuss several innovative databases that are proving transformational for the fields they touch. For example, a database of functional magnetic resonance images of the brain created at Dartmouth College is making it possible to test hypotheses about brain-behavior relations on data pooled across many individual studies; a database of geographic information based at the University of California, Santa Barbara is allowing those in a variety of disciplines to look at the influence of location on such things as health behaviors, social development, and wealth accumulation. A database of aptitude test scores at the University of Virginia is a test bed for statistical innovations that are making it possible to legitimately compare data and not just outcomes from disparate studies.

The Panel will describe several of these innovations in behavioral and other sciences, and will address important emerging issues. For example, the fMRI database (originally envisioned as capturing all the images from most of the major neuroscience journals) is constrained because of file size-images from a single journal consume terabytes of storage space and raise important questions of accessibility. As the behavioral sciences evolve toward more common acceptance of data sharing, those in the behavioral sciences must evolve toward a more common understanding of what should be contained in a database and what sorts of data are appropriate for archiving. Examples and issues from other disciplines will help inform the discussion.

 

1. Acquisition Criteria at the Murray Research Center: A Center for the Study of Lives
Jacquelyn B. James, Murray Research Center

The Murray Research Center is a repository for social and behavioral sciences data on the in-depth study of lives over time, and issues of special concern to American women. The center acquires data sets that are amenable to secondary analysis, replication, or longitudinal follow-up. In determining whether or not to acquire a new data set for the archive, several kinds of criteria are used. The criteria can be roughly grouped into five general categories: content of the study, methodology, previous analysis and publication, historical value, and cost of acquiring and processing the data. Each of these will be described with an indication of the relative importance of each criterion, where possible.

 

2. What Functional Neuroimaging Data is 'Worth' Sharing and the Scope of Large-Scale Study Data Archiving
John Darrell Van Horn, The fMRI Data Center, Dartmouth College, USA

Functional neuroimaging studies routinely produce large sets of raw data that comprise both functional image time series as well as high-resolution anatomical brain volumes. It is often the case that these data are then passed through several steps of processing and then only a limited set of the statistical output is presented in papers published in the peer-reviewed literature. Arguments for archiving only these summary results have suggested that they are of greater value than that of the raw data itself. However, since with each step of processing the information content of a data set remains constant or is reduced, it is difficult to see the source of any increased scientific value. The fMRI Data Center (fMRIDC) strives to archive complete raw functional neuroimaging data sets accompanied by enough information that anyone else would be able to reconstruct the steps in processing of the data and arrive at the same statistical brain map as the original authors. To achieve this, the fMRIDC requests that authors of published studies provide considerably more study 'meta' and raw data than is typically presented in their published article. As such, several studies currently in the fMRIDC archive rival the size of the entire human genome database (~20GB compressed). Through the storing of complete study data sets, the fMRIDC effort will serve to not only advance thinking into fundamental concepts about brain function by permitting others to examine the published neuroimaging data of others but also to document more thoroughly the scientific record of work in the fields of functional brain imaging and cognitive neuroscience.

 

3. Accession and Sharing of Geographic Information
Michael F. Goodchild, University of California, Santa Barbara, USA

Geographic information is a well-defined type, with complex uses and production systems. The Alexandria Digital Library began as an effort to provide remote access to a large collection of geographic information (maps and images), but has evolved into a functional geolibrary (a digital library that can be searched using geographic location as the primary key). I use ADL to illustrate many of the issues and principles inherent in sharing geographic information, and in policies regarding its acquisition by archives, including granularity, metadata schema, support for search across distributed archives, portals and clearinghouses, and interoperability.

 

Last site update: 25 September 2002