19th International CODATA Conference
Category: Infoscience

Marine Project Data Management System

J.S.Sarupria (sarujs@darya.nio.org) and G.V.Reddy (reddy@darya.nio.org)
Data & Information Division, National Institute of Oceanography, India


Marine science is a multidisciplinary  science  which produces very varied types of data/information  including hydrographic, biological, chemical, fisheries, geological, geophysical   data, in water column and at the bottom of  the ocean thus it is very difficult to incorporate the data into one database. 

Project Data Management is the provision of data management services to active research projects and is sometimes knows as ‘End-to-End’ data management. 

To construct the project data management policies and to address issues such as long-term archival of the data to ensure that  project makes a lasting contribution to marine science. Establish a coherent, credible, semi-distributed and scalable, end-to-end Data  Management (DM) plan with clear DM objectives, activities and timeline; a policy (e.g., delivery and exchange standards), widely agreed by the scientists and sponsors, and supported by funding agencies.

It includes a range of services and products, taking full advantage of best practices, standards or innovative approaches; several pro-active strategies addressing all needs and requirements.

Project Data Centre designed to establish guidelines, provide advice, and facilitate exchange of knowledge and expertise; several experienced, full-time national data managers / coordinators; , through existing infrastructures, capacity building, new international synergies  to insure compliance,  an increase of the overall value of scientific research and derived outputs.

At least 2-3 % of the project money should directly spent on data management activities to ensure compliance.

Technological requirements should be continually examined to avoid both too-heavy immediate requirements or longer term potential dead-ends which may leave present or future data users without access to data.

Some  times data centre may spend 90% of their effort on  to rescue  10% of the data ,but that is part of the point of having the data centre especially if otherwise value able data would go unreported. The best approach to minimize these efforts is  to  work more closely with data originators.

Data have various levels of quality,  Initially, the project paid little attention to this question  As the project matured, a subjective ranking system was implemented in the  data set,  to attempt to grade the quality of various data sets. Future efforts at collecting and entering data should explicitly consider  several dimensions, including accuracy, precision, reliability, and data source.

Data have errors (in the sense of mistakes), some of which can be caught and corrected before they enter the database, some of which after they enter the database, and some of which persist in the database.  We attempted to minimize these via a fairly conventional review process, but it would be worthwhile to assess this process and consider additional measures for quality assurance.  Cooperative quality control  by user and community expert feedback is extremely important, and needs to be facilitated so that data can be both promptly used and promptly reviewed.

An attempt is made to asses  data/ information available in public domain . The result  shows that more then 60% of marine  data/information  is being   available  in public  domain from   observational programs. The  rest   is   either lost or  partially   available in   publications .The reasons  are many  for loosing data /information, but  these are mainly due to    : (i) Data management component  is  missing in  project proposal/ observational programs ,(ii)  Observational programs are not well linked with data management activities ,(iii) There  are  communication gapes between data collectors (Scientists) and  data managers  and  (iv)  Funding agencies  are  not   monitoring   data  flow   from  data generating agencies  to the national data centers / archives. Therefore, the   data acquisition network  is partially  fail to trace/ monitor data flow effectively.

The overall  objective of the  DM  is to help and support the Project Investigators (Pis), project sponsors, funding agencies and end-users, to learn from the past and  achieve   the best science, for today and tomorrow.