19th International CODATA Conference
Category: Interoperability

Spatial data integration in GIS environment

Elżbieta Bielecka (elzbieta.bielecka@igik.edu.pl)
Institute of Geodesy and Cartography, Poland


Nowadays, there is a rapid growth of the availability of digital spatial data and a growing need to use it for all kinds of
GIS applications and to support the decision-making process. The development of communication technology makes it possible to collect datasets from a variety of sources and different types of application. It seems to be a lot of databases, datasets, and other geographical information like satellite images, aerial photographs, and maps and it also becomes possible for every user to share some spatial data, and not to collect it from the very beginning. Sharing data requires, first of all wide information about the scope of data, and the place where they are stored, furthermore translation from the original source of data into the user’s system and adaptation to specific GIS applications. The data adaptation process could be called data integration. Data integration is the most valuable function of GIS, and the data that is integrated meets user needs more precisely.

Data integration means combining of data files, datasets and databases originating from different sources into a one common database. Hence unification of codes, defining models of objects and data definitions is of the utmost importance. Integration of spatial data consists also in creating relations among various categories of descriptive and geometric data, as well as joining them.

Data integration is the most valuable function of GIS. Users should realize that the proper data integration usually requires a settlement of two conflicts: semantic and spatial. Resolution of semantic heterogeneity in GIS still requires more study in order to offer more efficient methodology. Spatial data integration requires extensive knowledge in the field of geomatics as well as technical infrastructure. Merging of different databases, datasets and data files is very complex, time consuming and expensive task. It should be solved in terms of geometry and topology. A very important aspect of spatial data integration is assurance data continuity and topology. Data mismatch can stem from many factors including incompatible projections, inconsistent map units, and different plotting scales. Differences in the relative age of data sets may mean differences in data collection methods and accuracy. The improper application of a datum to a dataset is an increasingly common and very important cause of data alignment problems. All these discrepancies and others should be removed during the integration process.

Data integration means also the implementation of vector, raster, TIN and other data models into one seamless geodatabase, and using them for analytical purposes and spatial modeling. Usually data integration is time consuming and expensive. As a result we have data well structured from an analytical perspective.

Steps towards an integrated geodatabase are as follows:

  1. Data transfer to the internal file format used in GIS software.
  2. Examining the data (entities), solving semantic conflicts.
  3. Transforming data to the fixed projection and the co-ordinate system; unifying map units.
  4. Spatial data merging (within one thematic layer):
  •    Generalization to provide similar data details
  •    Edge matching and map joining
  •    Error correction and entering missing data
  •    Forming topology
  •    Verification of data consistency and error correction
  •    Attaching attributes
  1. Vertical data matching (among different thematic layers covering the same area).
  2. Converting data to the appropriate data model.
  3. Indexing.

The afore mentioned steps describe the general problem of data integration. Some activities may be neglected according to the data diversification and existing discrepancies. However examination for solving semantic and spatial conflicts is always required.

The goal of this paper was to give an overview of the problems arising from dealing with dispersed data sources and to show some possible solutions. The database created for the purpose of delimitation of the Less-Favoured Areas in Poland will be set as an example. As this database covers the entire country, and over 10 different data sources were used, almost all problems connected with data integration occurred.