19th
International CODATA Conference
Category: Interoperability
Spatial
data integration in GIS environment
Elżbieta Bielecka (elzbieta.bielecka@igik.edu.pl)
Institute of Geodesy and Cartography,
Poland
Nowadays, there is a rapid growth of the availability of digital spatial data
and a growing need to use it for all kinds of GIS applications and
to support the decision-making process. The development of communication technology
makes it possible to collect datasets from a variety of sources and different
types of application. It seems to be a lot of databases, datasets, and other
geographical information like satellite images, aerial photographs, and maps
and it also becomes possible for every user to share some spatial data, and
not to collect it from the very beginning. Sharing data requires, first of all
wide information about the scope of data, and the place where they are stored,
furthermore translation from the original source of data into the user’s system
and adaptation to specific GIS applications. The
data adaptation process could be called data integration. Data integration is
the most valuable function of GIS, and the data that
is integrated meets user needs more precisely.
Data integration means combining of data files, datasets and
databases originating from different sources into a one common database. Hence
unification of codes, defining models of objects and data definitions is of
the utmost importance. Integration of spatial data consists also in creating
relations among various categories of descriptive and geometric data, as well
as joining them.
Data integration is the most valuable function of GIS. Users should realize that the proper
data integration usually requires a settlement of two conflicts: semantic and
spatial. Resolution of semantic heterogeneity in GIS still requires more study in order to offer more efficient
methodology. Spatial data integration requires extensive knowledge in the field
of geomatics as well as technical infrastructure.
Merging of different databases, datasets and data files is very complex, time
consuming and expensive task. It should be solved in terms of geometry and topology.
A very important aspect of spatial data integration is assurance data continuity
and topology. Data mismatch can stem from many factors including incompatible
projections, inconsistent map units, and different plotting scales. Differences
in the relative age of data sets may mean differences in data collection methods
and accuracy. The improper application of a datum to a dataset is an increasingly
common and very important cause of data alignment problems. All these discrepancies
and others should be removed during the integration process.
Data integration means also the implementation of vector,
raster, TIN and other data models into one seamless
geodatabase, and using them for analytical purposes
and spatial modeling. Usually data integration is time consuming and expensive.
As a result we have data well structured from an analytical perspective.
Steps
towards an integrated geodatabase are as follows:
- Data transfer to the internal file format used in GIS software.
- Examining the data (entities), solving semantic conflicts.
- Transforming data to the fixed projection and the co-ordinate system; unifying
map units.
- Spatial data merging (within one thematic layer):
-
Generalization to provide similar data details
-
Edge matching and map joining
-
Error correction and entering missing data
-
Forming topology
-
Verification of data consistency and error correction
-
Attaching attributes
- Vertical data matching (among
different thematic layers covering the same area).
- Converting data to the appropriate
data model.
- Indexing.
The afore mentioned steps describe
the general problem of data integration. Some activities may be neglected according to the data diversification
and existing discrepancies. However examination for solving semantic and spatial
conflicts is always required.
The goal of this paper was to give an overview of the problems
arising from dealing with dispersed data sources and to show some possible solutions.
The database created for the purpose of delimitation of the Less-Favoured
Areas in Poland will be set as an example. As this
database covers the entire country, and over 10 different data sources were
used, almost all problems connected with data integration occurred.