Résumé : |
(Auteur) The storage and retrieval of spatial data in computer systems has matured greatly over recent years, from the earliest approaches (of simple digitised linework and text) to the representation of features and their attributes, with the semantics of their behaviour associated. This has led to massive cost savings where data, which might have been captured for a specific purpose, can be shared and reused for other purposes.
In this first generation of Geographic Information Systems (GIS), the data is stored locally, with each vendor using different nomenclature and definitions of spatial objects and having very different rules for what is accepted as ''valid''. As a result a scientist using a desktop GIS may need to expend a considerable portion of his/her research effort and funds in translating, cleaning and preparing pre-existing data to convert to the form required for the study.
For some years now, there has been a trend towards spatial data being housed within a database management system, these being considered as a corporate resource, leading to the realisation that the geographic data itself is in fact an infrastructure, in the same way as is, for example, a telephone network. This moves the ownership of the data from the desktop, firstly to the corporation, and ultimately to being a shared resource between public and private organisations - a Geographic Information Infrastructure (GII).
An inhibiting factor in these trends is the lack of standardisation alluded to above. Where every data sharing operation involves manual intervention, it is difficult, if not impossible to create a GII. Thus a strong and consistent set of standards is needed, with the most basic requirement being for consistency in the geometric concepts used. While progress is being made by groups such as the International Standards Organisation Technical Committee 211 (ISO TC211) and the Open Geospatial Consortium (OGC), there is still much to be done.
The success of these standardisation efforts has been compromised by the requirement to be vendor neutral - i.e. to avoid specifying an internal representation to be used for storage. For example, the standards will remain silent on whether coordinate values should be stored in floating point or integer format.
As a result, definitions of spatial objects are expressed in mathematical terms assuming an infinite precision real number system, with the details of how this is to be translated into the computational representation being left to the implementer. Therefore there is no agreed normative meaning of the ''equals'' predicate when applied to geometric objects, and definitions of validity are in general left to the implementers.
If the standardisation effort is to allow spatial data to be interchanged without expensive manual intervention, a well defined logic is needed to underpin the standards and support the definition of validity of that data. This would also ensure that inferences drawn from the digital model remain consistent and do not lead to logical fallacies.
The language of spatial databases is couched in the language of mathematics, with operations being given names such as ''union'' and ''intersection'' and using vector-like representations. This naturally leads to the impression that the representations form a topological and/or vector space. Unfortunately this is not the case. Generally speaking, the rigorous mathematics used in the definition of spatial objects ends outside the database representation, which is only an approximation of the theoretical formalism used to define it.
This thesis documents a number of cases that illustrate the potential breakdown of logic to be found in current technology, for example, cases where the union or intersection operations lead to inconsistent results. Various alternative approaches that have been investigated in search of solutions are discussed, and their advantages and disadvantages indicated.
This current research has been motivated by an attempt to apply the mathematical approach to the actual representation of spatial features within the computer system. In this rigorous approach, the assumptions (or ''axioms'') are clearly identified, and used to develop a chain of argument, leading to a proof of the required proposition. The advantage of this approach in the field of spatial data representation is that, if the computer hardware can be verified to obey the axioms, then the correct results of the algorithms are assured.
In order to facilitate such a chain of proof, a form of representation known as the regular polytope has been defined, based on a small set of axioms and definitions, and shown to possess a consistent and complete logic. That is to say, the computational representation itself expresses the algebraic formalism, rather than being an approximation to an idealized mathematical model.
Thus this representation is capable of providing a potential storage structure for a useful class of features, but this should not be seen as the sole object of the research. Rather the regular polytope should be seen as an exemplar for any approach to spatial data representation and storage.
The fact that this particular representation can be axiomatically defined and implemented demonstrates that such an approach is feasible, and opens the possibility that all computational representations can be similarly analysed. The regular polytope is a particularly tractable construct for this type of analysis, which is the reason for choosing it. By contrast the kind of structure embedded in many current systems is far more complex. In particular, floating point numbers add a significant level of complexity, and only the most basic topological behaviour has been proved where floating point operations are assumed.
Based on integer and domain restricted rational arithmetic, it is shown that the logic of topology, the Boolean connection algebra and the region connection calculus can be expressed directly by the database implementation. Thus a database built on this structure cannot suffer from the kinds of breakdown of logic discussed above. In addition, this raises the prospect of a definition of validity and robustness of representation that is not vendor specific.
A regular polytope representation of spatial objects is defined as the union of a finite set of (possibly overlapping) "convex regular polytopes", which are in turn defined as the intersection of a finite set of half spaces. These half spaces are defined by finite precision number representations. The term ''Regular Polytope'' here does not carry its conventional meaning as the generalisation of a regular polyhedron (one having equal sides, faces and angles etc.). In the form used here, it combines the topological term ''regular'' with the conventional geometric meaning of ''polyhedron''.
The actual definition is given in axiomatic form, structured so as to form a ''boundary free'' representation, valid in any number of dimensions. Although it is explored here principally in 3D, particular reference is made to the mixture of 2D and 3D found in many current application areas such as cadastral property boundaries. Particular attention is paid to the issue of connectivity, both within and between regular polytopes, and the resultant logic is developed in terms of well studied concepts such as the region connection calculus.
The particular representation chosen for the half space is such that adjoining regular polytopes will have no points in common, and no points will exist between them. Thus it is possible to define a complete partition of space where every point that can be represented computationally is defined to exist in one and only one region. In the traditional representations of 2D polygons and 3D polyhedrons, points play a very important role of carrying the metric information. This is in contrast to regular polytopes where points do not play a role in the definition at all. Instead the metric is specified via the half planes using 3 or 4 integers (in 2D and 3D respectively).
This theoretic basis is then applied to actual database schema design, and several alternative models proposed and analysed. As a check on the practicality of the algorithms, ''proof of concept'' classes have been developed in the Java programming language, and tested on a significant set of cadastral parcels (2D and 3D) from the Queensland cadastre.
Finally, further areas of research are identified, including extensions of the approach to wider problem domains. |