Non-terrestrial geo-literals


#1

Hugh Williams said “We have a 900+ million triple Wikidata dataset loaded into the DBpedia instance we host for the community, the dataset of which consisted of about 140 ntriple bz2 compressed split files.”

As I indicated in my previous post I have loaded Wikidata into Virtuoso. I am using the open source version because I had to patch Virtuoso to accept non-terrestrial geo-literals.

How can one load Wikidata into Virtuoso without patching it?


#2

@pfps: What is the actual problem with “Non-terrestrial geo-literals” and what is the patch you made to enable the load to proceed? I am not aware of problems loading the Wikidata dataset we loaded into DBpedia, but then it probably doesn’t have the problem data that’s found in the much larger dataset you are seeking to load.


#3

I believe that I reported this as a bug several years ago but I can’t find the report just now.

There is also a separate report in

The “fix” that I performed was to comment out the two places that look at geometry literals in libsrc/Wi/rdfbox.C. This was enough to allow Virtuoso to load Wikidata without problems, and use Wikidata without problems as well. I’m not doing anything with geometry literals so far so maybe something will break if I try to do geometry-based retrievals.


#4

@pfps: I recall the git issue recent discussiona about wikidata datasets with non-earth coordinates and was wondering if this might be related …

What exactly is the change that you made in libsrc/Wi/rdfbox.c? Do you have a diff to show exactly what was changed?

In the git#295 issue, it is indicated that those Wikidata non-terrestrial values were invalid URIs from my reading of the latest updates added there …


#5

I don’t have an actual diff, but what I did was comment out two groups of two lines in libsrc/Wi/rdfbox.c as follows:

  /*pfps  if (RDF_BOX_GEO_TYPE == type && DV_GEO != box_dtp && DV_LONG_INT != box_dtp)
    sqlr_new_error ("42000",  "RDFGE",  "RDF box with a geometry RDF type and a non-geometry content"); */


  /* pfps if (type == RDF_BOX_GEO && box_dtp != DV_GEO)
     sqlr_new_error ("22023", "SR559", "The RDF box of type geometry needs a spatial object as a value, not a value of type %s (%d)", dv_type_title (box_dtp), box_dtp); */

I’ll get the actual error I ran into shortly.


#6

On an unmodified Virtuoso I get

error 42000 TURTLE RDF loader, line 2778442: RDFGE: RDF box with a geometry RDF type and a non-geometry content

The literal is for the location of the Beer crater on Mars, and is
"<http://www.wikidata.org/entity/Q111> Point(351.83 -14.47)"^^geo:wktLiteral (the IRI is enclosed in <> as required, but they don’t show up here).

The lexical form is in the lexical space of geo:wktLiteral so Virtuoso should not be complaining, or at least not complaining in this way.


#7

@pfps: Will have development review and comment on the viability of the change you made in libsrc/Wi/rdfbox.c and if the this could possibly be made configurable …


#8

@hwilliams /cc @pvk @imikhailov @kidehen

As I noted in issue#295, the user is responsible for avoiding nonsense data.

As I also noted there, even though the current Wikidata dumps (among other data) will be problematic at various points in future (because the URIs they’re using to designate some coordinate reference systems (CRS) do not actually dereference to such definitions), that data should still load without error, as these are valid geo:wktLiteral according to that spec

8.5 Requirements for WKT Serialization (serialization=WKT)

This section establishes the requirements for representing geometry data in RDF based on WKT as defined by Simple Features [ISO 19125-1].

8.5.1 RDFS Datatypes

This section defines one RDFS Datatype: http://www.opengis.net/ont/geosparql#wktLiteral.

RDFS Datatype: geo:wktLiteral

geo:wktLiteral a rdfs:Datatype;
   rdfs:isDefinedBy <http://www.opengis.net/spec/geosparql/1.0>;
   rdfs:label "Well-known Text Literal"@en;
   rdfs:comment "A Well-known Text serialization of a geometry object."@en .

Req 10 All RDFS Literals of type geo:wktLiteral shall consist of an optional URI identifying the coordinate reference system followed by Simple Features Well Known Text (WKT) describing a geometric value. Valid geo:wktLiterals are formed by concatenating a valid, absolute URI as defined in [RFC 2396], one or more spaces (Unicode U+0020 character) as a separator, and a WKT string as defined in Simple Features [ISO 19125-1].

/req/geometry-extension/wkt-literal

For geo:wktLiterals, the beginning URI identifies the spatial reference system for the geometry. The OGC maintains a set of CRS URIs under the http://www.opengis.net/def/crs/ namespace. This leading spatial reference system URI is optional. In the absence of a leading spatial reference system URI, the following spatial reference system URI will be assumed:

<http://www.opengis.net/def/crs/OGC/1.3/CRS84>

This URI denotes WGS 84 longitude-latitude.

Req 11 The URI http://www.opengis.net/def/crs/OGC/1.3/CRS84 shall be assumed as the spatial reference system for geo:wktLiterals that do not specify an explicit spatial reference system URI.

/req/geometry-extension/wkt-literal-default-srs

Req 12 Coordinate tuples within geo:wktLiterals shall be interpreted using the axis order defined in the spatial reference system used.

/req/geometry-extension/wkt-axis-order

The example geo:wktLiteral below encodes a point geometry using the default WGS 84 geodetic longitude-latitude spatial reference system for Simple Features 1.0:

"Point(-83.38 33.95)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

A second example below encodes the same point using <http://www.opengis.net/def/crs/EPSG/0/4326>: a WGS 84 geodetic latitude-longitude spatial reference system (note that this spatial reference system defines a different axis order):

"<http://www.opengis.net/def/crs/EPSG/0/4326>
Point(33.95 -83.38)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

Req 13 An empty RDFS Literal of type geo:wktLiteral shall be interpreted as an empty geometry.

/req/geometry-extension/wkt-literal-empty

Errors will be appropriate when using GeoSPARQL and other SQL/MM or SPARQL/MM comparison functions against these geo:wktLiteral values which have URIs in the CRS position that do not dereference to CRS definitions and/or when the CRS definitions do not provide sufficient information for translation between the different CRS in geo:wktLiteral values which are being compared – but the time for such CRS evaluation is not data-load.


#9

Indeed it is possible to comment out “unwanted” error messages, however that will result in subtle troubles with idnexing that data and with use of spatial predicates. There’s no easy fix for that in code, it might be better to change the datatype or to mimic Earth data by changing the CRS. Sorry we don’t support something that is supposed to be a Martian coordinate system, ditto a Martial calendar system. So far, Mars is a problematic market; say, NASA sent 5 or 6 perfect robots there but sold none.


#10

But much better would be to actually handle these geo-literals, which after all are compliant to spec.

Better than terminating loading when such a literal is encountered would be to provide a warning message and do something useful like not creating the triple.


#11

@imikhailov -

This data should load without issue.

The geo:wktLiterals all conform to the spec. There is nothing in that spec that mandates terrestrial CRS, nor any specific list(s) nor registries of CRS URIs. See above. The only requirement is that the CRS be designated by “a valid, absolute URI as defined in [RFC 2396].”

Issues may come when someone tries to perform operations involving geo:wktLiterals with CRS URI(s) which Virtuoso does not understand – and that’s when errors should be raised, saying so. It’s fine at that point to say, “Virtuoso doesn’t support using <http://www.wikidata.org/entity/Q111> as a CRS”, or "Virtuoso doesn’t have a way to compare geodata in CRS <http://www.wikidata.org/entity/Q111> against geodata in CRS <http://www.opengis.net/def/crs/OGC/1.3/CRS84>", or "CRS <http://www.wikidata.org/entity/Q111> doesn’t have a defined relationship to CRS <http://www.opengis.net/def/crs/OGC/1.3/CRS84>" or whatever – but it is not OK to raise these objections when ingesting the data, which is perfectly well formed and may well be comprehensible at some later point in time.