Rdb2rdf mapping is not persistent to server restart

fcarpani · October 17, 2018, 2:59pm

Hello.
I’m trying to authenticate users against virtuoso using two things:

A node js library
the /sparql-auth endpoint.

To recover data from users, I’ve made two sql views from the catalog: one for users and other for users-group relation

Also I made an rdb2rdf mapping to access these data from sparql endpoint.

When the mapping is loaded from conductor-linked data-rdb2rdf tab the validation fails with this error:

OVL validation has signalled VECSL: VECSL: Internal error, ssl refd before set, please report query to support on query select sample (?s), (if (count (1), "Info", "Error")) as ?severity, bif:concat ("Graph <", ?::source_g_iri, "> contains ", count (1), " triples") as ?message where { ?s ?p ?o } limit 1

If the validation is ignored, then the generation goes well and all works fine. But if the server is restarted, any query to the map is empty or delayed a very long time.

The behavior is the same if the mapping is loaded from isql using R2RML_MAKE_QM_FROM_G

What i’m doing wrong?

My mapping is the following:

mh:UserMapping	a rr:TriplesMap;
	rr:logicalTable
    [
      rr:tableSchema "DB";
      rr:tableOwner "DBA";
      rr:tableName  "memUsers"
    ];
    rr:subjectMap
    [
      rr:template "http://mh.uy/sys/user/{U_NAME}";
      rr:class mh:Sys_User;
      rr:graph mh:user
    ];
    rr:predicateObjectMap
    [
      rr:predicate userOnto:id;
      rr:objectMap [ rr:column "U_ID" ]
    ];
    rr:predicateObjectMap
    [
      rr:predicate userOnto:name;
      rr:objectMap [ rr:column "U_NAME" ]
    ];
    rr:predicateObjectMap
    [
      rr:predicate userOnto:fullName;
      rr:objectMap [ rr:column "U_FULL_NAME" ]
    ];
	rr:predicateObjectMap
	[
	 rr:predicate userOnto:uIsRole ;
	 rr:objectMap [ rr:column "U_IS_ROLE" ]
   ] ;
	rr:predicateObjectMap
	[
		rr:predicate userOnto:eMail ;
	 	rr:objectMap [ rr:column "U_E_MAIL" ]
	] ;
rr:predicateObjectMap
	[
		rr:predicate userOnto:group ;
		rr:objectMap [ rr:column "U_GROUP" ]
	] ;
rr:predicateObjectMap
	[
		rr:predicate userOnto:loginTime ;
		rr:objectMap [ rr:column "U_LOGIN_TIME" ]
	] ;
rr:predicateObjectMap
	[
		rr:predicate userOnto:accountDisabled ;
		rr:objectMap [ rr:column "U_ACCOUNT_DISABLED" ]
	] ;
rr:predicateObjectMap
	[
		rr:predicate userOnto:defPerms ;
		rr:objectMap [ rr:column "U_DEF_PERMS" ]
	] ;
rr:predicateObjectMap
	[
		rr:predicate userOnto:home ;
		rr:objectMap [ rr:column "U_HOME" ]
	] ;
	rr:predicateObjectMap
		[
			rr:predicate userOnto:inGroup ;
			rr:objectMap [ 
                rr:template "http://mh.uy/sys/user/{G_NAME}";
                rr:termType rr:IRI
             ];
		] .

Thanks
FDO.

hwilliams · October 18, 2018, 12:47pm

@fcarpani: What is the schema for the rr:tableName "memUsers" table? Is this a copy of the Virtuoso sys_users table for user account management? Also, what is the mh: prefix definition?

fcarpani · October 18, 2018, 2:15pm

Hello.
The table memUsers is the following view:

 create view DB.DBA.memUsers
AS
select G.u_name G_NAME,U.*
from db.dba.sys_users U,
      db.dba.sys_users G
where U.u_group=G.u_id;

This is in order to get the names of the main group of the user.

The prefixes are (sorry… I was sure to put the prefixes but I didn’t do it !):

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix mh: <http://mh.uy/sys/> .
@prefix userOnto: <http://mh.uy/sys/user#> .

Thanks.
FDO.

hwilliams · October 20, 2018, 1:30pm

@fcarpani: I have been able to recreate the error:

SQL> ttlp (file_to_string_output ('./fdo.ttl'), 'fdo.ttl', 'fdo.ttl');

Done. -- 3 msec.
SQL> DB.DBA.OVL_VALIDATE ('fdo.ttl', 'http://www.w3.org/ns/r2rml#OVL');
SUBJ     SEVERITY  MESSAGE
VARCHAR  VARCHAR  VARCHAR
_______________________________________________________________________________

NULL     Error    OVL validation has signalled VECSL: VECSL: Internal error, ssl refd before set, please report query to support on query  select sample (?s), (if (count (1), "Info", "Error")) as ?severity,
        bif:concat ("Graph <", ?::source_g_iri, "> contains ", count (1), " triples") as ?message
      where { ?s ?p ?o } limit 1
NULL     Info     There may be more errors but the validation is interrupted

2 Rows. -- 2 msec.
SQL>

and reported this to development to look into and fix.

You mentioned that after a restart of the server any queries to the map is empty, but I find querying the loaded Virtual Graph does return results after restart:

SQL> sparql select count(*) from <fdo.ttl> where {?s ?p ?o};
callret-0
INTEGER
_______________________________________________________________________________

54

1 Rows. -- 1 msec.
SQL>

Thus what is an example query that you are running ?

fcarpani · October 20, 2018, 1:47pm

I have some terms error. Sorry… and also sorry by my English…

The mapping is still with data after the reboot. But the result graph it isn’t.

The query:

select *
from mh:user
where { ?s ?p ?o }

has a result with data when is executed without restart the server after the load of the mapping.
But after the restart, the same query has no data.

I’ve tried to load the map using conductor, and following after the process after the error, an also tried from isql with the next statements (taken from docs):

SPARQL CLEAR GRAPH <http://mh.uy/sys/user#> ;
SPARQL CLEAR GRAPH <http://mh.uy/sys/user> ;
DB.DBA.TTLP (file_to_string_output('/tmp/initvirt/userMappingWGroup_V1.n3'),'','http://mh.uy/sys/user#');
EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://mh.uy/sys/user#'));

where the file in TTLP has the mapping.

If I execute this, the previous query has data until a restart of the server.

I hope you can see whats wrong…

Thanks.
FDO.

hwilliams · October 20, 2018, 5:21pm

@fcarpani: On my Virtuoso open source test database the data from the R2RML mapping does persist post restart of the database:

SQL> SPARQL prefix mh: <http://mh.uy/sys/> select * from mh:user where { ?s ?p ?o };
s                                                                                 p                                                                                 o
VARCHAR                                                                           LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://mh.uy/sys/user/PKI                                                         http://mh.uy/sys/user#id                                                          109
http://mh.uy/sys/user/PROXY                                                       http://mh.uy/sys/user#id                                                          105
http://mh.uy/sys/user/SIMILE                                                      http://mh.uy/sys/user#id                                                          108
http://mh.uy/sys/user/SPARQL                                                      http://mh.uy/sys/user#id                                                          107
.
.
.
http://mh.uy/sys/user/dav                                                         http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://mh.uy/sys/Sys_User
http://mh.uy/sys/user/dba                                                         http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://mh.uy/sys/Sys_User
http://mh.uy/sys/user/nobody                                                      http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://mh.uy/sys/Sys_User

129 Rows. -- 9 msec.
SQL>

What is the version of your Virtuoso database ? As I am testing the latest with a binary from the git stable/7 release made in Aug 2018:

SQL> status('');
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3229-pthreads for Darwin as of Aug 15 2018

Thus if you are not using the latest stable/7 build I would suggest upgrading that version and retest …

fcarpani · October 20, 2018, 6:35pm

Hello.
Thats my version:

OpenLink Virtuoso  Server
Version 07.20.3230-pthreads for Linux as of Oct 17 2018 
Started on: 2018-10-20 10:46 GMT-3

I’ve compiled a few days ago.
Now I’m checking the git branch and is develop/7 !

Thanks !!!
I’m changing to stable/7 and testing
I’ll post the results
THANKS again

FDO.

fcarpani · October 20, 2018, 9:22pm

Hello again.

I’ve recompiled a stable/7 branch downloaded today from github.

Now status is:
OpenLink Virtuoso Server
Version 07.20.3229-pthreads for Linux as of Oct 20 2018
Started on: 2018-10-20 16:52 GMT-3

And follows the information.

The behaviour is exactly the same as in develop/7.

I load the mapping and works fine (in any of the two ways). It works fine until the server is restarted. Then, the query waits a lot of time and then I got can’t connect from the browser.

For the compilation, I’m generating the rpm in fedora 25 with this spec file.

The output of configure is in this link

The RDB2RDF vad was compiled with all software.

I think that my problem, perhaps, is my virtuoso.ini file.

I notice now (cleaning the old log) that I have the follwing lines:

18:11:01 built-in procedure "DB.DBA.RDF_PROXY_GET_HTTP_HOST" overruled by the RDBMS
18:11:01 built-in procedure "DB.DBA.RDF_SPONGE_IRI_SCH" overruled by the RDBMS
18:11:01 built-in procedure "DB.DBA.RDF_PROXY_ENTITY_IRI" overruled by the RDBMS
18:11:01 built-in procedure "WS.WS.SYS_DAV_RES_RES_CONTENT_INDEX_HOOK" overruled by the RDBMS
18:11:02 Roll forward started
18:11:02     7 transactions, 2543 bytes replayed (100 %)
18:11:02 Roll forward complete
18:11:03 Error executing a server init statement : 22023: Uninitialized property qmvColumnsFormKey in JSO instance <sys:qmv-f36e4cff4e3627c0f9e056fa0d0f4329> of type <http://www.openlin
ksw.com/schemas/virtrdf#QuadMapValue> --  DB.DBA.RDF_QUAD_FT_UPGRADE ()

Also, I notice a real silly thing: In conductor → linked data → namespaces I can see prefixes which cannot be recognized from a sparql query.

I don’t know.

My next attemp will be remove the RPM then delete any directory and reinstall the RPM. But I clean the /var/lib/virtuoso/db in any time.

Thanks.
FDO.

fcarpani · October 21, 2018, 11:55pm

Hello !!!
SOLVED !!!
I put here all the path to solve this:

Remove the instalation using dnf.
dnf remove virtuoso-opensource
Remove (or rename) all directories:

/var/lib/virtuoso
/var/log/virtuoso
/user/share/virtuoso

Install the rpm using dnf:
dnf install ./virtuoso-opensource-7.2.x.....rpm
Start the service:
systemctl start virtuoso
Install packages with conductor: rdf_mappers,fct, framework and rdb2rdf . I think that only need rdb2rdf to this.
Declare the view (See previous message in this thread)
Load the map from isql:
DB.DBA.TTLP (file_to_string_output('/home.../userMappingWGroup_V1.n3'),'','http://mh.uy/sys/user#');
Create the data using the mapping from isql:
EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://mh.uy/sys/user#'));

At this step, if you query the data, you got a result. If you restart the server you got an error…

So… I repeat all steps and added a 9) step…
9) In isql, inmediatly of the 8) step command do the following command:
checkpoint;

Restart the server and ALL WORKS FINE !!!

Possibly I got some wrong parameter in my virtuoso.ini… But it’s possible that I must modify my virtuoso.service: I notice that file there’s no ExecStop keyword.

Thanks.
FDO.

TallTed · October 23, 2018, 9:27pm

That manual CHECKPOINT; in step #9 does explain a lot.

(You will notice, if you review the virtuoso.log, that there was a CHECKPOINT; at the end of each VAD package installation!)

When you make changes to the database (significantly, your steps #6, #7, and #8), you build up a .trx file which persists until a CHECKPOINT; is run. This .trx file holds all changes relative to the .db file which was previously written to disk, and is read from disk at instance launch.

The CHECKPOINT; persists all your changes to the .db file, and the .trx file is thus reduced to zero, until the next change you make.

Without the CHECKPOINT;, at each relaunch of the instance, the entire .trx file must be replayed – very much as if the transactions were bring performed for the first time – and this takes time. The queries you were running after relaunch were not taking a long time on their own – they were delayed by the active process of the .trx file being replayed.

By default, the virtuoso.ini is configured with a 60 minute checkpoint interval. This should be overridden for large transactions (such as data loads) that take longer than 60 minutes to perform, but once those transactions are complete, a manual CHECKPOINT; should generally be run, and the CheckpointInterval should generally be returned to 60.

fcarpani · October 23, 2018, 9:45pm

OK. Something like that was my supposition.
Now I’ve reduced the minCheckpointInterval.
But really, i’m using the sparql-auth endpoint to create a data graph with annotations of images.
This annotations are created by students using forms in a browser and the access to virtuoso is done using a nodejs (express) server. This process send sparql queries and inserts to virtuoso. This system will have some tens of users (is not massive).

I’m thinking that is not a good idea use the sparql-auth enpoint… how can I be sure that the data are accesible to other connection? Can I send a checkpoint (and commit) vía the sparql-auth endpoint?

I’m thinking that migth be necessary a migration from this endpoint to a SQL connection (with intensive use of sparql) where I can do some transactional work in order to be sure the data are accesible.
Also i think that a drastic reduction (seconds) of he checkpoint interval may help, but, this can impact on the performance.

Any suggestion is welcome… this is my first “serious” system with virtuoso-opensource as backend.

THANKS !
FDO.

TallTed · October 24, 2018, 1:10am

A 60 minute CheckpointInterval is generally fine. A radically sorter interval (10 minutes or less) would very rarely make sense. A longer interval often makes sense, especially if most user interaction is reading, rather than writing, and/or if a manual CHECKPOINT; will be run after any significant write (whether that’s an INSERT, UPDATE, DELETE, or otherwise).

There is no reason that the sparql-auth endpoint should not be used in your described scenario. It is in fact recommended for use when SPARQL-Update (a/k/a SPARUL) queries are being run. That said, running all your SPARQL queries within SQL connections (so as SPASQL, or SPARQL-in-SQL) is also viable, because SQL connections always require user authentication.

CHECKPOINT; and COMMIT; are SQL functions, and so not available through SPARQL endpoints. They should be run through iSQL or other SQL-based connections (ODBC, JDBC, ADO.NET, etc.). This is another reason that automatic checkpoints are typical.

Rather than feeling your way along through experimentation (which is naturally prone to at least as many failures as successes), I would suggest that you start a new thread and provide us with a bigger-picture idea of what you’re trying to achieve, such that we can advise on how best to proceed.