External Data Sources: R2RML and Scheduler

p1d1d1 · July 18, 2019, 7:34am

Dear all,

in relation to [1], could the use of the scheduler be a solution?

I mean would it be possible to schedule, for example, the script under “Complete Script for Creation of Linked Data Views” in [2] to be run via the scheduler e.g weekly?

[1] External Data Sources: R2RML
[2] Customizing R2RML Scripts in Virtuoso | by Hugh Williams | OpenLink Virtuoso Weblog | Medium

hwilliams · July 18, 2019, 10:22am

Yes, you can setup a scheduled event in the Virtuoso scheduler to re-run the R2RML script at a set interval to repopulate with new data.

kidehen · July 19, 2019, 2:04am

Is the structure of your data changing in relation to these scheduled view generations? If not, then you wouldn’t need to schedule view definition and generation. All you need to do is focus on the data to which the views apply regarding schedule activities, e.g., adding records to tables already associated with RDF views.

p1d1d1 · July 19, 2019, 7:11am

But that would be the case for transient views. My objective would be to have physical views, since as stated in the documentation [1] transient views do not really perform.
Let me describe you my use case:

I have a source table (in PostgreSQL) with about 2M records. Records in the table can be daily added/deleted/modified
Structure of the data doesn’t change, so no need to change the R2RML mapping

How would you suggest to proceed?

[1] 16.17.17. RDB2RDF Triggers

kidehen · July 19, 2019, 2:53pm

You have to approach this from a SQL perspective first, i.e., determine if direct querying against the target data source produces acceptable response times. If it does, then attach the table to Virtuoso and repeat your query to see how performance compares.

A transient RDF View is simply re-writing SPARQL as SQL. The queries that include rdf:type relations are the most expensive queries when using RDF Views.

What are your observations bearing in mind the above?

Note: you could engage our professional services which offers the following advantages:

Jointly accessible rendition of your setup, i.e., PostgresSQL and Virtuoso
Our analysis and optimization of the setup

/cc @hwilliams

p1d1d1 · July 29, 2019, 2:09pm

Hi Hugh,

I’d like to test a schedule event that should do the following:

DB.DBA.R2RML_MAKE_QM_FROM_G ('urn:mapping');
CLEAR GRAPH ('urn:physic');
RDF_VIEW_SYNC_TO_PHYSICAL ('http://localhost:8890/Test#', 1, 'urn:physic');
CLEAR GRAPH ('http://localhost:8890/Test#');

Unfortunately I’m not able to get it running. I guess I’m not using a right syntax.
Could you please help debugging this?
thanks

hwilliams · July 30, 2019, 10:18am

The DB.DBA.R2RML_MAKE_QM_FROM_G (‘urn:mapping’) call is incorrect and should be EXEC ('SPARQL ’ || DB.DBA.R2RML_MAKE_QM_FROM_G (‘urn:mapping’)); and you probably do not need to run it unless the R2RML script has changed.

The SPARQL CLEAR GRAPH(...) commands need to be prefixed with the keyword SPARQL to for the scheduler SQL interface to direct them the to SPARQL engine for execution.

All the commands should also be wrapped in a procedure for execution as a whole by the scheduler, i.e. –

CREATE PROCEDURE  RefreshPhysicalTriples()
{
  EXEC ('SPARQL ' ||  DB.DBA.R2RML_MAKE_QM_FROM_G ('urn:mapping'));
  SPARQL CLEAR GRAPH ('urn:physic');
  RDF_VIEW_SYNC_TO_PHYSICAL ('http://localhost:8890/Test#', 1, 'urn:physic');
  SPARQL CLEAR GRAPH ('http://localhost:8890/Test#');
}

Then just add the procedure name RefreshPhysicalTriples() (or whatever you call it) to the Scheduler as the SQL command to be executed at the scheduled time.

p1d1d1 · July 30, 2019, 10:35am

Hugh,

many thanks for the support! Can I use the isql interface in the Conductor to create the procedure?

I need to create the transient view at the beginning since at the end I delete it so that “DESCRIBE” only returns triples from the physical graph.

Thanks again

hwilliams · July 30, 2019, 11:00am

Yes, you can use the Conductor isql interface to create the procedure, then create the scheduled event via the Conductor scheduler interface. Note: you can also create scheduled events via SQL that can be scripted as detailed in the Scheduler Documentation.