External Data Sources: R2RML and Scheduler

Dear all,

in relation to [1], could the use of the scheduler be a solution?

I mean would it be possible to schedule, for example, the script under “Complete Script for Creation of Linked Data Views” in [2] to be run via the scheduler e.g weekly?

[1] External Data Sources: R2RML
[2] https://medium.com/virtuoso-blog/customizing-r2rml-scripts-in-virtuoso-d57973d8fab

Yes, you can setup a scheduled event in the Virtuoso scheduler to re-run the R2RML script at a set interval to repopulate with new data.

Is the structure of your data changing in relation to these scheduled view generations? If not, then you wouldn’t need to schedule view definition and generation. All you need to do is focus on the data to which the views apply regarding schedule activities, e.g., adding records to tables already associated with RDF views.

But that would be the case for transient views. My objective would be to have physical views, since as stated in the documentation [1] transient views do not really perform.
Let me describe you my use case:

  • I have a source table (in PostgreSQL) with about 2M records. Records in the table can be daily added/deleted/modified
  • Structure of the data doesn’t change, so no need to change the R2RML mapping

How would you suggest to proceed?

[1] http://docs.openlinksw.com/virtuoso/rdb2rdftriggers/

You have to approach this from a SQL perspective first, i.e., determine if direct querying against the target data source produces acceptable response times. If it does, then attach the table to Virtuoso and repeat your query to see how performance compares.

A transient RDF View is simply re-writing SPARQL as SQL. The queries that include rdf:type relations are the most expensive queries when using RDF Views.

What are your observations bearing in mind the above?

Note: you could engage our professional services which offers the following advantages:

  1. Jointly accessible rendition of your setup, i.e., PostgresSQL and Virtuoso
  2. Our analysis and optimization of the setup

/cc @hwilliams

Hi Hugh,

I’d like to test a schedule event that should do the following:

DB.DBA.R2RML_MAKE_QM_FROM_G ('urn:mapping');
CLEAR GRAPH ('urn:physic');
RDF_VIEW_SYNC_TO_PHYSICAL ('http://localhost:8890/Test#', 1, 'urn:physic');
CLEAR GRAPH ('http://localhost:8890/Test#');

Unfortunately I’m not able to get it running. I guess I’m not using a right syntax.
Could you please help debugging this?
thanks

The DB.DBA.R2RML_MAKE_QM_FROM_G (‘urn:mapping’) call is incorrect and should be EXEC ('SPARQL ’ || DB.DBA.R2RML_MAKE_QM_FROM_G (‘urn:mapping’)); and you probably do not need to run it unless the R2RML script has changed.

The SPARQL CLEAR GRAPH(...) commands need to be prefixed with the keyword SPARQL to for the scheduler SQL interface to direct them the to SPARQL engine for execution.

All the commands should also be wrapped in a procedure for execution as a whole by the scheduler, i.e. –

CREATE PROCEDURE  RefreshPhysicalTriples()
{
  EXEC ('SPARQL ' ||  DB.DBA.R2RML_MAKE_QM_FROM_G ('urn:mapping'));
  SPARQL CLEAR GRAPH ('urn:physic');
  RDF_VIEW_SYNC_TO_PHYSICAL ('http://localhost:8890/Test#', 1, 'urn:physic');
  SPARQL CLEAR GRAPH ('http://localhost:8890/Test#');
}

Then just add the procedure name RefreshPhysicalTriples() (or whatever you call it) to the Scheduler as the SQL command to be executed at the scheduled time.

Hugh,

many thanks for the support! Can I use the isql interface in the Conductor to create the procedure?

I need to create the transient view at the beginning since at the end I delete it so that “DESCRIBE” only returns triples from the physical graph.

Thanks again

Yes, you can use the Conductor isql interface to create the procedure, then create the scheduled event via the Conductor scheduler interface. Note: you can also create scheduled events via SQL that can be scripted as detailed in the Scheduler Documentation.