Business Development -- Bayer

in light of current communications with Bayer and also new enquiry from
“Steve” steven.schaffer@bayer.com

Hi… I am the architect on a project at Bayer to select a strategic, enterprise-wide Data Virtualization engine. I was asked to contact OpenLink SW as a potential candidate for this requirement. Please contact me to discuss an overview discussion and possible demo. Thanks

I have created this post to provide insight into Bayers current implementations

Bayer Projects with commercial licenses:

5 x OpenLink licenses serial numbers are #
12460-a /12460-b /12460-c /12460-d/12460-e

  • Production Virtuoso HA cluster with 3 x instance cluster nodes
  • QA Virtuoso 1 x instance node
  • Dev Virtuoso 1 x instance node

Current Organization Structure of Bayer Group

I have also been informed that DINOS Project is also using Virtuoso - I currently investigating this as I do not have any Project or Team details.

@kidehen @hwilliams @Carolinei @danielhm @TallTed

@kidehen

I had drafted this to send to Steve Schaffer’s in response to his enquiry

Hello Steve

Great to hear you are considering Virtuoso for your project.

As you may already know as a Multi-Model RDBMS, Virtuoso can function as a powerful tool for enhancing data access, integration, conceptual virtualization, and management oriented interactions across data represented as Relational Tables and/or RDF sentence collections (or Graphs).
As such I am sure Virtuoso will be an invaluable addition when add to your Enterprise-Wide solutions.

Could you provide a detailed overview of your project so we can obtain a clearer understanding of your requirements?

Out of interest are you aware of the current installations of Virtuoso running in Bayer Cropscience/Bayer Business Services in Germany?

Regards
Sonja

Steve’s enquiry

Hi… I am the architect on a project at Bayer to select a strategic, enterprise-wide Data Virtualization engine. I was asked to contact OpenLink SW as a potential candidate for this requirement. Please contact me to discuss an overview discussion and possible demo. Thanks

Hi Steve,

Great to hear you are considering Virtuoso for your project.

As you may already know, Virtuoso is a powerful platform for conceptual data virtualization that manifests as a secure, high-performance, and scalable Semantic Web of Linked Data.

I would certainly like to arrange a session to discuss your needs in more detail, so please let me know what dates and times work for you.

BTW – Virtuoso is in use across various projects within Bayer Cropscience/Bayer Business Services in Germany :slight_smile:

Regards
Sonja

1 Like

@kidehen @hwilliams

I received feedback from Steve re: availability he also provided this for us to digest and review:

Data Federation and Proofing Presentation

Sonja,
I can do a call/meeting next week Tuesday or Wednesday anywhere between
1:00 and 4:00 in the afternoon. Let me know what works for you or just
send me an invite. Thanks for your responsiveness. I am certainly
interested in knowing about use of Virtuoso within Bayer. Any use of
Virtuoso at Monsanto ??? Also of interest.

I responded re: setting agenda for meeting and also asked about attendees from Bayer’s side which is when Steven provided the Data Federation and Proofing Presentation slides

Agenda outline:

  1. Introduction/Overview from Bayers detailing strategic plans/requirements to introduce Virtuoso enterprise-wide.

    Comment from Steve:

    Attached my “Approach” document that explains the objective and general technical requirements . I have the strategy in mind, but not documented, and it just changed yesterday, so we can discuss this on the call

  2. Introduce OpenLink and Virtuoso

    Comment from Steve

    I’m just wrapping up the second phase of the project: (five) vendor solution overviews/customized demos. I get anywhere from six to sixteen participants, depending on the level of interest and availability; we are moving into vacation time, so I can’t predict. Typically, we get architects, project managers, business people, project team members, etc.

    Keep in mind that I the majority, if not all but me, are located in Germany, with a six hour difference, so it has to be in the morning between 9 and 12 US eastern time. Please propose two date/time slots in the near future and I will check with team availability.

    Usually, I have a pre-meeting with the vendor contact to discuss all of this (background), rather than doing it with a team of ten or fifteen that have heard this multiple times. I’m flexible, so if you want to forego this initial discussion, I’m OK on that.

FYI

The change of strategy that has happened yesterday may have some links to recent news relating to 12,000 job cuts at Bayer:

Mosanto and Bayer Crop Sciences are in the same business segment.

@kidehen @hwilliams
Meeting scheduled with Steven Schaffer for Wednesday 5th December 930am EST (1hour session)

Agenda - Steven will be providing an Overview of project as detailed in the following Data Federation and Proofing Presentation and details on current Strategy moving forward.

I have also uploaded the presentation to google docs slides and shared with you both along with @TallTed @danielhm

https://docs.google.com/presentation/d/1liLUkI6tz6dbI-NzZ8h8x50MiX4b4QiZVrvAc_d46r0

@kidehen @hwilliams I will secure a date and time with Steven for the Overview/Customised Demo week commencing 17th December 0930EST time slot which will work with the team in Germany

@hwilliams to prepare for the customised demo what other information would you need other than the Two excel spreadsheets?

The Data As An Asset Project that his managers manager is involved in is the project that purchased licenses for Virtuoso in August.

Additional Notes from Kingsley regarding the meeting:

Managing Data As An Asset
This presentation shows best practices in establishing and sustaining enterprise-wide data quality management.
URIBurner View

“Data as an Asset” is a theme to research across various companies e.g., BASF, Syngenta (we should be doing more in this massive org), various Banks etc…

#MDM and #DataGovernance key hashtags for tracking across Twitter and LinkedIn.

Steven needs to be connected to his colleagues (our contacts) in Germany. It will shorten matters. Also note they are senior to him.

Right now, he is taking a very long route i.e., one that the others are way beyond. One problem is that he dismisses the importance of entity relationship semantics because he is looking at things from an old perspective re Semantic Web.

The lesson here is this:

  • Enterprise Architects haven’t bought into the virtues of a Semantic Web of Linked Data.

  • Why? Because they don’t actually understand its practical utility.

  • How come? Because they are looking at the first coming of the Semantic Web meme circa., 1999 — 2000, they don’t see what’s currently in place as exemplified by Virtuoso.

This is our big challenge, across the board.

@snevitt: Need to know exactly what is to be demo’ed before committing, thus the excel spreadsheets mentioned and associated descriptions …

@snevitt,

Note, Tata Consultant should be a lead into figuring out our target for Tata becoming a Virtuoso OEM, Reseller, or Integration Partner.

Atos, Tata, Infosys, Cognizant, Northrop Gruman, Raytheon, BAE, etc… are all the same kinds of targets.

/cc @danielhm @Carolinei

@kidehen: I have taken the 2 spreadsheets Bayer provided, containing one row of sample data in each, saved them as CSV files, loaded one into our Oracle and the other into SQLServer test database instance in the US office, and then attached them into the demo.openlinksw.com instance:

SQL> select * from Oracle.HR.ADAM;
USUBJID           AESEQ             STUDYID           DOMAIN            SITEID            SUBJID            AETERM            AEDECOD           AEBODSYS          AESEV             AESER             AESERN            AEREL             DSREASAE         ONTRTFL           TRTEMFL           AEFN              BODFN             DECODFN           SAEFN            SDECODFN         SBODFN           AEDERMFN          AESTDT            AESTDTF          AESTDY            ANLSTDY           AEENDT           AEENDY           TRTP              TRTPCD            TRTPN             AGE               AGEGRP            AGEGRPN           RACE              RACEN             SEX               SAFETY            ITT               EFFICACY          COMPLT24          AEDICT            AESCAN            AESCONG           AESDISAB          AESDTH            AESHOSP           AESLIFE           HLGTERM           HLTERM            LLTERM            LSTDOSDT          TRTDUR            TRTSTDT
VARCHAR NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL  VARCHAR          VARCHAR NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  DECIMAL NOT NULL  DECIMAL NOT NULL  VARCHAR          VARCHAR          VARCHAR          DECIMAL NOT NULL  VARCHAR NOT NULL  VARCHAR          DECIMAL NOT NULL  DECIMAL NOT NULL  VARCHAR          VARCHAR          VARCHAR NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  DECIMAL NOT NULL  VARCHAR NOT NULL
_______________________________________________________________________________

DP-123-4567       1                 CDISCPILOT01      ADAE              999               9999              VERBATIM_0995     APPLICATION SITE ERYTHEMA  GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS  MILD              N                 0                 PROBABLE          NULL             Y                 Y                 1                 1                 1                 NULL             NULL             NULL             1                 03-Jan-01         NULL             2                 2                 NULL             NULL             Placebo           Pbo               0                 98                <99               1                 Unknown           5                 F                 Y                 Y                 Y                 Y                 MedDRA version 8.0 (partially masked by request of MSSO)  N                 N                 N                 N                 N                 N                 HLGT_9999         HLT_9999          APPLICATION SITE REDNESS  02-Jul-01         182               02-Jan-14

1 Rows. -- 1295 msec.

SQL> select * from SQLServer.Northwind.ae;
STUDYID           USUBJID           AETERM            AESTDTC           AESEQ              DOMAIN            AESPID            AEDECOD           AEBODSYS          AESEV             AESER             AEREL             AEOUT             AESCAN            AESCONG           AESDISAB          AESDTH            AESHOSP           AESLIFE           AESOD             AEENDTC          AESTDY            AEENDY           AEDTC             AEACN
VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  SMALLINT NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR NOT NULL  VARCHAR          INTEGER NOT NULL  VARCHAR          VARCHAR NOT NULL  VARCHAR
_______________________________________________________________________________

StudyX0001        DP-12-345         VERBATIM_0995     2001-01-01        1                  AE                DP01              APPLICATION SITE ERYTHEMA  GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS  MILD              N                 PROBABLE          NOT RESOLVED      N                 N                 N                 N                 N                 N                 N                 NULL             2                 NULL             2001-01-16        NULL

1 Rows. -- 2697 msec.
SQL>

I had hoped to import/attach the CSV files into a Virtuoso instance and then replicate them to Oracle & SQLServer, but the replication failed as detailed in bugz#18536.

In terms of creating the demo for Bayer, I presume we need to populate the tables in Oracle and SQLServer with meaningful synthetic data? How much data should be populate these tables with as they don’t specify, so it is at our discretion?

I haven’t created Linked Data Views yet as wanted to do this once the tables are populated and we have decided what form the presentation will take?

/cc @snevitt

Where is the Bayer spreadsheet?

@kidehen Attached are the original Excel Spreadsheets and the CSV files I created …

spreadsheet.zip (14.4 KB)
csv.zip (1.6 KB)

@kidehen @hwilliams

Happy New Year !!

Do you need any further information from Steven Schaffer, steven.schaffer@bayer.com?

Do you have any preferred dates you would like me to work with for the demo with them?

@snevitt: Note, I am attending Hobbit Project final review meeting in Luxembourg 15 - 17th Jan 2019 and thus will not be available on those days …

@kidehen
Should I look at week commencing 21st January for this presentation?
Do you have a preferred day?

cc / @hwilliams

We should generate about a 100 records per DB initially.

21st of Jan should be fine, if it isn’t MLK Day holiday.

@kidehen

it is MLK on 21st Jan so will select another day that week

@kidehen: I have been looking at the two demo spreadsheets (ADaM & AE) Bayer provided again, and can see there are USUBJID and STUDYID columns in both spreadsheets, which I presume are primary and foreign keys in both. But it is not clear to me which would the main table with foreign key to the other table ? I am also wondering if we should ask if they can provide 100 rows of sample data for both spreadsheets as creating meaningful data ourselves without understanding the it could result in incorrect assumptions being made. ?

Below is what they said when the spreadsheets were provided:

Here you go…
Please feel free to send questions on these templates, or schedule a follow-up if there is any confusion or need to clarification.
I really want your team to be successful in doing the customized data, and the better demos showed more understanding of the business aspect as opposed to the technical aspect.

/cc @snevitt