Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Data Access Adam Rauch, 6/05/08 Team: Geoff Snyder, Kevin Beverly, Cory Nathe, Matthew Bellew, Mark Igra, George Snelling.

Similar presentations


Presentation on theme: "External Data Access Adam Rauch, 6/05/08 Team: Geoff Snyder, Kevin Beverly, Cory Nathe, Matthew Bellew, Mark Igra, George Snelling."— Presentation transcript:

1 External Data Access Adam Rauch, 6/05/08 Team: Geoff Snyder, Kevin Beverly, Cory Nathe, Matthew Bellew, Mark Igra, George Snelling

2 Primary Goal Allow a variety of commonly used tools & languages to easily load, query, process, and analyze live data stored in the Atlas database

3 Available Methods & Their Problems Manual export to TSV, XLS, etc. – Example: Assay QC for Denny Lab Tedious, loses important column & run metadata – Example: Ad hoc SRA analysis with VISC assays Tedious, error prone, loses important metadata, introduces security risks

4 Available Methods & Their Problems Manual export to TSV, XLS, etc. Client API – Example: Record assay design history via Perl + API JSON format is inconvenient to work with Query API is incomplete

5 Available Methods & Their Problems Manual export to TSV, XLS, etc. Client API Direct access to the database from SQL tools – No current usages; may need for cross-folder queries, performance, limitations of web UI Potential security issue: entire database is readable OntologyManager (OM), aka “The Blender”, schema is too difficult to query Other objects (e.g., folders) are difficult to work with

6 Available Methods & Their Problems Manual export to TSV, XLS, etc. Client API Direct access to the database from SQL tools Dataset Snapshot – Not used currently Potential issues: currently datasets only, requires manual step, requires connecting to the database to retrieve data (need to be careful with security), not live

7 Problem Summary No way to load, process & analyze live Atlas data via key analysis tools & languages (Perl, Java, R, SAS, PHP) Direct PostgreSQL queries against Ontology Manager (OM), aka “The Blender”, are too difficult Accessing data by direct database connection is a potential security issue; need to minimize this Query client API is incomplete OM is too slow for large datasets

8 Two Big Categories of Tasks “Programmer / Analyst” Tasks – “I want to manipulate all the data I can see on the Atlas web site with my tool or language” – All current tasks fall into this category “Adminstrative” Tasks – Specialized tasks that require broad access to Atlas database via SQL – Performance, query flexibility, web site limitations

9 Programmer/Analyst Tasks: Requirements Require live, read-only access to all data user can view on Atlas from the tools she uses regularly Need ad hoc queries: easy to develop new queries and find the data of interest Need automated processes: e.g., nightly analysis of current data using a fixed query Need security: follow user’s Atlas permissions Key tools: Perl, Java, R, SAS, PHP

10 Administrative Tasks: Requirements A few tasks may require direct, read-only access to all data in the Atlas database using db admin tools, scripts, ODBC browsers, etc. Do not need to follow folder permissions since user can read all LabKey schemas Filtering by folder, finding lists & objects, etc. should be reasonably easy Need queries with a small number of joins – Tabular OM data (lists, assays, samples) must be easier to query – Joins are okay… just reduce the complexity Key tools : pgAdmin III, DbVisualizer, EMS SQL Manager, Perl

11 Programmer/Analyst Tasks: Proposed Solution Provide their tools access to data via the Atlas “front door” as a wrapper on top of the query client API – Recommended: Custom package for each tool: Perl/Java/R/SAS/PHP – Possible Alternative: ODBC wrapper Directly usable from more tools (SAS EXEC SQL, RODBC) Clumsy, hard to develop, doesn’t help with rest of client API (insert/update? charts? assays & other specialized data retrieval?) Encourage use of API-based solution to limit users who require direct database access Challenges – Development effort to create wrappers for each language – SAS interface may be difficult due to proprietary nature of SAS – Add full SQL query support to API – Fix issues with current SQL syntax – Improve ease-of-use (build query in UI then “export to SAS/R script”)

12 Administrative Tasks and Performance Issues: Proposed Solution Provide option to migrate tabular OM data from virtual to hard tables – User option for some types of data (e.g., lists) – Wholesale migration for others (e.g., datasets, flow) – Continue to use OM for data stored as trees or graphs (e.g., experiment) Challenges – Development and test effort – Naming hard tables in a reasonably discoverable way


Download ppt "External Data Access Adam Rauch, 6/05/08 Team: Geoff Snyder, Kevin Beverly, Cory Nathe, Matthew Bellew, Mark Igra, George Snelling."

Similar presentations


Ads by Google