Download presentation
Presentation is loading. Please wait.
Published byLily Galloway Modified over 11 years ago
1
1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com
2
2 SRI International Bioinformatics Frame Knowledge Representation Systems Long history of development in the AI knowledge representation community Distant cousin of object-oriented databases (convergent evolution) Background reading on frame systems l P. Karp, The design space of frame knowledge representation systems u http://www.ai.sri.com/pubs/files/236.pdf l P. Karp, Distinguishing Knowledge Bases and Data Bases: Who's on First and What's on Second u http://www.ai.sri.com/pubs/files/1397.pdf
3
3 SRI International Bioinformatics Ocelot Information P.D. Karp et al, A collaborative environment for authoring large knowledge bases, J Intelligent Information Systems 13:155-94 1999. http://www.ai.sri.com/pkarp/pubs/99jiis.pdf Ocelot Users Guide http://www.ai.sri.com/pkarp/ocelot/
4
4 SRI International Bioinformatics Pathway Tools Architecture Ocelot DBMS Generic Frame Protocol Pathway Genome Navigator Web Mode Desktop Mode Protein Editor Pathway Editor Reaction Editor Oracle or MySQL Disk File Lisp API PerlCyc API JavaCyc API
5
5 SRI International Bioinformatics Ocelot Data Model Ocelot database l Aka DB, Knowledge Base, KB, PGDB An Ocelot database is a collection of frames and slots
6
6 SRI International Bioinformatics Ocelot Frames Two kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle A symbolic frame name (id, key) uniquely identifies each frame l Examples: EG10223, TRP, Proteins Classes have Superclass(es), Subclass(es), Instance(s) Instances have one or more parent classes
7
7 SRI International Bioinformatics Slots Encode attributes and properties of a frame l Molecular weight, gene coordinates, comments Represent relationships between frames l The value of a slot is the identifier of another frame
8
8 SRI International Bioinformatics Slots Number of values l Single valued l Multivalued: sets or lists Slot values l Integer, real, string, symbol (frame name) Every slot is described by a slot frame (slotunit) in a KB that defines meta information about that slot l Datatype, classes it pertains to, constraints l Enumerations l Two slots are inverses if they encode opposite relationships u Slot Product in class Genes u Slot Gene in class Polypeptides
9
9 SRI International Bioinformatics Ocelot Data Model Frame data model compared to relational model: Minimizes size of schema relative to semantic complexity Inheritance lets us define new classes by modifying existing classes Relational normalization breaks multivalued attributes into separate tables – not needed in frame data model
10
10 SRI International Bioinformatics Ocelot Schema Schema is stored within the DB Schema is self documenting Slot frames define metadata about slots Schema evolution facilitated by l Easy addition/removal of slots, or alteration of slot datatypes l Flexible data formats that do not require dumping/reloading of data l New versions of Pathway Tools include a schema upgrade function u Updates schema to match that of new MetaCyc version u Transforms data into new schema
11
11 SRI International Bioinformatics Ocelot Storage System Architecture Persistent storage via disk files or Oracle or MySQL Oracle or MySQL (RDBMS KBs) l Concurrent development by multiple users l Incrementally fault in frames as referenced by the application l Incrementally save modified frames only l Stores complete transaction history of PGDB Disk files l Updating by a single user at a time l Read in entirety at start of session l Write in entirety at every save
12
12 SRI International Bioinformatics Figure showing multiple users tapping into one mysql server
13
13 SRI International Bioinformatics Ocelot Storage Subsystem RDBMS KBs RDBMS schema is independent of application schema DBMS is submerged within Ocelot, invisible to users Frames transferred from DBMS to Ocelot l On demand l By background prefetcher l Memory cache l Persistent disk cache speeds performance via Internet
14
14 SRI International Bioinformatics Ocelot Frame Faulting When a frame is referenced by Pathway Tools l Look in Ocelot virtual memory l Look in disk cache l Look in RDBMS
15
15 SRI International Bioinformatics Ocelot RDBMS Transaction History RDBMS KBs store complete transaction history Stored as sequences of GFP operations executed by the user or by Pathway Tools Right click -> Show -> Changes in pop-up window Used to compute gene last-curated date Can be used to open a PGDB in an earlier state
16
16 SRI International Bioinformatics Ocelot RDBMS Concurrency Control When user A saves updates: l Ocelot queries all transactions that occurred since A last saved or since the start of As session l Ocelot compares the operations in those transactions with the updates made by A l If conflicts are found, save does not occur and conflicts are reported to the user l If no conflicts, save proceeds l Other user transactions are evaluated into As session u Refresh
17
17 SRI International Bioinformatics Ocelot Update Conflicts Example conflicting updates: l User A deletes frame F ; User B modifies value in slot F l User A changes MW of protein P from 3 to 4 ; User B changes MW of protein P from 3 to 5 Example of updates that dont conflict: l User A updates frame E ; User B updates frame F l User A updates the value of P.MW ; User B updates the value of P.pI l Users A and B both delete all values of P.MW
18
18 SRI International Bioinformatics Revert KB Operation Undoes all changes in current session
19
19 SRI International Bioinformatics Pathway Tools / BioCyc Software/Database Bundles Each downloadable Pathway Tools configuration contains a combination of PGDBs Those PGDBs are loaded into Lisp virtual memory Build process: l Start Common Lisp l Load in all Pathway Tools compiled Lisp code into virtual memory l Load in all PGDBs for that configuration into virtual memory l Save virtual memory image as binary executable file
20
20 SRI International Bioinformatics Full BioCyc or Tier 1+2+3 Configuration 507 PGDBs loaded into virtual memory
21
21 SRI International Bioinformatics BioCyc at 10,000 Genomes Scalability of current approach is limited New approach: For full BioCyc, store PGDBs not in virtual memory but in Franz AllegroCache AllegroCache is a Common Lisp object-oriented database Implementation now in hand for Ocelot We have done extensive performance testing Performance looks good to 10,000 PGDBs
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.