Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl Randy Gobbel, Ph.D. May 14, 2003

Similar presentations


Presentation on theme: "Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl Randy Gobbel, Ph.D. May 14, 2003"— Presentation transcript:

1 Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl Randy Gobbel, Ph.D. May 14, 2003 gobbel@ai.sri.com

2 SRI International Bioinformatics Overview Why would you need to write complex queries? Emacs Lisp perlcyc The GFP API, and Pathway Tools-specific functions Examples and exercises

3 SRI International Bioinformatics When do you need complex queries? Many common queries are accessible from the command menu l By name l By substring l By class l Others are specialized by the type of the object being displayed Other queries of arbitrary complexity can be created by writing a (simple) program l Example: find all reactions with more than 5 citations

4 SRI International Bioinformatics Programmatic Access to PGDBs LISP and PERL languages used for programmatic queries and updates to PGDBs Generic Frame Protocol (GFP) is API for PGDBs

5 SRI International Bioinformatics Emacs “The extensible, self-documenting editor” (Most of the time) typing a printing character simply inserts it l Just like most Windows and MacOS programs Control and Meta keys in combination with other keys run commands l Again, just like keyboard shortcuts in most programs Control-H: Help l T -> tutorial, A -> apropos, W -> “where is ” l K -> “what does this key combination do?” Many commands are now available from pulldown menus

6 SRI International Bioinformatics Emacs Three ways to run Pathway Tools from within Emacs l Use the Emacs/Lisp interface provided with Allegro Common Lisp (fi) l Use the free ILisp package (wriitten in Emacs Lisp) l Run Pathway Tools from a shell within Emacs l Windows users: lowest-common-denominator u Cut and paste still works Advantages of using Emacs with Lisp l Syntax highlighting l Automatic indentation l One-keystroke evaluation of Lisp forms in fi and ilisp

7 SRI International Bioinformatics Lisp An idea that keeps reinventing itself l Function, arguments What is a list? l Unit of syntax: (a b c) l Unit of data: (a b c) l Unit of execution: (get-slot-value ‘arca ‘citations) Most languages: function(arg1, arg2, …) l Fine for writing Lisp: (function arg1 arg2 arg3 …) l Much easier to deal with in a computer

8 SRI International Bioinformatics Lisp Data Types Numbers l 1 l 1.325 Strings l “hello” Symbols l E.g.: ARCA (or, arcA) l Make a literal symbol by quoting it: ‘ARCA l Case-sensitive symbols require vertical bars: ‘|Genes| Special symbols: T and NIL l Used to mean True and False l NIL is also the empty list: ()

9 SRI International Bioinformatics Lisp Expressions and Evaluation (+ 3 4 5) l ‘+’ is a function (+ 3 4 5) is a function call with 3 arguments Arguments are evaluated: l Numbers evaluate to themselves l If any of the args are themselves expressions, they are also evaluated l (+ 1 (+ 3 4))  8 The values of the args are passed to the function Some functions allow variable numbers of arguments l (+)  0 l (+ 1)  1 l (+ 2 3 1 3 4 5 6)  24 (+ (* 3 4) 6)  18

10 SRI International Bioinformatics Lisp Expressions and Evaluation Also called “top level” and “read-eval-print loop” Uses a three-step process l Read u Reader converts elements outside “” and || to uppercase l Evaluate l Print Anything you type in is evaluated l 1  1 l “hello”  hello l (+ 2 3)  5 Quoting prevents evaluation l ‘(+ 2 3)  (+ 2 3) Setting a symbol to a value creates a variable: l (setq foo ‘(a b c))  (a b c) l foo  (a b c) l No declarations required!

11 SRI International Bioinformatics The Lisp Listener Useful forms in listener: l Previous Results: *, **, *** l But: not in programs (+ 1 2)  3 (+ 3 *)  6 **  3

12 SRI International Bioinformatics Dealing with the Lisp debugger l Error conditions result in a call to the Lisp debugger: l :continue continues, a numeric argument selects between possible options u Lower-numbered options generally take less drastic actions l :reset unwinds to the top level u WARNING: may exit the Pathway Tools window! l :zoom displays the stack EC(4): (xxx) *debugger-hook* called. Error: Attempt to take the value of the unbound variable `X'. [condition type: UNBOUND-VARIABLE] Restart actions (select using :continue): 0: Try evaluating X again. 1: Use :X instead. 2: Set the symbol-value of X and use its value. 3: Use a value without setting X. 4: Return to Top Level (an "abort" restart). 5: Abort entirely from this process. [1] EC(5): :res

13 SRI International Bioinformatics Lisp Variables Global variable values can be set and used during a session Declarations not needed (setq x 5)  5 x  5 (+ 3 x)  8 (setq y “atgc”)  “atgc”

14 SRI International Bioinformatics Equality in LISP Internally LISP refers to objects via pointers Fundamental equality operation is EQ l True if the two arguments point to the same object l Very efficient Other comparison operators: l = for numbers: (= x 4) l EQUAL for list structures or exact string matching: (equal x “abc”) l STRING-EQUAL for case-insensitive string matching: (string-equal x “AbC”) l EQL for characters: (eql x #’\A) l EQ for list structures or symbols (compares pointers): (eq x ‘ABC) l FEQUAL for frames: (fequal x ‘trp) Simple rule: Use EQUAL for everything except frames

15 SRI International Bioinformatics Functions for Operating on Lists length l (length x) l Returns the number of elements first l (first x) l Returns the first element nth l (nth j x) l Returns the Jth element of list X (element 0 is the first element)

16 SRI International Bioinformatics loop Loop allows you to iterate l Through a series of numbers u for i from 1 to 10 l Through a list u for rxn in rxns Conditionals control whether execution continues l when (> (length (get-slot-values rxn ‘citations)) 5) do lets you do something l do (+ i total) collect lets you gather up values l collect (get-frame-name rxn)

17 SRI International Bioinformatics loop You can combine as many loop clauses as you need: (loop for i from 1 to 10 for j from 10 downto 1 do (print (+ i j)) collect (* i j))  (10 18 24 28 30 30 28 24 18 10)

18 SRI International Bioinformatics Defining Functions Put function definitions in a file Reload the file when definitions change l EC(1): :ld my-queries.lisp (defun ( ) … code for function …) Creates a new operation called Examples: (defun square (x) (* x x)) (defun message () (print “Hello”)) (defun test-fn () 1 2 3 4)

19 SRI International Bioinformatics Accessing Lisp from Pathway Tools Starting Pathway Tools for Lisp work: > pathway-tools –lisp EC(1): (select-organism :org-id ‘XXX) Windows: pathway-tools-lisp.exe Lisp expressions can be typed at any time to the Pathway Tools listener Command: (get-slot-value ‘trp ‘common-name)  “L-tryptophan” Invoking the Navigator from Lisp: EC(2): (eco)

20 SRI International Bioinformatics The perlcyc API Written by Lukas Mueller at TAIR l Downloadable from the TAIR Web site l Installs as a standard CPAN module l From within Pathway Tools, start the server by hand: u (start-external-access-daemon)  (start-external-access-daemon :verbose? t) for tracing output Function names are the same as Lisp, with hyphens replaced by underscores, question marks by _p l get-class-all-instances  get_class_all_instances l coercible-to-frame?  coercible_to_frame_p Pathway Tools functions are callable as standard Perl functions Frame names are symbols which can be passed back to Lisp Control structures are standard Perl

21 SRI International Bioinformatics javacyc Uses the same Unix domain socket interface as perlcyc Function names use Java conventions l Get-slot-values  getSlotValues Includes a C library for Unix domain sockets

22 SRI International Bioinformatics Lisp vs. Perl Task: find all reactions with fewer than 5 citations Perl: use perlcyc; my $cyc = perlcyc->new(“ECOLI"); my @found; foreach $r ($cyc->all_rxns()){ my @citations = get_slot_values($r, “citations”); if (scalar(@citations) < 5) { push @found, $r; } Lisp: (loop for r in (all-rxns) when (< (length (get-slot-values r ‘citations)) 5) collect r)

23 SRI International Bioinformatics Pathway Tools User Accessible Functions Internal Pathway Tools functions that users can call Includes: l Generic Frame Protocol (GFP), the Ocelot object database API l Additional functions specific to Pathway Tools For more information see l http://bioinformatics.ai.sri.com/ptools/ptools-resources.html http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

24 SRI International Bioinformatics Generic Frame Protocol (GFP) A library of Lisp functions for accessing Ocelot DBs GFP specification: l http://www.ai.sri.com/~gfp/spec/paper/paper.html A small number of GFP functions are sufficient for most complex queries

25 SRI International Bioinformatics Generic Frame Protocol (get-class-all-instances Class) l Returns the instances of Class Key Pathway Tools classes: l Genetic-Elements l Genes l Proteins l Polypeptides (a subclass of Proteins) l Protein-Complexes (a subclass of Proteins) l Pathways l Reactions l Compounds-And-Elements l Enzymatic-Reactions l Transcription-Units l Promoters l DNA-Binding-Sites

26 SRI International Bioinformatics Generic Frame Protocol l Note: Frame.Slot means a specified slot of a specified frame u Frame and Slot must be symbols! (get-slot-value Frame Slot) l Returns first value of Frame.Slot (get-slot-values Frame Slot) l Returns all values of Frame.Slot as a list (slot-has-value-p Frame Slot) l Returns T if Frame.Slot has at least one value (member-slot-value-p Frame Slot Value) l Returns T if Value is one of the values of Frame.Slot (print-frame Frame) l Prints out the contents of Frame

27 SRI International Bioinformatics More useful functions (coercible-to-frame-p Thing) l Returns T if Thing is the name of a frame, or a frame object (save-kb) l Saves the current KB (replace-answer-list ) l Makes the specified frames browseable via the Pathway Tools GUI

28 SRI International Bioinformatics Generic Frame Protocol – Update Operations (put-slot-value Frame Slot Value) l Replace the current value(s) of Frame.Slot with Value (put-slot-values Frame Slot Value-List) l Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values (add-slot-value Frame Slot Value) l Add Value to the current value(s) of Frame.Slot, if any (remove-slot-value Frame Slot Value) l Remove Value from the current value(s) of Frame.slot (replace-slot-value Frame Slot Old-Value New-Value) l In Frame.Slot, replace Old-Value with New-Value (remove-local-slot-values Frame Slot) l Remove all of the values of Frame.Slot

29 SRI International Bioinformatics Additional Pathway Tools Functions – Semantic Inference Layer Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB http://bioinformatics.ai.sri.com/ptools/ptools- fns.html http://bioinformatics.ai.sri.com/ptools/ptools- fns.html

30 SRI International Bioinformatics GKB editor GUI for browsing the frame hierarchy l Command: Special  Taxonomy Viewer u View  Browse Class Hierarchy (ctrl-B) Allows viewing of classes, slots, and instances l You can’t write a query unless you know the exact class and slot names l Class names are usually case-sensitive symbols u |Genes|, |Proteins|, …

31 SRI International Bioinformatics Myths and Facts About Lisp Myth: Lisp is an interpreted language Fact: Lisp has very good optimizing compilers Myth: Lisp is slow Fact: studies of program performance show that Lisp is faster than most languages l Faster than either Java or Perl for most tasks l See http://www.bagley.org/~doug/shootouthttp://www.bagley.org/~doug/shootout Myth: Lisp uses huge amounts of memory Fact: Baseline Lisp installation requires 8-10MB Myth: Lisp is complicated Fact: Although lots of functionality is available within Lisp, the core of the language is far simpler than that of most other programming languages

32 SRI International Bioinformatics LISP and GFP References Common LISP, the Language -- The standard reference l Paper edition by Guy Steele l Online version u http://www.lispworks.com/reference/HyperSpec/Front/index.htm http://www.lispworks.com/reference/HyperSpec/Front/index.htm Information on writing Pathway Tools queries: l http://bioinformatics.ai.sri.com/ptools/ptools-resources.html http://bioinformatics.ai.sri.com/ptools/ptools-resources.html l http://www.ai.sri.com/pkarp/loop.html http://www.ai.sri.com/pkarp/loop.html l http://bioinformatics.ai.sri.com/ptools/debugger.html http://bioinformatics.ai.sri.com/ptools/debugger.html

33 SRI International Bioinformatics Pathway Tools information Web site Top top-level page l http://www.biocyc.org/ http://www.biocyc.org/ General Pathway Tools information l http://bioinformatics.ai.sri.com/ptools/ http://bioinformatics.ai.sri.com/ptools/ How to submit a bug report l http://bioinformatics.ai.sri.com/ptools/bug.html http://bioinformatics.ai.sri.com/ptools/bug.html Writing queries, introductions to Lisp, etc. l http://bioinformatics.ai.sri.com/ptools/ptools-resources.html http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

34 SRI International Bioinformatics Examples (select-organism :org-id ‘ecoli)  ECOLI (setq genes (get-class-all-instances ‘|Genes|))  (……………) (setq monomers (get-class-all-instances ‘|Polypeptides|))  (…………….) (setq genes2 genes)  (…………….)

35 SRI International Bioinformatics Problems all-substrates enzymes-of-reaction genes-of-reaction genes-of-pathway monomers-of-protein genes-of-enzyme

36 SRI International Bioinformatics Example Session (setq x ‘trp)  trp (get-slot-value x ‘common-name)  “L-tryptophan” (setq aas (get-class-all-instances ‘|Amino-Acids|))  (……..) (loop for x in aas count x)  20

37 SRI International Bioinformatics Example Session (loop for x in genes for name = (get-slot-value x ‘common-name) when (and name (search “trp” name)) collect x))  (…) (setq rxns (get-class-all-instances ‘|Reactions|))  (…) (loop for x in rxns when (member-slot-value-p x ‘substrates ‘trp) collect x)  (…) (replace-answer-list *)

38 SRI International Bioinformatics Example Session (setq x ‘(trp arg))  (TRP ARG) (replace-answer-list x)  (TRP ARG) (eco)

39 SRI International Bioinformatics How to write a good bug report Use dribble-bug (excl:dribble-bug “bug.txt”) to start dribbling (excl:dribble-bug) to stop How to get out of the debugger :bt – short backtrace of what functions are being called :zoom – more detailed trace :cont - continue. Lower numbers are less drastic Be specific, and as detailed as you can stand l What button/key did you push? l Which screen/editor were you using at the time? l What object were you viewing/editing? Try to find a reproducible test case if at all possible!

40 SRI International Bioinformatics How to use autopatch Patches load automatically on startup, or-- Special  Install Patches l Download and install l Or simply install Goes to our Web server gets patches, and installs them Restarting is usually not required l Functions are redefined on the fly l But: if the patch involved initialization, you might need to restart

41 SRI International Bioinformatics Arglist Keywords Are markers in arglist Not themselves argument names, but flag that following arguments are different somehow Most common are: l &optional l &rest l &key Examples: l (defun plus5 (x &optional (y 5)) (+ x y) ) l (plus5 3) ==> 8 (plus5 4 4) ==> 8 l (defun embed (x &key (y “ >>”)) (concatenate ‘string y x z) ) l (embed “foo” :z “]]]”) ==> “<<<foo]]]” l (defun listall (&rest rest-of-args) (sort (copy-seq rest-of-args) #’<))

42 SRI International Bioinformatics LISP Symbols Think of them as words in a dictionary, not strings l Lisp looks to see if symbol already exists l If so, it returns a pointer to that symbol, otherwise it creates a new one Similar to strings, but different Symbols live in packages, strings do not

43 SRI International Bioinformatics Symbols vs Strings (setq x ‘trp) -> TRP (eq x ‘trp) -> T (setq y “trp”) -> “trp” (eq y “trp”) -> NIL (equal y “trp”) -> T

44 SRI International Bioinformatics LISP Lists Fundamental to LISP ::: LISt Processing Zero or more elements enclosed by parentheses Typing a list to the listener: l ’(this is a list) => (THIS IS A LIST) Creating a list with functions : l (list ’so ’is ’this) => (SO IS THIS) Examples: l (1 3 5 7), ((2 4 6) 10 (0 8)), ’(1 this T NIL “that”) l The empty list: nil ()

45 SRI International Bioinformatics List Examples (length genes)  4316 (first genes)  XXX (subseq genes 0 50)  (……………) (nth 3 genes)  XXX


Download ppt "Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl Randy Gobbel, Ph.D. May 14, 2003"

Similar presentations


Ads by Google