Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

Similar presentations


Presentation on theme: "1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009."— Presentation transcript:

1 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009

2 2 Session Details: Advanced Topics Course Objectives When you complete this course you will be able to: Understand ways to optimize searching and processing results using Query Optimization in LexEVS 5.0 Restrictions & Resolution Iterator Handling Combinatorial Queries

3 3 Session Details: Advanced Topics Lesson Syllabus Lesson 1: Restrictions and Resolutions Lesson 2: Iterators Lesson 3: Combinatorial Queries

4 4 Lesson 1: Restrictions and Resolutions When you complete this lesson you will be able to: Filter a coded node set based on the meaning of concept content utilizing restrictions i.e. text matches within various property text fields Structure and restrict the results of coded node set operations with resolving methods

5 5 Lesson 1: Restrictions and Resolutions Restriction Overview User benefits of a coded concept set reference in the coding scheme: Provides potential resolution for the entire set of concepts Reference returned as: CodedNodeSet or CodedNodeGraph Acts as a container for query modifications which are collected, listed, and sorted to provide the optimum execution order as a single query Restrictions on the types of concepts returned is the first of these query modifications Importance: Gives a meaningful result to the user

6 6 Lesson 1: Restrictions and Resolutions Resolution Overview Resolution: After specifying optional restrictions, the nodes in a set or graph can be resolved as a list of ConceptReference objects which in turn contain references to one or more Concept objects. Resolving these objects gives the user an opportunity to structure what is returned in terms of overall volume or number of objects as well as how much is contained in the objects themselves. Final restrictions can also be applied during this method call. Resolving a node set or graph in a particular manner is important because it can affect performance.

7 7 Lesson 1: Restrictions and Resolutions Restrictions Examples Restrictions Create a basic service object for data retrieval LexBIGService lbSvc = LexBIGServiceImpl.defaultInstance(); Create a concept reference list appropriate for this coding scheme and this concept code (C13432) where the parameters are a String array consisting of a single value and the name of the coding scheme NCI_Thesaurus where this concept resides. ConceptReferenceList crefs = ConvenienceMethods.createConceptReferenceList( new String[] {“C13432”}, “NCI_Thesaurus”) ); Initialize a coding scheme version object with the correct version number for NCI_Thesaurus”). CodingSchemeVersionOrTag csvt = new CodingSchemeVersionOrTag(); csvt.setVersion(“08.09e”);

8 8 Lesson 1: Restrictions and Resolutions Restrictions Examples cont… Initialize a CodedNodeSet Object with all concepts in our sample coding scheme “getCodingSchemeConcepts(“NCI_Thesaurus”), csvt).” The final restrictToCodes(crefs) method call restricts the return to the single code in the previously initialized list of one. CodedNodeSet nodes = lbSvc.getCodingSchemeConcepts(“NCI_Thesaurus”), csvt). restrictToCodes(crefs);

9 9 Lesson 1: Restrictions and Resolutions Resolution Examples Resolution (and a little restriction too) Build a list of references from the current (and already restricted) set and restrict them further to the single property of “FULL_SYN” and resolve to a single value “1” regardless of what the result set size is. ResolvedConceptReferenceList matches = nodes.resolveToList( null, ConvenienceMethods.createLocalNameList("FULL_SYN"), 1);

10 10 Lesson 1: Restrictions and Resolutions Resolution Examples cont… Now initialize a ResolvedConceptReference with the result and initialize a Concept object by calling the getReferencedEntry() method. The Concept object is the base information model object and contains properties, presentations and definitions which help define and explain the concept. We’ll retrieve a presentation defining the concept with a call to the first element in the presentation list, getting the text and it's accompanying content. ResolvedConceptReference ref = (ResolvedConceptReference)matches. enumerateResolvedConceptReference().nextElement(); Concept entry = ref.getReferencedEntry(); System.out.println("Matching synonym: " + entry.getPresentation(0).getValue() );

11 11 Lesson 1: Restrictions and Resolutions Restriction Example 2 Setup Setup: LexBIGService lbSvc = LexBIGServiceImpl.defaultInstance(); CodingSchemeVersionOrTag csvt = new CodingSchemeVersionOrTag(); csvt.setVersion(“08.09e”); Get a coded node reference: CodedNodeSet nodes = lbSvc.getCodingSchemeConcepts(“NCI_Thesaurus”, csvt );

12 12 Lesson 1: Restrictions and Resolutions Restriction Example 2 Begin a series of restrictions on this node set: nodes.restrictToMatchingDesignations("heart", SearchDesignationOption.PREFERRED_ONLY, "LuceneQuery", null); Matching designations means we will be matching presentation type properties for this concept. “heart” is the text we will search on, and the PREFERRED_ONLY option get’s us only the preferred designation for that concept. We using a standard LuceneQuery type of search and the “null” value insures we are working in the default language for this scheme. We are basically restricting by text value containment and on a property type set with a particular flag set on it.

13 13 Lesson 1: Restrictions and Resolutions Restriction Example 2 Let’s restrict it further: cns.restrictToMatchingProperties(null, new PropertyType[] {PropertyType.DEFINITION}, null, null, null, “heart", "LuceneQuery", null); What’s wrong with doing it this way? This would create an query where the results would be returned where only those matches occurring in both Presentation and Definition type properties. Instead you can get two different references into the coding scheme and make the following call on one of them: cns.union(codednodeset); Getting results from both.

14 14 Lesson 1: Restrictions and Resolutions Resolution Example 2 Resolve a light weight list as an option: If we use a resolve to list method that accepts a boolean and a integer limit for size we can create a list that contains minimal reference to the concept and a list limited in size. cns.resolveToList(null, null, null, null, false, -1); The false boolean entry can provide a list of concepts with minimal references to the coding scheme. This can allow a resolution of the full concept later.

15 15 Lesson 1: Restrictions and Resolutions Resolution Example 2 If you are confident in your sorting algorithm resolve the list to a limit of fifty concepts: First we’ll create a sort option Then set the name of the sorting extension we are calling Next set whether it will be sorted ascending or descending Then add it to a list of sort options. SortOption so = new SortOption(); so.setExtensionName("code"); so.setAscending(true); SortOptionList sol = new SortOptionList(); sol.addEntry(so);

16 16 Lesson 1: Restrictions and Resolutions Resolution Example 2 Finally, we’ll pass the list of sorted options to the resolve method and resolve it: cns.resolveToList(sol, null, null, null, false, 50); Adding a maximum size of 50 further limits the size of the returned list and limits the overhead of the method call.

17 17 Lesson 1: Restrictions and Resolutions Restrict and Resolve a Node Graph Let’s shift focus to the Coded Node Graph implementation. Remember how we restricted the coded node graph to a coded node set? cng.restrictToCodes(codednodeset); Remember how this provided the user with a set of starting points for associations connecting the list of coded nodes to other nodes above and below them in the hierarchy? And how we could further restrict the node graph by the kind of associations between nodes? cng.restrictToAssociations( Constructors.createNameAndValueList("subClassOf"),null);

18 18 Lesson 1: Restrictions and Resolutions Node Graph Restrictions Node graph restrictions are more focused on the how associations are expressed in the terminology. So instead of building a set of nodes by determining the content of concept objects node graphs focus on: the edges of the graph where they exist in relation to the nodes you provide as references, what direction they can be navigated how the edges are referred to (naming conventions)

19 19 Lesson 1: Restrictions and Resolutions Node Graph Restrictions For example: Restrict this graph to associations and any related qualifiers they may have cng.restrictToAssociations(listOfAssociations, associationQuailifier); Restrict this graph to any directional names and any qualifiers associated with this association name cng.restrictToDirectionalNames(listOfAssociations, associationQuailifier) Restrict this graph to a set of codes and their source codes and edges. cng.restrictToSourceCodes(codednodeset); Restrict this graph to a set of codes and their target codes and edges. cng.restrictToTargetCodes(codednodeset);

20 20 Lesson 1: Restrictions and Resolutions Node Graph Resolution But efficient resolution of graphs requires similar attention to when and whether concepts and associations to other concepts are resolved immediately or later. cng.resolveAsList(concept_reference, true, false, 1, 1, null, null, null, -1); Here we are resolving only the next layer “down” from the “concept_reference” focus code of both coded entries and associations. Limits are not placed limit on the number of nodes and associations to be resolved, but we insure we have a layer of node references from which to begin our next call into a hierarchy layer. So what we’ve done is to set ourselves up to step through each layer of the hierarchy of fully resolved nodes. This is one technique to limit method call overhead. What would another be?

21 21 Lesson 1: Restrictions and Resolutions Review 1  How would you restrict a node set to a single concept.

22 22 Lesson 1: Restrictions and Resolutions Answer 1  How would you restrict a node set to a single concept. By creating a restriction by codes with a single unique identifier as a list element passed in as a parameter. This is an important use of restrictions when you have a reference to the concept and wish to resolve it’s details later.

23 23 Lesson 1: Restrictions and Resolutions Review 2  What kind of user is likely to be interested in how restrictions are applied to the coded node set or graph?

24 24 Lesson 1: Restrictions and Resolutions Answer 2  What kind of user is likely to be interested in how restrictions are applied to the coded node set or graph? The end user will want to get some meaningful results

25 25 Lesson 1: Restrictions and Resolutions Review 3  What kind of user is likely to be interested in how resolutions are applied to the coded node set or graph?

26 26 Lesson 1: Restrictions and Resolutions Answer 3  What kind of user is likely to be interested in how resolutions are applied to the coded node set or graph? The developer will want to insure performance.

27 27 Lesson 1: Restrictions and Resolutions Review 4  You’d like to return a lightweight list to keep your method call overhead low. What will you do to insure this?

28 28 Lesson 1: Restrictions and Resolutions Answer 4  You’d like to return a lightweight list to keep your method call overhead low. What will you do to insure this? You’ll choose the resolveToList() method that accepts a boolean flag indicating you want no coded entities resolved.

29 29 Lesson 1: Restrictions and Resolutions Review 5  Describe two ways a developer can resolve a coded node graph that won’t cause a huge object to be returned, yet still allow the end user to traverse an entire graph structure.

30 30 Lesson 1: Restrictions and Resolutions Answer 5  Describe two ways a developer can resolve a coded node graph that won’t cause a huge object to be returned, yet still allow the end user to traverse an entire graph structure. A developer can resolve the graph to one level of completely resolved nodes and associations and step through the hierarchy one level at a time. Or the developer can resolve the entire graph and leave the coded entities unresolved – much as you might do with a coded node set.

31 31 Lesson 2: Iterators When you complete this lesson you will be able to: Write better performing code when resolving to lists or iterators. Return a list, a single concept reference or advance the iterator using the appropriate method Understand how to a resolve an iterator with lightweight objects.

32 32 Lesson 2: Iterators Iterators in LexEVS Iterators in LexEVS Additional method for coded node sets Helpful for resolving larger sets of nodes Still insures lower resource overhead Advantage of resolving iterators from coded nodesets: Obtain resolution without any calls to the database Capable of referencing a local Lucene Index instead ResolvedConceptReferencesIterator rcri = cns.resolve(sol, null, null, null, false); Allows the user to employ sort options, filter options and a final restriction option for restricting to property types and property names just like restricting to a list Included is the option to return any resolved concept references without resolving the coded entry using the boolean value “ false ”

33 33 Lesson 2: Iterators Iterators in LexEVS Iterators in LexEVS Use a number of options for retrieving concept references, scrolling the iterator and returning concept reference lists. Get the next ResolvedConceptReference rcri.next(); Get a ResolvedConceptReferenceList of the specified size. rcri.next(size); Get a ResolvedConceptReferenceList from the iterator based on indexed start and end points. rcri.get(arg0, arg1); Scroll the iterator returning another iterator: rcri.scroll(scrolled_size); Get a ResolvedConceptReferenceList from the last scroll of the iterator. rcri.getNext();

34 34 Lesson 2: Iterators Review 1  What iterator method returns another, potentially smaller iterator?

35 35 Lesson 2: Iterators Answer 1  What iterator method returns another, potentially smaller iterator? The scroll method returns another iterator

36 36 Lesson 2: Iterators Review 2  You can get a single concept reference from the iterator – what’s the method for that?

37 37 Lesson 2: Iterators Answer 2  You can get a single concept reference from the iterator – what’s the method for that? next()

38 38 Lesson 2: Iterators Review 3  How is resolving an iterator potentially a better performing method than resolving to a list?

39 39 Lesson 2: Iterators Answer 3  How is resolving an iterator potentially a better performing method than resolving to a list? The iterator resolves against the Lucene index – a much faster call.

40 40 Lesson 2: Iterators Review 4  Describe how you would resolve a list of concept references from an iterator and keep that list relatively lightweight.

41 41 Lesson 2: Iterators Answer 4  Describe how you would resolve a list of concept references from an iterator and keep that list relatively lightweight. First use the resolve method that accepts a boolean flag indicating whether to resolve coded entities in concept reference lists Then resolve to a concept reference list using the resolve(size) method.

42 42 Lesson 3: Combinatorial Queries When you complete this lesson you will be able to: Provide a useful result set to the end user by using a combination of fields applied as parameters Expand your understanding of the SortOption type filter criteria. Choose from a variety of text matching algorithms Discuss how the Lucene Query text matching algorithm, in particular, can be leveraged by LexEVS.

43 43 Combinatorial Queries: putting it all together One of the most powerful features of the LexEVS architecture is the ability to define multiple search and sort criteria without intermediate retrieval of data from the LexEVS service. The following example shows a simple, yet powerful, query to search a code system based on a ‘sounds like’match algorithm (the list of all available match algorithms can be listed using the ‘ListExtensions –m’ admin script.) Lesson 3: Combinatorial Queries Overview of Combinatorial Queries

44 44 Lesson 3: Combinatorial Queries Combinatorial Queries Example cont… To be exact this is a double restriction query with an additional application of sort criteria and restricted return values Declare the service... LexBIGService lbs = LexBIGServiceImpl.defaultInstance(); Start with an unconstrained set of all codes for the vocabulary CodingSchemeVersionOrTag csvt = new CodingSchemeVersionOrTag(); csvt.setVersion(“08.09e”); CodedNodeSet cns = lbs.getCodingSchemeConcepts(“NCI_Thesaurus”, csvt);

45 45 Lesson 3: Combinatorial Queries Combinatorial Queries Example cont… Constrain to concepts with designations (assigned text presentations that contain text that sounds like 'heart ventricle' cns.restrictToMatchingDesignations( "hart ventrickle", SearchDesignationOption.ALL, MatchAlgorithms.DoubleMetaphoneLuceneQuery.to String(), null); Further restrict the results to concepts with a semantic type of 'Anatomical Structure' cns.restrictToMatchingProperties( Constructors.createLocalNameList("Semantic_Type"), "Anatomical Structure", "exactMatch", null);

46 46 Lesson 3: Combinatorial Queries Combinatorial Queries Example cont… Indicate that the resulting list should be sorted with the best results first and then sorted by code if there is a tie. SortOptionList sortCriteria = Constructors.createSortOptionList( new String[] {"matchToQuery", "code"}); Indicate to return only the assigned UMLS_CUI and textualPresentation properties. LocalNameList restrictTo = ConvenienceMethods.createLocalNameList( new String[] {"UMLS_CUI", "textualPresentation"} ); Still nothing computed yet.

47 47 Lesson 3: Combinatorial Queries Combinatorial Queries Example cont… Perform the query and resolve the sorted/filtered list with a maximum of 6 items returned. ResolvedConceptReferenceList list = cns.resolveToList( sortCriteria, restrictTo, null, 6); Print the results ResolvedConceptReference[] rcr = list.getResolvedConceptReference(); for (ResolvedConceptReference rc : rcr) { System.out.println("Resolved Concept: +" +rc.getConceptCode()); }

48 48 Lesson 3: Combinatorial Queries Code Sample Review Declare the target concept space The coded node set (variable ‘cns’) is initially declared to query the NCI Thesaurus vocabulary. At this point the concept space included by the set can be thought of as unrestricted, addressing every defined coded entry (the ‘false’ value on the declaration indicates to also include inactive concepts). However, it important to note that no search is performed by the LexEVS service at this time.

49 49 Lesson 3: Combinatorial Queries Applying Filter Criteria Applying filter criteria No computation is performed (to realize query results) during invocation of the restrictToMatchingDesignations() and restrictToMatchingProperties() methods. These calls effectively narrow the target space even further, indicating that filters should be applied to the information returned by the LexEVS query service.

50 50 Lesson 3: Combinatorial Queries Using Lucene Query Syntax Using the Lucene Query Syntax and other text matching functions Text Matches: The text criteria applied in methods such as restrictToMatchingDesignations() uses one of a number of powerful text processing applications to provide the user with broad capability for text based searches. Text matches can be simple applications of “exactMatch”, “startsWith” or “contains” algorithms Regular expressions Lucene Query syntax (used in the LuceneQuery function.) As shown in the preceding slides, these options are passed into the restrictToMatchingDesignations() method as parameters.

51 51 Lesson 3: Combinatorial Queries Lucene Query Lucene Queries are well documented and can be very powerful Uninitiated users may need some background on their use User should start here with the official Lucene Query Parser documentationLucene Query Parser documentation Keep in mind: Some LexEVS queries such as "startsWith" and "contains" use wild card searches under the covers Use of wild cards in this context can cause errors in searches involving these search types Wild card queries should use the flexibility of the Lucene Query searches in restrictToMatchingDesignation() instead Work much as described in the query syntax documentation.

52 52 Lesson 3: Combinatorial Queries Special Characters- Lucene Query Special characters in the Lucene Query search can cause unexpected results. For example: If you are not using special characters as recommended for various Lucene search mechanisms then your searches may not return expected results or may return an error Example: If the value you are searching upon contains say, parenthesis, you will need to place the value in quotations. The escape characters described in the Lucene Documentation do not work at this time

53 53 Contains Method- should be used to narrow down search results once a progressively longer substring more closely matching the term of interest is entered Results would not be expected if a Lucene Query is used. Lesson 3: Combinatorial Queries Narrowing Searches- Lucene Query

54 54 Lucene Query Narrowing Searches: You should not expect to see a Lucene Query narrow down search results as you progressively enter a longer substring more closely matching your term of interest. Instead use the contains method. Lesson 3: Combinatorial Queries Lucene Query

55 55 Lesson 3: Combinatorial Queries Review 1  You are not sure of the spelling for a term. What kind of match algorithm would help?

56 56 Lesson 3: Combinatorial Queries Answer 1  You are not sure of the spelling for a term. What kind of match algorithm would help? The DoubleMetaphone algorithm

57 57 Lesson 3: Combinatorial Queries Review 2  You know the terminology well enough to understand the text you want to search in exists in a particular property. What kind of restriction method will you use?

58 58 Lesson 3: Combinatorial Queries Answer 2  You know the terminology well enough to understand the text you want to search in exists in a particular property. What kind of restriction method will you use? restrictToMatchingProperties().

59 59 Lesson 3: Combinatorial Queries Review 3  You wish to sort a list of returned values, first by the unique identifying code next by the match in the text query. How will you do this?

60 60 Lesson 3: Combinatorial Queries Answer 3  You wish to sort a list of returned values, first by the unique identifying code next by the match in the text query. How will you do this? Create a sort option list with “matchtoQuery” as the first string and “code” as the second and pass it to the resolve method

61 61 Lesson 3: Combinatorial Queries Review 4  You need a very flexible text query with complete flexibility in design. What matching algorithm will you choose?

62 62 Lesson 3: Combinatorial Queries Answer 4  You need a very flexible text query with complete flexibility in design. What matching algorithm will you choose? Regular expressions

63 63 Lesson 3: Combinatorial Queries Review 5  What kinds of adjustments can you make to the Lucene Query?

64 64 Lesson 3: Combinatorial Queries Answer 5  What kinds of adjustments can you make to the Lucene Query? Boolean queries, Levenshtein Distance, wild card searches.

65 65 Lesson 3: Combinatorial Queries Review 6  What kinds of characters should you avoid searching on in the LuceneQuery?

66 66 Lesson 3: Combinatorial Queries Answer 6  What kinds of characters should you avoid searching on in the LuceneQuery? Special characters such as ~, (), * since these can have a special meaning or just be stripped when normalized by Lucene

67 67 Session Details: Query Optimization Questions?


Download ppt "1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009."

Similar presentations


Ads by Google