SPARK Search Engine. Martijn Harthoorn Programmer at Furore Implementer of the Search Engine of SPARK

Slides:



Advertisements
Similar presentations
Building FHIR Servers on Existing Applications
Advertisements

More Than You Think HL7 is people, HL7 is ideas, HL7 is collaboration.
Startup Technology Pitfalls and How to Avoid them.
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Abstract Data Types (ADT)
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Shawn McClure Software Engineer CIRA, Colorado State University Projects: Visibility Information Exchange Web.
Indexing and Searching
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Read Lecturer.
Module 17 Storing XML Data in SQL Server® 2008 R2.
Apache Lucene in LexGrid. Lucene Overview High-performance, full-featured text search engine library. Written entirely in Java. An open source project.
JSP Standard Tag Library
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Database Design for DNN Developers Sebastian Leupold.
Computer Science II 810:062 Section 01 Session 2 - Objects and Responsibilities.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Joomla! Day France SEBLOD Version 2.0 for Joomla! 1.6.
MongoDB An introduction. What is MongoDB? The name Mongo is derived from Humongous To say that MongoDB can handle a humongous amount of data Document.
DELOVODNIK PRO A short presentation. Standard mail record keeping Most companies these days receive and send a lot of paper mail. Too many of them keep.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Goodbye rows and tables, hello documents and collections.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
Nonvisual Arrays and Recursion by Chris Brown under Prof. Susan Rodger Duke University June 2012.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Using the Bookshare API July Why do it? Developed in 2008, provides 3rd party developers access to Bookshare functionality in any application It.
Updating JUPITER framework using XML interface Kobe University Susumu Kishimoto.
Programming games Examples. Classwork: work and show something of your final project. Homework: (finish…)
By: Matt Batalon, MCITP  Another form of temporary storage that can be queried or joined against, much like a table variable, temp.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
MD – Object Model Domain eSales Checker Presentation Régis Elling 26 th October 2005.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
LDAP/TIO implementations -2- Overview of TIO-index implementations Henny Bekker The DAG, GIDS and Desire TIO/LDAP index servers.
ROOT I/O for SQL databases Sergey Linev, GSI, Germany.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
FHIR Server Design Review Brian Postlethwaite HEALTHCONNEX October 2015.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
MongoDB First Light. Mongo DB Basics Mongo is a document based NoSQL. –A document is just a JSON object. –A collection is just a (large) set of documents.
ICM – API Server & Forms Gary Ratcliffe.
Tweets Discrimination Analysis
Introduction to MongoDB. Database compared.
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Open Map Yamama Dagash & Haitham Khateeb under the supervision of: Benny Daon & Eyal Levin Open Map.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Leveraging SharePoint Search In SharePoint 2013 Jameson Bozeman.
#SummitNow What's Coming Arrived in CMIS November, 2013 Gregory Melahn/Alfresco Software
Notice: MySQL is a registered trademark of Sun Microsystems, Inc. MySQL Conference & Expo 2011 Michael “Monty” Widenius Oleksandr “Sanja”
VistA on Douglas K. Martin, MD
Node.js Express Web Services
Database Performance Tuning &
Safe by default, optimized for efficiency
VistA on Doug Martin, MD.
Exceedra + Azure Mark Rendle Principal Software Architect
Database Performance Tuning and Query Optimization
Un</br>able’s MySecretSecrets
Optimizing Microsoft SQL Server 2008 Applications Using Table Valued Parameters, XML, and MERGE
The Top 10 Reasons Why Federated Can’t Succeed
ESSnet on Data Warehousing 4th Workshop Maia Ennok 20th. of March 2013
Filip Rodik Code For Croatia / Gong
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Versatile workflow management Tool
Introduction to Access
Chapter 11 Database Performance Tuning and Query Optimization
Global Distribution.
Navigating SSMS Primer for Beginners
Veterans Health Administration
Presentation transcript:

SPARK Search Engine

Martijn Harthoorn Programmer at Furore Implementer of the Search Engine of SPARK The work after the question mark. Who am I?

The place of Search RESTService Storage Index & Search MongoDB Spark

Paradigm FHIR client should be easy. FHIR server needs to solve the complex issues. Search Search has some…

First there was Storage Search Then there was Search

Connectathon To test a client – you must have a tested server To test a server – you must have a tested client “One fool can ask more questions than seven wise men can answer”

Connectathon “But what if you are wrong?”

History Version 1. -A Generics based implementation -On top of the FHIR data model. -Programmed per search parameter programming. -No meta data available yet. -No indexing. -Slow.

History Version 2. -Data Model independent, -Meta data not available - manually added -Lucene.NET as indexer (Index in Lucene, Database in Mongo) -Fast -Standardised all parameter specifics into standard “modifiers”. -All Code based on search parameter types. -Joins are client side

History Version 3. -Modified to store the Lucene index in Mongo -Index storage unreliable. -Never saw light of day

History Version 4. CURRENT -Index storage to a dedicated Mongo collection -Build expression tree from parameters -Chained parameters have full functionality (modifiers, operators) -Joins are client side

Indexing Why indexing?

Why indexing

Why indexing

Indexing. HOW-TO You DO want A de-serialized data to an object with all values strongly typed. You DON’T want to spend time analyzing and interpreting JSON and/or XML. 1.Harvest the Resource 2.Determine data type 3.Groom your data 4.Store data in Index

Indexing. 1. Harvesting Resource: Patient Search parameter: family Searches for the family name and prefix of every HumanName that is registered with a Patient. Usage:

Indexing. 1. Harvesting Patient List Name (HumanName) Family Prefix Given Suffix Resource: Patient Search parameter: family Using the Visitor pattern Path from Meta data: "patient.Name.Prefix" "patient.Name.Family"

Indexing. 2. Determine data type > patient (Patient) > Name (HumanName) > LastName (string) Data type: string Search parameter type: string Selected indexing method: -Single value – as string -More values – as string array

Indexing. 2. Determine data type > patient (Patient) > Gender (Coding) > Coding (List ) > Code (CodeableConcept) Data type: Code Search parameter type: Token Selected Indexing method: Store in an array each codeable concept -System (uri) -Code (string) -Display (string)

Indexing. 3. Groom your data -Remove dashes, dots, slashes from dates etc. -If you implement a like search from the left side, you might want to split names at the dash in to multiple hits.

Indexing. 4. Store in the index FieldValue Resource"Patient" Local IDpatient/1 Level0 Family["LaVaughn", "Robinson", "Obama"] Given"Michelle" Gender[ { System: “…”, Code: “..”, Display: “..” }, … … * Level The patient is not a contained resource (level 0) * Family In Mongo you can store an array that can be searched like a normal string.

Future Version 5. NEXT -All parameters based on FHIR data types? -Joins using Mongo Map-Reduce?

Complexity So what is the issue?

Complexity Include & Chained parameters -Joining over references return multiple resource types -Client side (not in Mongo database) joins

Complexity Transactions -FHIR has bulk POST -Split between Indexing and storage

Complexity Multiple types Some properties do not have a fixed type. Example: observation.value Can be a: -CodeableConcept -String -Quantity (number + unit)