0 Unstructured Machine Learning: Providing the link between Genetic Data and Published Research Dr Tony C Smith Reel Two, Inc. 9 Hartley Street Hamilton,

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

EndNote Web Reference Management Software (module 5.1)
EndNote Web Reference Management Software (module 5)
THOMSON REUTERS INTEGRITY SM : INTEGRATED DRUG DISCOVERY AND DEVELOPMENT PORTAL.
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
© Siemens Product Lifecycle Management Software Inc. All rights reserved Siemens PLM Software Solid Edge ST4 Training Revising assemblies.
Databases & Data Warehouses Chapter 3 Database Processing.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
MARKETING STRATEGIES More information:
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Data Mining By Dave Maung.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Dr Jamal Roudaki Faculty of Commerce Lincoln University New Zealand.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Search Engine Architecture
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Expanding GO annotations with text classification Nicko Goncharoff Reel Two, Inc.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Case Study SummaryChallenges UPS is the world's largest package delivery company and a provider of supply chain management solutions. Their office in Brazil.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.
Copyright OpenHelix. No use or reproduction without express written consent1.
CS507 Information Systems. Lesson # 11 Online Analytical Processing.
Realtime Financial Monitoring and Analysis System May 2010 Lietu Search Engine.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Developing GRID Applications GRACE Project
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Innovative Novartis Knowledge Center
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
TDM in the Life Sciences Application to Drug Repositioning *
OARE Module 5A: Scopus (Elsevier)
Research on Knowledge Element Relation and Knowledge Service for Agricultural Literature Resource Xie nengfu; Sun wei and Zhang xuefu 3rd April 2017.
Development of the Amphibian Anatomical Ontology
Elsevier Activity Range
Taxonomies, Lexicons and Organizing Knowledge
Introduction of KNS55 Platform
Introduction to Information Retrieval
Search Engine Architecture
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Information Retrieval and Web Design
Presentation transcript:

0 Unstructured Machine Learning: Providing the link between Genetic Data and Published Research Dr Tony C Smith Reel Two, Inc. 9 Hartley Street Hamilton, New Zealand

1 What is Machine Learning? creating computer programs that get better with experience learn how to make expert judgments discover previously hidden, potentially useful information (data mining) How does it work? user provides learning system with examples of concept to be learned induction algorithm infers a characteristic model of the examples model is used to predict whether or not future novel instances are also examples – and it does this very consistently, and very, very quickly!

2 Structured Learning WeightDamageDirtFirmnessQuality heavyhighmildhardpoor heavyhighmildsoftpoor normalhighmildhardgood lightmediummildhardgood Lightclearcleanhardgood normalclearcleansoftpoor heavymediummildhardpoor... Mushroom Data weight good dirt firmness poor heavy light normal mildclean hardsoft poor good good

3 Unstructured Learning data does not have fixed fields with specific values examples: images, continuous signals, expression data, text learning proceeds by correlating the presence or absence of any and all salient attributes Document Classification given examples of documents covering some topic, learn a semantic model that can recognize whether or not other documents are relevant prioritize them: i.e. quantify “how relevant” documents are to the topic not limited to keywords (nor is it misled by them) adapt to the user’s needs (ephemeral or long-term)

4 How Text Mining Works Users supply the system with training data Documents that are good examples of the desired category The system builds ‘classifiers’ Statistical models based on the training data The system classifies novel data Identifies other documents about the desired category Results are displayed or stored Files can be viewed, routed to end users or stored in databases

5 Classification System Client-specific categories Familiar Windows-style interface Drag-and-drop documents to create custom categories Classified documents are ranked by relevance View contents of individual documents – sentences are highlighted by their relevance to the category

6 The Initial Problem: Individual curators evaluate data differently Protein Modification MAPK-KK Cascade Activation of p38 MAP Kinase While scientists can agree to use the word "kinase," they must also agree to support this by stating how and why they use "kinase," and consistently apply it. Only in this way can they hope to compare gene products and find out if and how they are related. The Gene Ontology – A Good First Step The Initial Solution: The Gene Ontology (GO) – A controlled vocabulary with defined relationships between items. GO consists of more than 13,000 nodes, or ‘GO Terms’, divided into three main trees: Biological Process, Cellular Component and Molecular Function Of these, only about 3800 GO Terms are ‘active’ – that is, terms appended with more than just one or two publications.

7 The Gene Ontology Knowledge Discovery System GO KDS) bridges the gap by classifying all of MEDLINE. New documents are classified as they’re added Scientists can now annotate gene targets quickly and reliably GO KDS is updated along with GO and MEDLINE Enormous gap between GO- annotated docs (27,000) and full MEDLINE database (12 million entries). Updates lag behind. Scientists must understand and agree to use the GO Knowledge changes and alters definitions. GO is only a partial solution GO KDS – Filling the gaps in GO Using GO “as is” takes too long and delivers too little

8 Current GO term(s) openLocation of listed term in GO All sub-terms for the listed term: click on a term to further refine your search Enter a keyword to search in this GO category Opens abstract in separate window Color of stars identifies the GO branch: number of stars indicates confidence of category placement Original GO classifications (by domain-expert) KDS discovers novel classifications GO KDS Interface Tour

9 GO KDS Key Benefits  Quickly sort documents into most relevant categories to the user  Replace laborious annotation by domain experts with a trainable, automated system  Discover conceptual links between previously unrelated scientific domains  Identify key articles for pertinent research  Integrate public, private and proprietary documents

10 Drug Approval Collecting information Organizing/Collating documents Satisfying approval criteria Life Science Research Finding relevant literature Prioritizing articles/reports Discovering hidden connections Distributing information Patent preparation Searching patent databases Collecting relevant documents Synthesizing information How is document classification useful?

11 Intelligent Text Mining: Therapeutic Courses One Reel Two client is using Classification System to rapidly sort through large volumes of medical documentation in disparate therapeutic areas. The Problem: Client must generate E-Learning Courses from hundreds of pages of reports, literature and product documentation supplied by client Old Solution: Manually read through documents to find paragraphs related to ‘Diagnosis’, Etiology, Epidemiology etc. New Solution: Use Reel Two Classification System to build a custom taxonomy, then automatically classify and extract relevant document sections into Therapeutic Area categories

12 Intelligent Text Mining – Patent Analysis Search patent filings for the ideas or concepts behind one’s analysis – Explore state of prior art, competitive landscape or ‘innovation gaps’ – Overcome intentionally vague language in patent filings The Mechanism of Action listed for this patent is "Neurotransmitter release modulator." However Classification System identified that this chemical modulator binds to the acetylcholine receptor, which is the true mechanism of action, and classified this patent in “MoA: Acetylcholinesterase”. In an in vitro assay, 2-chloro-5-(3-(R)-pyrrolidinylmethoxy)-3-pyridinecarbaldoxime (Ia) exhibited a Ki value for binding to neuronal nicotinic acetylcholine receptors of nM. ACTIVITY - Analgesic; neuroprotective; nootropic; antiparkinsonian; neuroleptic; tranquilizer; antiinflammatory; antidepressant; anabolic; anorectic; anticonvulsant; uropathic; gastrointestinal; antiaddictive; gynecological. MECHANISM OF ACTION - Neurotransmitter release modulator. Identifying ‘Mechanism of Action’ in life science patents Patents are classified according to a taxonomy built by the client: Alzheimer’s Patents MoA: 5-HT Inhibitor MoA: Acetylcholinesterase MoA: Antioxidant MoA: Antiviral… Example Project Sample Output

13 “Life Science Information Management will form the largest unmet need for IT companies in the 21st Century” Caroline Kovak, General Manager, IBM Life Sciences

14 1. Search for a particular GO term by opening one of the main branches Appendix: GO KDS Interface

15 2. ‘Drill down” through the taxonomy to find a term of interest. Click on that term. Appendix

16 3. Select the desired GO term. ‘Open’ the category by clicking on ‘new search with this term.’ Appendix

17 4. Scroll down to view abstracts. Appendix

18 5. Discover conceptual links to other GO categories. Click on the category to add the term to your search. Appendix

19 6. View the data intersection between GO categories. Scroll through to view abstract. Appendix

20 7. GO terms identify concepts embodied in the abstracts, enabling quick review. Appendix

21 8. Select an abstract of interest, and click to open the complete abstract. Appendix

22 9. The abstract will open in a new window, allowing you to continue with your search, or to link directly to the journal. Appendix