An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems Pablo Matos, Leonardo Lombardi, Thiago Pardo,

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
European Cooperative Network for Research, Diagnosis and Therapy of Parkinson´s Disease EuroPa Plenary Meeting WORK PACKAGE 3 WORK PACKAGE 3 - Patient.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Decision Support and Artificial Intelligence Jack G. Zheng July 11 th 2005 MIS Chapter 4.
WORKFORCE PLANNING June 2011 Amr Fouad Training & Research Sector Ministry of Health & Population.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
Labeling claims for patient- reported outcomes (A regulatory perspective) FDA/Industry Workshop Washington, DC September 16, 2005 Lisa A. Kammerman, Ph.D.
Knowledge Dietary Managers Association 1 DMA Certification Exam Blueprint and Curriculum Development.
Relational Database and Data Modeling
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
International Telecommunication Union Workshop on Standardization in E-health Geneva, May 2003 Digital Imaging in Pathology for Standardization Yukako.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
CALENDAR.
0 - 0.
1 1  1 =.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Query optimisation.
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Making the System Operational
ZMQS ZMQS
A Process Based on Paragraph for Treatment Extraction in Scientific Papers of the Biomedical Domain Juliana Duque, Pablo Matos, Cristina Ciferri, Thiago.
The 5S numbers game..
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
RFT Evaluation of clinical interventions in community pharmacy Final Report This project was funded by the Australian Government Department of.
Report Card P Only 4 files are exported in SAMS, but there are at least 7 tables could be exported in WebSAMS. Report Card P contains 4 functions: Extract,
Configuration management
Software change management
DOROTHY Design Of customeR dRiven shOes and multi-siTe factorY Product and Production Configuration Method (PPCM) ICE 2009 IMS Workshops Dorothy Parallel.
ABC Technology Project
Galit Haim, Ya'akov Gal, Sarit Kraus and Michele J. Gelfand A Cultural Sensitive Agent for Human-Computer Negotiation 1.
Federal Department of Home Affairs FDHA Federal Statistical Office FSO Meeting of the OECD Expert Group on SDMX September, OECD, Paris Centralized.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
:: DIAsDEM :: Seminar: Web Mining WS 2003/2004 Ingo Kampe Heiko Scharff.
Squares and Square Root WALK. Solve each problem REVIEW:
Lecture 3 Validity of screening and diagnostic tests
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Dept of Biomedical Engineering, Medical Informatics Linköpings universitet, Linköping, Sweden A Data Pre-processing Method to Increase.
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
1 First EMRAS II Technical Meeting IAEA Headquarters, Vienna, 19–23 January 2009.
Addition 1’s to 20.
25 seconds left…...
Subtraction: Adding UP
Week 1.
Number bonds to 10,
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
We will resume in: 25 Minutes.
1 Unit 1 Kinematics Chapter 1 Day
1 Adolescence Chapter 11: Sexuality 2 What do these women have in common?
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
Chapter 13 The Data Warehouse
How Cells Obtain Energy from Food
Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.
Presentation transcript:

An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems Pablo Matos, Leonardo Lombardi, Thiago Pardo, Cristina Ciferri, Marina Vieira, and Ricardo Ciferri presented by Thiago Pardo USP NLP Group and UFSCar Database Group, São Carlos, BR

Context and Motivation A lot of electronic documents that report experiments treatment adopted patients with some kind of disease number of patients enrolled in the treatment symptoms and risk factors positive and negative effects There are several transactions and journals e.g., American Journal of Hematology, Blood, and Haematologica An Environment for Data Analysis - IEA-AIE /02/10 2/22

Context and Motivation Nowadays, researchers and doctors are not able to process this huge number of documents An Environment for Data Analysis - IEA-AIE /02/10 3/22

Context and Motivation These documents are in unstructured format, i.e., in plain textual form, specially in PDF There is necessary to transform these data from unstructured to structured format in order to submit it to an automatic knowledge discovery process An Environment for Data Analysis - IEA-AIE /02/10 4/22

Goal Development of an environment called IEDSS-Bio for analyzing data of biomedical domain, i.e., Sickle Cell Anemia Support the expert in making decisions: Extracting relevant information from biomedical documents Storing the information in a data warehouse (DW) Mining interesting knowledge from the DW 06/02/10 An Environment for Data Analysis - IEA-AIE2010 5/22

Contributions Theoretical: Domain Knowledge Methodology of Information Extraction Practical: Resources: collection of documents, dictionary and rules Tools: Converter, Information Extraction, Data Warehouse, Data Mining systems An Environment for Data Analysis - IEA-AIE /02/10 6/22

The Environment for Data Analysis An Environment for Data Analysis - IEA-AIE /02/10 How many patients had clinical improvement and were treated with the hydroxyurea drug? A significant amount of patients under treatment with the hydroxyurea drug tend to have marrow depression. 7/22

Converter Module An Environment for Data Analysis - IEA-AIE /02/10 8/22

Converter Module An Environment for Data Analysis - IEA-AIE /02/10 9/22

Information Extraction Module An Environment for Data Analysis - IEA-AIE /02/10 Processed Sections: Abstract, Results and Discussion (class of positive and negative effects) All Sections (class of patient) 10/22

Sentence Classification An Environment for Data Analysis - IEA-AIE2010 Output Training Positive Effect Negative Effect Others Test Several files about complication sentences Several files about benefit sentences Several files about other sentences New Text TXT Set of sentences classified into classes Classes 06/02/10 11/22

Identification of Relevant Information An Environment for Data Analysis - IEA-AIE /02/10 Dictionary Biomedical Database 12/22

Identification of Relevant Information An Environment for Data Analysis - IEA-AIE /02/10 Identification of Information Pipeline Example of Sentences Relevant Information Rules 13/22

Experiments: Sentence Classification 1. How do human beings manually perform the sentence classification? 2. Is it feasible to automate the sentence classification task? 3. What kind of classification algorithm performs better in this task? An Environment for Data Analysis - IEA-AIE /02/10 14/22

Manual Classification by humans? Annotation Agreement in 50 sentences An Environment for Data Analysis - IEA-AIE /02/10 Fleiss (1971) 1 15/22

It is feasible to automate this task? AnnotatorAll the classes 3 experts naïve subjects0.71 experts + naïve subjects0.65 An Environment for Data Analysis - IEA-AIE /02/10 AgreementScale PoorUnder 0 Slight0 a 0.2 Fair0.21 a 0.4 Moderate 0.41 a 0.60 Substantial0.61 a 0.80 Almost PerfectBetween 0.81 and 1 Landis e Koch (1977) 2 16/22

What kind of classification algorithm performs better in this task? An Environment for Data Analysis - IEA-AIE /02/10 Distribution of classes for each sample 3 17/22

An Environment for Data Analysis - IEA-AIE /02/10 Bag-of-words model AVM configuration: Minimum Frequency = 2 Attributes: 1 to 3-grams 1, for the case the n-gram occurs in the sentence (present); 0 otherwise (absent). Not considered: stopwords removal and stemming Sentence Classification Process: training and testing phase 3 18/22

An Environment for Data Analysis - IEA-AIE /02/10 Evaluation 3 Partitioning method: 10-fold cross-validation 19/22

Conclusions The environment proposed – Information Extraction and Decision Support System in Biomedical domain – aims at being a general environment for mining relevant information in the biomedical domain First experiments on sentence classification a step of the whole process very good results (95.9% accuracy) for papers about Sickle Cell Anemia (SCA) Task of sentence classification in the SCA domain is well defined and possible to be automated An Environment for Data Analysis - IEA-AIE /02/10 20/22

Future Work Investigate the identification of treatment and symptoms information in scientific papers Extract of the relevant sentence pieces for populating our databases using IE approaches, e.g., rule-based and dictionary-based Investigate the use of parallel processing to optimize the more time- consuming tasks, e.g., the application of data mining algorithms and the analytical query processing Other biomedical areas may also benefit from our text mining approach An Environment for Data Analysis - IEA-AIE /02/10 21/22

An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems USP NLP Group and UFSCar Database Group, São Carlos, BR Questions ?

References ANTHONY, L.; LASHKIA, G. V. Mover: a machine learning tool to assist in the reading and writing of technical papers. IEEE Transactions on Professional Communication, v. 46, n. 3, p , FLEISS, J. L. Measuring nominal scale agreement among many raters. Psychological Bulletin, v. 76, n. 5, p , LANDIS, J. R.; KOCH, G. G. The measurement of observer agreement for categorical data. Biometrics, v. 33, n. 1, p , PINTO, A. C. S. et al. Technical Report "Sickle Cell Anemia". São Carlos: Department of Computer Science, Federal University of São Carlos, p. 16. Available at:. An Environment for Data Analysis - IEA-AIE /02/10 23/22