Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIE 2008 1 Biomedical Knowledge Confidence Criteria Assessment of Biomedical Knowledge According to Confidence Criteria Ines Jilani :

Similar presentations


Presentation on theme: "MIE 2008 1 Biomedical Knowledge Confidence Criteria Assessment of Biomedical Knowledge According to Confidence Criteria Ines Jilani :"— Presentation transcript:

1 MIE 2008 1 Biomedical Knowledge Confidence Criteria Assessment of Biomedical Knowledge According to Confidence Criteria Ines Jilani : ines.jilani@spim.jussieu.fr ines.jilani@spim.jussieu.fr Natalia Grabar Pierre Meneton Marie-Christine Jaulent Wednesday, 28 th of May 2008

2 MIE 2008 2 Context Increasing number of biomedical articles in Pubmed* Follow-up work on automatic extraction of functional knowledge about genes/proteins from scientific articles Δ indexed in Pubmed –Using lexico-syntactic patterns: Language specific automaton (grammar) oSyntactic elements (Verb, Noun, Adjective…) oSemantic elements (Meaning of words…) * http://www.ncbi.nlm.nih.gov/sites/entrez Δ Jilani I, Grabar N & Jaulent M.-C. Fitting the finite-state automata platform for mining gene functions from biological scientific literature. In SMBM in Jena (Germany) 2006

3 MIE 2008 3 Example of lexico-syntactic pattern o(Sox2; sensory organ development) o(Hint; murine development)

4 MIE 2008 4 Introduction Limits of the system –Loss of context: reliability and confidence of the claim Solution –Use some devices to « weight » the extracted knowledge In order to make more confident use of extracted knowledge oHedge, modifier, qualifier oConfidence markers

5 MIE 2008 5 Hedges, modifiers, qualifiers … Linguistic devices used by authors to qualify their assertions –Different grammatical categories: verbs, adverbs, adjectives… –“Copper deficiency is a plausible cause of Alzheimer disease (AD). This hypothesis should be tested with a lengthy trial of copper supplementation”* “hedge” was first used by Lakoff Δ : “words whose job it is to make things more or less fuzzy” Hyland Φ, and others carried out qualitative studies of these qualifiers –without modelling them –nor integrated their use for weighting any kind of information in a knowledge extraction system * Quoted from the abstract of the article with Pubmed Identifier 17928161 Δ Lakoff, G., (1972) : Hedges: A study of Meaning Criteria and the Logic of Fuzzy Concepts, Chicago Linguistic Society, 8, pp. 183-228 Φ Hyland, K. 1995. The Author in the Text: Hedging Scientific Writing. Hong Kong Papers in Linguistics and Language Teaching.

6 MIE 2008 6 Objectives Work on confidence markers in scientific articles –Their use –Their significance –Their classification –Their automatic detection in texts for knowledge weighting purposes The main aim was to document the information so that it could be used confidently E.g. : (Sox2; sensory organ development) –Sox2 is required for sensory organ development –Sox2 might be required for sensory organ development –Sox2 is probably required for sensory organ development –Our findings suggest that Sox2 is required for sensory organ development –Doe, et al. has demonstrated that Sox2 is required for sensory organ development

7 MIE 2008 7 Materials 3 corpora obtained by querying Pubmed Lexical resource: WordNet®* is a large lexical database of layman English: nouns, verbs, adjectives and adverbs –Used to enrich the extracted confidence markers by identifying their synonyms * WordNet, An Electronic Lexical Database, Christiane Fellbaum ed., (1998), The MIT Press, Cambridge, Mass CorpusQUERYSPECIESSOURCESPECIFICITYNUMBER of SENTENCES CORP1 160 genes + Alzheimer disease humanPubmed355 abstracts817 CORP2 160 genes + Alzheimer disease humanPubmed Central 68 full texts27,912 CORP3 160 genes + Alzheimer disease wormPubmed348 abstracts825

8 MIE 2008 8 Methods Manual collection of confidence markers from CORP1, CORP2 and CORP3 Enrichment of the list of confidence markers –Using WordNet® Classification of confidence markers according to 2 types of classes Add the Impact Factor (IF) as another confidence criterion –Hypothesis: IF of a journal is subjectively related to the reliability of the biological and medical information published Modeling confidence criteria: develop a formula allowing to order the triplets (representing annotations) in respect to their confidence score, and consistently

9 MIE 2008 9 Results 250List of 250 manually collected confidence markers was generated 478Enrichment using WordNet® increased the number of confidence markers listed to 478 Classification –4 different categories in ascending order of confidence  Type 1 –10 distinct qualifiers modifying confidence levels within the Type 1 categories, characterizing subjectivity in texts  Type 2

10 MIE 2008 10 1 - Interrogation or trial and error of the author: Knowledge that remains unproven and requires demonstration. e.g.: “remain to be confirmed”, “has yet to be identified”, “?” 2 - Distance suggested by the author compared to his assertions or the knowledge presented in the text: It may also correspond to a restriction of the knowledge concerned to a specific context (e.g.: the context of the article or experiment). e.g.: “our findings suggest that”, “in this case we conclude that”, ”it is possible that” 3 - Studies by other researchers, references to other works, articles or methods: We assume that if an article is cited, the information is assumed, or at worst simply believed to be true. e.g.: “previous observation”, “it is now believed that”, “it has been proposed that” 4 - Demonstration or proof given by the author: This corresponds to work carried out by the author and presented in the concerned article. e.g.: “we reveal that”, “we show here that”, “our results indicate that”… Results: Type 1 class

11 MIE 2008 11 10 Qualifiers representing probabilities from negation to affirmation, i.e. from the least probable to the most probable Results: Type 2 class* Confidence - -Confidence + + * Work derived from: Ian Jacobs. 1995. English Modal Verbs

12 MIE 2008 12 Results: Modeling Modeling confidence criteria for their automatic extraction –Regular expressions are used “we anticipate” and “we expect”  we *( + ) –Synonyms are used “we hypothesise” and “we suspect ”  we *( + + + + ) “have been previously confirmed”, “is now largely confirmed” and is “widely confirmed ”  * (previously+now)*(largely+widely+extensively+generally)* We had anticipated that… We have anticipated that… We expect that…. We have expected that…

13 MIE 2008 13 Results: Application Context of apolipoprotein E gene * * * points Triplets (Gene, Function, PMID)

14 MIE 2008 14 Results : Explanations - ApoE allelic variability influences pupil response to cholinergic challenge and cognitive impairment. 1 - The Apolipoprotein E (ApoE) epsilon4 allele role in LOD is controversial, while it is still unknown it is still unknown in vascular depression. 2 seems - ApoE4 seems to facilitate HSV-1 latency in the brain much more so than ApoE3. 3 Triplets Type1 Type2 IF ApoE/ cognitive impairment/16764677 1 4104,091 ApoE epsilon4 allele/vascular depression /17337010 2 1102,035 ApoE4/ HSV-1 latency/16699018 3 2105,178 confidence order Triplets ordered in an ascending confidence order: 1 ; 3 ; 2

15 MIE 2008 15 Discussion / Conclusion Confidence markers collected manually –Abstracts –full text articles They are extended with WordNet® resource They are classified into 4 categories of Type 1 and 10 categories of Type2 This study constitute a priming work: the confidence markers will be easily added to lexico-syntactic patterns already generated for annotating genes/proteins functionally Annotation already present in databases could be additionally documented with confidence markers –Gene Ontology Annotation files –Swissprot / Uniprot The confidence markers can be used by curators to annotate genes/proteins through a system able to detect those qualifiers

16 MIE 2008 16 Perspectives The users of the final system are potentially biologists, curators… Take into account for the confidence scoring the type of study presented in an article –Observational study (epidemiological) –Controlled experiment –Clinical essay…


Download ppt "MIE 2008 1 Biomedical Knowledge Confidence Criteria Assessment of Biomedical Knowledge According to Confidence Criteria Ines Jilani :"

Similar presentations


Ads by Google