Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu HypertenGene: Extracting key.

Similar presentations


Presentation on theme: "1 Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu HypertenGene: Extracting key."— Presentation transcript:

1 1 Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu HypertenGene: Extracting key hypertension genes from biomedical literature …

2 2 Where are we from? Institute of Information Science Academia Sinica Taiwan

3 3 InCoB 2009 Institute of Information Science Academia Sinica Taiwan

4 4 HypertenGene: Extracting key hypertension genes from biomedical literature with position and automatically- generated template features Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu

5 5 Outline Motivation Major tasks Dataset Evaluation Conclusion

6 6 What Causes Hypertension

7 7 GAD Database Disease View Search for All Record found: 930 *About 930 PubMed ID about genes associate to hypertension recorded in GAD *Database update to 2008

8 8 Articles about Hypertension Over three hundred thousands abstracts about hypertension in PubMed

9 9 Key Hypertension Genes Genes which cause hypertension genetically Example The GNB3 may be considered a genetic marker for hypertension. [PMID: 14557282]

10 10 HG Pair in a Sentence The GNB3 may be considered a genetic marker for hypertension. S G H HG Pair

11 11 Motivation Major tasks Dataset Evaluation Conclusion Outline

12 12 Major Task 1.Gene named entity recognition (NER) and gene normalization (GN) 2.Hypertension named entity recognition 3.Gene-hypertension relation extraction

13 13 Gene Named Entity Recognition Example The GNB3 may be considered a genetic marker for hypertension. [PMID: 14557282]

14 14 Gene Named Entity Recognition Example The GNB3 may be considered a genetic marker for hypertension. [PMID: 14557282]

15 15 Gene Normalization Example The GNB3 may be considered a genetic marker for hypertension. [PMID: 14557282] Gene ID: 2784 guanine nucleotide binding protein (G protein), beta polypeptide 3 guanine nucleotide-binding protein, beta-3 subunit transducin beta chain 3 G protein, beta-3 subunit GTP-binding regulatory protein beta-3 chain GNB3

16 16 1.Gene named entity recognition (NER) and gene normalization (GN) 2.Hypertension named entity recognition 3.Gene-hypertension relation extraction Major Task

17 17 Disease NER Example The GNB3 may be considered a genetic marker for hypertension. [PMID: 14557282] In conclusion, REN 10631A alleles are significantly associated with EHT in the Emirati population. [PMID: 16138564] EHT : Essential HyperTension

18 18 Disease NEs in Evident Sentences OBJECTIVE: We sought to determine whether polymorphisms in the transforming growth factor (TGF)- beta3 gene are associated with risk of pregnancy- induced hypertension (PIH) in case-control mother-baby dyads.... CONCLUSION: A fetal TGF-beta3 polymorphism (rs11466414) is associated with PIH in a predominantly Hispanic population. PMID: 19628198

19 19 List of Hypertension Acronym Original NameAcronym pregnancy-induced hypertension PIH Primary pulmonary hypertension PPH Family history of hypertension FH Pulmonary hypertension PH More than 30 pairs were collected by acronym recognition component

20 20 1.Gene named entity recognition (NER) and gene normalization (GN) 2.Hypertension named entity recognition 3.Gene-hypertension relation extraction Major Task

21 21 Formulation HG pair 1 in a S 1 HG pair 2 in a S 1 HG pair 3 in a S 2 Binary Classification: if one target HG pair has relation or not Key Relation Not a Key Relation HG pair 2 HG pair 1 HG pair 3

22 22 Motivation Major tasks Dataset Evaluation Conclusion Outline

23 23 Datasets Our data set consists of 939 sentences from 195 abstracts selected from the GAD 1395 HG pairs can be extracted from these 939 sentences Positive HG pairNegative HG pair Number of HG pairs 3491046

24 24 Training & Testing Randomly selected 90% HG pairs for training set 10% HG pairs for test set Repeat 30 times Calculated the averages to compare their performance

25 25 Motivation Major tasks Dataset Evaluation Conclusion Outline

26 26 Scoring Method : F-score The weighted harmonic mean of precision and recallharmonic mean HG1 HG2 HG3 HG4 HG5 HG6 HG7 HG8 HG10 HG9 Dataset : HG1~HG10 Key Gene Prediction HG1 HG4HG5HG6 HG7 precision : 1/5 = 0.2 recall : 1/3 = 0.33 F-score : (2*0.2*0.33)/(0.2+0.33)=0.25

27 27 AUC of the iP/R curve * n is the total number of correct HG pairs * p i is the highest interpolated precision for the correct HG pairs j at r j * r j the recall at that HG pairs * Interpolated precision pi is calculated for each recall r by taking the highest precision at r or any r’ > r.

28 28 Scoring Method : AUC HG1 HG2 HG3 HG4 HG5 HG6 HG7 HG8 HG10 HG9 Dataset : HG1~HG10 Key Gene Prediction 1st HG1 2nd HG6 3rd HG7 4th HG2 5th HG3 1st HG1 2nd HG6 3rd HG2 4th HG3 5th HG7 Key Gene Prediction Precision : 0.6, Recall : 1, F-score : 0.75 AUC : 0.733333 Precision : 0.6, Recall : 1, F-score: 0.75 AUC : 0.833333

29 29 Select Features for Classification Binary Classification Features

30 30 Select Features for Classification The GNB3 may be considered a genetic marker for hypertension. Binary Classification Key HG pair or not Features

31 31 Features Basic Word Features Chunk Features Parse Tree Path Features Template Features Position Features

32 32 Basic Word Features The GNB3 may be considered a genetic marker for hypertension. Words between may, be, considered, a, genetic, marker, of, predisposition, for Words between (bigram) may_be, be_considered, considered_a, a_genetic, genetic_marker, marker_of, of_predisposition, predisposition_for

33 33 Parse Tree Path Features Parse Tree Path Features : NP_S_VP_NP_PP_NP

34 34 Chunk Features The GNB3 may be considered a genetic marker for hypertension. Inter-HG chunk types VP_NP_PP Inter-HG chunk head words consider_marker_hypertension Word TheGNB3maybeconsideredageneticmarkerforhypertension Chunk B-NPI-NPB- VP I-VP B- NP I-NP B- PP B-NP

35 35 Result of Baseline Features ConfigPrecisi on RecallF-scoreAUCS AUC Baseline0.7040.5360.603 0.4930.126 Baseline : Basic word+ Chunk + Parse Tree S AUC : Standard Variation of AUC

36 36 Template Features Especially, a polymorphism in SLC12A was significantly associated with hypertension in women even after correction by the Bonferroni method. The leptin gene polymorphism was associated with hypertension independent of obesity. On analysis of covariance, the interaction between ND2 - 237 Leu / Met polymorphism and habitual drinking was significantly associated with both systolic blood pressure and diastolic blood pressure. … gene … associated with … hypertension

37 37 Result of B+T Features ConfigPRF-scoreAUCS AUC ∆AUCtAUC>AUC B ? (t >1.67?) Baseli ne 0.7040.5360.603 0.4930.126N/A B+T0.7330.5400.6150.513 0.1050.0110.65No B : Baseline feature (words feature, chunk feature, parse tree) T : Template features t : t test

38 38 Position Features Relative position features Section features Divide an abstract into four sections : Value = 0~10 ObjectiveMethodsResultConclusions

39 39 Before Section Categorization

40 40 After Section Categorization

41 41 PubMed EX

42 42 Result ConfigPRF- score AUCS AUC ∆AU C tAUC>AU C B ? (t >1.67?) Baselin e 0.7040.5360.603 0.4930.126N/A B+T 0.7330.5400.615 0.5130.1050.0110.65No B+P 0.8250.8230.820 0.8140.0870.36011.44Yes B+P+T 0.8150.8790.8410.818 0.0840.37811.75 Yes B : Baseline feature (words feature, chunk feature, parse tree) P : Position features T : Template features

43 43 Motivation Major tasks Dataset Evaluation Conclusion Outline

44 44 Conclusions-1 The first systematic study of extracting hypertension-related genes.

45 45 Conclusions-2 The first attempt to create a hypertension- gene relation corpus base on the GAD database.

46 46 Conclusions-3 Propose a supervised learning approach for extracting key hypertension-related genes.

47 47 Thanks for your attention


Download ppt "1 Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu HypertenGene: Extracting key."

Similar presentations


Ads by Google