Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,

Similar presentations


Presentation on theme: "Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,"— Presentation transcript:

1 Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center, Chennai, India & K. Vijay-Shanker University of Delaware

2 Information Extraction from the Literature Wealth of information available (only) in unstructured form (scientific literature) Need to store data in structured form (databases) for bioinformatics applications Information extraction is an active field. Focus in the biological domain -- extracting information on protein pairs that interact

3 Phosphorylation Extraction <Agent = Frp-1 Theme = p53 Site = Ser 15> <Agent = JNK Theme = c-Jun Site = unk>

4 Why Phosphorylation Central role in signal transduction. One of the more widely studied IE generalize to other post-translational modifications and binding. Different challenges –Agent and target – not just proteins –Site of phosphorylation

5 Steps in Text Processing and Information Extraction Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

6 BioNEx (PSB 2003) Detects Names/Terms of following types: –Protein/gene –Protein/gene parts –Chemicals –Source –Others Two tasks – Detection and Classification

7 Classification -- F-Term Names are often descriptive NPs –Simian immundeficiency virus –T cell –Mitogen-activated protein kinase –Ras guanine nucleotide exchange factor Description of function/class of entities Useful to assign types

8 Additional Sources for Classification Using context – h-terms such as “expression” – “…IL-2 expression…” Appositives –“Mek1, a tyrosine kinase,…” Acronyms –Mitogen-activated protein kinase (MAPK)…...MAPK … Coordination High precision and recall (PSB 2003)

9 Steps in IE Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

10 Phrase Chunking Detect BaseNPs –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10

11 Phrase Chunking Detect BaseNPs and Verb Groups –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10

12 Phrase Chunking Detect BaseNPs –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10 Verb groups (passive vs. active forms)

13 Phrase Chunking Detect BaseNPs –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10 Verb groups Appositives –… Sic1, an inhibitor …, is phosphorylated Relative Clauses – … Ser38 which is phosphorylated …

14 Steps in IE Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

15 Why type classification? A phosphorylated B in C –ATR/FRP-1 also phosphorylated p53 in Ser 15 … –Active Chk2 phosphorylated the SQ/TQ sites in Ckk2 SCD … –cdk9/cyclinT2 could phosphorylate the retinoblastoma gene (pRb) in human cell lines

16 Type Classification Extensive use of type information in rules Typing done by means of –Phrase internal -- e.g., Ras guanine nucleotide exchange factor Sos –Contextual – e.g., homolog of TAF(II) –syntactic information – appositive, coordination etc.

17 Steps in IE Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

18 Patterns and Templates (in/at )? –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10 Active, Passive, Adjectival forms for phoshorylate/phosphorylated Different orders and optionality of arguments

19 Patterns for Phosphorylation Non-Verbal (not common) but frequent Phosphorylation of (by )? (in/at )? Phosphorylation of … by/via phosphorylation (at )? Altogether, large number of patterns from examining 300 abstracts and 10 journal articles.

20 Sentence-Based Evaluation PrecisionRecall F-measure Agent 91 88 89 Theme 96 87 92 Site 94 73 82 Relation 89 77 83 Agent 89 89 81 Theme 98 93 96 Site 100 96 96 Relation 96 89 92

21 Utility in Building Databases IE on 1000 abstracts – 5m/3s Precision on 200 abstracts –Relation > 92% Scales up well Useful for constructing DBs.

22 Discussion High precision and recall Beyond protein-protein (e.g., site) Non-verbal Generalizes to other post-translational modifications? (acetylate, methylation,…)

23 Future Work Piecemeal information specification X phosphorylates Y + phosphorylation of Y at Z = X phosphorylates Y at Z Fusion/Information Merging


Download ppt "Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,"

Similar presentations


Ads by Google