The BioText Project: Recent Work Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from Genentech
Project Team Project Leaders: PI: Marti Hearst Co-PI: Adam Arkin Computational Linguistics Preslav Nakov Emilia Stoica Sarah Poon IR/Databases/Software Ariel Schwartz Itai Brickner Brian Wolf Bioscience Janice Hamer Alumni Dr. Barbara Rosario Dr. TingTing Zhang Gaurav Bhalotia
BioText Project Goals Provide flexible, intelligent access to information for use in biosciences applications. Focus on Textual Information from Journal Articles Tightly integrated with other resources Ontologies Record-based databases
BioText Architecture Sophisticated Text Analysis Annotations in Database Improved Search Interface
Today’s Talks 1. Intro (Marti) 2. Design and Implementation of the Layered Query Language (Ariel & Brian) 3. Adding Fulltext to LQL (Itai) 4. Determining Gene Function from Text (Emilia) 5. Using the Web as an Implicit Training Corpus (Presley) 6. Identifing Protein-Protein Interactions (Marti, covering Barbara’s work) 7. Citances (Marti) 8. Discussion: what should our user interface do?
Recent Papers Predicting Gene Functions from Text Using a Cross- Species Approach, Emilia Stoica and Marti Hearst, to appear in PSB Multi-way Relation Classification: Application to Protein- Protein Interaction, Barbara Rosario and Marti Hearst, in HLT/EMNLP Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution, Preslav Nakov and Marti Hearst, in HLT/EMNLP 2005.
Recent Papers Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing, Preslav Nakov, Ariel Schwartz, Brian Wolf, and Marti Hearst, in ACL/ISMB SIGLINK Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing, Preslav Nakov and Marti Hearst, in CoNNL Citances: Citation Sentences for Semantic Analysis of Bioscience Text, Preslav Nakov, Ariel Schwartz, and Marti Hearst, in the SIGIR'04 workshop on Search and Discovery in Bioinformatics.