Presentation is loading. Please wait.

Presentation is loading. Please wait.

Please have a seat. Our program will commence shortly.

Similar presentations


Presentation on theme: "Please have a seat. Our program will commence shortly."— Presentation transcript:

1 Please have a seat. Our program will commence shortly.

2 Biomarker Automated Retrieval Tool Ronny Chan, Kim Ngo Earth Science Data Systems Dept.

3 Bioinformatics Relationship Science produces massive amounts of data Data needs to be analyzed, stored, & retrieved  This is data-mining We want to apply computer science to improve this process

4 Motivation Problems with conventional data mining Time consuming Accuracy not defined (subjective) No objective scientific info retrieval tool Where are the Biomarkers?

5 Cancer Biomarkers An indicator of cancerous growth.

6 Proposed Solution Create a program that allows people to quickly scan literature for the most relevant keywords/biomarkers B.A.R.T. HER-2 HPEBP4 EP-CAM ERBB2 BAG-1

7 Significance What is the need of the project? More efficient research Save time conventional enhanced B.A.R.T.

8 Goals Make biomarker/keyword searches more efficient Learn Java Learn SQL

9 Approach Write a program Read in articles Use part of Vector Space Model algorithm to rank terms Output relevant terms in statistical rankings they BRCA1 VS.

10 Vector Space Model Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines

11 Algorithm for B.A.R.T. Keywords Input PubMed Query Agent Data Store Data Retrieval and Output Content Analyzer Keyword Parser Content Ranker

12 DCIS CU-TP3982 ERBB2 HER-2 HPEBP4 BAG-1 EP-CAM 99M Results

13 Lessons & Difficulties Deciding on algorithm choice Ease of implementation and effectiveness Limited knowledge & experience Java, SQL Initial implementation is slow 5 ARTICLES=160 sec UPDATE: AUGUST 18, 2004  100 ARTICLES=8^19 years 20 ARTICLES=1904 sec 100 ARTICLES=8^38 years

14 Future work Apply different term weight functions to make results more robust Optimize the program for speed

15 Citations 1. http://ir.iit.edu/~dagr/cs529/files/handouts/03Vector SpaceImplementation-6per.PDF 2. http://classes.engr.oregonstate.edu/eecs/spring200 4/cs419/10 3. http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw- rank.pdf 4. http://hartford.lti.cs.cmu.edu/classes/95- 778/Lectures/04-BooleanVectorSpaceB.pdf 5. Biomarkers Definitions Working Group. Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).

16 Acknowledgements Earth Science Data System, JPL Tina Xiao Paul Ramirez Chris Mattmann Roshanak Roshandel Sean Hardman ALL SoCalBSI Colleagues National Institute of Health (NIH) National Science Foundation (NSF) Southern California Bioinformatics Summer Institute (So Cal BSI) SoCalBSI Professors Jacqueline Heras

17 Q :malignant breast cancer D 1:detection of malignant level in the cell D 2:sighting of breast stage in the breast cancer D 3:detection of malignant stage in the cancer docthestagelevelsightingcellmalignantinofbreastdetectioncancer D11(0)01(.477)0 1(.176)1(0) 01(.176)0 D21(0)1(.176)01(.477)001(0) 2(.477)01(.176) D31(0)1(.176)000 1(0) 01(.176) Q00000 0010 VSM Example IDTERMDFIDF 1the30 2stage2.176 3level1.477 4sighting1.477 5cell1.477 6malignant1.176 7in30 8of30 9breast1.477 10detection2.176 11Cancer2.176

18 Example Continued… Keyword tf * idf


Download ppt "Please have a seat. Our program will commence shortly."

Similar presentations


Ads by Google