Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Information Retrieval from Bioinformatics Websites Kang Peng.

Similar presentations


Presentation on theme: "Automatic Information Retrieval from Bioinformatics Websites Kang Peng."— Presentation transcript:

1 Automatic Information Retrieval from Bioinformatics Websites Kang Peng

2 Introduction Bioinformatics Databases –SwissProt & TrEMBL (http://us.expasy.org/sprot/)http://us.expasy.org/sprot/ –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)http://scop.mrc-lmb.cam.ac.uk/scop/ –InterPro (http://www.ebi.ac.uk/interpro/index.html)http://www.ebi.ac.uk/interpro/index.html Bioinformatics Sequence analysis tools –NCBI BLAST (http://www.ncbi.nlm.nih.gov/BLAST/)http://www.ncbi.nlm.nih.gov/BLAST/ –PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/)http://bioinf.cs.ucl.ac.uk/psipred/ –MEME( http://meme.sdsc.edu/meme/website/meme.html ) http://meme.sdsc.edu/meme/website/meme.html Integrated Sites –NCBI (http://www.ncbi.nlm.nih.gov/)http://www.ncbi.nlm.nih.gov/ –EBI (http://www.ebi.ac.uk/index.html)http://www.ebi.ac.uk/index.html

3 The Problem How to retrieval information automatically or in batch mode? –Retrieve 10,000 protein sequences from SwissProt based on the access IDs –Make secondary structure predictions for 10,000 protein sequences through the PSIPRED site –Find known InterPro patterns(motifs) in the 10,000 protein sequences through the InterPro site –… Some websites has provided mechanisms for batch retrieval, but not all

4 Solution First check if the website has already provided what you want!!! –If so, why not use it?? Study how the web browser interacts with the server Write a program that simulates the web browser to communicate with the web server. –Now we have full control, so we can do whatever we want …

5 The World Wide Web Web Server Client (Browser) Internet (TCP/IP) HTTP RequestHTTP Response (demo.htm) GET http://astro.temple.edu/~kangpeng/demo.htm A simple web page Hello, guys! GET

6 The World Wide Web Web Server (CGI, ASP, JSP, …) Client (Browser) Internet (TCP/IP) HTTP RequestHTTP Response (demo.htm) GET http://astro.temple.edu/~kangpeng/demo.htm A simple web page Hello, guys! GET DB/Application Server DB

7 Common Gateway Interface (CGI) A CGI program –Runs on the web server –Takes inputs by web browser user –Can query DBs, run sequence analysis tools, etc. –Can convert its output into HTML files –Can be written in ASP, JSP, PHP, Perl, even C++

8 HTML FORM To input parameters and/or data for CGI programs Parameters/data is encoded and sent –As “ name=value ” pairs –In the URL – GET method example: http://us.expasy.org/cgi-bin/prosite-search-fulhttp://us.expasy.org/cgi-bin/prosite-search-ful –In HTTP request body - POST method example: http://us.expasy.org/tools/protparam.htmlhttp://us.expasy.org/tools/protparam.html Let ’ s examine some examples now …

9 Automatic Information Retrieval We need identify following from the FORM section in the web page (by reading the HTML source file) –The URL of the CGI program –Data/parameters for the CGI program –Data encoding and request method: in URL (GET) or in HTTP request body (POST)?

10 Automatic Information Retrieval Figure out the interaction process between the browser and the server – it could take several steps! An example: retrieving the secondary structure information for a protein in PDB (http://www.rcsb.org/pdb/)http://www.rcsb.org/pdb/ Now we can write a program that simulates the web browser to communicate with the web server.

11 Implementation Any languages that provide convenient programming interfaces to the HTTP protocol –Visual Basic: Winsock control, Internet Transfer Control –Visual C++: WinInet API (for C), WinInet MFC classes (for C++) –Java: java.net.URLConnection, java.net.URL –Linux/Unix shell: lynx –Perl: ??? –…

12 Examples Please check my webpage at http://astro.temple.edu/~kangpeng (Not available until Monday)

13 Thank You!


Download ppt "Automatic Information Retrieval from Bioinformatics Websites Kang Peng."

Similar presentations


Ads by Google