Download presentation
Presentation is loading. Please wait.
Published byJemimah Emmeline Thornton Modified over 9 years ago
1
Use of Computers in Molecular Biology
Meena K Sakharkar Training Manager, BioInformatics Centre National University of Singapore
2
What is BioInformatics?
Many related terms and buzzwords A multiplicity of names: bioinformatics biocomputing biological computing computational biology computational genomics biological data mining
3
Overview of the challenges of Molecular Biology Computing
The huge dataset problem automated DNA sequencers the Human Genome Project bulk sequencing of cDNAs (ESTs)
4
GenBank Growth Chart Bases Year
As of Oct. 1999, GenBank contains over 3.8 billion bases of DNA and protein sequence, which requires about 18 gigabytes of computer disk storage space.
5
Human Genome Project What is the Human Genome Project?
15-year effort formally begun in October coordinated by the U.S. Department of Energy and the National Institutes of Health. identify all the estimated 80,000 genes in human DNA, determine the sequences of the 3 billion chemical bases that make up human DNA, store this information in databases, develop tools for data analysis, and address the ethical, legal, and social issues (ELSI) that may arise from the project.
6
Who is head of the U.S. Human Genome Project?
The DOE Human Genome Program is directed by Ari Patrinos, and Francis Collins directs the NIH Human Genome Program. Ari Patrinos also heads the Department of Energy Office of Biological and Environmental Research.
7
Related fields origin of life genomics and proteomics
molecular evolution origin of life genomics and proteomics the Human Genome Project theoretical biology complexity and information theory biotechnology lead drug discovery computing with biomolecules
8
Our ( working) definition
Bioinformatics: the body of tools, algorithms and know-how needed to handle complex biological information the technological aspect Computational biology: the application of bioinformatics tools to perform biological studies the scientific aspect very broad and diverse field
9
Bioinformatics is clearly a multi disciplinary field including:
computer systems management networking, database design computer programming molecular biology
10
Integrating bioinformatics and computational biology:
A biologist can use existing tools but might misinterpret results The black-box effect - the 'software kit' A biologist might refrain from doing some interesting analysis if the existing software doesn't offer it as an option The ability to program is important A computer scientist or a programmer can produce interesting and/or efficient algorithms and tools, but these might lack biological relevance. A biological training/background is important Beware of the 'just a tool maker' stigma Best results are achieved by integrating the development of tools with their usage in interesting biological systems
11
A list of general issues addressed by BioInformatics:
mapping evolutionary studies sequencing statistical analysis of biopolymers sequence comparisons gene modeling DNA sequence -> DNA structure molecular modeling structure comparisons sequences/structures -> function gene expression and genetic/metabolic networks databasing visualisation and interaction comparing tools new technologies ethics
12
How to handle all the information?
Producing Processing Storing Sharing Querying Retrieving Visualising Annotating Curating
13
Use of Computers in Molecular Biology
Powerful tools to organise the data itself. Exponential growth. A new release is made every two months. Data Analysis. Retrieval. Homology Search. Modelling purposes - Drug Design Data Integration Data Visualisation
14
Paradigmatic Shift: Getting new sequences is now easy.
Having a new sequence, we can start by analysing it using the computer, or we can start by doing experimental work. "A month in the lab can often save an hour in the library." - Westheimer ... or searching the Internet, or doing computerised analyses. From 'wet lab' to 'soft lab'. in vivo, in vitro, and in silico
15
Information is being collected, organized, and made available:
GenBank is the central sequence information database in the United States Data is shared between GenBank and European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ) All sequence data submitted to any of these databases is automatically integrated into the others. Sequence data is also incorporated from the Genome Sequence Data Base (GSDB) and from patent applications.
16
Similarity Searching in the databanks
"Are there any sequences in the databanks similar to my sequence?" Directly searching the databanks by comparing sequences uses too much computer time The Biologist uses timesaving tools: FASTA and BLAST Relies on statistics and the informed judgement of the Biologist.
17
Pairwise and Multiple Alignments
Multiple Alignment is the basis for the study of protein families and functional domains. When pairwise alignment is expanded to multiple sequences, it becomes a computationally huge problem. To reduce the nearly infinite permutations, a simplified heuristic (approximate) algorithm is used known as progressive pariwise alignment
18
Structure-function relationships: Sequence patterns that predict function
Challenging areas of computational molecular biology is the prediction of the function of protein molecules from their sequence. Sequence determines 3-D structure, structure determines function Identify conserved regions (domains or motifs) Domain databases can be used to scan any unknown protein sequence
19
Searching Literature using PubMed at NCBI
20
PubMed Project by NIH and NLM.
Search Tool for accessing literature citations. PubMed Search system - MedLine and Pre Medline Database and Molecular Biology Databases indexed under Entrez.
21
MedLine MedLine - MEDlers OnLINE Database - NCBI’s premier bibliographic database. Covers medicine, nursing, dentistry, veterinary medicine, the health care sciences and pre-clinical sciences. Has over 3900 current biomedical journals published in the US and other foreign countries.
22
MedLine 9 million records. Since 1966.
23
PreMedLine Introduced in August 1996.
Basic Citation and abstracts before the full records are prepared and added to Medline.
29
MEDLINE SAMPLE RECORD UI 98408838 AU Tao X, Dafu D
TI Relationship between synonymous codon usage and protein structure. MH Codon* MH Protein Folding* MH Protein Structure, Secondary* MH Proteins / genetics …… AB The hypothesis that synonymous codon usage is related to protein three- dimensional structure is examined by … PT Journal article SO FEBS Lett 1998 Aug 28 : 434 (1- 2) : 93- 6
30
Medline Subject Index Medical Subject Headings (MeSH) Selected From ~ 17,000 Thesaurus Terms Average per article 3- 4 as major points of article = * in PubMed Most specific MeSH assigned for each concept Subheadings 88 Qualifiers used with the ~17,000 MeSH terms Subheadings focus on sub aspects of MeSH term ( e. g., Haemoglobins - chemistry )
31
MEDLINE Indexing MeSH Terms to LIMIT Retrieval
human, animal, male, female, age groups, organism, etc. Publication Types ( Another way to LIMIT ) review, clinical trial, letter, journal article, etc.
32
MEDLINE Subject Headings
Advantages of MeSH Terms Represent a subject concept & no term synonyms needed Find relevant articles on a search topic that may not be explicitly mentioned in a title or abstract Focus search & be specific to eliminate irrelevant records Increase search efficiency to save time … Get reliable results
33
Searching MEDLINE Subject Headings
Disadvantages of MeSH Thesaurus terms may not cover all concepts, esp. jargon Not every concept in abstract or article can get thesaurus terms
34
MEDLINE Subject Headings -- MeSH
MeSH THESAURUS .... Terms organized in conceptual hierarchies 15 broad categories Term hierarchy levels within … Broad to narrow Browse online MeSH thesaurus Find out information on search concept terms Select which terms best represent search concepts
35
MEDLINE Indexing MeSH Hierarchy -- Top Level A Anatomy B Organisms
C Diseases D Chemicals and Drugs G Biological Sciences, etc. -----
36
MEDLINE Indexing MeSH Term Hierarchy -- Subcategories
G Biological Sciences G1 Biological Sciences - -- - G4 Biological Phenomena, Cell Physiology, Immunity G5 Genetics G6 Biochemical Phenomena, Metabolism, Nutrition, etc.
37
MEDLINE Indexing MeSH Term Hierarchy
Where is “Amino Acid Sequence” in the MeSH Hierarchy? Genetics G5 Genetics, Biochemical G5. 331 Molecular Sequence Data G Amino Acid Sequence Peptide Library etc. Sequence Homology, Amino Acid Base Sequence Carbohydrate Sequence
38
MEDLINE Searching Search terms are combined with Boolean “OR” and “AND” .
39
MEDLINE Finding Subject Terms
Method 1 “Bootstrap” Search a particular concept (e. g., “protein sequence”) in the TITLE index. Next, check SUBJECT terms on records for the closest MeSH match = “amino acid sequence” Method 2 Use MeSH Thesaurus Browse online MeSH Thesaurus to explore possible terms to use. Select the best terms to cover your search concepts. Decide whether to “explode” or not.
40
MEDLINE Search Strategy
If no MeSH term exists (OR, if you found too few records) Consider free- text searching Free- text searching … Need term synonyms Consider term truncation ( e. g., use * in PubMed )
41
Modifying Retrieval -- - NOT ENOUGH Found
Reduce number of concepts to combine Add synonyms or related terms Use both free- text words & MeSH terms Truncate free- text words as appropriate Explode subject term, if it has narrower terms Do NOT use limits ( e. g., major point, review ) Consult a professional searcher … Librarian
42
Modifying Retrieval --- TOO Many Found
Use MeSH terms only … Use no free- text words Use “MeSH Power” to Focus Your Search Try a more specific MeSH term Limit MeSH terms to MAJOR point of article Use a Subheading with your MeSH term Reduce number of synonyms, if free- text searching Add additional concepts to your search Use Limits … English language, reviews Restrict to human, animal, or organism
43
Internet Tools and Searches
44
Network Utilities
45
What is the Internet? A world wide collection of networks of computers
A network of computer networks A network based on the TCP/IP protocol
46
Standalone Computer PC Printer A typical setup at home Speakers
47
LAN A Small Local Area Network of two computers and one printer
in your office
48
Inter-Departmental Network
49
Campus Wide Network
50
Campus Network Wide Area Network National Network InterCountry Network Global Network The INTERNET
51
What can you do with Internet?
INTERNET APPLICATIONS Electronic Mail ( ) Internet Talk/Chat (IRC) File Transfer (FTP) Remote Login (Telnet) Internet News (Usenet) Info retrieval (Gopher, World Wide Web) AudioVideo Conferencing (CU-SeeMe, Mbone) Internet Phone
52
FTP: File Transfer Protocol
ftp ncbi.nlm.nih.gov login: anonymous passwd: address If you want to ftp from a server then use your own login and passwd
53
Ftp commands continued…..
cd - change directory ls - listing pwd - present working directory bin - transfer in binary mode asc - transfer in ascii mode hash - show the transfer. lcd - local change directory
54
FTP commands continued..
prompt - multiple file tranfer mget - multiple file tranfer else you can just use get mput - put multiple files onto the server put - single file transfer
55
Telnet Work on another machine by remote login.
Telnet intron.bic.nus.edu.sg login: passwd: Must have an account on the machine for doing telnet Must have internet connection Space allocated to you on the machine
56
HTML- an Introduction
57
What is Hypertext? Non-Linear Text Links embedded in the text
Jumps to other locations in the document/db Fence ...... the quick brown fox jumps over the fence
58
Creating a Web Page Terms to Know WWW/Web: World Wide Web
HTML: Hyper Text Mark-up Language URL: Uniform Resource Locator I assume that: know how to use Netscape or some other Web browser have access to a Web server (or that you want to produce HTML documents for personal use in local-viewing mode)
59
Creating a Web Page What an HTML Document Is?
Collection of styles HTML documents are plain-text files Can be created using any text editor You can also use word-processing software if you remember to save your document as "text only with line breaks." HTML is not case sensitive. TAGS are used to mark the element of the file for your browser.
60
Creating a Web Page TAGS Explained
Every HTML document should contain certain standard HTML tags. Each document consists of head and body tags. The head contains the title, and the body contains the actual text that is made up of paragraphs, lists, and other elements.
61
<html> <head> <TITLE>A Simple HTML Example</TITLE> </head> <body> <H1>HTML is Easy To Learn</H1> <P>Welcome to the world of HTML. This is the first paragraph. While short it is still a paragraph!</P> <P>And this is the second paragraph.</P> </body> </html> The required elements are the <html>, <head>, <title>, and <body> tags (and their corresponding end tags). Note: Because you should include these tags in each file, you might want to create a template file with them.
62
TAGS Explained HTML: HEAD:
This element tells your browser that the file contains HTML-coded information. The file extension .html also indicates this an HTML document and must be used. HEAD: The head element identifies the first part of your HTML-coded document that contains the title.
63
TITLE The title element contains your document title and identifies its content in a global context. BODY Contains the content of your document. HEADINGS HTML has six levels of headings, numbered 1 through 6. With 1 being the most prominent. Headings are displayed in larger and/or bolder fonts than normal body text. The syntax of the heading element is: <Hy>Text of heading </Hy> where y is a number between 1 and 6 specifying the level of the heading. PARAGRAPHS Carriage returns in HTML files aren't significant. Word wrapping can occur at any point in your source file, and multiple spaces are collapsed into a single space by your browser. The </P> closing tag can be omitted. This is because browsers understand that when they encounter a <P> tag, it implies that there is an end to the previous paragraph.
64
Using the <P> and </P> as a paragraph container means that you can center a paragraph by including the ALIGN=alignment attribute in your source file. <P ALIGN=CENTER> This is a centered paragraph. [See the formatted version below.] </P>
65
Lists Unnumbered Lists The output is:
HTML supports unnumbered, numbered, and definition lists. You can nest lists too, but use this feature sparingly because too many nested items can get difficult to follow. Unnumbered Lists To make an unnumbered, bulleted list, 1.start with an opening list <UL> (for unnumbered list) tag 2.enter the <LI> (list item) tag followed by the individual item; no closing </LI> tag is needed 3.end the entire list with a closing list </UL> tag Below is a sample three-item list: <UL> <LI> apples <LI> bananas <LI> grapefruit </UL> The output is: apples bananas grapefruit
66
produces this formatted output:
Numbered Lists A numbered list (also called an ordered list, from which the tag name derives) is identical to an unnumbered list, except it uses <OL> instead of <UL>. The items are tagged using the same <LI> tag. The following HTML code: <OL> <LI> oranges <LI> peaches <LI> grapes </OL> produces this formatted output: 1.oranges 2.peaches 3.grapes
67
A definition list (coded as <DL>) usually consists of alternating a definition term (coded as <DT>) and a definition definition (coded as <DD>). Web browsers generally format the definition on a new line. The following is an example of a definition list: <DL> <DT> NCSA <DD> NCSA, the National Center for Supercomputing Applications, is located on the campus of the University of Illinois at Urbana-Champaign. <DT> Cornell Theory Center <DD> CTC is located on the campus of Cornell University in Ithaca, New York. </DL> The output looks like: NCSA NCSA, the National Center for Supercomputing Applications, is located on the campus of the University of Illinois at Urbana-Champaign. Cornell Theory Center CTC is located on the campus of Cornell University in Ithaca, New York.
68
Nested Lists Lists can be nested. You can also have a number of paragraphs, each containing a nested list, in a single list item. Here is a sample nested list: <UL> <LI> A few New England states: <LI> Vermont <LI> New Hampshire <LI> Maine </UL> <LI> Two Midwestern states: <LI> Michigan <LI> Indiana The nested list is displayed as A few New England states: Vermont New Hampshire Maine Two Midwestern states: Michigan Indiana
69
National Center for Supercomputing Applications<BR>
Forced Line Breaks/Postal Addresses The <BR> tag forces a line break with no extra (white) space between lines. Using <P> elements for short lines of text such as postal addresses results in unwanted additional white space. For example, with <BR>: National Center for Supercomputing Applications<BR> 605 East Springfield Avenue<BR> Champaign, Illinois <BR> The output is: National Center for Supercomputing Applications 605 East Springfield Avenue Champaign, Illinois Horizontal Rules The <HR> tag produces a horizontal line the width of the browser window. A horizontal rule is useful to separate sections of your document. For example, many people add a rule at the end of their text and before the <address> information. You can vary a rule's size (thickness) and width (the percentage of the window covered by the rule). Experiment with the settings until you are satisfied with the presentation. For example: <HR SIZE=4 WIDTH="50%"> displays as:
70
Physical Styles <B> bold text <I> italic text <TT> typewriter text, e.g. fixed-width font.
71
Linking Power - link text and/or image. Browser highlights the identified text or image with color and/or underlines to indicate that it is a hypertext link. HTML's single hypertext-related tag is <A>, which stands for anchor. To include an anchor in your document: 1.start the anchor with <A (include a space after the A) 2.specify the document you're linking to by entering the parameter HREF="filename" followed by a closing right angle bracket (>) 3.enter the text that will serve as the hypertext link in the current document 4.enter the ending anchor tag: </A> (no space is needed before the end anchor tag) Here is a sample hypertext reference in a file called US.html: <A HREF=" HomePage</A> This entry makes the words BIC HomePage the hyperlink to the document
72
You can make it easy for a reader to send electronic mail to a specific person or mail alias by including the mailto attribute in a hyperlink. The format is: <A For example, enter: <A Meena KS</a> to create a mail window that is already configured to open a mail window for the Meena KS . (You, of course, will enter another mail address!)
73
To include an inline image, enter:
<IMG SRC=ImageName> where ImageName is the URL of the image file. The syntax for <IMG SRC> URLs is identical to that used in an anchor HREF. If the image file is a GIF file, then the filename part of ImageName must end with .gif. Filenames of X Bitmap images must end with .xbm; JPEG image files must end with .jpg or .jpeg; and Portable Network Graphic files must end with .png. Image Size Attributes <IMG SRC=SelfPortrait.gif HEIGHT=100 WIDTH=65>
74
Demo:
75
Thank You
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.