Presentation on theme: "Bioinformatics Computing Department Bioinformatics Government Post Graduate Collage, Mandain Abbottabad. Sajid Khan."— Presentation transcript:
Bioinformatics Computing Department Bioinformatics Government Post Graduate Collage, Mandain Abbottabad. Sajid Khan
BI-302 Bioinformatics Computing-I 3+1 Prerequisite: Programming Fundamentals Specific objectives of the course: This course aims to introduce the concepts of data representation, searching, security and ownership. Develop techniques for pattern matching, recognition and their applications in bioinformatics. Course Outline: Databases: Data management, networks, geographical scope, communications models, transmissions technology, protocols, bandwidth, topology, hardware, contents, security, ownership, implementation, Search engines. Search process, search engine technology, searching and information theory, computational methods, knowledge management, data, sequence and structure visualization, data mining methods and technology, pattern recognition and discovery, pattern matching, dot matrix analysis, substitution matrices, dynamic programming, Scripting.
Lab Outline: Simulation of various bioinformatics entities, application of various bioinformatics methods, scripting languages python, perl and PHP, and their applications in Bioinformatics. Recommended Books: Latest editions of following books 1. “Bioinformatics Computing” Bryan Borgeron, Pearson Education. 2. “Methods in Biotechnology and ioengineering”, Vyas S.P. and Kohli D.V.
Introduction Bioinformatics Bioinformatics Computing The Killer Application Issues of Personalized Drugs Parallel Universes Turing Machine/Model Information Theory Central Dogma Process of RNA synthesis RNA-Protein Codon Transcription Wheel. From Data to Knowledge
Bioinformatics is a science that involves collecting, manipulating, analyzing, and transmitting huge quantities of data, and uses computers whenever appropriate. Bioinformatics Computing combines practical insight for assessing bioinformatics technologies, practical guidance for using them effectively, and intelligent context for understanding their rapidly evolving roles.
The Killer Application The most commonly cited "killer app" of biotech is personalized medicine just-in- time delivery of medications (popularly called designer drugs") tailored to the patient's condition.
High throughput screening The use of affordable, computer-enabled microarray technology to determine the patient's genetic profile. The issue here is affordability, in that microarrays costs tens of thousands of dollars. Medically relevant information gathering Databases on gene expression, medical relevance of signs and symptoms, optimum therapy for given diseases, and references for the patient and clinician must be readily available. Custom drug synthesis The just-in-time synthesis of patient-specific drugs, based on the patient's medical condition and genetic profile, presents major technical as well as political, social, and legal hurdles.
Parallel Universes One of the major challenges faced by bioinformatacists is keeping up with the latest techniques and discoveries in both molecular biology and computing. Two key events in the late 1920s were Alexander Fleming's discovery of penicillin and Vannevar Bush's Product Integraph, a mechanical analog computer that could solve simple equations. In the 1930s, Alan Turing, the British mathematician, devised his Turing model, upon which all modern discrete computing is based.
Turing Machine/Model The Turing model defines the fundamental properties of a computing system: a finite program, a large database, and a deterministic, step-by-step mode of computation. The Turing Machine, which can simulate any computing system, consists of three basic elements: a control unit, a tape, and a read-write head. The read-write head moves along the tape and transmits information to and from the control unit.
Information Theory Shannon's model of a communications system includes five components: an information source, a transmitter, the medium, a receiver, and a destination. The amount of information that can be transferred from information source to destination is a function of the strength of the signal relative to that of the noise generated by the noise source.
Central Dogma The Central Dogma of Molecular Biology. DNA is transcribed to messenger RNA in the cell nucleus, which is in turn translated to protein in the cytoplasm. The Central Dogma, shown here from a structural perspective, can also be depicted from an information flow perspective.
PROCESS OF RNA SYNTHESIS Messenger RNA (mRNA) Synthesis. DNA is transcribed to nuclear RNA (nRNA) this is in turn processed to mature mRNA in the nucleus. Maturation involves discarding the junk nucleotide sequences (introns) that interrupt the sequences that will eventually be involved in translation (exons).
RNA-Protein Codon Transcription Wheel. The 64 possible codons represent the 20 common amino acids, as well as one start (ATG) and three stop (TAG, TAA and TGA) markers. Redundancies normally occur in the last nucleotide of the three-letter alphabet.
From Data to Knowledge In viewing the Central Dogma as an information flow process, it's useful to distinguish between data, knowledge, and metadata. For our purposes, the following definitions and concepts apply: Data are numbers or other identifiers derived from observation, experiment, or calculation. Information is data in context a collection of data and associated explanations, interpretations, and other material concerning a particular object, event, or process. Metadata is data about the context in which information is used, such as descriptive summaries and high-level categorization of data and information.
From Data to Knowledge Data (from direct observation) Patient Age: 5. Physical Exam Findings: Freckling in the armpits and masses on and just below the surface of the skin. Information (from a molecular disease database) Neurofibromatosis is a genetic disorder causing tumors to form on nerve tissue anywhere in the body. The pattern of inheritance is autosomal dominant. Metadata (from an online publications database) The incidence of neurofibromatosis is about one out of every 2,500 people worldwide. It is associated with difficulty seeing, hearing, and in some cases (NF2), with paralysis and early death. In contrast, type 1 (NF1) is more of a cosmetic disorder.
From Data to Knowledge Visualization of NF2 Gene on Chromosome 22. As it would appear through National Center for Biological Information's Web-based Map View program, the NF2 gene appears at position 22q12 on chromosome 22. Information (from an online molecular disease database) The NF2 gene has been mapped to chromosome 22 and is thought to be a so-called "tumor suppressor gene." Metadata (from an online publication database) The pattern of inheritance is autosomal dominant, caused by a spontaneous mutation in the egg or sperm before fertilization.
Convergence In automated gene sequencing, purified genomic or complementary DNA is first fragmented by restriction enzymes, and these fragments are separated by size on a gel. This is followed by isolating single fragments and using the Sanger chain- termination method to sequence each fragment individually with chain- terminating ddNTS (dideoxy nucleoside triphosphates) labeled with fluorochromes according to the base present. For example, a green fluorochome is typically used for Adenine (A), red for Tyrosine (T), blue for Cytosine (C), and yellow for Guanine (G).
Boolean Operators The AND and OR logic operators form the basis of digital computer operations. Digital Computer Physical Architecture (left) versus Abstraction (right). Process control applications emphasize input/output hardware (left), which corresponds to the peripherals abstraction (right).
Archiving Just as the transfer of data from DNA to RNA to protein relies on an information infrastructure, data archives rely on an information technology (IT) infrastructure. This IT infrastructure includes network and database technologies as well as standard vocabularies to store and access information. Numerical Processing Computers are recognized foremost for their computational or numerical-processing capabilities. In bioinformatics, applications for numerical-processing techniques range from sequence analysis, microarray data analysis, and site prediction to gene finding, protein structure prediction, and phylogenetic analysis. Rules-based systems IF First Codon = "T" AND Second Codon = "A" AND (Third Codon = "A" OR Third Codon ="G") THEN Codon = "Stop"
Artificial Neural Network This machine-learning technology relies on tightly interconnected input, hidden, and output layers to map input patterns to output patterns. One of many possible truth tables (right) illustrates the mapping of input to output patterns. Learning is signified by the thickness of lines joining nodes, and node values are indicated by color (white = 0 and black = 1). Hidden nodes can take on values between 0 and 1.
Communications As a communications system, the computer is like an asynchronous communications medium, typified by server-based . From a data-management perspective, computers are increasingly used as asynchronous communications devices. The difference between and a typical biological database is that contribution to biological databases such as GenBank are meant for others, but the identity of the recipients is unknown and largely unknowable by the sender. The communications of sequencing and protein structure information is hindered because of the lack of a standard format for creating and storing gene data, even within companies. Several contenders for the standard include Gene Expression Markup Language (GEML), based on the eXtensible Markup Language (XML), and Microarray Markup Language (MAML). The latter is based on collaboration between the National Center for Biotechnology Information, Stanford University, and the European Bioinformatics Institute.