Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Published byModified over 4 years ago
Presentation on theme: "Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :"— Presentation transcript:
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site : http://webcourse.cs.technion.ac.il/236523
Course Structure and Requirements 1.Class Structure 1.2 hours Lecture 2.1 hour tutorial 2. Home work Homework projects will be given every third week The homework will be done in pairs. 4/4 homework projects submitted 2.A final project will be conductedand submitted in pairs
Grading 30% Homework assignments 70% final project
Bioinformatics An approach to mine knowledge from biological data. A bunch of methods to ease biological research in the lab. Human catcgtagCTAGACTacgc Mouse ctagctgaCTAGACTatcg Dog tacctatcCTAGACTcgac Horse acctactcCTAGACTcgaa
Biological Databases Tutorial 1 http://www.ncbi.nlm.nih.gov / http://www.genome.ucsc.edu/ o DNA,RNA & Protein sequences o RNA & Prot. Structure o Gene Expression o Protein localization o Mutations o Similarity between species o Specie Specific database o Literature o Experimental support
Biological Sequences: RefSeq A comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. Genomic Sequences. known mRNAs. Predicted mRNAs: - Putative genes (homologue to known gene). - Orphan genes (look like ORFs but have no homologues). Known Proteins. Predicted Proteins (Putative & Orphan).
RefSeq Comprehensive, it covers a wide variety of sequences. Complete genomic molecules. Incomplete genomic regions. Transcript products. Protein products. Non-coding transcripts. Predicted Transcript products. Predicted Protein products. How to identify each kind of sequence?: accession numbers.
RefSeq Accession Number: A unique identifier given to a sequence Description Kind of sequence Example Complete genomic molecules (genomes, chromosomes, organelles, plasmids).DNANC_123456 Alternative Genomic Assembly.DNAAC_123456 Incomplete Genomic AssemblyDNANT_123456 Incomplete genomic regions.DNANG_123456 Transcript products; Mature mRNA protein-coding transcripts.RNANM_123456 Protein products; full-length products & partial proteins.ProteinNP_123456 Non-coding transcripts including tRNAs, rRNAs and others.RNANR_123456 Predicted Transcript products; model mRNA corresponding to the genomic contigs. RNAXM_123456 Predicted Protein products; model proteins corresponds to the genomic contigs. ProteinXP_123456 Complete Table:http://www.ncbi.nlm.nih.gov/RefSeq/key.html
ENTREZ Integrated, It is related to other databases through ENTREZ, A NCBI interface that connects between different Databases. RefSeq PubMed (Literature) GEO (Gene Expression) PDB (Protein Structure) Uni-Prot (Protein Sequences) GenBank (genomic data) OMIM (genetic disorders) ENTREZ: http://www.ncbi.nlm.nih.gov/Entrez/http://www.ncbi.nlm.nih.gov/Entrez/
Literature Sequences Disease Gene Expression Prot. Structure Similarity between species Experimental support Integrated database Entrez
RefSeq is non-redundant, each sequence is represented only once. But...What is redundancy in biological databases? Are two alleles of the same locus redundant? Are the same loci in two closely related organisms redundant? Are two gene copies redundant? It depends on the kind of database. In RefSeq two alleles from a same locus are considered redundant. In RefSeq two loci from closely related organisms are not redundat. In RefSeq two gene copies are not redundant. At last…
A Bioinformatic Navigator that concentrates information from various sources. It enables visualization of a big amount of information at the same time. Genome Browser