Week-6: Genomics Browsers

Slides:



Advertisements
Similar presentations
Introduction to genomes & genome browsers
Advertisements

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Genomics, Genetics and Biochemistry
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
ECE 501 Introduction to BME
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
RNA Ribonucleic Acid.
Gene Structure and Identification
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Organization of the human genome Genome structure Nuclear vs. mitochondrial genomes Gene families Transposable elements Other repeated sequences.
Eukaryotic Gene Expression The “More Complex” Genome.
Human Genetics The Human Genome 1.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Chapter 10 genome, gene expression; genes as units of inheritance transmission of heritable characteristics; gene regulation, eukaryote chromosomes, alleles.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Sackler Medical School
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Mark D. Adams Dept. of Genetics 9/10/04
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
11 Gene function: genes in action. Sea in the blood Various kinds of haemoglobin are found in red blood cells. Each kind of haemoglobin consists of four.
How many genes are there?
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Accessing and visualizing genomics data
Chromosomes Genes Where do you find DNA? All mature cells contain DNA except the red blood cells DNA is found in the nucleus Small amounts of DNA are.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genetics 3.1 Genes. Essential Idea: Every living organism inherits a blueprint for life from its parents.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Gene structure and function
The genome of prokaryotes and eukaryotes- nuclear and extranuclear genetic organization.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Chromosome Organization & Molecular Structure. Chromosomes & Genomes Chromosomes complexes of DNA & proteins – chromatin Viral – linear, circular; DNA.
Chapter 13- RNA and Protein Synthesis
Molecular Genetics Transcription & Translation
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
Organization of the human genome
Genetic Testing for the Clinician
Chapter 5 The Content of the Genome
Human Cells Human genomics
School of Pharmacy, University of Nizwa
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Today… Review a few items from last class
Genomes and Their Evolution
Organization of the human genome
Chapter 9 Organization of the Human Genome
Genomic alterations in breast cancer cell line MDA-MB-231.
Chapter 6 Genome Sequences and Gene Numbers
Novel approach to genetic analysis and results in 3000 hemophilia patients enrolled in the My Life, Our Future initiative by Jill M. Johnsen, Shelley N.
School of Pharmacy, University of Nizwa
From Mendel to Genomics
The Content of the Genome
Genome Annotation and the Human Genome
BF528 - Whole Genome Sequencing and Genomic Variation
The Content of the Genome
Presentation transcript:

Week-6: Genomics Browsers Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-6: Genomics Browsers (UCSC, IGV and ExAC) Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics and Systems Biology Core University of Nebraska Medical Center

Terminology Genome Typically the nuclear genome in eukaryotes or the only genome in prokaryotes Extra-nuclear genome Mitochondrial and chloroplast genomes Metagenome A mixture of genomes belonging to multiple species that are not fully characterized Epigenome The characteristics of the genome that effects gene expression, such as chromatin packing, methylation, etc. Pangenome The union of the gene sets of all the strains of a species, typically applied to prokaryotes (like the pangenome of E. coli) Human Microbiome (microbe metagenome) The set of all microbial genomes that harbor human body

Genome sizes of species in the evolutionary spectrum

Human Karyotype

Statistics on Human Genome Haploid nuclear genome size (3.0 x 109 ) Female-3,227 Mbp; Male-3,122 Mbp Chromosomes: 1-22, X, Y, all linear Highly conserved regions Coding DNA covers about 30 Mbp (1%) Other regulatory regions cover about 100 Mbp (3%) Repetitive DNA covers more than 50% Segmental duplication: more than 5% Endogenous retroviral genomes (ERVs): 5-8% (inherited) Other associated genomes Mitochondrial genome: about 16.5 Kbp, circular genome Viral genomes (transfected exogenous Retroviruses) Microbiome (~3,000 microbes are estimated to harbor human body)

Statistics on Human Exome Some exome capture kits include protein coding regions as well as the flanking untranslated regions (5’ UTR and 3’ UTR) Exome studies usually include all the protein coding regions covering about 30 Mbp of DNA (~1%) Human genome has approximately 180,000 exons An estimated 85% of the disease causing mutations exist on exons; hence, clinical sequencing heavily targets exome sequencing On average there are 9 exons per gene, but the number varies by gene length, which ranges from 1-363. The Titin gene (TTN) has 363 exons. Average exon length is about 122 bp Exons with 3’ UTRs are considerably longer

Statistics on Human Genes/Proteins About 25K genes code for about 100,000 proteins in human Not all expressed at the same time or at the same location Mitochondrial genes: 37 (code for 22 tRNAs, 13 proteins and 2 rRNAs) Retroviral proteins About 3.5 million genes encoded by about 3000 microbiome flora Oral microbiome, gut microbiome, etc. Coding genes: 20,338 (source Ensembl) Pseudogenes: 14,638 Gene transcripts: 200,000 For up-to-date stats on human genome, click below https://www.ensembl.org/Homo_sapiens/Info/Annotation

Genome browsers: UCSC, IGV, and ExAC Demonstrator: Adam Cornish Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center

UCSC Genome Browser Online tool developed at the University of Santa Cruz California used to provide context and annotations to dozens of genomes. Examples of annotations include: Sequence Genes Phenotypes mRNA Expression Regulation Sequence conservation and Variation

UCSC Genome Browser

UCSC Genome Browser Select your species to the left Select the appropriate assembly up above. Enter the gene you want to investigate We’re using histone subunit h3: HIST1H3A

UCSC Genome Browser: HIST1H3A

UCSC Genome Browser: HIST1H3A Genome location

UCSC Genome Browser: HIST1H3A Gene ID and amino acid sequence

UCSC Genome Browser: HIST1H3A Gene expression across 53 different tissue types will be found here

UCSC Genome Browser: HIST1H3A H3K27Ac marks can be found here from 7 different cell lines

UCSC Genome Browser: HIST1H3A DNase I hypersensitivity clusters

UCSC Genome Browser: HIST1H3A DNA conservation across 100 vertebrates

UCSC Genome Browser: HIST1H3A Amino acid conservation across different species

UCSC Genome Browser: HIST1H3A Known SNPs as found in dbSNP Green = synonymous Red = missense or splice variants Black = intronic Blue = UTR

UCSC Genome Browser: HIST1H3A Known repetitive or low complexity regions

UCSC Genome Browser: Link: https://genome.ucsc.edu

Next Generation Sequencing (NGS) Overview A. NGS library is prepared by fragmenting a gDNA sample and ligating specialized adapters to both fragment ends. B. The library is loaded on to a flow cell and the fragments are hybridized. Each fragment is amplified into a clonal cluster.

Next Generation Sequencing (NGS) Overview C. Fluorescently labeled nucleotides are successively flowed and some are incorporated. Each cluster is imaged to detect which nucleotide was added to the cluster. This is repeated for each basepair and generates our Reads. D. Reads are aligned to a reference genome. After alignment, differences between the sample and the reference can be identified.

Integrative Genomics Viewer (IGV) Software developed at the Broad Institute to more easily view: Next Generation Sequencing data Array-based data Genome annotations Variant data (Single Nucleotide Polymorphisms, insertions, deletions, copy number variations, etc.) GWAS data and more! Talk about how abundant sequencing data has become and how it’s only going to grow larger. http://software.broadinstitute.org/software/igv/igv2.3

Integrative Genomics Viewer Software developed at the Broad Institute to more easily view: Next Generation Sequencing data Array-based data Genome annotations Variant data (Single Nucleotide Polymorphisms, insertions, deletions, copy number variations, etc.) GWAS data and more! Talk about how abundant sequencing data has become and how it’s only going to grow larger.

Integrative Genomics Viewer Talk about how abundant sequencing data has become and how it’s only going to grow larger.

Integrative Genomics Viewer Link to IGV: https://goo.gl/PBamc1 Link to dataset: https://goo.gl/hXfdCx Talk about how abundant sequencing data has become and how it’s only going to grow larger.

ExAC Browser Link: http://exac.broadinstitute.org The Exome Aggregation Consortium (ExAC) is a collaboration headed by The Broad Institute Online browser Human-specific Currently is an aggregation of exomes acquired from 60,706 unrelated individuals of varying states of health with the exception of severe pediatric diseases Contains allele frequency information for all variants identified in these exomes Talk about how abundant sequencing data has become and how it’s only going to grow larger.

ExAC Browser Talk about how abundant sequencing data has become and how it’s only going to grow larger.

ExAC Browser Link: http://exac.broadinstitute.org Talk about how abundant sequencing data has become and how it’s only going to grow larger.

When to use which viewer? UCSC Genome Browser: Looking up annotation for genes/regions of interest and you have a small number of small files IGV: Loading large datasets (> ~100mb) and a large number of samples ExAC Browser: Looking up contextual information for NGS data such as allele frequencies