Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes.

Slides:



Advertisements
Similar presentations
Maize Genetics, Genomics, Bioinformatics workshop
Advertisements

SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
Cloning lab results Cloning the human genome Physical map of the chromosomes Genome sequencing Integrating physical and recombination maps Polymorphic.
9 Genomics and Beyond Brief Chapter Outline
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
CISC667, F05, Lec4, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Whole genome sequencing Mapping & Assembly.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Restriction Mapping of Plasmid DNA. Restriction Maps Restriction enzymes can be used to construct maps of plasmid DNA Restriction enzymes can be used.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Reading the Blueprint of Life
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Genomics BIT 220 Chapter 21.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey Chapter 3 Fundamentals of Mapping and Sequencing Basic principles.
O PTICAL M APPING AS A M ETHOD OF W HOLE G ENOME A NALYSIS M AY 4, 2009 C OURSE : 22M:151 P RESENTED BY : A USTIN J. R AMME.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
The Changing Face of Sequencing
Theobroma cacao Integrated Physical and Genetic Map 2 BAC Libraries 250 Genetic Markers.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
DNA Fingerprinting Project Lead the Way Human Body Systems.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
Human Genome.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
Locating and sequencing genes
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Genome Analysis Assaad text book slides only Lectures by F. Assaad can be downlaoded from muenchen.de/~farhah/index.htm.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
KEY CONCEPT DNA sequences of organisms can be changed.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Objectives: Outline the steps involved in sequencing the genome of an organism. Outline how gene sequencing allows for genome wide comparisons between.
Radiation hybrid map of the zebrafish genome
Midterm Breakdown Part I Part II: Part III : 8 calculations
Karyotypes and DNA Fingerprinting
Research in Computational Molecular Biology , Vol (2008)
Pre-genomic era: finding your own clones
the manipulation of living organisms for human use Chapter 13
Biology, 9th ed,Sylvia Mader
Scientists use several techniques to manipulate DNA.
Primers 2 primers are required for exponential amplification
The student is expected to: (6H) describe how techniques such as DNA fingerprinting, genetic modifications, and chromosomal analysis are used to study.
Lecture 9 Genome Mapping By Ms. Shumaila Azam
Human Epidermal Differentiation Complex in a Single 2
Mass and Molar Ratios of DNA
Biology, 9th ed,Sylvia Mader
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Presentation transcript:

Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes

How to get a genomic library: Breaking the DNA, cloning the fragments, and ordering 1,...,6 Cloned DNA Fragments Cleavage site Let us cut the isolated DNA with a restriction enzyme taken at a low concentration many sites will remain unrestricted

Marker every fifth lane Marra et al., Genome Res., 7, (1997) 96 samples, 25 marker lanes BAC Fingerprinting: Gel-based Fragment Separation

 Hamming distance H(A,B) =  |A i – B i | (mutual overlap) A: B: i=1i=1 n n  Probability that at least one fragment will be shared by chance between clones A and B: p = 1- (1- 1/t) m (t=L/2R - number of bins on gel length L; R - resolution). Distance functions Clones as math vectors: A B Limited fingerpinting resolution  bands shared by chance                  

Genome physical mapping problems are computationally challenging “ … We have been looking at the assemblies of large genomes … and for every ‘draft’ genome we look at, we find hundreds - and sometimes thousands - of mis-assemblies ”. Salzberg & Yorke (2005) Beware of mis-assembled genomes. Bioinformatics, 21:

Bioinformatics and Human Factors  Reading the scores  Clustering (contig assembly)  Ordering the clusters  Merging contigs  Anchoring (getting genetic and physical maps together)  Verification of mapping results (at each stage) Which factors may affect the quality of physical map ? Where bioinformatics can help ?

“Mapping” means “positioning” based on some distance The major mapping steps Fingerprinted clones, C k k=1,…, Distances d ij for (C i, C j ) shared bands Clustering (high stringency) Ordering (high stringency) Merging (lower stringency) Anchoring and verification

P-value of clone overlaps Sulston score (Sulston et al., 1988): p = 1-(1-1/N) n(c2) is the probability of random incidence of two bands; n(c) – number of bands in clone c; N – total number of distinguishable bands

Approximation of the exact model of random clone overlap IoE approximation Wendl’s exact theory (J. Com. Biol. 2005, 12: )

Band abundances: Unexploited source to improve mapping quality 3B

Varying cutoff: increasing rather than decreasing stringency protected clusters Adaptive Clustering

Network representation of signific ant clone overlaps vertices correspond to clones and edges – to significant clone overlaps

clones clones from the selected diametric path (MTP) wheat 1B Network representation o f significant clone overlaps 13

Identification of putative Q-clones and Q-overlaps

Identification of contig non-linearity diam Wheat 1BS Ctg13 width Width >1 is diagnostic for a non-linear cluster Using net of significant clone overlaps to find diametric path and calculate width o f the net 15

Diametric path: Calculate ranks r j =r j (c i ) for all clones c j relative to clone c i (through significant clone overlaps). Diametric path (  MTP) is the shortes t path through significant clone overla ps connecting clones c i and c j with ma ximal r j (c i ). Width of net: maximal rank relative to diametric path Width >1  non-linear cluster Identification of contig non-linearity

Identification of contig non-linearity Example with Q-clone: 17

Using net of significant clone overlaps, for each clo ne c i calculate ranks r ij for all clones c j. Diametric path: for pair of clones with maximal r ij id entify the shortest path through significant clone ov erlaps MTP Width of net: maximal rank relative to diametric path Width >1 is diagnostic for a non-linear cluster PAG Identification of contig non-linearity

“Linearization” by removing clones in cluster branching

Reducing genome mapping (linear ordering) problems to traveler salesman problem (TSP) Order 1: a b c d e f g h k l m n l 1 Order 2: b a c d e f g h k l m n l 2 ……… Order N: f c m h e a g n k l b d l N n=60 N =60!/2 ~ orders The problem How to chose the best (true) order, i.e., the one that gives the map of minimal length? A B C D EF G H … a b c d e f g h … a b c d e f g h i j k

Example: A Contig

Re-sampling based order verification Excluding parallel clones allows constructing a stable "skeleton" map and specifying coordinates of all clones relative to this map.

Testing the FPC contigs by using LTC wheat 1B

Testing the FPC contigs by using LTC wheat 1B

Wheat 1B: Some of FPC contigs have non-linear to pological structure inconsistent with chromosome li near structure : Q - clones ? Testing the FPC contigs by using LTC

Edges represent the significant overlaps (with cutoff e-25 of Sulston score). Increasing the stringency up to 1e-75 does not help here in gettingnon-trivial linearization! Ctg2 FPC contigs with non-linear topology, and even cycles Testing the FPC contigs by using LTC

Problematic contigs (simulated maize)

Xuhw258 Xuhiuw264Xuhiuw265 Xuhw259 Xuhw264-5-T7 Xuhw264-3-T7 Xuhw T7 Yr15 #3 #28 #4 #5 #6 #7 Brachypodium synteny-based markers French clones-based markers 450 Kb ?