Download presentation
Presentation is loading. Please wait.
Published byAubrey Tate Modified over 9 years ago
1
Click to edit Master title style Irys data analysis January 10 th, 2014
2
Irys Workflow – Data Analysis Genome Map (.cmap) Single molecule maps (.bnx) Sample Anchoring (.xmap) irys ™ ICSirysView ™ Image processing Short NGS Contigs RefSeq Reference RefSeq Reference Genome Map 2 Structural variation detection Sequence Assembly Validation Sequence contig scaffolding Integration Analysis Scanning Sequence scaffolding without de novo assembly Using a reference (eg hg19) Using a second genome map Using NGS contigs Using a reference (eg hg19) Using a second genome map Using NGS contigs Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Mapping based variant calling Two color applications: epigenetics, DNA damage Assembly
3
workshops De novo assembly (Using irysview (Alex); Python/command line – Heng/Ernest) SV detection – Warren/Andy
4
Core workflow: Data QC: basic molecule stats 4
5
Core workflow: Data QC: molecule quality report 5 Always consider the mapping rate with respect to the stringency setting Mapping rate helps us estimate the useful coverage depth as well as data quality
6
Stretch normalization Evaporation (increasing [salt]) during the scanning prolonged of version 2 chips results in shortening of molecules in nanochannels. This can be corrected for by measuring the average stretch in each scan and correcting with a normalization factor. Determining average stretch: –Internal ruler based normalization –Reference mapping based normalization 6
7
Core workflow: De novo assembly: optArg 7 From molecule quality report and.err file p value based on genome size or as stringent as possible Stringencies vary based on step
8
No reference? With no reference, we can run a de novo assembly based on expectations and data QC observations: –Expected genome size –Site density (in silico) –Label density (empirical) –Molecule n50 (empirical) Run de novo assembly (relaxed) Use the result of the de novo assembly to run molecule quality report Update error characteristics (stretch normalization) and rerun de novo assembly 8
9
De novo assembly QC 9 We started with 1.8Gb (>100kb) that mapped at 40%. We had a good quality reference so we expect to use ~0.8Gb. Genome has 14 chromosomes Expected size is 20Mb Map n50 is good, we may be able to further improve it with additional depth or optimized sample prep
10
De novo assembly QC 10
11
De novo assembly QC 11 Higher stringency assembly The higher stringency assembly misses some of the genome but resolves the chimera
12
Click to edit Master title style Applications: Sequence anchoring 12
13
12 Mb Streptomyces Genome Assembly with Various Technologies Total MbContigsN50 (kb) 9.0812492 11.3897154 11.6320918 11.87111,870 DNA sequence scaffolding BioNano Genomics NGS + Cosmids Short-Read NGS Only 3 rd -Gen Reads
14
Sequence anchoring Illumina + cosmids: 11.38Mb, 97 contigs, n50 length: 154kb, 11.38Mb anchored Illumina: 9.08Mb, 124 contigs, n50 length: 92kb, 8.9Mb anchored Pac Bio: 11.63Mb, 20 contigs, n50 length: 918kb, 11.63Mb anchored 1 Mb Validate sequence assembly Find errors Scaffold/Orient/Size gaps Output FASTA or AGP (soon)
15
Click to edit Master title style Applications: Structural variation 15
16
Structural Variation-Insertion/Deletion Calls (vs hg19) 95 regions in BioNano GenomeMaps correspond to N-based gaps in hg19 (not included in graph). The gaps may contain repeats and polymorphic regions, where SV enriches.
17
Structural Variant Examples: Insertions and Deletions Genome Map hg19 Molecules +4.9kb Genome Map hg19 Molecules -176,265 kb #h SmapEntryIDQryContigIDRefcontigID1RefcontigID2 QryStartPos QryEndPos RefStartPos RefEndPosOrientationConfidenceType #f intint float stringfloatstring net size 128266 1,483,278 1,488,217 75,697,428 75,878,632+ delete 176,265 4.9 kb region181.2 kb region #h SmapEntryIDQryContigIDRefcontigID1RefcontigID2 QryStartPos QryEndPos RefStartPos RefEndPosOrientationConfidenceType #f intint float stringfloatstring net size 457766 1,093,571 1,111,027 13,122,638 13,135,195+ insert 4,899 17.5 kb region12.6 kb region
18
workshops De novo assembly (Using irysview (Alex); Using Python/command line – Heng/Ernest) –OptArg- iterations, stringencies, merging, ref mapping –Output.err file Alignref Visualization of genome maps to molecules Identification of chimeras SV detection – Warren/Andy –Explain the SV detection application (consider IP issues) –Discuss stringency parameters –Show resulting table ranges explain types
19
19
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.