Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.

Similar presentations


Presentation on theme: "Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion."— Presentation transcript:

1 Sequencing Data Quality Saulo Aflitos

2 Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion Assembly - Concepts

3 Scaffold (≈ 2Mbp) Paired-End Mate-Pair LowComplexityRegion Pseudo Molecule (Super Scaffold) Scaffolding

4 Assembly

5 Repeats?! Scaffolding

6 Goldberg SMD et al. 2006 1x 3x2x 3x 1x Consensus Reads Contig Depth of Coverage Reality

7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA NAAACGTACGTAAAANAAACGTACGTAAAA A/C A C 95% ±550% ±10 Heterozygozity

8 50.37 265.89 48.65 41.61 57.60 Raw Filtered Consequences of Data Cleaning

9 Sequencing Shotgun RNAseq

10 Sequencing Paired End Mate Pair

11 Shred Size Selection Adapter Sequencing Genome Ultrasound Physical RE Gel Beads ID Binding to Surface Circularization Illumina 454 PacBio Sample Preparation

12 Shredding

13 Size Selection

14 100bp Insert Size 150bp-2Kbp Illumina PE Read Length Sequencing

15 Insert Size 2K-20Kbp Read Length 500bp 454 MP 150bp Sequencing

16 Data

17 Machine Name Read ID (unique) Encoded Quality 0-40 Chance of being wrong FastQ

18 FastQ Format

19 13 0.05 5% FastQ Statistics

20 Cleaning

21 Sequence duplication Per base N-content Per base GC content Per base sequence quality Per sequence quality Sequence length distribution Per base sequence content Contamination screen fastq screen Per sequence GC content FastQC Quality Checking Tool

22 SolexaQA Cleaning Tool

23

24

25

26 Exercise Create “cleaning” folder – mkdir cleaning; cd cleaning Inside it, run: wget -O saulo.bash http://goo.gl/Tx8g6http://goo.gl/Tx8g6 Run it with: bash saulo.bash This will download FastQC and SolexaQA – FASTQC HELP : http://goo.gl/EE8M7http://goo.gl/EE8M7 – FASTQC TUTORIAL: http://goo.gl/rihyAhttp://goo.gl/rihyA – FASTQC MANUAL : http://goo.gl/9yihChttp://goo.gl/9yihC – SolexaQA Help : http://solexaqa.sourceforge.net/http://solexaqa.sourceforge.net/ Run FastQC:./FastQC/fastqc & File > open [Files of Type = FastQ files]

27 Exercise Verify the two.fq files (you can use less ): – bad_MiSeq_dataset.fq – good_MiSeq_dataset.fq Clean the bad dataset with SolexaQA’s DynamicTrim.pl script: – perl SolexaQA_v.2.1/DynamicTrim.pl ► bad_MiSeq_dataset.fq -h 25 Verify the improvement (or not) by opening – bad_MiSeq_dataset.fq.trimmed

28 ?


Download ppt "Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion."

Similar presentations


Ads by Google