Presentation is loading. Please wait.

Presentation is loading. Please wait.

NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.

Similar presentations


Presentation on theme: "NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM."— Presentation transcript:

1 NGS data format and General Quality Control

2 Data format “Flowchart” Sequencer raw data FastqSAM/BAM

3 Fastq file Used to record raw reads coming off the sequencers Each record contains four lines Parameters were usually set by the sequencer, such as read length

4 Fastq file

5 Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).FASTA Line 2 is the raw sequence letters. The read length is the length of the string. Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. http://en.wikipedia.org/wiki/FASTQ_format

6 General quality control of raw reads Using FASTQC – A tool that implements some general rules – Basic Statistics – Per base sequence quality – Per sequence quality scores – Per base sequence content – Per base GC content – Per sequence GC content – Per base N content – Sequence Length Distribution – Sequence Duplication Levels – Overrepresented sequences – Kmer Content

7 Quality scores

8 Perbase “N” percentage

9 Sample FASTQC reports Good quality : http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_ sequence_short_fastqc/fastqc_report.html http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_ sequence_short_fastqc/fastqc_report.html Bad quality: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_s equence_fastqc/fastqc_report.html http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_s equence_fastqc/fastqc_report.html

10 Data format “Flowchart” SequencerFastqSAM/BAM

11 SAM stands for Sequence Alignment Map BAM is the binary form of SAM Used for mapped/aligned reads Generated by NGS mapper/aligners

12 SAM

13 BAM


Download ppt "NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM."

Similar presentations


Ads by Google