Presentation is loading. Please wait.

Presentation is loading. Please wait.

Garbage In, Garbage Out: Quality control on sequence data

Similar presentations


Presentation on theme: "Garbage In, Garbage Out: Quality control on sequence data"— Presentation transcript:

1 Garbage In, Garbage Out: Quality control on sequence data

2 Key concepts of session
The quality of the data limits what you can confidently say about the data and how you can subsequently use it. An important component to quality control is visualization: you must actually LOOK at your data.

3 So you have reads off a sequencer … where do you start?
The fastQ format: More on the file format and quality encoding:

4 Expectation

5 But the reality may be very different

6 So what? Why does QC matter?
You are going to spend a LOT of time (and $) on this dataset. Downstream analysis software assumes pretty well behaved data!!

7 How to assess a bag of reads
Pre-mapping: FastQC GC content read quality (Phred score) Post-mapping: read coverage (which regions, how much) complexity (# unique samples)

8 Protocol matters – how the experiment influences your QC
Mistakes in protocol can result in abnormal distributions Poor read quality = poor mapping = poor coverage

9 WHY doesn’t it look like I wanted?
Cell clustering – over-amplification Low library complexity Problems with amplification or size selection Problem with adapters See also:

10 But one person’s garbage is another’s treasure.

11 You can still obtain information
Even low coverage samples can give you information: Which genes are being actively transcribed Differentially expressed genes (depending on depth and coverage)

12 Running FastQC – Pre-Trim
Determine which adapters are present if you are unsure of the protocol Assess whether sequencing/protocol providing the results expected Refine trimming options

13 In this script, we will: Flip reads (reverse complement) – protocol dependent Run FastQC To run (after adjusting parameters in green box): $ bash fastqc_pretrim.sh

14 Open up our fastqc .html report

15 Trimming Many different trimming programs available
We will use “bbduk” – quick runtime, lots of trim options $ vi trim.sh

16 In this script, we will: Trim for adapters (followed by length) Trim for quality To run (after adjusting rootname/project): $ bash trim.sh

17 View trim stats $ cd /home/user/hackcon/trimmed $ ls
$ vim sample.stats What can we learn from this report?

18 Running FastQC – Post-Trim
Determine which adapters are present if you are unsure of the protocol Assess whether sequencing/protocol providing the results expected Refine trimming options

19 In this script, we will: Assess our trimming parameters Determine if we need to re-trim or move forward with mapping To run (after adjusting rootname/project): $ bash fastqc_postrim.sh

20 Open up our fastqc .html report


Download ppt "Garbage In, Garbage Out: Quality control on sequence data"

Similar presentations


Ads by Google