Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using command line tools to process sequencing data

Similar presentations


Presentation on theme: "Using command line tools to process sequencing data"— Presentation transcript:

1 Using command line tools to process sequencing data
Tapio Vuorenmaa, Krista Kokki Using command line tools to process sequencing data

2 This hands-on session:
Part 1 – Bedtools Part 2 – DEMO: Galaxy with the command line Part 3 – Bedtools excercises

3 Part 1: Bedtools Allows to do wide range of genomics tasks easily
Command line tool Easy to use  Perfect example We’re focusing on using command line and parameters, not to tool itself ”intersect”

4 Bedtools - intersect ”Do my two features in the set overlap with each other?” Files: “genes.bed” and “markers.bed” (in exercises) .bed format

5 .bed format? Flexible way to define data lines displayed in annotation track One of the file types Genome Browser uses Three required fields: chr, chr start, chr end Additional fields; name, score, strand etc. Our ”data” is simplified

6 Our ”data” Chromosome 1. Chromosome 2.

7 Back to intersect Question: Do my chip-seq peaks overlap?
The basic command: bedtools intersect –a genes.bed –b markers.bed > result.bed Optional parameters Program Command First file Second file Redirect output (optional)

8 Optional parameters -wa  Display the original feature for each overlap -u  Display only one (the first) overlap found. -s  Only display overlaps found on the same strand. -c  Count the number of overlaps. -v  Complement, display those which do not overlap. -S  Only display overlaps found on the opposite strand. And many more. See your bedtools cheat sheet.

9 That’s not all there’s to it
There are number of other things you can do with bedtools, such as Coverage Merge Cluster

10 Part 2: DEMO: Galaxy with the command line
Step 1. .sra file into fastqc fastq-dump NAME.sra --offset 33 Output: .fastq file Step 2. Quality report of your data fastqc NAME.fastq Output: .html file Quality conversion (to get quality score from ASCII code)

11 DEMO: Galaxy with the command line
Step 3. Trimming (optional) fastx_trimmer -i NAME.fastq -o TRIMMED_NAME.fastq -f 1 -l 50 Input file Output file First base to keep Last base to keep

12 DEMO: Galaxy with the command line
Step 4. Quality filtering fastq_quality_filter -i TRIMMED_NAME.fastq -o QFILT_NAME.fastq -q 10 -p 100 Input file Output file Minimum quality score to keep Minimum % of bases that must have –q quality

13 DEMO: Galaxy with the command line
Step 5. Removing quality information fastx_collapser -i QFILT_NAME.fastq -o COLPS_NAME.fasta Input file Output file

14 DEMO: Galaxy with the command line
Step 6. Mapping bowtie //hg19 -f COLPS_NAME.fasta, --best -v 2 -m 3 -k 1 output.sam Output: .sam Genome Query input files (f=fasta) Result in best-to-worst order No more than 2 mismatches Not reporting alignments for reads having more than 3 reportable alignments Report up to 1 valid alignments Output in sam format

15 DEMO: Galaxy with the command line
Finally: Samtools Genome browser doesn’t use .sam format (output from mapping). .sam must be converted to .bam samtools view –bS NAME.sam > NAME.bam Output: .bam Now you can visualize the result in Genome browser. Sam to bam (when having header)

16 So, why the command line? Remote access: you can easily access and operate other computers, such as the computing servers, using the command line. Speed: Programs are (usually) controlled by parameters. Once you learn to use these commands and parameters, they are very quick to use. Control: Using pipelining and redirection enables users to perform powerful tasks with a single line of commands. Automation: With scripting users can create sequences of program tasks to execute automatically without further user interaction.

17 Don’t be scared of the command line!


Download ppt "Using command line tools to process sequencing data"

Similar presentations


Ads by Google