Presentation is loading. Please wait.

Presentation is loading. Please wait.

NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15 th, 2012 BioSci room B9242 Facilitator: Richard.

Similar presentations


Presentation on theme: "NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15 th, 2012 BioSci room B9242 Facilitator: Richard."— Presentation transcript:

1 NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15 th, 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB

2 Learning Objectives  Linux revisited  Quick dive into the Open-Bio pool (BioPython)  A first look at NGS data:  NCBI short read archive  Processing NGS: FASTX tool kit et al.  Visualization: IGV

3 Files and Permission  Linux user permissions: owner, group, or others  Owner/user is the person who created the file “OWNS” the file / directory  Group is a team of people that’s associated together GROUP project / Team work  Others is just other people on the server  Each file / directory can have it’s permission set to (r)ead, (w)rite, or e(x)ecute

4 Do a long listing (ls –l)  dr-x-wxrw- Separated into four sections  (d)(r - x)(- w x)(r w -) Examples: chmod o+x foo.txt  grant ‘execute’ permission to ‘others’ on foo.txt chmod g-rw foo.txt  remove ‘read’ and ‘write’ permission from group chmod ugo+rwx foo.txt  grant all rights to everyone To change the user/group (‘owner’) of a file: chmod ubuntu:ubuntu foo.txt chmod: change file permissions directory or file (-) user (owner)group others

5  Hitting “tab” will auto-complete file or program names (or suggest possible names)  Up arrow will let you return to previous commands  Editing of text files: “nano” is an easier alternative to “emacs”, but less powerful  alternatively, use SSH client to transfer files on your Windows desktop, edit them in Windows, then transfer back  BUT: make sure you use a text editor that knows the difference between a Windows and a Linux text file (e.g. Notepad++) a few useful tips…

6 Some more useful basic Linux commands  “cd” changes your directory, e.g. ‘cd /usr/local’  “man” display manual for command, e.g. ‘man ‘ls’  “pwd” tells you the directory you are currently in (= working directory)  “history” will list recent commands, enumerated with line numbers. By; typing an exclamation point with the line number (e.g. !123), you can redo the command

7 Accessing remote servers  “ssh” – Secure Shell ssh –i private_keypair  “scp” – Secure CoPy ssh –i private_keypair Where user is the account (default: local user) and host is the internet name of the computer (defaults: local host)

8 OpenBio Case Study: BioPython

9 FIRST LOOK AT NGS DATA NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

10

11 Linux, MacOSX or Unix only

12 Get the precompiled binary wget  fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 bunzip2 fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 tar –xvf fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar sudo mv bin/* /usr/local/bin

13 FASTX tool kit I  FASTQ-to-FASTA converter  Convert FASTQ files to FASTA files.  FASTQ Information  Chart Quality Statistics and Nucleotide Distribution  FASTQ/A Collapser  Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)  FASTQ/A Trimmer  Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).  FASTQ/A Renamer  Renames the sequence identifiers in FASTQ/A file.  FASTQ/A Clipper  Removing sequencing adapters / linkers

14 FASTX tool kit II  FASTQ/A Reverse-Complement  Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.  FASTQ/A Barcode splitter  Splitting a FASTQ/FASTA files containing multiple samples  FASTA Formatter  Changes the width of sequences line in a FASTA file  FASTA Nucleotide Changer  Converts FASTA sequences from/to RNA/DNA  FASTQ Quality Filter  Filters sequences based on quality  FASTQ Quality Trimmer  Trims (cuts) sequences based on quality  FASTQ Masker  Masks nucleotides with 'N' (or other character) based on quality

15

16 Integrative Genomics Viewer


Download ppt "NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15 th, 2012 BioSci room B9242 Facilitator: Richard."

Similar presentations


Ads by Google