Trinity College Dublin, The University of Dublin GE3M25: Data Analysis Karsten Hokamp, PhD Genetics TCD, 16/11/2015.

Slides:



Advertisements
Similar presentations
Linux commands exercise 1. What do you need, if you try to these at home? You need to download and install Ubuntu Linux from the Internet – DVD is need.
Advertisements

Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Linux+ Guide to Linux Certification, Second Edition
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
Very Quick & Basic Unix Steven Newhouse Unix is user-friendly. It's just very selective about who its friends are.
UNIX By Darcy Tatlock. 1. Successful Log Into Unix To actively manipulate your website you need to be logged in. Without being logged in you cannot enter.
CS 141 Labs are mandatory. Attendance will be taken in each lab. Make account on moodle. Projects will be submitted via moodle.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Introduction to Linux Workshop February Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.
NGS Analysis Using Galaxy
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 3 Windows File Management 1 Morrison / Wells / Ruffolo.
Brief introduction to UNIX A. Emerson CINECA, High Performance Systems.
UNIX command line. In this module you will learn: What is the computer shell What is the command line interface (or Terminal) What is the filesystem tree.
Chapter 9 Part II Linux Command Line Access to Linux Authenticated login using a Linux account is required to access a Linux system. The Linux prompt will.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
MES Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
Unix Basics Chapter 4.
UNIX command line. In this module you will learn: What is the computer shell What is the command line interface What is the directory tree Some UNIX commands.
Computer Programming for Biologists Oct 30 th – Dec 11 th, 2014 Karsten Hokamp  Fill out.
©NIIT Collaborate Lesson 1C / Slide 1 of 23 Collaborate Knowledge Byte In this section, you will learn to: Use the cal command Determine the file types.
System Administration Introduction to Unix Session 2 – Fri 02 Nov 2007 Reference:  chapter 1, The Unix Programming Environment, Kernighan & Pike, ISBN.
Additional UNIX Commands. 222 Lecture Overview  Multiple commands and job control  More useful UNIX utilities.
Week 3 Exploring Linux Filesystems. Objectives  Understand and navigate the Linux directory structure using relative and absolute pathnames  Describe.
Linux Operations and Administration
Lesson 2-Touring Essential Programs. Overview Development of UNIX and Linux. Commands to execute utilities. Communicating instructions to the shell. Navigating.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Week Two Agenda Announcements Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Next lab assignments.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 2a – A Unix Command Sampler (Courtesy of David Notkin, CSE 303)
Linux Commands C151 Multi-User Operating Systems.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
GE3M25: Computer Programming for Biologists Python, Class 5
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
CS 245 – Part 1 Using Operating Systems and Networks for Programmers Jiang Guo Dept. of Computer Science California State University Los Angeles.
1 Lecture 2 Working with Files and Directories COP 3353 Introduction to UNIX.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 4 Karsten Hokamp, PhD Genetics TCD, 01/12/2015.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 3 Windows File Management 1 Morrison / Wells / Ruffolo.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Linux+ Guide to Linux Certification, Second Edition Chapter 4 Exploring Linux Filesystems.
A Brief Overview of Unix Brandon Bohrer. Topics What is Unix? – Quick introduction Documentation – Where to get it, how to use it Text Editors – Know.
Unix Fundamentals CS 127. File navigation cd - change directory cd /var/log cd /etc/apache2 cd ~/Desktop ~ is a shortcut for the home directory.
Learning basic Unix command It 325 operating system.
Introduction to Linux Workshop February 15, 2016.
File Management commands cat Cat command cat cal.txt cat command displays the contents of a file here cal.txt on screen (or standard out).
Linux Tutorial Lesson Two *Getting Help in Linux *Data movement and manipulation *Relative and Absolute path *Processes Note: see chapter 1,2,3 from Linux.
Learning Unix/Linux Based on slides from: Eric Bishop.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Chapter 11 Command-Line Master Class
Prepared by: Eng. Maryam Adel Abdel-Hady
Linux Commands Help HANDS ON TRAINING Author: Muhammad Laique
Some Linux Commands.
C151 Multi-User Operating Systems
GE3M25: Data Handling – ChIP-Seq
Guide To UNIX Using Linux Third Edition
GE3M25: Data Analysis, Class 4
GE3M25: Data Handling – ChIP-Seq
GE3M25: Data Analysis, Class3
CSE 374 Programming Concepts & Tools
LING 408/508: Computational Techniques for Linguists
More advanced BASH usage
January 26th, 2004 Class Meeting 2
Presentation transcript:

Trinity College Dublin, The University of Dublin GE3M25: Data Analysis Karsten Hokamp, PhD Genetics TCD, 16/11/2015

Trinity College Dublin, The University of Dublin GE3M25 Data Handling Module Content Python Programming Bioinformatics ChIP-Seq analysis

Trinity College Dublin, The University of Dublin GE3M25 Data Handling Module Content Python Programming Bioinformatics ChIP-Seq analysis Evaluation: 1. Weekly tasks (50%) 2. Project report (50%)

Trinity College Dublin, The University of Dublin GE3M25 Grading Weight Python Programming Bioinformatics ChIP-Seq analysis Statistics 2/3 1/3

Trinity College Dublin, The University of Dublin Class 1: NGS Basics What is Next-Generation Sequencing? What types of data are produced? How to get access to NGS data? Investigating the data

Trinity College Dublin, The University of Dublin Next Generation Sequencing Technologies that parallelize the sequencing process, producing thousands or millions of sequences  massive impact on Genomics Massively parallel signature sequencing (MPSS) Polony sequencing 454 pyrosequencing. Illumina (Solexa) sequencing. SOLiD sequencing Ion Torrent semiconductor sequencing DNA nanoball sequencing Heliscope single molecule sequencing Single molecule real time (SMRT) sequencing NGS took sequencing runs from 84 kilobase (kb) per run to 1 gigabase (Gb) per run!

Trinity College Dublin, The University of Dublin Illumina Sequencing B Flow cell is coated with lawn of oligos complimentary to adapter sequence

Trinity College Dublin, The University of Dublin Illumina Sequencing Flow Cell 8 lanes with 120 tiles each (GAIIx)

Trinity College Dublin, The University of Dublin YouTube Videos Illumina Solexa Sequencing (< 2 minutes): Illumina (5 minutes):

Trinity College Dublin, The University of Dublin Next Generation Sequencing - Applications Xu F, Wang Q, Zhang F, Zhu Y, Gu Q, Wu L, Yang L, Yang X. Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research. Cardiovasc Diagn Ther 2012;2(2):

Trinity College Dublin, The University of Dublin Example data set PubMed search: atlas of small regulatory rnas in salmonella EMBO J Oct 17;31(20): doi: /emboj Epub 2012 Aug 24. An atlas of Hfq-bound transcripts reveals 3' UTRs as a genomic reservoir of regulatory small RNAs. Chao Y1, Papenfort K, Reinhardt R, Sharma CM, Vogel J. Related information  GEO DataSets uid= uid= Hfq-coIP overgrowth 14 Samples: D%20gsm[ETYP] D%20gsm[ETYP Test sample: Hfq coIP OD2+6h, SRA SRX155645

Trinity College Dublin, The University of Dublin Example data set Download from local server: Download from local server: Short Read Archive stores data in highly compressed format (.sra ending)

Trinity College Dublin, The University of Dublin Working on the Command Line NGS files are generally too big to open in TextEdit, Word or Excel! We can access small parts of the data files on the command line.

Trinity College Dublin, The University of Dublin Working on the Command Line Start: Open 'Terminal' from Spotlight or Dock

Trinity College Dublin, The University of Dublin Working on the Command Line – Terminal Title barPromptCursor

Trinity College Dublin, The University of Dublin Working on the Command Line – the Prompt userhost directory symbol

Trinity College Dublin, The University of Dublin Working on the Command Line – Orientation output command pwd = print working directory

Trinity College Dublin, The University of Dublin File Hierarchy root / Application Library Users bin tmp kahokamp Desktop Documents Library Movies Music

Trinity College Dublin, The University of Dublin Working on the Command Line – Orientation root directory separator ~ = short-cut for home-directory

Trinity College Dublin, The University of Dublin Working on the Command Line – File Listing ls -- list directory contents

Trinity College Dublin, The University of Dublin Working on the Command Line – File Listing ls -l  long directory listing

Trinity College Dublin, The University of Dublin Working on the Command Line – File Listing name last modification date size group ownerpermissions type link count

Trinity College Dublin, The University of Dublin Working on the Command Line – File Listing r = read permission w = write permission x = execution permission - = forbidden Permissions in triplets for owner, group, everyone else rwxr-xr-x

Trinity College Dublin, The University of Dublin Working on the Command Line – File Listing Parameter(s) Argument(s)

Trinity College Dublin, The University of Dublin Working on the Command Line – Manual man = manual page for a program

Trinity College Dublin, The University of Dublin Working on the Command Line – Manual space for next page h for help q for quit

Trinity College Dublin, The University of Dublin Working on the Command Line – Moving Around Some examples of using cd (change directory): cd Downloads cd  change into home directory cd ~/Downloads cd..  change into upper directory cd –  change into previous directory Try and combine with 'pwd' to get your bearings!

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Automatic extension with key: cd cd D  shows possible extensions cd Dow  extends to 'Downloads' shortens work and saves from typos!

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data Download 'fastq-dump' to extract data

Trinity College Dublin, The University of Dublin Working on the Command Line – Extracting Data

Trinity College Dublin, The University of Dublin Working on the Command Line – Extracting Data make tool executable by adding x-bit

Trinity College Dublin, The University of Dublin Working on the Command Line – Extracting Data. designates current directory./fastq-dump  execute tool from current directory

Trinity College Dublin, The University of Dublin Working on the Command Line – Extracting Data

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data head pulls out first 10 lines of a file

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data head is a system tool try the following for location and search paths: which head echo $PATH

Trinity College Dublin, The University of Dublin The FastQ Format Similar to Fasta but includes Quality data

Trinity College Dublin, The University of Dublin The FastQ Format sign + signheader (optional)quality string sequence string

Trinity College Dublin, The University of Dublin The FastQ Format SRA sign lanetilex, y coordinates Sequencer ID

Trinity College Dublin, The University of Dublin Quality Information Indication of the probability that a base call is correct Base call Quality C I T I … A # N #

Trinity College Dublin, The University of Dublin Quality Information Conversions of probabilities into quality score: Phred quality score

Trinity College Dublin, The University of Dublin Quality Information Conversions of probabilities into quality score: probqual

Trinity College Dublin, The University of Dublin Quality Information Alignment with quality score: C T T T T A G C G C A C G G C T … A A N … score of length 2 ≠ base of length 1

Trinity College Dublin, The University of Dublin Quality Information Conversion of quality score: C T T T T A G C G C A C G G C T … A A N … ASCII code is the numerical representation of a character

Trinity College Dublin, The University of Dublin Quality Information Conversion of quality score:

Trinity College Dublin, The University of Dublin Quality Information Conversion of quality score: C T T T T A G C G C A C G G C T … A A N … CTTTTAGCGCACGGCT … AAN IIIIIIIIIIIIIIII … ###

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data view content page by page with 'less'

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data space for next screen h for help q to quit

Trinity College Dublin, The University of Dublin Working on the Command Line – Examining Data G to go to bottom g to go to top -N to turn on line numbering / to search forward ? to search backwards n for next hit

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts cycle through previous commands using the 'up' and 'down' arrow use 'left' and 'right' to move cursor and modify hit 'return' to execute command

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Access individual elements from previous command: !! = previous command !!:0 = first element (less) !!:1 = second element (SRR fastq)

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Access individual elements from previous command: number of lines in the file

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Use calculator to divide line number by 4:

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Confirm with grep:

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Repeat previous commands history = brings up list of commands !# = repeats command # (e.g. !103)

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts More short-cuts: ctrl-a to get to beginning of line ctrl-e to get to end of line ctrl-r to search back in history ctrl-d to delete the next character esc-d to delete the next word

Trinity College Dublin, The University of Dublin Insight so far A lot of Poly-A tails and low-quality sequences at 3' end!  Get a more comprehensive overview of data quality

Trinity College Dublin, The University of Dublin Quality Control with FastQC Download and install FastQC

Trinity College Dublin, The University of Dublin Quality Control with FastQC

Trinity College Dublin, The University of Dublin Quality Control with FastQC File  Open  SRR fastq

Trinity College Dublin, The University of Dublin

Working on the Command Line – Hard-trimming Download UrQt:

Trinity College Dublin, The University of Dublin Working on the Command Line – Hard-trimming

Trinity College Dublin, The University of Dublin Working on the Command Line – Hard-trimming run program without arguments for help

Trinity College Dublin, The University of Dublin Working on the Command Line – Hard-trimming

Trinity College Dublin, The University of Dublin Working on the Command Line – Monitoring Monitor resource usage with Activity Monitor

Trinity College Dublin, The University of Dublin Working on the Command Line – Short-cuts Make terminal bigger (green dot in title bar)! Switch between applications: -

Trinity College Dublin, The University of Dublin Quality Control with FastQC File  Open  SRR qtrim.fastq

Trinity College Dublin, The University of Dublin Quality Control with FastQC Before and after trimming:

Trinity College Dublin, The University of Dublin loads of A's towards the end

Trinity College Dublin, The University of Dublin Working on the Command Line – Hard-trimming

Trinity College Dublin, The University of Dublin some bias left but not polyA

Trinity College Dublin, The University of Dublin Working on the Command Line – Hard-trimming Other options to consider: --min_read_size --phred 64 --t 28

Trinity College Dublin, The University of Dublin Exercises - Try other options for trimming with UrQt - Carry out FastQC of trimmed data - Find other online data sets - Download with fastq-dump SRRXXXXX - Run FastQC and UrQt

Trinity College Dublin, The University of Dublin Achievements so far Learnt about NGS Browsed GEO archive for public data sets Downloaded and unpacked SRA file Worked on the UNIX command line Learnt commands wc, less, bc Practiced command line short-cuts Carried out QC on sequence file Hard-trimmed bad quality base-calls and polyA tails

Trinity College Dublin, The University of Dublin Don't forget to log out!