DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.

Slides:



Advertisements
Similar presentations
A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015.
Advertisements

Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Working Environment - - Linux - -.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Guide To UNIX Using Linux Third Edition
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
A crash course in njit’s Afs
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
CS 141 Labs are mandatory. Attendance will be taken in each lab. Make account on moodle. Projects will be submitted via moodle.
NGS Analysis Using Galaxy
Next generation sequencing Xusheng Wang 4/29/2010.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Introduction to UNIX/Linux Exercises Dan Stanzione.
Help session: Unix basics Keith 9/9/2011. Login in Unix lab  User name: ug0xx Password: ece321 (initial)  The password will not be displayed on the.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
File formats Wrapping your data in the right package Deanna M. Church
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
1 Editing a C Program 01/16/15. 2 Objective Use Linux to edit, compile and execute a C program.
Next Generation DNA Sequencing
Agenda Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Review next lab assignments Break Out Problems.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Usage of Workstation Lecturer: Yu-Hao( 陳郁豪 ) Date:
Getting started: Basics Outline: I.Connecting to cluster: ssh II.Connecting outside UCF firewall: VPN client III.Introduction to Linux IV.Intoduction to.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Introduction to UNIX Geraint Vaughan. What is UNIX? Command-line operating system (not point- and click) Designed for ‘experts’ Lots of different variants.
CGS 3460 Why we choose UNIX n Powerful lMulti-user operating system lGood programming tools Most heavy-duty database management systems started out on.
Getting the most out of the workshop Ask questions!!! Don’t sit next to someone you already know Work with someone with a different skillset and different.
Introduction to RNAseq
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
More on Using onyx 8/28/13. Program 1 Due a week from today. See website for details.
AN INTRO TO UNIX/LINUX COMMANDS BY: JIAYANG WANG.
Basic Unix Commands. Listing files and directories ● ls:command is used to list the files and ● directories in present working directory ● ls command.
Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.
Dr. Sajib Datta Jan 16,  The website is up.  Course lectures will be uploaded there ◦ Check regularly for assignments and update.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Learning Unix/Linux Based on slides from: Eric Bishop.
Introduction to Scripting Workshop February 23, 2016.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
DAY 3. ADVANCED PYTHON PRACTICE SANGREA SHIM TAEYOUNG LEE.
From Reads to Results Exome-seq analysis at CCBR
July LJM Introduction to Bioinformatics Lisa Mullan, HGMP-RC.
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
GRID COMPUTING.
Tutorial of Unix Command & shell scriptS 5027
Day 5 Mapping and Visualization
Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands Nov 17, 2016 Joe Troy.
Cancer Genomics Core Lab
Dowell Short Read Class Phillip Richmond
Getting started with CentOS Linux
Prepared by: Eng. Maryam Adel Abdel-Hady
Some Linux Commands.
Introduction to RAD Acropora millepora.
National Center for Genome Analysis Support
Practice #0: Introduction
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
Web Programming Essentials:
Tutorial of Unix Command & shell scriptS 5027
Getting started with CentOS Linux
Tutorial Unix Command & Makefile CIS 5027
Yung-Hsiang Lu Purdue University
Information processing after resequencing
Alignment of Next-Generation Sequencing Data
The Variant Call Format
Presentation transcript:

DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM

INDEX  Day 1  General aspects for genetic map construction  Genetic polymorphism and recombination frequency  Genotyping using molecular marker  Map construction (phenotype, AFLP, RFLP)  Sequencing method  Next generation sequencing  Whole genome reference sequence  Reference sequencing for Genotyping  Retrieving sequence polymorphism  Genetic map construction (SNP, InDel)

GENETIC POLYMORPHISM & RECOMBINATION FREQUENCY

GENOTYPING USING MOLECULAR MARKER An Integrated High-density Linkage Map of Soybean with RFLP, SSR, STS, and AFLP Markers Using A Single F2 Population Xia et al. 2008

MAP CONSTRUCTION An Integrated High-density Linkage Map of Soybean with RFLP, SSR, STS, and AFLP Markers Using A Single F2 Population Xia et al. 2008

NEXT GENERATION SEQUENCING  Sequencing  Sanger’s Dideoxy Termination  Using dNTPs  Electrophoresis in capillary gel  Read dye colors one-by-one  Average 700~900 bp  Massive Parallel Sequencing Platform  So called Next Generation Sequencing platform  SOLiD (Sequencing by Ligation), Illumina (Sequencing by synthesis), 454 (Pyrosequencing)  Read 50+35(50+50), 50~300, 700 bp  1200~1300, ~3000, 1 million reads per run

NEXT GENERATION SEQUENCING Sequencing technologies – the next generation Michael et al. Nature review genetics 2010

WHOLE GENOME REFERENCE SEQUENCE  Polymorphism discovered by comparison  Reference is required for comparison  So, the reference genome is obligated  Making contigs which is constituted by unique sequences combination using PE or small size MP  Scaffolding which includes less unique sequences (i.e. repetitive sequences) using large insert size MP library sequences  Anchor the scaffold using genetic map  But, genetic map constituted by several types of molecular marker is not able to translate to sequence information

RESEQUENCING FOR GENOTYPING  GET Polymorphism!, Treat it as a marker or locus!  SNPs  Small size InDels  Align several depth of raw read sequence against Ref.  Statistics  Lots of alignment software is available  BLAST, BLAT, BWA, BOWTIE-series…..  Aligner which use BWT as a main algorithm are famous  Fast, efficient

RESEQUENCING FLOW CHART SolexaQA bwa bowtie2 bwa bowtie2 Alignment samtools SAM samtools BAM Sorted BAM samtools bcftools samtools bcftools pileup VCF selection JoinMap4 Map construction DNA/RNA NGS platform Raw read Sequences Raw read Sequences Quality trimming

RETRIEVING SEQUENCE POLYMORPHISM  BOWTIE2 or BWA are just align the bulky reads to reference sequence  Making SAM(sequence alignment/mapping)/BAM(binary sequence alignment/mapping) as a result  Several types of statistics or inferences can be adapted to retrieving polymorphism (Picard, GATK)  Samtools package is used in retrieving variants  The VCF(variant calling format) is the ouput file

GENETIC MAP CONSTRUCTION Selection of a core set of RILs from Forrest x Williams 82 to develop a framework map in soybean Wu et al. 2011

HURDLES ON THE ROAD TO GENETIC MAP  Output of calling variation is a VCF format  JoinMap input file is LOC format  Is there a Converter between the VCF and LOC?  Make converter program, Make genetic map yourself  These are the final goal of this courses

TODAY’S PRACTICE  Make a connection to remote computer  Get used to Linux system  Get familiar with python2.7

THANK YOU  If you have a question, please ask me.

DAY 1. PRACTICE- BASIC LINUX COMMAND TAEYOUNG LEE

CONNECTING  Server is located in Seoul National University campus  Connect to server computer using putty SSH client program  Download at

CONNECTING  Execute putty  Put IP address ( ) at Host Name and click OPEN

CONNECTING  ID : trainee  PW : bogor  Then you are in server now  Only white character on black background

 ls  Listing files and directories  cd  Change directory  Practice) enter into /data2/python BASIC COMMAND IN LINUX

 mkdir  Make directory  Usage) mkdir dir_name  Practice) make directory named as your name BASIC COMMAND IN LINUX

 vi  Open text editing program  Make new text file  usage) vi filename_to_edit vi filename_to_make  Practice) make text file named as yourname in your directory, write something and save it  Insert, replace, esc  :q :w :wq :q! BASIC COMMAND IN LINUX

 mv  Moving files or directories  Rename files or directories  Usage) mv present_file_path file_path_to_move  Practice)  Change directory into upper directory  cm) cd..  Make some text file by vi  Move text file to your directory  Rename text file BASIC COMMAND IN LINUX

 cp  Coping files or directories  Usage) cp file_path file_path_to_copy  cp can rename file  If you want to copy directory, you have to use –r option  Cp –r dir_path dir_path_to_copy  Practice)  Make directory in your directory  Copy some file into directory with rename and w/o rename BASIC COMMAND IN LINUX

 rm  Removing files or directories  Usage) rm file_name  If you want to remove directory, you have to use –r option  rm –r dir_name  Practice)  Remove the directory and file BASIC COMMAND IN LINUX

 less  Read only text viewer  Have advantage for large size text file  Usage) less file_name  Searching function  /  Practice)  Open large text file by vi and less  /data2/python/Gmax_109_gene_exons.gff3  Use searching function  /Gm12 BASIC COMMAND IN LINUX

 wget ftp://ftp.arabidopsis.org/ home/tair/Sequences/whole_chromosomes/tai r9_Assembly_gaps.gff

 cat  Concatenate files  Print out files  Usage cat file_name1 file_name2 …  Practice)  Print out file by cat  Print out file three times BASIC COMMAND IN LINUX

 grep  Grep the lines contain some words  Usually use with cat  Usage) cat file_name | grep ‘word’  ‘|’ mean after  This usage mean we grep line which contain some word after print out file  Various useful options  -v : vanish  -c : count  ‘word1\|word2’ = word1 or word2  grep ‘word1’ | grep ‘word2’ = word1 and word2  Practice)  Grep ‘Gm12’ in /data2/python/Gmax_109_gene_exons.gff3  Grep ‘Gm12’ or ‘Gm15’ in same file  Grep ‘gene’ and ‘mRNA’  Count line contain ‘Gm12’  Vanish line contain exon or CDS or mRNA BASIC COMMAND IN LINUX

 sort  Sorting file  Usually use with cat  Usage) cat file_name | sort  Various useful options  -k sort by column  -u sort and remove redundancy  -n numeric sort  -r reverse  -d delimiter setting  Practice)  Sort /data2/python/Gmax_109_gene_exons.gff3 by start position(by column and numeric) BASIC COMMAND IN LINUX

 cut  Cutting column in file  Usually use with cat  Usage) cat file_name | cut –f n (n : integer)  Practice)  Retrieve chromosome, start position, end position in /data2/python_study/Gmax_109_gene_exons.gff3 BASIC COMMAND IN LINUX

 >  Standard input, output vs. file input, output  Input and output on screen or file  > can save standard output to file output  cat file_name | grep ‘word’ > output_file  >>  >> also can save standard output to file output  But just adding! BASIC COMMAND IN LINUX

 Fasta file  /data2/python/ap2.fa  Fastq file  /data2/python/example.fastq  Gff file  /data2/python/Gmax_109_gene_exons.gff3  Python file!  /data2/python/1stday.py HANDLE FILE

 Make a new text file named as new.txt  The file contain  Gm01,1,23  Gm04,4,56  Gm03,6,78  Gm04,8,10  Copy new.txt into new.copy  Remove new.copy  Using cat, print the contents of new.txt  Using grep, print the contents the new.txt contain Gm04  Using cut, print the first column of new.txt and save it as a file named as new.txt.cut

THAT’S IT FOR TODAY  Q & A