Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Center for Genome Analysis Support

Similar presentations


Presentation on theme: "National Center for Genome Analysis Support"— Presentation transcript:

1 National Center for Genome Analysis Support
Introduction to genomics software use on high performance computing systems Le-Shin Wu, Ph.D. Carrie Ganote National Center for Genome Analysis Support Genomics in July, July 22 , 2014

2 Summary High Performance Computing (HPC) cyberinfrastructure at IU
Running Applications in Command Line Interface (CLI) Running Applications Through Graphical User Interface (GUI) Examples

3 Who is NCGAS? Funded by National Science Foundation
Partner with TACC, SDSC, PSC, Broad Access to High Performance Computing Systems Bioinformatics consulting for biologists Install and upgrade bioinformatics software Optimized software for better efficiency Open for business at: Before I jump into the real technical stuff I would like to spend several minutes to talk about who we are and what we do NCGAS is an NSF-funder organization based at IU since 2011 We partner with TACC, SDSC, PSC and broad institute We provide computing infrastructures and bioinformatics consulting support to biologists We also optimized bioinformatics software for better efficiency

4 HPC Cyberinfrastructure at IU
Mason large memory cluster (512 GB/node) Quarry cluster (16 GB/node) BigRed2 petaFLOPS cluster Hi Performance File System DC2 (3.5 PB at 40 Gbps throughput) Research File System (RFS) for data storage Research Database Cluster for structured data High Performance Storage System HPSS (15PB Type TB Disk) High speed internal network (56 Gbps)

5 High Performance Computing System
High Performances Storage System High Performances File System

6 Running Applications in CLI

7 System Access Use a SSH2 client to connect to the login nodes iterm
putty ssh –Y

8 Basic linux shell commands
cd - change directories mkdir - make directories mv- change the name of a directory pwd - print working directory ls - listing of directory contents cp - copy files rm - remove files cat - show file contents less – similar to cat with backward and forward

9 Linux File System “.” current directory “..” parent directory
/home/jono/photos (Absolute Path) ../photos (Relative Path)

10 Running applications in command line
Know what the program is called Bowtie, Bwa, Blast, Trinity, … Know what programming language will be used Java, perl, python, php, … Prepare the inputs Create working directory Upload the data Pre-processing

11 Running applications in command line
Set the commands $>System_options Languare_options Application_name Application_options System_options: screen, nohup, nice, time,… Language_options: java, perl, python, php,… Application_name: Trinity.pl, bowtie2, bwa, blastn,… Application_options: -num_threads, -in, -out,.. $>time java -Xmx4g -Xms4g -jar /N/soft/mason/picard-tools-1.52/MarkDuplicates.jar INPUT=$SORTED_BAM OUTPUT=$MARKDUPLICATES METRICS_FILE=$METRICS REMOVE_DUPLICATES=true

12 Examples

13 Running Applications in GUI

14 What is Galaxy Galaxy is a web-based platform for analyzing data
It provides a set of tools that one can apply to the data It stores all the activities of analyses It allows for sharing of data sets, methods, and workflows

15 GALAXY.IU.EDU Model Quarry Mason Virtual box hosting Galaxy.IU.edu
The host for each tool is configured to meet IU needs Quarry Mason Data Capacitor 2

16 Focus pane – shows options, parameters, and output for current item.
Galaxy at IU History – shows steps previously taken to manipulate input data sets Tool bar - contains the available steps to apply to data What is it? -> go over user interface. Where? Diff between local and remote. Focus pane – shows options, parameters, and output for current item.

17 Examples

18 CLI V.S. GUI Command Line Interface (CLI) Fully control
Fast and Efficient More skills Graphical User Interface (GUI) Easy and User Friendly Less skills Black Box

19 Thank You Le-Shin Wu Carrie Ganote NCGAS


Download ppt "National Center for Genome Analysis Support"

Similar presentations


Ads by Google