National Center for Genome Analysis Support

Slides:



Advertisements
Similar presentations
Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Advertisements

Exploring the UNIX File System and File Security
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
CS 110 Intro to Computer Science I Sami Rollins Fall 2006.
Introduction to RCC for Intro to MRI 2014 July 25, 2014.
Lecture 01CS311 – Operating Systems 1 1 CS311 – Lecture 01 Outline Course introduction Setting up your system Logging onto the servers at OSU with ssh.
Virtual Machine and UNIX. What is a VM? VM stands for Virtual Machine. It is a software emulation of hardware. By using a VM, you can have the same hardware.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
Embedded Programming and Robotics Lesson 13 Basic Linux 1.
Understanding the Basics of Computational Informatics Summer School, Hungary, Szeged Methos L. Müller.
1 SEEM3460 Tutorial Unix Introduction. 2 Introduction What is Unix? An operation system (OS), similar to Windows, MacOS X Why learn Unix? Greatest Software.
A crash course in njit’s Afs
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Module 13: Configuring Availability of Network Resources and Content.
Overview of Linux CS3530 Spring 2014 Dr. José M. Garrido Department of Computer Science.
Help session: Unix basics Keith 9/9/2011. Login in Unix lab  User name: ug0xx Password: ece321 (initial)  The password will not be displayed on the.
Chapter 9 Part II Linux Command Line Access to Linux Authenticated login using a Linux account is required to access a Linux system. The Linux prompt will.
Welcome to Linux & Shell Scripting Small Group How to learn how to Code Workshop small-group/
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
WRF Domain Wizard A tool for the WRF Preprocessing System Jeff Smith Paula McCaslin July 17, 2008.
Stern Center for Research Computing
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
Computing Fundamenatls CMSC 201 Computer Science I Penny Rheingans University of Maryland Baltimore County (with inspiration from previous 201 instructors.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
Chapter Two Exploring the UNIX File System and File Security.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
Unix Shell Basics Edited from Greg Wilson's "Software Carpentry"
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Linux and Java Basics. What is Linux? Operating system by Linus Torvalds that was a clone of Unix (thus Linux) Free and open source – this is the reason.
Intro to UNIX Presented by: Student Ambassadors: Lauren Lewis Martin Sung.
Introduction to Programming Using C An Introduction to Operating Systems.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Apr. 25, Grid Computing Hands On Training for Users Faculty of Sciences, University.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)
Unix Servers Used in This Class  Two Unix servers set up in CS department will be used for some programming projects  Machine name: eustis.eecs.ucf.edu.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Isecur1ty training center Presented by : Eng. Mohammad Khreesha.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
Linux A practical introduction. 1)Background and Getting Started Linux is an operating system with multiple providers Red Hat/CentOS (our version) Ubuntu.
 Last lesson, the Windows Operating System was discussed along with the Windows command shell  Unix is a computer operating system, that similarly manages.
+ Introduction to Unix Joey Azofeifa Dowell Lab Short Read Class Day 2 (Slides inspired by David Knox)
July LJM Introduction to Bioinformatics Lisa Mullan, HGMP-RC.
Linux Workshop Session 2 By Amol and Prem. Overview of Presentation Brief Review Useful tools Remote Access Troubleshooting.
Overview of Linux Fall 2016 Dr. Donghyun Kim
Hackinars in Bioinformatics
GRID COMPUTING.
UNIX To do work for the class, you will be using the Unix operating system. Once connected to the system, you will be presented with a login screen. Once.
Computing challenges in working with genomics-scale data
Getting started with CentOS Linux
Tutorial Six Recap & Linux Basics CompSci Semester Two 2016.
Andy Wang Object Oriented Programming in C++ COP 3330
3.4 User Interfaces This tutorial will give you a first-hand experience of: Navigating the directory structure in a CLI and an unfamiliar GUI Running the.
Assignment Preliminaries
Linux + Galaxy Server Tutorial
Exploring the UNIX File System and File Security
Introduction to Linux Week 0 - Thursday.
Richard LeDuc, Ph.D. (Manager)
Web Programming Essentials:
Getting started with CentOS Linux
Andy Wang Object Oriented Programming in C++ COP 3330
UNIX/LINUX Commands Using BASH Copyright © 2017 – Curt Hill.
An introduction to the Linux environment v
Introduction to High Performance Computing Using Sapelo2 at GACRC
Yung-Hsiang Lu Purdue University
Module 6 Working with Files and Directories
Working in The IITJ HPC System
Presentation transcript:

National Center for Genome Analysis Support Introduction to genomics software use on high performance computing systems Le-Shin Wu, Ph.D. Carrie Ganote National Center for Genome Analysis Support Genomics in July, July 22 , 2014

Summary High Performance Computing (HPC) cyberinfrastructure at IU Running Applications in Command Line Interface (CLI) Running Applications Through Graphical User Interface (GUI) Examples

Who is NCGAS? Funded by National Science Foundation Partner with TACC, SDSC, PSC, Broad Access to High Performance Computing Systems Bioinformatics consulting for biologists Install and upgrade bioinformatics software Optimized software for better efficiency Open for business at: http://ncgas.org Before I jump into the real technical stuff I would like to spend several minutes to talk about who we are and what we do NCGAS is an NSF-funder organization based at IU since 2011 We partner with TACC, SDSC, PSC and broad institute We provide computing infrastructures and bioinformatics consulting support to biologists We also optimized bioinformatics software for better efficiency

HPC Cyberinfrastructure at IU Mason large memory cluster (512 GB/node) Quarry cluster (16 GB/node) BigRed2 petaFLOPS cluster Hi Performance File System DC2 (3.5 PB at 40 Gbps throughput) Research File System (RFS) for data storage Research Database Cluster for structured data High Performance Storage System HPSS (15PB Type + 600TB Disk) High speed internal network (56 Gbps)

High Performance Computing System High Performances Storage System High Performances File System

Running Applications in CLI

System Access Use a SSH2 client to connect to the login nodes iterm putty ssh –Y hpstrnXX@mason.indiana.edu

Basic linux shell commands cd - change directories mkdir - make directories mv- change the name of a directory pwd - print working directory ls - listing of directory contents cp - copy files rm - remove files cat - show file contents less – similar to cat with backward and forward

Linux File System “.” current directory “..” parent directory /home/jono/photos (Absolute Path) ../photos (Relative Path)

Running applications in command line Know what the program is called Bowtie, Bwa, Blast, Trinity, … Know what programming language will be used Java, perl, python, php, … Prepare the inputs Create working directory Upload the data Pre-processing

Running applications in command line Set the commands $>System_options Languare_options Application_name Application_options System_options: screen, nohup, nice, time,… Language_options: java, perl, python, php,… Application_name: Trinity.pl, bowtie2, bwa, blastn,… Application_options: -num_threads, -in, -out,.. $>time java -Xmx4g -Xms4g -jar /N/soft/mason/picard-tools-1.52/MarkDuplicates.jar INPUT=$SORTED_BAM OUTPUT=$MARKDUPLICATES METRICS_FILE=$METRICS REMOVE_DUPLICATES=true

Examples

Running Applications in GUI

What is Galaxy Galaxy is a web-based platform for analyzing data It provides a set of tools that one can apply to the data It stores all the activities of analyses It allows for sharing of data sets, methods, and workflows

GALAXY.IU.EDU Model Quarry Mason Virtual box hosting Galaxy.IU.edu The host for each tool is configured to meet IU needs Quarry Mason Data Capacitor 2

Focus pane – shows options, parameters, and output for current item. Galaxy at IU History – shows steps previously taken to manipulate input data sets Tool bar - contains the available steps to apply to data What is it? -> go over user interface. Where? Diff between local and remote. Focus pane – shows options, parameters, and output for current item.

Examples https://galaxy.iu.edu

CLI V.S. GUI Command Line Interface (CLI) Fully control Fast and Efficient More skills Graphical User Interface (GUI) Easy and User Friendly Less skills Black Box

Thank You Le-Shin Wu (lewu@iu.edu) Carrie Ganote (cganote@iu.edu) NCGAS (help@ncgas.org)