Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.

Similar presentations


Presentation on theme: "Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview."— Presentation transcript:

1 Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview of bioinformatics applications

2 Introduction Damian Christey Professional Technologist Departments of Mathematics and Biology damian@math.wvu.edu

3 Cluster Computing High Availability (HA) High Performance (HPC) Specialized software Highly parallel Beowulf Commodity hardware Open Source software

4 Biology Cluster Hardware 12 nodes 2 processors per node Dual core 1GHz Opteron 8 GB RAM each Gigabit ethernet 2TB RAID storage

5 GNU/Linux Free, Open Source, Unix- based operating system Rocks cluster management system: http://www.rocksclusters.org/ http://www.rocksclusters.org/ CentOS: http://centos.org/http://centos.org/ derived from Redhat: http://www.redhat.com/ http://www.redhat.com/

6 Why Linux? Cheap Reliable and Scalable Customizable Unix philosophy Text processing

7 Accessing the Cluster Monitoring - http://alba.as.wvu.edu/gangliahttp://alba.as.wvu.edu/ganglia Secure Shell ssh -X username@alba.as.wvu.edu on Mac OS or Linuxusername@alba.as.wvu.edu Windows users can download SSH and X server from: http://cygwin.com/http://cygwin.com/ File transfer – SFTP http://www.winscp.com/ for Windows http://www.winscp.com/ http://cyberduck.ch/ for Mac http://cyberduck.ch/ qrsh – command to get a shell on a node

8 Unix Filesystem Tree with a single root: / folders may be physically stored on separate devices, different machines /home/bob : Bob’s files /opt/Bio : Bioinformatics programs /share/bio : shared data, genome libraries

9 Unix Permissions 3x3 Matrix: owner, group, other read, write, execute chgrp biouser file change the group to which the file belongs chmod g+w file give the group write permission to your file

10 Text Processing cat file : dump the contents of file to standard output head, tail : output the first / last n lines of file grep : return lines matching pattern in input or file grep -v : invert match | : pipe output of one program to another > : pipe output to a file >> : concatenate output to end of file

11 Sequencing and Assembly Software Phred - reads DNA sequencing trace files, calls bases, and assigns quality values Phrap - assembling shotgun DNA sequence data Consed - viewing, editing, and finishing sequence assemblies created with phrap Artemis - genome viewer and annotation tool

12 Sequence Analysis and Screening Software (WU, NCBI, MPI) BLAST - find regions of local similarity between sequences ClustalW, T_Coffee, MUSCLE - multiple sequence alignment RepeatMasker - screens for interspersed repeats and low complexity sequences RepeatScout, PILER - de novo repeat finder EMBOSS – assorted analysis tools

13 Phylogenetics Software Phylip, Paup - packages for inferring phylogenies or evolutionary trees. MrBayes - bayesian inference of phylogeny Structure - model-based clustering method for inferring population structure


Download ppt "Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview."

Similar presentations


Ads by Google