2 Synopsis Why Linux? Your computing environment The command line interpreter (CLI/terminal/shell)Linux directory structureWorking with filesPipes and redirectionsWhere to find help?SSH + FTPIf we have time…Basics of (shell) scriptingWorking with multiple processorsUser groups and permissions
3 Why use Linux? Learning the basics of Linux allows you to: The best tools available in the field of bioinformatics are largely non-portable, open source tools that are written for POSIX systems (such as Linux and UNIX).Learning the basics of Linux allows you to:develop a skill set that sets you apart from a wet-only biologistrun bioinformatics resources on your own machineaccess ready-made bioinformatics computing environmentsdo reproducible researchperform analyses on computer clusters – important for long computational jobsaccess cloud computing and High Performance Computing (HPC) resourcesLinux is a multi-access, multi-tasking system: multiple users may be logged in and run multiple tasks on one machine at the same time, sharing resources (CPUs, memory, disk space) -> this is what we will be doing during the course
4 Your computing environment In this class you will be using a thin client.A thin client refers to either a software program or to an actual computer that relies heavily on another computer to do most of its work.Think of it as a dumb computer that allows you to access our server (much bigger computer) stored in the data center.You get access to computational resources that would not be available in a “normal” computer, such as the one you use at work or at home.The server we will be using is a HP DL 580 G74 cores x 8 threads x hyperthreading = 64 threads1 TB RAMaccess to 100 TB storage
5 Logging in to our server It’s easy! You will start with a fairly familiar-looking desktop. Double click on highlighted icon.
6 Enter your TU Graz credentials – the credentials you use to login to online.tugraz.at
7 After you’ve entered your credentials, click “Connect”.
13 Logging out of your session At the end of each class please log out of your session.I will later cover the “screen” command which will allow you to leave your programs running even if your session has ended.To log out simply click this icon at the top left of your screen.A dialog like the one on the right will appear, make sureyou have selected “Save this session for future logins”and then click the “Log Out” button.You will be returned to the initial desktop of the thin client.Try it now!
15 The command line interface (CLI) The CLI is a text-based interface that allows computer users to type commands that get executed by the operating system. You call programs, modify files and control your system just by writing a bit of text.It is an extremely powerful tool, because it centralizes managing your system and, crucially, makes automation easier, saving you a lot of work (more on this later, if we have time).We will be using the CLI to do most of the lab tasks in this class.There are different types of Unix shells or CLIs available, the one we are using is called Bash. Other good examples are tcsh, zsh or ksh.
18 Linux directory structure You won’t find any Windows, Program Files, or Users folders if you start browsing around the file system on your Linux computer. On Windows, an application might store all its files in C:\Program Files\Application. On Linux, its files would be split between multiple locations – its binaries in /usr/bin, its libraries in /usr/lib, and its configuration files in /etc/.Notice the use of slashes: in Windows you use “\” C:\Users, while in Linux you will use “/” /export/home.3 most important folders we will be using in this course/export/home/username – your user home directory, similar to the Windows C:\Users\User/export/programs – contains all the bioinformatics software used in the course
19 We will browse to /export/home/sensencw – the path to my home directory Windows directories in C:\Linux directories in / (root)
20 Linux directory structure at the root - / GUICLILinux directory structure at the root - /Graphical User Interface (GUI) versus command line view
25 Paths A Windows path: C:\Program Files\ Linux path types: Full: /export/home/amgrecu/home.pngRelative path to /export/home: amgrecu/home.pngRelative path to /export/home/amgrecu: home.pngYour home directory: /export/home/amgrecuOR ~
26 Working with directories After you log in to the terminal window, you are in yourhome directory, i.e. /export/home/amgrecu.To see in what directory you are, type pwd (print workingdirectory)
27 Working with directories cd - Change (current) directoryWithout additional arguments, this command will takeyou to your home directorycd ./ - Change (current) directory to the same directory.cd /export/home/amgrecu/Documents - Changedirectory from wherever to /export/home/amgrecu/Documents.cd Desktop - Change (current) directory to Desktop(will work if the current directory contains “Desktop”)cd ../ - Change (current) directory one level back(closer to the root)cd ../../../ - Change (current) directory three levelsback (closer to the root)
28 Working with directories Creating directoriesmkdir /export/home/amgrecu/new_dir - Make a new directory called “new_dir” in /export/home/amgrecumkdir new_dir - Make a new directory called “new_dir” in the current directoryRemoving directoriesrmdir /export/home/amgrecu/new_dir - Remove directory called “new_dir” in /export/home/amgrecu – will fail if the directory is not emptyrm –rf /export/home/amgrecu/new_dir - Remove directory called “new_dir” in /export/home/amgrecu with all its content (i.e. all files and subdirectories will be gone)rm –rf new_dir - Remove directory called “new_dir” in current directory with all its content (i.e. all files and subdirectories will be gone)rm –rf is a very dangerous command! Once you have deleted something it is GONE!There is no “Recycle Bin”! There are no backups!Check carefully what you are deleting!
29 Listing directory content Use ls –alh in the current directory. For even more options, man ls.file lastaccessednameblue - directorypermissionsOwner user and groupsize
30 May not work with all remote clients! Useful things to knowMay not work with all remote clients!Use Up/Down arrow keys – this will cycle through recently executed commands.Use the TAB key – this will often present you with a list of choices after typing a part of a command (more on this in a moment)history command: list all recently used commandshistory - list all remembered commandshistory | more - list all remembered commands page by pagehistory | grep –i example (list all remembered commands containing string “example”, case-insensitive)
31 Tab CompletionWriting every single character of a command is tedious. But your shell can help you!Press the Tab key while writing commands, directories or file names.Press once to complete until the last unambiguous character (you will hear a beep if there are no suggestions or you have ambiguous options).Press twice to show all completion options.
33 More examples: Tab Completion bow<tab> -> bowtie and you hear a beepbowtie<tab><tab> :
34 Real World Example: is simply Tab Completion cd /export/data/mol923/Bowtie-exampleis simplycd /ex<tab>da<tab>m<tab>B<tab>
35 Working with files – Types of files There are many types of files. Here are the most important:Text files (human-readable; can be viewed and modified using a text editor)Text documents (e.g., README files)Data in text format (e.g., FASTA, FASTQ, …)Scripts:Shell scripts (*.sh)Perl scripts (*.pl)Python scripts (*.py) and many othersBinary files (not human-readable; cannot be viewed using a text editor)Executables (e.g., samtools, blastn, bowtie)Data in binary format (e.g, BAM files, index files for Bowtie, formatted BLAST databases)Compressed files - archives (*.gz, *.zip, *.bz2) – often text files re-formatted to save space on disk or packaged directory treesHow do you know what type of file you are looking at?
36 The “file” commandUNIX_examples]$ file example.txt example.txt: UTF-8 Unicode English text, with very long lines UNIX_examples]$ file read read: ASCII text UNIX_examples]$ file first-script.sh first-script.sh: Bourne-Again shell script text executable UNIX_examples]$ file blastn-example.fasta blastn-example.fasta: ASCII text velvet]$ file velvetg velvetg: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux , not stripped
37 Naming – Best Practices Names should be meaningful, e.g. sample19_blastn_out.fasta, NOT blastn.fastaNames are case-sensitive - MyFile, myfile, myFile are all differentUse only letters (upper- and lower-case), numbers from 0 to 9, a dot (.), and an underscore (_) – e.g this_is_an_example.ext, NOT this &ismy example^.extAvoid other characters! They may have special meaning in LinuxWhile working in command line terminal you always explicitly specify a program which is supposed to open this file.
38 Where to find help?Every UNIX and Linux machine comes with built-in help for most of its commands and programs.Type “man” followed by the command nameTo leave:qFor Scrolling:Arrow Up or Down for one line up or downPage Down or Ctrl-F for one page downPage Up or Ctrl-B for one page upTo searchA slash, then your search term, then enter. e.g. /listn to go to the next occurrence of the search termEven “man” itself has a manpage. Type man man into the terminal.
39 System commands on files – Copy (cp) Usage: cp <source_file> <destination_file>cp sample_data.fa /home/amgrecu/sample.fa - copy file sample_data.fa from the current directory to /home/amgrecu and give the copy a name sample.fa; destination directory must existcp /home/amgrecu/my_script.sh . - copy file myscript.sh from /home/amgrecu to the current directory under the same file namecp /home/amgrecu/*.fastq /workdir/amgrecu - copy all files with file names ending with “.fastq” from /home/amgrecu to /workdir/amgrecu; destination directory must existcp –r /workdir/amgrecu/test /home/amgrecu - if test is a directory, it will be copied with all its files and subdirectories to directory /home/amgrecu/test; if /home/amgrecu/test did not exist, it will be createdman cp for all options to the cp command
40 Moving, renaming and removing files Moving and renaming files - has the same command line structure as cpUsage: mv <source_file> <destination_file>man mv for all options to the mv commandRemoving (deleting) filesUsage: rm <file>man rm for all options to the rm command
41 The chmod command is used to change file permissions. 4= read 2= write 1= executeMyself, Group, Worldchmod 700 filename
42 Working with files - compression To save disk space, we can compress large files if we do not intend to use them for a while. A lot of files downloaded from the web are compressed and usually need to be uncompressed before processing can take place. Compression works best on text files, such as FASTA or FASTQ.Common compressed formats:gzip (gz)gzip my_file - compresses file my_file, producing my_file.gzgzip –d my_file.gz - decompress my_file.gz, producing my_filebzip2bzip2 my_file - compresses to my_file.bz2bzip2 my_file.bz2 – decompress to my_file
43 Working with files - compression zipzip my_file.zip my_file1 my_file2 my_file3 - create a compressed archive called my_files.zip, containing three files: my_file1, my_file2, my_file3unzip my_files.zip - decompress the archive into the constituent files and directoriestartar -cvf my_file.tar my_file1 my_file2 my_dir - create a compressed archive called my_files.tar, containing files my_file1, my_file2 and the directory my_dir with all its contenttar -xvf my_files.tar - decompress the archive into the constituent files and directoriesHelp for compression tools:gzip -hbzip2 --helpzip --helptar --help
44 gedit Kate nano Editing text files User-friendly options: More advanced suggestions:nanoYou can call them from CLI:nano filenamegedit filenamekate filenameIn the course we will generally be using gedit
45 Working with text files more example.txt - display the content of the file README.txt in the current directory dividing the file into pages; press SPACE bar to go to the next pagehead -5 big.fasta - display first 100 lines of the file big.fasta in the current directorytail -5 big.fasta - display last 100 lines of the file big.fasta in the current directorytail -10 big.fasta | more - extract the last 1000 lines of the file big.fasta and display them page by pagehead big.fasta | tail display lines 901 through 1000 of the file big.fastacat big.fasta - print the file on screen. Only use for small files!wc big.fasta - display the number of lines, words, and characters in a fileNote the character | - it pipes the output from one command as input to another.
46 Working with text files Looking for a string in a text file:grep “Error: lane” error.log - display all lines of the file error.log in the current directory which contain the string “Error: lane”Looking for a string in a group of text files:grep “Error: lane” *.out - display all files *.out in the current directory which contain the string “Error: lane”; also display the lines containing that string
47 Pipes and Redirections Every command line program has three streams:STDIN: Standard inputSTDOUT: Standard outputSTDERR: Standard errorThe output you see in the console is either from STDOUT or STDERR. If the command is interactive, it waits for input on STDIN.| A pipe character redirects the STDOUT from the left program into the STDIN of the right program> Redirects the STDOUT from the left program into the file on the right2> Redirects the STDERR from the left program into the file on the right< Program on the left receives STDIN input from the file on the right
48 Pipes and Redirections Show files with a specific extension in a directory ls /export/data/mol923/Bowtie_example/reads | grep faCapture output of program to a file blastn -db nt -query blastn-example.fasta > hits.txtRedirect error channel into a log file cuffmerge -g genes.gtf -s genome.fa assemblies.txt 2> cuffmerge_errors.txt
49 Where to find help No.2?Not every command has a man page.However most commands have built-in help output:cp --helpblastn -helpcufflinks -hIt’s usually one of “-h”, “--help“, “-help” or similarYou can pipe the output into a pager (use q to leave; up/down arrow Ctrl-F/Ctrl-B to navigate)cp --help | less
50 Where to find help No.3? Also, there is plenty of help on the Internet Most problems can be solved by Googling. Basically, just type all related words into the search field.Ask specific questions here. General questions are usually already answered. You can find them through the search field, but usually they’re the top hits on Google anyway.“Stackoverflow for Biologists”Forum about Bioinformatics tools and –omics data processing tools
54 Connect to „cbt01.cbt.tugraz.at“ Connecting from Home to the UNIX machines at TUGrazThinLinc ClientConnect to „cbt01.cbt.tugraz.at“We have 50 licenses, if you aren‘t allowed in, please try again later.
55 Book Recommendation Only available as an ebook: The Linux Command Line A Complete IntroductionBy William E. Shotts Jr.An alternative:Learning the bash Shell, 3rd EditionUnix Shell ProgrammingBy Cameron Newham
56 Your assignment for the weekend Download Filezilla and ThinLinc client and install them on your home computer.Connect to cbt01.cbt.tugraz.at and play through a UNIX tutorial some more.
57 Max Malek firstname.lastname@example.org Emergency AssistanceMax Malek