Download presentation
Presentation is loading. Please wait.
Published byRachel Palmer Modified over 9 years ago
1
Using the Unix Shell There is No ‘Undelete’
2
The Unix Shell “A Unix shell is a command-line interpreter or shell that provides a traditional user interface for the Unix operating system and for Unix-like systems. Users direct the operation of the computer by entering commands as text for a command line interpreter to execute or by creating text scripts of one or more such commands.” - Wikipedia
3
Things to Keep in Mind There is no ‘undelete’ Shell commands are case-sensitive (CaPitaLizaTIoN mAttErs) Do NOT use space, ?, *, \, / or $ in file names because these have special meanings to the shell Filenames that begin with. are ‘hidden’ There is no ‘undelete’
4
The Importance of Being ‘Root’ ‘Root’ or ‘Superuser’ is the administrator account, which has phenomenal cosmic power. The ‘sudo’ command allows you to “do as superuser” from an account with ‘sudo privileges’. As root in the shell, you can literally ‘delete’ the operating system or operating system files (like choosing to delete Microsoft Windows while using Windows)… and then watch the stars go out… – Moral of the story: If you don’t know what a file is… it’s better to ask or leave it alone. – Installing software can require use of ‘sudo’
5
Unix Tutorial http://www.ee.surrey.ac.uk/Teaching/Unix/ Science.txt file location for tutorial: – http://www.ee.surrey.ac.uk/Teaching/Unix/science.txt http://www.ee.surrey.ac.uk/Teaching/Unix/science.txt – Unix command: wget http://www.ee.surrey.ac.uk/Teaching/Unix/science.txt Additional help/tutorial/walkthrough http://software-carpentry.org/4_0/shell/
6
Grep grep science science.txt grep science science.txt > newfile1.txt grep -B 1 -A 2 science science.txt > newfile1.txt Use man grep to learn more about grep A ‘redirect’ symbol that sends output which would normally go to the screen to a text file instead. Command line ‘options’ that change the behavior of the ‘grep’ program, with numerical parameters that specify the new behavior.
7
Permissions Type ls -l *note: those are both lower-case L characters -rw-r--r-- 1 krmerrill staff 358400 Feb 2 13:00 AJB_Merrill-d1100085_au.doc drwxr-xr-x 47 krmerrill staff 1598 Jul 17 2011 My Pictures - means regular file, d means directory, l (lower-case L) means link first triplet is the user read, write, and execute permissions second triplet is the group permissions last triplet is permissions for everyone else, or ‘other’ ls -al shows above information for all files, including hidden files chmod = change permissions u = user; g = group; o = other;a = all (user, group, and other) r = read; w = write; x = execute chmod u+x filename adds user execute permission on filename chmod g-wx filename removes group write and execute permissions from filename Permissions that are not mentioned in this format chmod command are not affected
8
Useful Shell Commands See the Linux Command Line Reference document on the course website Directory commands Change to sub-directory within the current directory: cd xyz Change to sub-directory in another part of the directory tree: cd /path/to/filename Create directory: mkdir newdir Remove empty directory: rmdir xyz Wildcard characters: ? matches any single character, * matches zero or more characters Example: rm *.txt will remove all files with a name ending in.txt rm file?.fastq will remove file1.fastq, file2.fastq, …, filex.fastq
9
Regular Expressions See the RegularExpressions.pdf document on the course website for an overview of literal characters and metacharacters Regular expressions are useful within grep, awk, sed and other command-line tools as well as in Java, Perl, Python, and other scripting languages. Some text editor programs in Linux also use regular expressions, (also called regexps or regex). We will use nedit as an example. Replacing a space character with a new-line character in a file of barcodes – find ‘(OWB\d+) ’ and replace with ‘\1\n’ – note the trailing space in the first expression.
10
Command-line example Testing analyses on a small random sample of a sequence dataset is a good idea – find and fix problems quickly How to randomly sample the same reads from a set of paired- end files? A one-line command is saved on the course website to do this. time paste file1.fastq file2.fastq |awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | shuf | head - 2000000 | sed 's/\t\t/\n/g' | awk '{print $1 > "file1.fastq"; print $2 > "file2.fastq"}‘ Let’s look at this step by step
11
time this tells the system to display the time required to execute the command paste Bigfile1.fastq Bigfile2.fastq | this joins two files of paired-end sequence reads as tab-delimited columns, line by line – the files should have the same number of lines, with reads in the same order in both files awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | this uses the ‘awk’ program to convert the four lines of FASTQ format to tab-separated fields on a single line per sequence record shuf | this utility sorts lines in a file into a random order head -2000000 | this utility takes the first 2 million lines of the re-ordered file sed 's/\t\t/\n/g' | this uses the ‘sed’ stream editor to convert the tab delimiters back into new-line characters to restore the 4-line FASTQ format awk '{print $1 > “Subfile1.fastq"; print $2 > “Subfile2.fastq"}' this uses ‘awk’ to split the two tab-delimited columns back into two separate files Command-line example
12
How do you come up with this stuff?
14
Someone else has probably had this problem
15
Search for help on SeqAnswers or StackExchange http://biostar.stackexchange.com/ The Bioinformatics Forum on SeqAnswers: http://seqanswers.com/forums/forumdisplay.php?f=18
16
SolexaQA.pl This Perl script assumes that header lines of sequence files are written in one of several formats The code uses regular expressions to sort out formats: if( $line =~ /\S+\s\S+/ ){# Cassava 1.8 variant if( $line =~ /^@[\d\w\-\._]+:[\d\w]+:[\d\w]+:[\d\w]+:(\d+)/ ){ $number_of_tiles = $1 + 1;# Sequence Read Archive variant }elsif( $line =~ /^@[\d\w\-\._\s]+:[\d\w]+:(\d+)/ ){ $number_of_tiles = $1 + 1;}# All other variants }elsif( $line =~ /^@[\d\w\-:\._]*:+\d*:(\d*):[\.\d]+:[\.\/\#\d\w]+$/ ){ $number_of_tiles = $1 + 1;}
17
Alternate Formats This Perl script assumes that header lines of sequence files are written in one of several formats The code uses regular expressions to sort out formats: if( $line =~ /\S+\s\S+/ ){# Cassava 1.8 variant – does the header line contain a space surrounded by non-space characters? @EAS139:136:FC706VJ:2:2104:15343:197393_1:Y:18:ATCACG $line =~ /^@[\d\w\-\._]+:[\d\w]+:[\d\w]+:[\d\w]+:(\d+)/ ) # NCBI SRA variant – does the header line contain a string with –, _,or. before the first colon? @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
18
SolexaQA.pl $line =~ /^@[\d\w\-\._\s]+:[\d\w]+:(\d+)/ ) # Two other variants – 1.does first field contain –,., or _ followed by two more colon- delimited fields? $line =~ /^@[\d\w\-:\._]*:+\d*:(\d*):[\.\d]+:[\.\/\#\d\w]+$/ ) 2.does first field contain –,., :, or _ followed by four colon-delimited fields, followed by., /, or # at the end of the line? Example header line from GSL sequence file: @3:1:1006:20321:Y This would be described by $line =~ /^@\d+:\d+:\d+:\d+:[YN]/
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.