Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)

2 Background 2 Preparation for this class

3 We talk about 3 Terminology

4 4 Synchronization

5 Linux 5 Difference to UNIX

6 UNIX To put it very generically, Linux is an operating system kernel, and UNIX is a certification for operating systems. The UNIX standard evolved from the original Unix system developed at Bell Labs (1969). After Unix System V, it ceased to be developed as a single operating system, and was instead developed by various competing companies, such as Solaris (from Sun Microsystems), AIX (from IBM), HP-UX (from Hewlett-Packard), and IRIX (from Silicon Graphics). UNIX is a specification for baseline interoperability between these systems, even though there are many major architectural differences between them. 6

7 Linux was born out of the desire to create a free software alternative to the commercial UNIX environments. Its history dates back to 1991, or further back to 1983, when the GNU project, whose original aims where to provide a free alternative to UNIX, was introduced. Linux has never been certified as being a version of UNIX, so it is described as being Unix-like. 7

8 UNIX History 1960smultics project (MIT, GE, AT&T) 1970sAT&T Bell Labs 1970s/80sUC Berkeley 1980sDOS imitated many Unix ideas Commercial Unix fragmentation GNU Project 1990sLinux nowUnix is widespread and available from many sources, both free and commercial 8

9 9

10 UNIX Flavors Sun's Solaris, Hewlett-Packard's HP-UX, and IBM's AIX® are all flavors of UNIX that have their own unique elements and foundations. Windows has two main lines. The older flavors are referred to as "Win9x" and consist of Windows 95, 98, 98SE and Me. The newer flavors are referred to as "NT class" and consist of Windows NT, 2000, XP, Vista, and 7. Microsoft no longer supports Windows NT, all the 9x versions. The flavors of Linux are referred to as distributions (or "distros"). 10

11 Linux Distributions All the Linux distributions released around the same time frame will use the same kernel. They differ in the –add-on software –GUI –install process –price –documentation –technical support All the flavors of Windows come from Microsoft, the various distributions of Linux come from different companies/vendors such ass Linspire, Red Hat, SuSE, Ubuntu, Xandros, Knoppix, Slackware, Lycoris, and so on. 11

12 UNIX Philosophy Multiuser / Multitasking Flexibility / Freedom Everything is a file File system has places, processes have life Designed by programmers for programmers 12

13 UNIX Structure Programs Kernel Hardware 13

14 UNIX The File System 14

15 UNIX Programs Shell is the command line interpreter Shell is just another program A program or command –interacts with the kernel –may be any of: built-in shell command interpreted script compiled object code file 15

16 Any Questions? 16

17 Vs. Windows 17 Which is better? Of course, this is a open question.

18 Terminology 18 Operating System

19 Vs. Windows 19 To you, are Linux and Windows the same thing? Or, Linux is an platform for only specific usage?

20 Terminology 20 Terminal

21 21 What is inside the terminal?

22 22

23 23 Yes, Remote Desktop, is a terminal

24 24 Similar to anything you use to access BBS, conceptually

25 Getting Started 25

26 Youre welcome to 26 Interrupt me, anytime!

27 Getting Started Logging In Login and password prompt to log in –login is users unique name –password is changeable; known only to user, not to system staff Unix is case sensitive –issued login and password (usually in lower case) 27

28 Getting Started Passwords Do: –make sure nobody is looking over your shoulder when you are entering your password –change your password often –choose a password you can remember –use eight characters, more on some systems –use a mixture of character types – include punctuation and other symbols 28

29 Getting Started Passwords Dont: –use a word (or words) in any language –use a proper name –use information in your wallet –use information commonly known about you –use control characters –write your password anywhere –EVER give your password to anybody Your password is your account security: –To change your password, use the passwd command –Change your initial password immediately 29

30 Getting Started Unix Command Line Structure A command is a program that tells the Unix system to do something. It has the form: command options arguments –Whitespace separates parts of the command line –An argument indicates on what the command is to perform its action –An option modifies the command, usually starts with - 30

31 Getting Started Getting Help Not all Unix commands will follow the same standards Options and syntax for a command are listed in the man page for the command man: On-line manual –$ man command –$ man -k keyword 31

32 Getting Started Directory Navigation pwdprint working directory cdchange working directory (go to directory) mkdirmake a directory rmdirremove directory lslist directory contents 32

33 Getting Started Permissions Each line (when using -l option of ls) includes the following: –type field (rst character) –access permissions (characters 2–10): –first 3: user/owner –second 3: assigned unix group –last 3: others Permissions are designated: –rread permission –wwrite permission –xexecute permission –-no permission 33

34 Getting Started File Maintenance Commands chmodchange the file or directory access permission chgrpchange the group of the file chownchange the owner of a file rmremove (delete) a file cpcopy file mvmove (or rename) file chmod [options] file –Using + and - with a single letter: u user owning file g those in assigned group o others –$ chmod u+w file # gives the user (owner) write permission –$ chmod g+r file # gives the group read permission –$ chmod o-x file # removes execute permission for others 34

35 chmod [options] file –using numeric representations for permissions: r=4 w=2 x=1 –$ chmod 777 file gives user, group, and others r, w, x permissions –$ chmod 750 file gives the user read, write, execute gives group members read, execute gives others no permissions 35

36 Getting Started Display Commands echoecho the text string to stdout catconcatenate (list) headdisplay first -n lines of file taildisplay last -n lines of file Useful in pipe 36

37 Any Questions? 37

38 Getting Started System Resource Commands dfreport file system disk space usage duestimate file space usage psshow status of processes (options vary from system to system see the man pages) killterminate a process whereisreport program locations whichreport the command found hostnamereports the name of the machine the user is logged into unamehas additional options to print info about system hardware and software dateprint or set the system date and time 38

39 Getting Started More Fun with Files ln link to another file –symbolic link (soft link) $ ln -s source target A symbolic link is used to create a new path to another file or directory. Useful when the target file has versions. –hard link $ ln source target A hard link creates a new directory entry pointing to the same inode as the original file. The file will not be deleted until all the hard links to it are removed. –Very different when you delete the original file. 39

40 sort sort file contents uniq remove duplicate lines file file type tr translate characters –$ tr [a-z] [A-Z] file nd find files –$ find. -name ay –$ find. -newer empty –$ find. -type d –print gzip compression –often use.gz extension tar archive les –use.tar extension –use.tgz extension when combining gzip wc word count 40

41 Any Questions? 41

42 Shells 42

43 Shells The shell sits between you and the operating system –acts as a command interpreter –reads input –translates commands into actions to be taken by the system To see what your current login shell is: –$ echo $SHELL 43

44 Shells Basic Shells Bourne Shell (sh) –good features for I/O control often used for scripts –other shells based on Bourne may be suited for interactive users –default prompt is $ C Shell (csh) –uses C-like syntax for scripting –I/O more awkward than Bourne shell –job control –history –default prompt is % –uses ˜ symbol to indicate a home directory (users or others) 44

45 Shells Other Shells Based on the Bourne Shell: –Korn (ksh) –Bourne-Again Shell (bash) job control history uses ˜ symbol to indicate a home directory (users or others) –Z Shell (zsh) Based on the C Shell: –T-C shell (tcsh) 45

46 Shells Built-in Shell Commands The shells have a number of built-in commands: –executed directly by the shell –dont have to call another program to be run –different for the different shells –cd, echo, exit, for, if, pwd, … 46

47 Shells Environment Variables Environmental variables are used to provide information to the programs you use. Global environment variables are set by your login shell and new programs and shells inherit the environment of their parent shell. –GROUPyour login group, e.g. staff –HOMEpath to your home directory, e.g. /home/frank –HOSTthe hostname of your system, e.g. nyssa –PATHpaths to be searched for commands, e.g. /usr/bin:/usr/ucb:/usr/local/bin –SHELLthe login shell youre using, e.g. /usr/bin/csh –USERYour username, e.g. frank 47

48 Any Questions? 48

49 49 Now, we are more familiar with this penguin

50 50

51 Linux Vs. Windows Interface Kernel/GUI-Based Target Users Business Pirate Copy Open Source Popularity Users Habits Support Developers Drivers/Games/Virus 51

52 Linux Vs. Windows History Linux was originally built by Linus Torvalds at the University of Helsinki in 1991. Linux is a Unix-like, kernel-based, fully memory- protected, multitasking operating system. It runs on a wide range of hardware from PCs to Macs. First version of Windows Windows 3.1 released in 1992 by Microsoft. Windows is a GUI-based operating system. It has powerful networking capabilities, is multitasking, and extremely user friendly. 52

53 Linux Vs. Windows Functionalities Linux seems to be more reliable, flexible and generous. Ironically, even Linux is open source, it falls short in the number of different applications available for it. Windows seems to be less mature (at first) in most measures of evaluating a good OS. However, it proves that the appearance is more important than everything. Crucial but real. 53

54 54 Of course, this guy is probably the most successful sale ever

55 55 He helped many biomedical related researches

56 56 As time goes by

57 57 Linux has many partners

58 Linux Vs. Windows Things Changed Linux has much improved UI –To me, the installation procedure of some distributions seems easier than Windows Windows keeps strengthening the ability of being a good OS, no matter what the reason is –For example, Microsoft improved IE to eliminate Netscape (it succeeded at IE3). Again, Microsoft wants to do it against Firefox now. Both IE7 and 8 failed. But who knows? Although the functionality difference is decreasing, the popularity difference is increasing. –Habit (this is even critical in search engine war) –Support (the hateful Windows update) –Is the flexibility of Linux an advantage? 58

59 59 Which distribution? (probably scared many beginners)

60 60 Ubuntu

61 61

62 Ubuntu Ubuntu is based on the Debian distribution (good package management). It is named after the Southern African ethical ideology Ubuntu (humanity towards others). Ubuntu provides an up-to-date, stable operating system for the average user, with a strong focus on usability and ease of installation. Web statistics from late 2009 suggest that Ubuntu's share is between 40 and 50%. Ubuntu is sponsored by the UK-based company Canonical Ltd., owned by South African entrepreneur Mark Shuttleworth. By keeping Ubuntu free and open source, Canonical is able to utilize the talents of community developers in Ubuntu's constituent components. Instead of selling Ubuntu for profit, Canonical creates revenue by selling technical support and from creating several services tied to Ubuntu. 62

63 63

64 Mark Shuttleworth Born at 18 September 1973 Founded Thawte in 1995, which specialised in digital certificates and Internet security and then sold it in December 1999, earning about USD 575 million. In September 2000, Shuttleworth formed HBD Venture Capital, a business incubator and venture capital provider. In March 2004 he formed Canonical Ltd., for the promotion and commercial support of free software projects. 64

65 65 There are speeches really valuable, do some homework

66 To Sum Up 66 Ubuntu is as friendly as any version of Windows. Everyone can start to use it without any introduction.

67 67 However, if you choose a dual system, you will never become a master

68 Shell Scripts 68

69 Shell Scripts Similar to DOS batch les Quick and simple programming Text file interpreted by shell, effectively new command List of shell commands to be run sequentially Execute permissions, no special extension necessary Magic first line –#! –Include full path to interpreter (shell) #!/bin/sh 69

70 Shell Scripts Interacting Special variables for processing arguments –$#number of arguments on command line –$0name that script was called as –$1 – $9command line arguments –$@all arguments (separately quoted) –$*all arguments –$?numeric result code of previous command –$$process ID of this running script Interacting With User –Talk to user (or ask questions) first, then get input from user, put it in variable echo prompt read variable 70

71 Shell Scripts Control Structure if [ … ]; then … for variable in … ; do … done Check sh man page for details, also look at examples. #!/bin/sh if [ $# -ge 2 ] then echo $2 elif [ $# -eq 1 ]; then echo $1 else echo No input fi 71

72 Any Questions? 72

73 Can you 73 Use shell script to change filenames from lower- to uppercase? Remember that the wild card symbol * can help you get all files.

74 #!/bin/sh for file in *; do echo "processing $file" mv $file `echo $file | tr '[a-z]' '[A-Z]` done How would you do in Windows? BTW, why Perl? It can be done in one line –$ ls | perl -nle 'my $o=$_; tr/a-z/A-Z/; \ rename $o, $_' How would you do with C? 74

75 Any Questions? 75

76 Code Size Calculator 76 Ina file Outcode size Requirement - input from command line - do not count space characters - do not count comments (C style) - must complete in Unix - if you dont have one, contact me ASAP - using C would be the best Bonus - write a shell script version

77 Deadline 77 2010/3/9 23:59 Zip your code, a step-by-step README of how to execute the code and anything worthy extra credit. Email to

78 gcc 78

79 gcc gcc is the GNU C Compiler, and g++ is the GNU C++ compiler, while cc and CC are the Sun C and C++ compilers also available on Sun workstations. Notice that, C++ is different to C in a certain extent. A safe way is to regard they are two different languages with very similar basic structures. 79

80 gcc Compiling a Simple Program Consider the following example: let hello.c be a file that contains the following C code –#include stdio.h int main() { printf(Hello\n); } The standard way to compile this program is with the command –$ gcc hello.c -o hello This command compiles hello.c into an executable program named hello. It does nothing more than print the word hello on the screen. –$ chmod 755 hello –$./hello 80

81 Alternatively, the above program could be compiled using the following two commands –$ gcc -c hello.c –$ gcc hello.o -o hello The end result is the same, but this two-step method first compiles hello.c into a machine code file named hello.o and then links hello.o with some system libraries to produce the final program hello. In fact the first method also does this two- stage process of compiling and linking, but the stages are done transparently, and the intermediate file hello.o is deleted in the process. 81

82 gcc Frequently Used Options The examples below demonstrate how to use many of the more commonly used options. Some options can be combined, although it is generally not useful to use debugging and optimization options together. Makes the resulted executable contain symbolic information for the gdb debugger –$ gcc -g myprog.c -o myprog Have the compiler generate many warnings about syntactically correct but questionable looking code. It is good practice to always use this option with gcc and g++ –$ gcc -Wall myprog.c -o myprog Generate optimized code. The -O is a capital o and not the number 0! –$ gcc -O myprog.c -o myprog Compile a C program that uses math functions such as sqrt –$ gcc myprog.c -o myprog -lm 82

83 gcc Multiple Source Files If there are multiple source file –$ gcc file1.c file2.c -o myprog Or –$ gcc -c file1.c $ gcc -c file2.c $ gcc file1.o file2.o -o myprog The second one compiles source files separately. If only file1.c was modified –$ gcc -c file1.c $ gcc file1.o file2.o -o myprog Notice that file2.c does not need to be recompiled. –significant time savings when there are numerous source files This process, though somewhat complicated, is generally handled automatically by a makefile. 83

