Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Bioinformatics Primer.

Similar presentations


Presentation on theme: "Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Bioinformatics Primer."— Presentation transcript:

1 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Bioinformatics Primer Goal: Introductory skills for bioinformatics analysis. Format: Complete the exercises, ask anything.

2 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – interpro interpro (www.ebi.ac.uk/interpro/)‏ Exercise –for 3 proteins important to your research area (choose 2 well defined, 1 not well defined)‏ –download their protein sequence from www.ncbi.nlm.nih.gov –analyse them using interpro what domains do they contain? what are the functions of these domains? what families do the proteins belong to? –how would you do this on 100 proteins, or 20,000 proteins?

3 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – gene ontologies Gene Ontology database –www.geneontology.org‏www.geneontology.org Exercise –Keep this information saved as you will use it thje following days –1) Define Molecular function Biological process Subcellular location –2) Find GO identifiers that describe functions, processes or locations that are relevant to your research List the identifier, type and description. Should you use identifiers further up or down the hierarchy?

4 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – gene ontologies Exercise continued –3) For 3 proteins relevant to your research What GO terms are assigned to the protein? What evidence is there for the assignments? –4) Describe the difference between the evidence codes. –5) How would you find all proteins with a specific molecular function?

5 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – ArrayExpress/GEO GEO/ArrayExpress –Microarray repository tools containing published microarray data –Note differences in ease of use and completeness! Exercise –Compare GEO and ArrayExpress. –Search for Human stem cell microarray studies –What are the GEO/ArrayExpress identifiers for some recent Stem cell microarray studies? –What data is available? Raw data? Processed data? –Download a CEL file (or set of CEL files) from a stem cell microarray study. –Go to ArrayExpress Atlas Look up at least two genes of interest (in stem cell biology)‏ What does the database tell you?

6 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – Ensembl Exercise –Go to Ensembl. Describe it. –Look up a (human) gene. How many transcript variants does it have? –Explore! –Use BioMart to gather all Ensembl identifiers and Entrez geneIDs for all human and mouse genes, export this data into excel (you will need this later).

7 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – UCSC Exercise –Go to genome.ucsc.edu. –Look up a (human) gene. Select many different gene models – how many transcript variants are found for your gene in UCSC known genes, AceView, Refseq? –Use the table browser to download all human genes (refseq) into excel. –What else of interest can you download?

8 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – R See accompanying worksheet

9 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – command line login Try this on your own laptop Windows command line –windows+R, type “cmd” Cygwin (unix in windows)‏ –open cygwin putty (log into a unix server)‏ –ip address, username, password VMware (virtual machine within windows)‏ –choose a unix virtual machine (i.e. tinyunix)‏ –open a terminal Apple Mac –OS X: open a terminal

10 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – command line Basic command line operations –Directories cd : Change the current directory pwd : get current working directory –Viewing files and directories ls : list the contents of a directory (dir)‏ more : see contents of file on screen, stop after every page less : see contents of file (with better ability to move in the file)‏ cat : see contents of file, don't stop at new page

11 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – command line Basic command line operations –Editing files emacs : open file for editing in emacs –other programs: nedit, vi –Copying and moving files cp : copy file to destination (copy)‏ mv : move (or rename) file to destination (move)‏

12 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – command line Basic command line operations –login and copying ssh / scp : login to server, copy files –viewing parts of files head -#lines : look at first # lines tail -#lines : look at last # lines –pattern matching grep -e “pattern” : find lines in file with “pattern” grep -v “pattern” : find lines in file without “pattern” –counting wc : count words, lines in file

13 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Basic Skills – command line Basic command line operations –> : send results to file more filename > filename2 (send all of filename to filename2)‏ ls > directory_contents.txt –pipe :”|” : send the results forward to another program grep -e “pattern” filename > filename_pattern.txt head -5 filename > filename_pattern.txt

14 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. command line exercises Create a directory, name it after yourself What is the current working directory? Copy “exercise.txt” into your directory Change the working directory to that directory Look at the file with “more” Read the man page for wc with “man wc” What are the first 5 lines? What are the last 3 lines? How many lines contain the word “fish”? (hint you need to use pipe)‏

15 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. command line exercises command line in windows –Test the following in windows command line (open with windows-key + R, then “cmd”)‏ more | (pipe)‏ grep wc sed –Which work, which do not? –How do you find help for a program ​ ? –What is “sed” for?

16 Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Additional resources Plenty of tutorials are available online for R and unix –Unix tutorial for beginers http://www.ee.surrey.ac.uk/Teaching/Unix/ –R http://cran.r-project.org/other-docs.html –Note some are very large (100+ pages)‏


Download ppt "Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2008. Bioinformatics Primer."

Similar presentations


Ads by Google