Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to.

Similar presentations


Presentation on theme: "Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to."— Presentation transcript:

1 Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to search for a human gene of interest in a plant model. Evaluate the significance of your search results to see how similar human and plant genes might be. 1

2 The Arabidopsis Information Portal is funded by a grant from the National Science Foundation (#DBI-1262414) and co-funded by a grant from the Biotechnology and Biological Sciences Research Council (BB/L027151/1). These lessons were developed during the summer of 2015 as education outreach for the www.Araport.org portal in conjunction with the J. Craig Venter Institute, Rockville, MD, 20850, USA.www.Araport.org Contact information General information: araport@jcvi.orgaraport@jcvi.org Jason Miller, Grant Co-Principal Investigator, JCVI jmiller@jcvi.org This lesson was prepared by Andrea Cobb, Ph.D. (adcobb@fcps.edu)adcobb@fcps.edu with the help of Margot Goldberg (mgoldberg1@pghboe.net)mgoldberg1@pghboe.net

3 The images below are all examples of….? 3

4 What science models do you recall? Lipid bilayer model Lock and key model of enzymes Stickleback model of evolution Computer models Experimental model of osmosis 4

5 Why use models instead of the “real thing”? To simplify a complex system Example: Study an enzyme reaction in a test tube rather than in the whole organism which contains many enzymes. To better manipulate and measure an effect Example: Treat Drosophila with drug X and measure the drug’s effect on Drosophila life span. To predict (test the model) Example: Use a computer model to find protein coding regions in the DNA of a newly sequenced genome. Other ideas? 5

6 Thanks for volunteering for our study. Your chart says you have problems eating, facial weakness and overall poor muscle tone. Looks like your mother had the same symptoms. Your diagnosis is nemaline myopathy. I am sad to tell you that no known treatment exists, but my researchers and I are working hard to find a treatment. You can find information on this genetic disorder in a website called Online Mendelian Inheritance in Man http://www.OMIM.orghttp://www.OMIM.org The OMIM database shows that you might have a mutation in your Actin alpha 1 gene. We won’t experiment on you! It is much faster, kinder and less expensive to use a plant model. Thanks for your help, Doctor!

7 https://www.youtube.c om/watch?v=foHiKrlY9 Qchttps://www.youtube.c om/watch?v=foHiKrlY9 Qc explains why scientists use a certain plant for a model 7 Which plant will you use to study a version of my actin alpha 1 (ACTA1) gene?

8 https://www.arabidopsis.org/portals/education /aboutarabidopsis.jsp 8

9 Can plants really be used as models for studying human diseases? 9

10 Xiang Ming Zu and Simon Geir Molier, Current Opinions in Biotechnology, 2011, 22, 300-307. 10

11 http://www.bbc.co.uk/progra mmes/p00lx6cl http://www.bbc.co.uk/progra mmes/p00lx6cl https://www.youtube.com/w atch?v=eDA8rmUP5ZM https://www.youtube.com/w atch?v=eDA8rmUP5ZM http://aboutlifting.com/music-helps-plants- grow-and-will-help-muscles-grow/ 11 Before we find out whether plants have human muscle genes, it would be important to know if plants move!

12 12 Why don’t you rest? I am going to search the OMIM database to find out more about your possible gene mutation. Use your computer and go to: http://www.OMIM.org http://www.OMIM.org and find out more about nemaline myopathy and the ACTA1 gene that may be involved. After you answer questions on your handout, type in any human disease that interests you and examine the results.

13 Use your computer to find: http://www.OMIM.org and learn more about nemaline myopathy and the ACTA1 gene that may be involved. http://www.OMIM.org After you answer questions on your handout, search for any human disease that interests you and examine the information. 13

14 Use your textbook, open access textbooks, videos and databases to begin to find information about muscle genes and proteins. https://www.boundless.com/biology/ 14

15 Usually, a general search engine will give you too many hits for the question below! 15

16 108 results Even a broad scientific database may provide too many unrelated hits! Why are there SO MANY results? 16

17 “BIG DATA” Biologists are increasingly able to quickly generate enormous amounts of data but their data analysis may take weeks or even years. Data transfer protocols are not interchangeable, data storage is expensive, queries can crash! https://en.wikipedia.org/wiki/List_of_RNAs 17

18 What scientific approach finds better information? Bioinformatics is an interdisciplinary approach which uses computational, mathematical, and engineering methods to analyze and make discoveries from enormous data sets. 18

19 To address the problem of BIG DATA, scientists can share data and analysis with other scientists. This speeds analysis and adds expertise. Scientists can share their data in research- specific portals. These research-specific portals usually have customized bioinformatics tools. 19

20 A few examples of how bioinformatics is used…. UseQuestions addressed: Basic researchHow is DNA organized in chromosomes? Are genes related to other genes? Given sequence data, how do we find a gene? How are genes expressed in response to the environment? BiomedicineWill this drug work on this patient? Can we cure genetic diseases? Which genetic variations are associated with heart disease? Which pathogen proteins are best for vaccine development? MicrobiologyCan microbes remove pollution? Can microbes decrease the impact of climate change? Where did a disease originate? AgricultureCan drought resistant plants be identified, bred or engineered? Can insect resistant plants improve food supplies? Can more healthful food sources be developed? UseQuestions addressed: Basic researchHow is DNA organized in chromosomes? Are genes related to other genes? Given sequence data, how do we find a gene? How are genes expressed in response to the environment? BiomedicineWill this drug work on this patient? Can we cure genetic diseases? Which genetic variations are associated with heart disease? Which pathogen proteins are best for vaccine development? MicrobiologyCan microbes remove pollution? Can microbes decrease the impact of climate change? Where did a disease originate? AgricultureCan drought resistant plants be identified, bred or engineered? Can insect resistant plants improve food supplies? Can more healthful food sources be developed? 20

21 Scientists are more likely to find useful information in bioinformatics portals that support their particular research. 21

22 National Center for Biotechnology Information http://www.ncbi.nlm. nih.gov/gene Araport https://www.araport.org/ An example of increasingly more specific research-centered portals 22 http://www.phytosystems.ulg.ac.be/florid/ FLOR-ID

23 23 For our plant model to be useful for my research, I must find a similar plant version of the ACTA1 gene involved in nemaline myopathy. Since plants and animals both move, do they use the same types of proteins to move? Do they have the same genes coding for these proteins?

24 Begin your search on the NCBI portal to find names of human muscle genes. Use http://www.ncbi.nlm.nih.gov/ and enter information shown, use the pull- down menu to select Gene. (Note: Araport.org and similar genome browsers will also allow you to search for genes and proteins of interest.)http://www.ncbi.nlm.nih.gov/ 24

25 Could plant and animal versions of this gene have a function in common? 25

26 Actin subunits self-assemble to form filaments which have a role in cell structure. Check the “Inner Life of the Cell” video. https://www.youtube.com/watch?v=FzcTgrxMzZk (2:20 until 3:15) https://www.youtube.com/watch?v=FzcTgrxMzZk (2:20 until 3:15) https://www.youtube.com /watch?v=VVgXDW_8O4U is a video showing polymerization of G-actin, a protein similar to Alpha Actin. This is how your actin should work. 26

27 Click on FASTA to obtain the human ACTA1 gene sequence. If it is reasonable that plants might have a gene similar to human ACTA1, you will need to find the ACTA1 gene sequence. 27

28 Copy, then paste the ACTA1 gene sequence to a new Word document or clipboard—we will use this to look for an Arabidopsis thaliana version of this gene. Save the Word document as “human ACTA1 DNA sequence”. 28

29 I want to search for a version of the human ACTA1 gene in Arabidopsis thaliana. What bioinformatics tool could I use? 29

30 30

31 BLAST Types BLASTn compares 2 or more DNA sequences BLASTp compares 2 or more protein sequences BLASTX reads a DNA sequence in the 6 possible reading frames then compares it to a protein sequence database tBLASTX compares 2 or more DNA sequence translated in 6 reading frames 31

32 32

33 http://www.ncbi.nlm.nih.gov / There are several ways to access NCBI BLAST. Start at the URL and page, then select BLAST. http://blast.ncbi.nlm.nih.gov/Blast.cgi Or just go to the BLAST page URL below. Select nucleotide blast If I have a known DNA sequence, how can I use BLASTn to look for an unknown similar sequence? 33

34 Click on FASTA to obtain the human ACTA1 gene sequence. You found a human gene to compare… 34

35 And you’ve already copied and pasted the ACTA1 gene sequence to a Word document or clipboard—we will use this to look for an Arabidopsis thaliana version of this gene. 35

36 Steps to use Blastn Paste in your copied ACTA1 sequence Enter the name of the organism in which we are looking for the same gene (Arabidopsis thaliana) Select the program –use “Somewhat similar sequences” for the broadest search #4 push blast button Check “show results” in a new window, then click on BLAST 36

37 What information is provided in an NCBI BLASTn report? The Graphics Section shows the query sequence in the red bar (green arrow) and aligned sequences are shown in colored tracks below. Each “track” represents a sequence that the BLASTn tool discovered in the database that is similar to your query sequence. The colored sections in each track are blocks of DNA which align with varying similarity (score), shown by the colored bar above. The black lines connecting the colored blocks are poorly aligned sequences (less than 40% identity). Move the mouse over a block to see the definition and score for that sequence result (also called “hit”). By clicking on a colored box, you will jump to the actual DNA alignment farther down the page. 37

38 38 What information is provided in an NCBI BLASTn report? The Descriptions Section lists the aligned sequence names and provides information about the alignment. In this search, we are using one gene sequence to find a similar gene sequence. Look at the results that end in “gene”.

39 What is gene alignment? What BLASTn values tell us whether the alignment is meaningful? 39

40 40 https://www.youtube.com/watch?v=6Udqou3vmng Go to 31:13-40:15 for a more detailed explanation of alignment. Query Subject (database used for search) Starting and ending nucleotides of your query Starting and ending nucleotide coordinates for this sequence in its database

41 41 BLASTn seeks to maximize the score for aligning shorter stretches of Query compared to the database. Alignment of the entire query is not required by Local alignment. Matching nucleotides are given a score of +1 and mismatches are negative. There are penalties for gaps. There are different algorithms, but this is the general idea.

42 42

43 “Query cover” tells what percentage of the alignment is a good match to your input sequence (query). Note that the query is more than 2750 nucleotides long. 43

44 The query coverage is low here (20%) because you are comparing 2 DNA sequences which contain exons (conserved, thus aligned) and introns (not highly conserved, thus non-aligned or poorly aligned. 44

45 Although only 20% of the query aligns to a sequence in the Arabidopsis database, 80% of the aligned part is identical to the query (see the “Ident” value of 80% and the color-coded portions of the result track. ) 45

46 “Alignments” provides details about nucleotide locations, matches, gaps or mismatches. Access more info about the sequence by clicking on the sequence ID 46

47 The E-value indicates the number of alignments with an equivalent or better score from this database that would be expected just by chance. For example, a one-in- a million (1/1,000,000) chance is a very small chance and would be written 1e -6. The lower the E-value, the more significant the score (less likely due just to chance). E-values are in scientific notation, ex: 3e-80 = 3 x 10 -80 47 In general, an E-value of 1X10 -5 or smaller is considered significant (not just aligned by chance).

48 48 This is from the Alignments Section and shows the details

49 Results are arranged in a default setting from lowest E-value to highest. Compare the E-value, Query cover and % identity for the checked “hits”. Which GENE is most similar to the human ACTA1 sequence query? Click on the accession number for more information about the gene that had the most significant alignment 49

50 50 Amino acid sequence Link for more info!

51 51

52 What information did you use to indicate that the plant version was a meaningful find? 52

53 1.Pick a human gene which you think is highly conserved between plants and animals. 2.Follow the procedure you just learned to see if a similar Arabidopsis version exists. 3.Record your info on the scorecard. 4.Repeat for a gene that you predict is unique to humans. 53

54 Human Gene Name Human Gene ID Human Gene Function Arabid opsis Gene Name Arabid opsis Gene ID Arabidopsis Gene Function Out-come evidence : Score, E-value, Similar Function, Predic- tion? Actin alpha 1 ACTA1Cytoskele tal structure ACT7Actin 7Cytoskeletal structure E value was 1e-80, not random, both have similar functions…. Yes Gene Discovery Scorecard 54

55 What information so far indicates whether or not plants have animal muscle genes? What additional information might you need to be more certain whether ACT7 is a plant version of human ACTA1? 55


Download ppt "Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to."

Similar presentations


Ads by Google