Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daniel Ence Yandell Lab University of Utah.  Annotations are descriptions of features of the genome  Structural: exons, introns, UTRs, splice forms.

Similar presentations


Presentation on theme: "Daniel Ence Yandell Lab University of Utah.  Annotations are descriptions of features of the genome  Structural: exons, introns, UTRs, splice forms."— Presentation transcript:

1 Daniel Ence Yandell Lab University of Utah

2  Annotations are descriptions of features of the genome  Structural: exons, introns, UTRs, splice forms etc.  Coding & non-coding genes  Annotations should include evidence trail  Assists in quality control of genome annotations  Examples of evidence supporting a structural annotation:  Ab initio gene predictions  ESTs  Protein homology

3  Protein Domains and Families  InterPro  Pfam  GO and other ontologies  Pathways

4

5

6

7 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

8 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

9 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

10 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

11 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

12 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

13 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV SUCCESS

14 >Smg5 MEVTFSSGGSSNASSECAIDGGTNR CRGLEPNNGTCILSQEVKDLYRSLYT ASKQLDDAKRNVQSVGQLFQHEIEEK RSLLVQLCKQIIFKDYQSVGKKVREV MWRRGYYEFIAFV Incorrect annotations poison every experiment that uses them!!

15 MAKER An annotation pipeline and genome-database management tool for “next-generation” genome projects

16 MAKER User Requirements: Can be run by a single individual with little bioinformatics experience

17 MAKER User Requirements: Can be run by a single individual with little bioinformatics experience System Requirements: Can run on Linux or Mac OS X based systems

18 MAKER User Requirements: Can be run by a single individual with little bioinformatics experience System Requirements: Can run on Linux or Mac OS X based systems Program Output: Output is compatible with popular annotation tools like Web- Apollo and JBrowse

19 MAKER User Requirements: Can be run by a single individual with little bioinformatics experience System Requirements: Can run on Linux or Mac OS X based systems Program Output: Output is compatible with popular annotation tools like Web- Apollo and JBrowse Availability:Free for the academic community (including source code)

20 mRNA-seq integration Integrating new evidence into existing databases Update/revise legacy annotation sets

21 Legacy Annotation Set 1Legacy Annotation Set 2Legacy Annotation Set n new data Identify legacy annotation most consistent with new data Automatically revise it in light of new data If no existing annotation, create new one current assembly

22 Legacy Annotation Set 1Legacy Annotation Set 2Legacy Annotation Set n new data Identify legacy annotation most consistent with new data Automatically revise it in light of new data If no existing annotation, create new one current assembly

23 Supports Message Passing Interface (MPI), a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.

24

25

26

27

28

29

30

31

32  MAKER-P

33  Plant

34  MAKER-P  Plant  Parallelized

35  MAKER-P  Plant  Parallelized  Publication

36  Publication: MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations Campbell, Law, Holt et al., Plant Phys. 2013

37  Atmosphere  MPI enabled for parallel computation  Maximum instance size 16 CPU   TACC Lonestar  Supercomputer with 22,656 CPU  MPI enabled for parallel computation  Can complete entire rice genome in ~2 hrs (1,152 cores)  96 CPU per chromosome  Currently being integrated into the iPlant Discovery Environment   XSEDE  https://www.xsede.orghttps://www.xsede.org

38 Performance on Zea maize genome (~ 2Gb)

39  8,640 cpus on TACC  ~37 hours with queue (runtime 14 hours 37 minutes)  Throughput of > 1 Gb/hour

40 Assembly & Annotation at iPlant

41  non-coding RNA support  better repeat annotation  better pseudogene annotation

42  tRNAscan support  Will run from inside MAKER  Doesn’t install automatically  snoScan support  Can supply data file for annotation  Will run from inside automatically  Doesn’t install automatically

43  In the past:  Custom Repeat library  de novo generated RepeatModeler  Now:  RepeatModeler, but better.  Step-by-step guide available at: /index.php/Repeat_Library_Construction--Basic /index.php/Repeat_Library_Construction--Basic  To be automated in the future

44  In the past:  Custom Repeat library  de novo generated RepeatModeler  Now:  RepeatModeler, but better.  Step-by-step guide available at: /Protocol:Pseudogene  To be automated in the future

45  Expanded ncRNA support  MAKER-EVM  Expanded Augustus/bam support  Better integration with iPlant’s Discovery environment

46  More of a feeling than a to-do list  lncRNAs

47 Haas et al., Genome Biology 2008

48 Cantarel et al., 2008; Holt and Yandell, 2010

49 EVM

50  MAKER gives Augustus hints  Augustus can take better hints from a bam file  Users will be able to supply a bam file in the MAKER control file  Bam files open up a world of possibilities!

51 Assembly & Annotation at iPlant

52 Trichmonas vaginalis Pinus taeda Apis dorsata Cronartium quercuum Common Pigeon Cardiocondyla obscurior Southern right whale Tardigrade Spotted Gar Gibbon Turkey 9 spined stickelback Golden Eagle

53 I’d like to thank and recognize all contributions from Mark Yandell at the University of Utah, as well as lab members Barry Moore, Michael Campbell, Daniel Ence, and former lab member Meiyee Law. Special thank you to Scott Cain, Robert Buels, and Amelia Ireland. I would also like to recognize collaborators Ian Korf at UC Davis MAKER-P and integration into iPlant infrastructure: Josh Stein (CSHL) Kevin Childs (MSU) Gaurav Moghe (MSU) David Hufnagel (MSU) Jikai Lei (MSU) Rujira Achawanantakun (MSU) Carolyn Lawrence (USDA-ARS CICGRU) Doreen Ware (CSHL) Shin-Han Shiu (MSU) Yanni Sun (MSU) Ning Jiang (MSU) Matt Vaughn (TACC) Dian Jiao (TACC) Zhenyuan Lu (CSHL) Nirav Merchant (U. Arizona) Pinus taeda genome project: Jill Wegrzyn (UConn) John Liechty (UC Davis) Kristian Stevens (UC Davis) Carol Loopstra (Texas A&M) Hans Vasquez-Gross (UC Davis) Brian Lin (UC Davis) Matt Dougherty (UC Davis) Jacob Zieve (UC Davis) Pedro J Martinez-Garcia (UC Davis) James A Yorke (U. Maryland( Marc Crepeau (UC Davis) Daniela Puiu (Johns Hopkins) Steven L Salzberg (Johh Hopkins) Pieter J. deJong (CHORI-BACPAC Resources Center) Keithanne Mockaitis (Indiana University) Dorrie Main (Washington State) Chuck Langley (UC Davis) David Neale (UC Davis) MAKER-devel community Funding from the NHGRI through an RO1 grant entitled Software for the creation and quality control of genome annotations.

54

55 Mailing List: maker-devel at yandell-lab.org Download: me: dence at genetics.utah.edu


Download ppt "Daniel Ence Yandell Lab University of Utah.  Annotations are descriptions of features of the genome  Structural: exons, introns, UTRs, splice forms."

Similar presentations


Ads by Google