2nd Texas A&M Big Data Workshop Development of “Big Data” Scientific Workflow Management Tools for the Materials Genome Initiative: “Materials Galaxy”

Slides:



Advertisements
Similar presentations
Copyright © 2008, SAS Institute Inc. All rights reserved. Discovering Meaningful Patterns in Genomics Data with JMP Genomics Jordan Hiller JMP Genomics.
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Development on Nordic platform for sensitive biomedical data The Tryggve project Antti Pursula.
Information on GVL - Genomics Virtual Laboratory Oct 2013 Audience: Service Desk Developed as part of the Australian.
JUNE 2007 page 1 EDS Proprietary Applications Modernization Services Modernizing the Applications Portfolio.
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
Web Accessible Virtual Research Environment for Ecosystem Science Community Presentation by Siddeswara Guru.
The Future of the Information Professional Dr. Mark Burfoot April 16 th 2013 SLA Pharmaceutical & Health Technology Division.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Library 2.0 Vision and Trends In the Campus Library.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
Managing Software Projects in Spatial Hypertext : Experiences in Dogfooding Frank Shipman Department of Computer Science & Center for the Study of Digital.
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
Integrative and Comparative Biology 2009 C. Schwenk, D.K. Padilla, G.S. Bakken, R.J. Full.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
Title Example 1 Presenter Name. Systems Approach Framework 1 Systems Theory is about understanding complex and large-scale interactions based on our perceptions.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
TEXT MINING IN BIOMEDICAL RESEARCH QI LI 03/28/14.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Présentation EPFL-Public | Ecole Polytechnique Fédérale de Lausanne EPFL.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Discovery Environments Susan L. Graham Chief Computer Scientist Peter.
Tsinghua University Service-Oriented Enterprise Coordination Prof. Dr. Yushun Fan Department of Automation, Tsinghua University,
E-BIOGENOUEST: A REGIONAL LIFE SCIENCES INITIATIVE FOR DATA INTEGRATION Datacite Annual Conference Nancy Olivier Collin – IRISA/INRIA
US NITRD LSN-MAGIC Coordinating Team – Organization and Goals Richard Carlson NGNS Program Manager, Research Division, Office of Advanced Scientific Computing.
 The institute started in 1989 as a UNDP funded project called the National Agricultural Genetic Engineering Laboratory (NAGEL).  The Agricultural.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Computational Scientometrics Studying science by scientific means Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
NIH Big Data to Knowledge (BD2K) March 4, 2014 Peter Lyster National Institute of General Medical Sciences (NIGMS) NIH.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
NGS data analysis CCM Seminar series Michael Liang:
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
| nectar.org.au NECTAR TRAINING Module 3 Common use cases.
-- Don Preuss NCBI/NLM/NIH
Strengthening Delaware’s Education System Enhancing Science Education with the Next Generation Science Standards, focusing on college & career readiness.
Libraries and Data Curation Services, the best thing since… UC3 Data Curation Workshop Trisha Cruse.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Nursing Informatics NI.
The User Perspective Michelle Osmond. The Research Challenge Molecular biology, biochemistry, plant biology, genetics, toxicology, chemistry, and more.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
NATIONAL TREASURES DATA PRESERVATION WITH METADATA Sharon Shin Metadata Coordinator Federal Geographic Data Committee Secretariat ASPRS-Reno 2006.
Impact of the New ASA Undergraduate Curriculum Guidelines on the Hiring of Future Undergraduates Robert Vierkant Mayo Clinic, Rochester, MN.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
A Reference Model for RDA & Global Data Science Yin ChenWouter Los Cardiff University University of Amsterdam 1.
Open Science Framework Jeffrey Spies University of Virginia.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Kathleen Shearer Data management: The new frontier for libraries.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Galaxy for analyzing genome data Hardison October 05, 2010
Engineering (Richard D. Braatz and Umberto Ravaioli)
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
Jarek Nabrzyski Director, Center for Research Computing
A cloud platform for interactive reproducible computational experiments Siddeswara Guru Data Science Director.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Presentation transcript:

2nd Texas A&M Big Data Workshop Development of “Big Data” Scientific Workflow Management Tools for the Materials Genome Initiative: “Materials Galaxy” Rodolfo Aramayo Department of Biology College of Sciences

Dr. Rodolfo Aramayo Ricardo Perez Department of Biology Dr. Raymundo Arroyave Dr. Ibrahim Karaman Daniel Sauceda Dr. Anjana Talapatra Nayan chaudhary Ramaranjan Ruj Vinay Akula Department of Materials Science and Engineering Dr. Ricardo Gutierrez-Osuna Department of Computer Science and Engineering

The Problem… When data is generated faster than it can be processed we have an "information crisis” This crisis is not only associated with the lack of hardware/software infrastructure to transform data into knowledge but also with two major informatics-related needs: –I. Accessibility: It is not uncommon to find scientists unable to process information due to their lack of programming and/or informatics expertise –II. Reproducibility: Lack of robust frameworks to ensure reproducibility has been identified as a major issue in the scientific enterprise Reproducibility in INFORMATICS is a major challenge as the generation of knowledge out of data involves highly complex analysis workflows

The Problem = Opportunity This “Problem” will hit Materials Sciences hard, since this field is undergoing a major transformation into being “Big- Data” centric –This is particularly true since the launch of the Materials Genome Initiative (MGI) in 2011 –The Materials data infrastructure is undergoing active development –Materials Sciences is expected to ramp-up from 0% to 100% Big- Data in few years This is a tremendous opportunity for us to become leaders in this emerging field

Our Objective… To establish TAMU as a leading center for Materials and Materials Informatics How are we going to do that? –By developing a series of computational tools designed to collect, store and analyze "Big Data" from diverse sources –By adapting and porting Informatic Tools from other, more developed areas into Materials Sciences We propose to start adapting “Galaxy”, a complex web-based system originally developed for Genomics applications for Materials Informatics

What Is Galaxy? (Definition) Galaxy is an open source, web-based platform for accessible, reproducible, and transparent computational biomedical researchGalaxy –Accessible: Users without programming experience can easily specify parameters and run tools and workflows –Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis –Transparent: Users share and publish analyses via the web and create Pages, interactive, web-based documents that describe a complete analysis Source: Galaxy Wiki:

Galaxy’s Internals

Galaxy’s Interface

Hillman-jackson et al Using Galaxy to Perform Large-Scale Interactive Data Analyses

Galaxy’s Internals

Hillman-jackson et al Using Galaxy to Perform Large-Scale Interactive Data Analyses

Galaxy’s Internals

Hillman-jackson et al Using Galaxy to Perform Large-Scale Interactive Data Analyses

Galaxy Interface Hillman-jackson et al Using Galaxy to Perform Large-Scale Interactive Data Analyses

Galaxy’s Internals

Blankenberg et al Making whole genome multiple alignments usable for biologists

Galaxy Runs on “Ada” (“Reveille”)