Call in: 800-593-0616 Participant Passcode: 2927756 Centra: Meeting ID: ICR_WShttp://ncicb.centra.com August 11, 2010 ICR-WS Meeting.

Slides:



Advertisements
Similar presentations
Monthly Webinar Upgrading to caTissue Plus caTissue Training Monthly Webinars Trainer: Srikanth Adiga & Poornima Govindrao |
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
AHRT: The Automated Human Resources Tool BY Roi Ceren Muthukumaran Chandrasekaran.
CX Analytics: Best Practices in Measuring For Success
Bio-IT World April 13, Operating System (Linux) Persistence (MySQL, PostgreSQL) Web Infrastructure (JBoss, Tomcat, Apache) General Applications.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo.
GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
CaTissue Suite 2.0 Scope Detail TBPT Workspace call May 23, 2011.
Call in: Participant Passcode: Centra: Meeting ID: ICR_meetinghttp://ncicb.centra.com April 1, 2009 caArray.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
CaGrid 2.0 December What is caGrid 2.0??? Provides a patch for caGrid 1.x to support SHA2 OSGi implementation of WSRF on the new technical stack.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
Deliverable Readiness Review LexEVS 5.1 December 17, 2009.
December 2006 National Cancer Imaging Archive (NCIA) October 11, 2007.
Event-Based Model for Reconciling Digital Entries Thesis Proposal Ahmet Fatih Mustacoglu 10/3/20151Ahmet.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
CaBench-to-Bedside (caB2B) A caGrid TM client to facilitate translational research Key Stakeholders Involved: Developer Washington University Persistent.
TCGA The Cancer Genome Atlas Project January 24, 2008.
Call in: Participant Passcode: Centra: Meeting ID: ICR_meetinghttp://ncicb.centra.com October 1, 2008 caArray.
GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
The National Biomedical Imaging Archive (NBIA) In Action: An Introduction for Users A Tool Demonstration from caBIG® Presented by: Eliot Siegel, MD Maryland.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
CaNanoLab Users Group February 2012 Use of Informatics to Expedite and Validate the Application of Nanotechnology in Biomedicine.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
© 2006 DTP PMC; made available under the EPL v1.0 | July 12, 2006 | DTP Enablement Project Creation Review Creation Review: Eclipse Data Tools Platform.
CaBench-to-Bedside (caB2B) An easy to use tool for searching across the caGrid Mukesh Sharma Washington University School of Medicine.
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
CaIntegrator2 – Part 1: Create a Study with Clinical Data Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
CaArray User Community Meeting Release Demonstration Call in: Participant Passcode: Centra: Meeting.
A collaborative tool for sequence annotation. Contact:
Introduction to caIntegrator caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
What is NCIA? National Cancer Imaging Archive Searchable repository of in vivo cancer images in DICOM format Publicly available at no cost over the Internet.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
CaArray User Community Meeting Feature Overview and Review of MAGE-TAB Update and Export Specification Call in: Participant Passcode:
A Technical Overview Bill Branan DuraCloud Technical Lead.
Genome STRiP ASHG Workshop demo materials
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
Flight is a SaaS Solution that Accelerates the Secure Transfer of Large Files and Data Sets Into and Out of Microsoft Azure Blob Storage MICROSOFT AZURE.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
EJB Enterprise Java Beans JAVA Enterprise Edition
CaTissue Suite 1.2 TPBT Face to Face Michelle Lee, MBA, Ph.D. Ian Fore, D. Phil. December, 2009.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
ArrayExpress Ugis Sarkans EMBL - EBI
CaTissue Suite 1.2 TPBT Face to Face Michelle Lee, MBA, Ph.D. Ian Fore, D. Phil. December, 2009.
Overview and Demo of CaIntegrator2 A Tool for Publishing and Analyzing Integrated Study Data.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
CaNanoLab Users Group April 2012 Use of Informatics to Expedite and Validate the Application of Nanotechnology in Biomedicine.
Sharing Maps and Layers to Portal for ArcGIS Melanie Summers, Tom Shippee, Ty Fitzpatrick.
Presentation transcript:

Call in: Participant Passcode: Centra: Meeting ID: ICR_WShttp://ncicb.centra.com August 11, 2010 ICR-WS Meeting caArray and 2.5.0

Outline Overview of caArray caArray – upcoming release New data parsers caArray – next major release Improve import of very large datasets and include support for next gen sequencing experiments Avenues for Feedback

caArray Overview Manage data/annotations throughout the life of an experiment Collaborate & share pre-publication data with partners Control access at the experiment or sample level Install locally or use the central NCI instance

Import and Export in caArray Import data and annotations using MAGE-TAB Associate data files to samples Annotate the experiment and samples using controlled vocabularies Specify protocols used to process samples and data Export data and annotations Export into MAGE-TAB Export into SOFT format for subsequent GEO submission

Data in caArray Native data Native files from various platforms and providers can be stored and associated to samples. E.g., Affymetrix, Agilent, Illumina, Nimblegen, Genepix, ImaGene Parsed data In addition, caArray parses a subset of file types so that the data can be pulled by analytical clients using programmatic APIs. E.g., retrieve signal values from Affymetrix CHP file.

Programmatic APIs and the Grid Programmatic APIs allow… search and retrieval of annotations and native data files retrieval of parsed data that can be passed on to analysis applications Grid API and Java-only API Grid API: Retrieve public data across caArray installations on caGrid Java API: Retrieve public or private data

Clients of caArray Clients consuming caArray data via APIs include… GenePattern geWorkbench caIntegrator2 caB2B Taverna workflows

caArray – upcoming release Timeline: expected release this month (August) Scope: New Data Parsers Agilent: GEML/xml array designs (aCGH, gene expression and miRNA) Raw TXT data files (aCGH, gene expression and miRNA) Nimblegen – Community Code Contribution: NDF array designs Pair Report (raw and normalized) data files Illumina: BGX/TXT array designs Gene expression: Sample Probe Profile TXT files with unique Probe_Id Genotyping: Processed matrix TXT files with unique IlmnID values Affymetrix: AGCC/Command Console formats for CDF, CEL and CHP files. CNCHP files with copy number and LOH data. Copy number data in MAGE-TAB Data Matrix format

caArray Plan for caArray Focus on upload/import/download of large data sets Plan for Grid security Plan for caTissue integration Fix collaborator view of uploaded files Curate organisms, material types and protocol types Plug-in architecture prototype Support search for experiments by publication Find & download samples within an experiment Upgrade Java (6), Jboss (5.1) and MySQL (5.1)

Upload/import/download of large data sets Current functionality The current approach of storing data files in a MySQL database does not scale to the large volumes of data expected from experiments like next gen sequencing. Individual imports are limited to about 1.5GB each, forcing the user to import in multiple smaller batches. There is room to improve in the upload/import/download user experience Plans Support storage of large volumes of data, possibly on a distributed file system Support large data set imports without the need for chunking. Possible approaches are to store parsed data on the file system, to use Postgres, or to break the import into multiple smaller transactions. Support import of next gen sequencing files like FASTQ and BAM. Add queue management to ease the import process Support resumable downloads and transparent compression

Plan for Grid security Current functionality The current version of the Grid API supports access to only publicly available data. This means that if programmatic access to protected data is desired, then the Java API must be used instead Plans Perform design work for implementing Grid security, including items such as: Allow a programmatic client to log in using Grid credentials and retrieve protected data. Local installers will have the choice to keep old-style local accounts or migrate to Grid accounts. A mechanism must be provided for users to migrate their local accounts to Grid accounts. Use Grid Grouper to manage groups.

Plan for caTissue integration Current functionality Various ad hoc systems (like /paper) are used during the lab workflow in order to transfer specimen information from the biospecimen system to the assay system. Users need a way to look up data associated with a specimen they found in the biospecimen system, or to look up the specimens associated with data they found in caArray Plans Agree upon requirements with the caTissueSuite team for how biospecimens/biomaterials will be mapped between the two systems, and what services are needed to enable integration.

Fix collaborator view of uploaded files Current functionality Files that are uploaded but not yet imported can be seen only by the experiment owner, and are invisible to collaborators who have read access to the experiment Plans Users with read/write access to the experiment will be allowed to see uploaded files. A user with sample-selective access will not be allowed to see uploaded files, except if (s)he uploaded the files. Significant work on the security filters in makes this fix now possible.

Curate organisms, material types and protocol types Current functionality Duplicate terms are readily created, especially on MAGE-TAB import Plans Limit organisms to the NCBI taxonomy term source, and clean up duplicates Limit material types and protocol types to the MGED ontology and clean up duplicates Longer-term Plans Suggest alternative terms during import Let curators merge duplicate terms

Plug-in Architecture Prototype Prototype demonstrating the benefits of a plug-in architecture Refactoring caArray to introduce a plug-in framework based on OSGi Gives the ability to deploy plug-ins without the need for a full-fledged release Plug-ins would be allowed at defined integration points – e.g., parsers for new data types, new visualizations, additional APIs that can be exposed Prototype will demonstrate a heatmap visualization of gene expression data as a plug-in.

Avenues for Feedback Molecular Analysis Tools Knowledge Center Forum: GForge Community Change Request tracker This meeting. The next caArray session on the ICR-WS will be on: Wednesday, October 13, 2:00 PM ET