Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SALESFORCE ® Introduction.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Setting Up Information Portal Irwan Sampurna C-CONTENT 23 May 2006.
Usage of the memoQ web service API by LSP – a case study
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
Depositing e-material to The National Library of Sweden.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
The CEGIS Online Bibliography Holly K. Caro In late May of 2009, the Center of Excellence for Geospatial Information Science (CEGIS) decided to consolidate.
Humboldt University: A workflow model for digital theses and dissertations ETD A workflow model for digital theses and dissertations Developments.
Introducing Symposia : “ The digital repository that thinks like a librarian”
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Development of Japanese GIS Tool for use in the Humanities ○ Masatoshi ISHIKAWA †, Yoichi KAWANISHI ††, Hidefumi OKUMURA †††, Shoichiro HARA †††† † University.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Management of information. Objectives Discuss the benefits of good management practice Present reference management tools Present bookmark management.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 14 Sharing Documents 1 Morrison / Wells / Ruffolo.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
INTRODUCTION TO WEB DATABASE PROGRAMMING
RMG Study Group Session I: Git, Sphinx, webRMG Connie Gao 9/20/
A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
Overview Features & Functions 07/2013. A family of workflow automation products developed by Konica Minolta’s Solutions Engineering Center Easy-to-use.
Using the Powerful Microsoft Azure Platform, e-SUAP Properly and Securely Manages All Steps for Customizable Business Activities Permissions MICROSOFT.
GDT V5 Web Services. GDT V5 Web Services Doug Evans and Detlef Lexut GDT 2008 International User Conference August 10 – 13  Lake Las Vegas, Nevada GDT.
Overview Features & Functions 07/2013. Foundations Need A Document Processing Solution That… Collects files automatically? Scans to Word? Scans to Desktop?
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
User’s guide. Compare features:EndNote WebEndNote Save references++ Organize & edit references++ Storage capacity (number of references)10,000unlimited.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Notre Dame Radiation Chemistry Data Center. Keith P. Madden Notre Dame Radiation Laboratory.
Documentation NCRR Documentation for BioPSE/SCIRun and map3d All this great software and you want documentation too!?
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
PatentScope - Electronic Publication World Intellectual Property Organization.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
OWL Representing Information Using the Web Ontology Language.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
AdLib eDocument Solutions Scott Mackey AdLib eDocument eDocument Solutions.
© 2013, published by Flat World Knowledge Chapter 10 Understanding Software: A Primer for Managers 10-1.
Stage Setting I Audience: FS scientists, possibly technicians Module position: Archiving protocol and metadata training completed.
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu Advisor: Prof. Geoffrey C. Fox 1/14/2009.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
Collection Management Systems
 A content management system ( CMS ) is a system providing a collection of procedures used to manage work flow in a collaborative environment. These.
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
The 2007 Microsoft Office System Servers Enterprise Content Management, Workflow and Forms Martin Parry Developer and Platform Group, Microsoft Ltd
Automation Living in a Paper Oriented World and The Steps to Automation.
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
MIRC Overview Medical Imaging Resource Center. RSNA2006 MIRC Courses Overview of the RSNA MIRC Software Installing MIRC on Your Laptop Using MIRC for.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
SDMX: A brief introduction
DATABASES WHAT IS A DATABASE?
Tutorial 7 – Integrating Access With the Web and With Other Programs
DIBBs Brown Dog BDFiddle
Presentation transcript:

Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -

Computational Chemistry Taking experiment to cyberspace Nobel Prize Chemistry 2013 (see also 1981, 1998)

Well stablished theories Standard computer codes Permanent storage Re-use results Certify results Number of citations of CompChem papers per year

Is Comp Chem a Big Data Problem?

Our Big Data Problem (1) Help researchers in their daily tasks (manage results, apps & tools)

Our Big Data Problem (2) Store and manage files of former group members

Our Big Data Problem (3) Supporting Information files Certify results - Reuse results

5 ★ Open Data Tim Berners-Lee Present ioChem-BD

Scientists Submit jobs Data Collection Manually Reports (pdf files) Manually HPC Files TeraBytes >95% waste Publishers Files Public Information Present

Scientists Submit jobs Workflows Data Collection Automated Reports XML Automated Cloud HPC HPC on demand Results Databases XML Publishers Information Public Files Information Future

Scientists Submit jobs Data Collection Automated Reports XML Automated HPC Results Databases XML Publishers Files Public Files Information ioChem-BD

Objectives  Build a handy tool for:  Managing any type of datasets  Generating reports (xml, pdf, jpg)  Making research data public access  Redefine daily workflows and publishing protocols  Set a common data standard for Comp. Chemistry formats (XML - CML)  Open to add future functionalities for data manipulation and analysis. Open to queries by third parties.  Build a distributed knowledge database  data becomes social

Definition ioChem-BD is a Digital Repository aimed to manage and store Computational Chemistry files (inputs & outputs), and comes to fill the gap between results generation and manuscripts publication, and raise data to 5* quality.

N starting formats  1 final format All output files are converted to CML CML  Chemical Markup Language

What does CML allow?

What will CML allow? Anything researchers need to boost their research New reports types, and graphs New build formats – R plots – Datasets – (Your code here)…

Features  Data syntheses : HTML5 reports  Data easily exportable and viewable  Ease of use web app  Integrated with other external software :  Jmol, Chemaxon, HighCharts, DOI …  Fully and dynamically customizable on which fields :  to capture  to display

Architecture : ioChem-BD modules Private use Single page web Entry point for HPC centers Upload via web/shell Productivity oriented Search by chemical substructure / metadata Create

Create module

Manage – Post-processing – Organize projects collections – Enrich Data: Description, keywords, additional files – Reports: Generate Sup. Info. files (pdf) for publishing – Reaction Energy paths – Consistency (level of theory) – Thermodynamic corrections – Kinetic Analysis ( TOF, % e.e.) – Molecular descriptors (QSAR) – etc … Create

Architecture : ioChem-BD modules Public content Multiple web pages Data coming from Create Data browse, search Community generated Content syndication Browse

Browse module

Browse

ioChem-BD Data conversion workflow

Performance of our new extraction library ≈4x

ioChem-BD Create module features

ioChem-BD Browse module features

Current project status In production (ICIQ, URV, UdG) & Demo servers up ( Supported formats: – Gaussian, ADF, VASP, Turbomole, Molcas, ORCA Reports Module (Sup. Info., Reaction Energy profiles) Download just one single file installer Documentation ( Álvarez-Moreno, M.; de Graaf, C.; López, N.; Maseras, F.; Poblet, J. M.; C, Bo J. Chem. Inf. Model. 2015, 55, 95. On going projects: ERC Proof-of-Concept (N. López, ICIQ): Catalytic materials La Caixa/Crysforma: molecular properties database for APIs DOI Query other databases (ChemSpider, CheBI) TO DO: Sindicate distributed browsers … and much more

Acknowledgements

Taming the Big Data in Computational Chemistry