Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Integrated and Comprehensive Data Mining System for Studying Environmental Impact of Nanomaterials: NEIMiner Nano Working Group Presentation 10/13/2011.

Similar presentations


Presentation on theme: "An Integrated and Comprehensive Data Mining System for Studying Environmental Impact of Nanomaterials: NEIMiner Nano Working Group Presentation 10/13/2011."— Presentation transcript:

1 An Integrated and Comprehensive Data Mining System for Studying Environmental Impact of Nanomaterials: NEIMiner Nano Working Group Presentation 10/13/2011 Kaizhi Tang, Ph.D., David Mihalcik, Thomas Wavering, Roger Xu Intelligent Automation Inc Prof. Stacey Harper, OSU Sue Pan, SAIC Sponsor Agency: Dr. Jeff Steevens, Army ERDC

2 Outline Motivation and proposed approach NEI modeling framework Design of NEIMiner information system NEIMiner

3 Motivation and proposed approach of NEIMiner NEED: To reduce the risk of nanomaterials in military use, NM environmental impact analysis requires a comprehensive NEI modeling framework, centralized NEI database, powerful model discovering tool and integrated model composition strategy. KEY COMPONENTS OF THE PROPOSED APPROACH Flexible data integration based on the ETL (Extract, Transform, Load) strategy of data warehouse. Integrated and collaborative data management utilizing modern content management system Optimized data mining process with many algorithms and parameters with huge computational burden Flexible model composition based on unified model abstraction reusing FRAMES DELIVERABLES Conceptual framework of NEI analysis Collaborative NEI information system with model discovery and composition capability VALUE TO THE CUSTOMER /TRANSITION CUSTOMER Environmental impact estimation tool for nanomaterials Easy access to large amount of NEI data in a centralized data warehouse and the available model generation tool Potentially useful evaluation models of NEI

4 Collaboratory of Structural Nanobiology NEI Data NEI Data Mining Models Scope of NEI Modeling

5 NEIMiner System Architecture NEI Data NEI Data Mining Models

6 Available NEI Data and Schemas Nanomaterial-Biological Interactions Knowledgebase – http://nbi.oregonstate.edu/ Cancer Nanotechnology Laboratory portal (caNanoLab) – NCI, https://cananolab.nci.nih.gov/caNanoLab/ ICON: International Council on Nanotechnology – Rice University, http://icon.rice.eduhttp://icon.rice.edu Nano-Tab – tab-delimited spreadsheet type based on EBI and ISA-TAB NanoParticle Ontology(NPO) – Implemented in OWL Most complete characterization capture Largest number of publications, limited characterization capture Wide range of characterization and health impact data Most complete characterization capture Largest number of publications, limited characterization capture

7 Other Data and Schemas OECD Database on Research into Safety of Manufactured Nanomaterials – http://webnet.oecd.org National Institute for Occupational Safety and Health (NIOSH) – http://www.cdc.gov/niosh/topics/nanotech/NIL.html SAFENANO - Institute of Occupational Health (UK) – http://www.safenano.org/AdvancedSearch.aspx University of Wisconsin - Madison: Nanoscale Science and Engineering Center – http://www.nanoceo.net/nanorisks National Reference Center for Bioethics Literature - Georgetown University, Kennedy Institute of Ethics – http://bioethics.georgetown.edu/ Nanomedicine Research Portal – http://www.nano-biology.net/ Center on Nanotechnology and Society (Chicago-Kent College of Law in the Illinois Institute of Technology) – http://www.nano-and-society.org/

8 Data Extraction Methods Data extraction via web services – Example: caNanoLab Data extraction via web scraping – Examples: ICON, NBI – Approaches Human copy-and-paste HTTP programming Text grepping and regular expression matching HTML parsers

9 Design philosophy of NEI data Warehouse Data Warehouse – Centralized data from multiple data sources for analysis => multiple nano risk related data sources with different formats – Consists of an ETL tool, a Database, a Reporting tool, Data Modeling => tools useful for NM data integration and mining – Subject oriented data organization => risk assessment for nano materials – Multi-dimensional => various nanomaterial properties – Star schema => extendible schema design

10 NEI Model Discovery Physical properties Material Type Particle size distribution PDI Shape Structure Chemical properties Surface reactivity Surface charge Water solubility Exposure and Study scenario Duration Continuity Exposure route Number of nanoparticles Number of ligands Biological Properties Species, age, gender, weight Environmental ecosystem response Fate and transport Bioavailability and uptake Biomagnificiation Biological response Genomic response Cell death Correlation? Prediction?

11 Interesting Mining Problems and Solutions How to handle missing data – Median on numerical values – Median-frequency categories – Classification or regression using existing data How to determine attribute significance – Compare gain ratio for classification – Compare relief ratio for numerical prediction How to select algorithms and their parameters for training – Meta-optimization on algorithms and parameters How to split the data sets for high-quality models – Comparing various splitting strategies – Clustering as a preprocessing step

12 Demonstration of NEIMiner 12


Download ppt "An Integrated and Comprehensive Data Mining System for Studying Environmental Impact of Nanomaterials: NEIMiner Nano Working Group Presentation 10/13/2011."

Similar presentations


Ads by Google