Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Data Analysis and High Performance Computing Yu (Cathy) Jiao, Ph.D. Robert M. Patton, Ph.D. Xiaohui Cui, Ph.D. Applied Software Engineering.

Similar presentations


Presentation on theme: "Presented by Data Analysis and High Performance Computing Yu (Cathy) Jiao, Ph.D. Robert M. Patton, Ph.D. Xiaohui Cui, Ph.D. Applied Software Engineering."— Presentation transcript:

1 Presented by Data Analysis and High Performance Computing Yu (Cathy) Jiao, Ph.D. Robert M. Patton, Ph.D. Xiaohui Cui, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering Division

2 2 Jiao_Knowledge_SC07 Streaming data analysis challenges Data Binary Text Image Multimedia Sensors One small step for man 11010010 1970198019902000 2010 High volume data streams are constantly generated. Traditional data encoding scheme is inefficient. Need a new solution to handle incremental clustering.

3 3 Jiao_Knowledge_SC07 Distributed data stream mining with Piranha  Piranha utilizes distributed and parallel data clustering to process data streams.  Piranha applies a novel data encoding scheme, Term Frequency-Inverse Corpus Frequency (TF- ICF).  Piranha handles incremental clustering using a threshold-based solution.

4 4 Jiao_Knowledge_SC07 headnode worknode

5 5 Jiao_Knowledge_SC07

6 6 Jiao_Knowledge_SC07

7 7 Jiao_Knowledge_SC07

8 8 Jiao_Knowledge_SC07 Global Cooling Global Warming Regional Conflicts Endangered Ecosystem Disappearing Glacier Population Change Abnormal Weather New Diseases Climate Change Oil Production Drought Effect: Gas price soars to $6.00 per gallon Rainfal l Modeling the impact of policies Industrial Pollution Cloud Feedback Volcan o Activity Ocean Activity Car Emissio n Solar Activity Solar Activity Policy: Budget cut for public transportation CausesEffects

9 9 Jiao_Knowledge_SC07 Ant colony optimization Bird flocking model Breakthrough—bioinspired distributed solution AlignmentSeparationCohesion

10 10 Jiao_Knowledge_SC07 Category/topic Number of articles 1Airline safety10 2 China and spy plane and captives 4 3 Hoof and mouth disease 9 4Amphetamine10 5Iran nuclear16 6 North Korea and nuclear capability 5 7Mortgage rates8 8 Ocean and pollution 10 9 Saddam Hussein and WMD 10 Storm Irene22 11Volcano8 The document collection dataset The clustering results of K-means, ant clustering and MSF clustering algorithm on synthetic and document datasets after 300 iterations Algorithms Average cluster number Average F-measure value Synthetic dataset MSF40.9997 K-means40.9879 Ant40.9823 Real document collection MSF9.1050.7913 K-means110.5632 Ant10.1623 Multiple species flocking (MSF) document clustering Alignment Separation Cohesion

11 11 Jiao_Knowledge_SC07 Summary  Current technology cannot solve emerging national challenges.  Intelligent software agents are a significant breakthrough technology.  Results indicate high potential to help solve these national challenges.  We have a progression of successfully deployed agent systems and research to our credit.

12 12 Jiao_Knowledge_SC07 Contacts Yu (Cathy) Jiao, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering Division (865) 574-0647 jiaoy@ornl.gov Robert M. Patton, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering Division (865) 576-3832 pattonrm@ornl.gov Xiaohui Cui, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering Division (865) 576-9654 cuix@ornl.gov 12 Jiao_Knowledge_SC07


Download ppt "Presented by Data Analysis and High Performance Computing Yu (Cathy) Jiao, Ph.D. Robert M. Patton, Ph.D. Xiaohui Cui, Ph.D. Applied Software Engineering."

Similar presentations


Ads by Google