CDT PROJECTS 2013-14 John Keane, Software Systems Group 1. Data Analytics / Big Data 2. Parallel & Distributed Systems 3. Decision Support.

Slides:



Advertisements
Similar presentations
© 2013 IBM Corporation Complexity must be conquered (Data is more than “just” big) Laura Haas IBM Almaden Research Center 1.
Advertisements

Quantitative Research and Analytics, Proprietary and Confidential1 Ryan Michaluk
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Shipi Kankane Prashanth Nakirekommula.  Applying analytics and risk- management capabilities to health insurance through LexisNexis data platforms. 
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
10.1 © 2007 by Prentice Hall 10 Chapter Improving Decision Making and Managing Knowledge.
Sylnovie Merchant, Ph.D MIS 210 Fall 2004 Lecture 1: The Systems Analyst Project Management MIS 210 Information Systems I.
 INPUT: Acxiom Corporation collects 300 million individual demographic records.  OUTPUT: Who is going to default on a loan.  Q: How do you process.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Module 3: Business Information Systems
Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Understanding Data Warehousing
Management Information Systems
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
Computers in Urban Planning Computational aids – implementation of mathematical models, statistical analyses Data handling & intelligent maps – GIS (Geographic.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Implementation of information system. Implementation in system’s lifecycle It can be done after the system is designed, It is one of the most difficult.
Challenges towards Elastic Power Management in Internet Data Center.
Guiding Principles. Goals First we must agree on the goals. Several (non-exclusive) choices – Want every CS major to be educated in performance including.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
Algorithms and Visualization. Strategy ALG: algorithms and data structures for spatial data fundamental, curiosity-driven research applications in other.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
Investigating Survivability Strategies for Ultra-Large Scale (ULS) Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
Robust Design: The Future of Engineering Analysis in Design
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
1 Project Planning. 2 u What is a project plan? defining a goal and then developing a strategy for achieving that goal. u Who is involved? all those affected.
Data Mining with Big Data IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014 Xiangyu Cai ( )
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Mining of Massive Datasets Edited based on Leskovec’s from
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
This is a free Course Available on Hadoop-Skills.com.
Enterprise Alert on Microsoft Azure Fully Automates Critical Incident Communication and Transforms It into an Intelligent, Reliable, and Mobile Experience.
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
Center for Networked Computing. Motivation Model and problem formulation Theoretical analysis The idea of the proposed algorithm Performance evaluations.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
September Nazim Madhavji Hello, my name is © N.H. Madhavji, UWO.
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Big Data Analytics and HPC Platforms
Maxeler Technologies Multiscale Dataflow Technology solves mission-critical compute problems with 20-50x improvement in performance & energy Application.
MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Organizations Are Embracing New Opportunities
Parallel Sessions: Pathways & Prediction
University of Maryland College Park
Dynamo: A Runtime Codesign Environment
Analytics and OR DP- summary.
Genomic Data Clustering on FPGAs for Compression
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
A Must to Know - Testing IoT
Improve Patient Experience with Saama and Microsoft Azure
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Improving Decision Making and Managing Knowledge
UPTIME & SEMANTIC WEB STANDARDS
ERP and Related Technologies
Deployment Optimization of IoT Devices through Attack Graph Analysis
Presentation transcript:

CDT PROJECTS John Keane, Software Systems Group 1. Data Analytics / Big Data 2. Parallel & Distributed Systems 3. Decision Support Systems HAPPY TO DISCUSS

With Nenadic CHALLENGE Investigate: –Applications: characteristics and predictability –Data Analytic / Machine Learning Algorithms – relatively simple so far –Software: Map-Reduce, Hadoop –Hardware: various platforms Big Data Analytics (IBM funded)

With Nenadic, Zeng, Stivaros (Consultant, RMCH) Adverse drug event detection (EU funded) –Bayesian/Fuzzy association rules algorithms CHALLENGE –Compare/contract accuracy of prediction Clinical Outcome Mining (Christie Hospital) –Data/text-based clinical records – better diagnose and predict CHALLENGE –Illness staging; multi-modal data; changes over time; Decision Support for Radiology (NIHR-funded) –Decision aid to assist better description of scans CHALLENGE –Usability; Integration with existing tools; Link to literature Bio-medical data analytics

Colossal itemsets: - Very high dimensional datasets - Run-time increases exponentially as average row length increases; Minimal unique itemsets (MUI) SUDA: Special Unique Detection - “risky” records, those likely to be linked– 16 years old + widow - Records of most concern have many, small MUIs - SUDA s/w used by ONS, UK; licensed by Singaporean govt; - Algorithm used by UN/World Bank International Household Survey CHALLENGES: Data structure to represent itemsets during search process Search space pruning Algorithm: bottom-up; top-down; hybrid; Parallelism Itemset Mining Algorithms {baby nappies}->{beer}

Eco-service composition (EU funded) with Mehandjiev, MBS Aims to determine conditions for achieving eco-friendly, resilient and optimal service compositions on a distributed cloud infrastructure Two service optimisation approaches deployed: 1. Global: analyses end-to-end interaction between services 2. Local: computes local optimization by creating dynamic service chains between service provider/consumer CHALLENGE Energy-efficient load balance and scheduling

HPC + Finance (EU funded, UK Government) High Frequency Trading –Flash crashes: dramatic sudden drop in share price  describe/predict –Working paper: High Frequency Trading and Mini Flash Crashes HPCFinance New models of risk analysis (diverse data integration) Role of HPC in Finance and comparison of technologies Trade-off: accuracy, speed, cost comparison: Cloud; GPGPUs, FPGA (Maxeler box) CHALLENGES: Data engineering; Analytics; Algorithms; High performance;

Preference Elicitation from Pairwise Comparison with Mikhailov, MBS; Siraj, COMSATS IIT, Pakistan Decision making is complex in presence of uncertainty and insufficient knowledge. Aim to estimate preference using pairwise comparison: PC used when unable to assign scores to available options; judgements provided may be inconsistent Work has proposed consistency measures and prioritization measures where revision not allowed. PriEsT tool now has sensitivity analysis -> best solution. CHALLENGES –Evolutionary approach to multi-criteria DSS –Work on preference elicitation model and tool –Group decision making –Bridge PriEsT and R (popular data mining tool) via XMCDA