Abdelzaher (UIUC). Research Milestones DueDescription Q1 Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

TARGET DETECTION AND TRACKING IN A WIRELESS SENSOR NETWORK Clement Kam, William Hodgkiss, Dept. of Electrical and Computer Engineering, University of California,
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences 1 A framework for.
Imbalanced data David Kauchak CS 451 – Fall 2013.
INTRODUCTION TO MODELING
Machine Learning and Data Mining Course Summary. 2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions.
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Tru-Alarm: Trustworthiness Analysis of Sensor Network in Cyber Physical Systems Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Compressed Sensing for Networked Information Processing Reza Malek-Madani, 311/ Computational Analysis Don Wagner, 311/ Resource Optimization Tristan Nguyen,
T Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015.
16722 Sensing and Sensors Mel Siegel )
Distributed data fusion in peer-to-peer environment Sergiy Nazarko, InBCT 3.2, Agora center, University of Jyväskylä.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Distributed and Efficient Classifiers for Wireless Audio-Sensor Networks Baljeet Malhotra Ioanis Nikolaidis Mario A. Nascimento University of Alberta Canada.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Task 1 (I1.1): Fundamentals of Context-aware Real-time Data Fusion.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Abstract This poster presents results of three studies dealing with application of ARTMAP neural networks for classification of remotely sensed multispectral.
Software Engineering Software Process and Project Metrics.
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.
Wireless Networks Breakout Session Summary September 21, 2012.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 Controversial Issues  Data mining (or simple analysis) on people may come with a profile that would raise controversial issues of  Discrimination 
The Science of Prediction Location Intelligence Conference April 4, 2006 How Next Generation Traffic Services Will Impact Business Dr. Oliver Downs, Chief.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Classification Techniques: Bayesian Classification
1 Computational Vision CSCI 363, Fall 2012 Lecture 28 Structure from motion.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
High-integrity Sensor Networks Mani Srivastava UCLA.
ReSeTrus Development of a digital library technology based on redundancy elimination and semantic elevation, with special emphasis on trust management.
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Network Community Behavior to Infer Human Activities.
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Lead: Roth (UIUC) Abdelzaher (UIUC) Huang (UIUC) Lei (IBM) Presented by: Tarek Abdelzaher.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Real Time Collaboration and Sharing
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Semantic Alignment Spring 2009 Ben-Gurion University of the Negev.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Paper: A. Kapoor, H. Ahn, and R. Picard, “Mixture of Gaussian Processes for Combining Multiple Modalities,” MIT Media Lab Technical Report, Paper.
1. Dong Wang, Md Tanvir Amin, Shen Li, Tarek Abdelzaher, Siyu Gu, Chenji Pan University of Illinois at Urbana Champaign, Urbana, IL, USA Lance Kaplan.
Lecture 1.31 Criteria for optimal reception of radio signals.
Data Reliability II: A Fundamental Challenge in Social Sensing
Prepared by: Mahmoud Rafeek Al-Farra
MURI Annual Review Meeting Randy Moses November 3, 2008
Innovative Front-End Signal Processing
Classification Techniques: Bayesian Classification
Multi-Step Attack Defense Operating Point Estimation via Bayesian Modeling under Parameter Uncertainty Peng Liu, Jun Dai, Xiaoyan Sun, Robert Cole Penn.
Data Warehousing and Data Mining
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Roc curves By Vittoria Cozza, matr
Presentation transcript:

Abdelzaher (UIUC)

Research Milestones DueDescription Q1 Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of prediction/estimation results. Q2 Extended analysis of semantic links in information networks. Formulation of information network abstractions that are amenable to analysis as new sensors in a data fusion framework. Q3 Data pool quality metrics and impact of data fusion. Formulation of metrics for data selection when all data cannot be used/sent. Q4Validation of QoI theory. Documentation and publications.

Research Milestones DueDescription Q1 Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of prediction/estimation results. Q2 Extended analysis of semantic links in information networks. Formulation of information network abstractions that are amenable to analysis as new sensors in a data fusion framework. Q3 Data pool quality metrics and impact of data fusion. Formulation of metrics for data selection when all data cannot be used/sent. Q4Validation of QoI theory. Documentation and publications.

4 Signal data fusion Information Network Analysis Sensors, reports, and human sources Trust, Social Networks Methods: Bayesian analysis Maximum likelihood Estimation etc. Methods: Ranking Clustering etc. Methods: Fact-finding Influence analysis etc. Machine Learning Methods: Transfer knowledge CCM etc. Fusion of hard sources Fusion of soft sources Fusion of text and images Fusion from human sources This Talk: Towards a QoI Theory for Data Fusion from Sensors + Information network links

Infrared motion sensor Target Sensor Fusion Example: Target Classification Vibration sensors Acoustic sensors Different sensors (of known reliability, false alarm rates, etc) are used to classify targets Well-developed theory exists to combine possibly conflicting sensor measurements to accurately estimate target attributes. Bayesian analysis Maximum likelihood Kalman filters etc.

Information Network Mining Example: Fact-finding Example 1: Consider a graph of who published where (but no prior knowledge of these individuals and conferences) Rank conferences and authors by importance in their field Han Abdelzaher Roth Sensys KDD WWW Fusion Example 2: Consider a graph of who said what (sources and assertions but no prior knowledge of their credibility) Rank sources and assertions by credibility John Sally Mike Claim4 Claim1 Claim3 Claim2

The Challenge How to combine information from sensors and information network links to offer a rigorous quantification of QoI (e.g., correctness probability) with minimal prior knowledge? Infrared motion sensor Target Vibration sensors Acoustic sensors John Sally Mike Claim4 Claim1 Claim3 Claim2 + P(armed convoy)=?

Applications Understand Civil Unrest Remote situation assessment Use Twitter feeds, news, cameras, … Expedite Disaster Recovery Damage assessment and first response Use sensor feeds, eye witness reports, … Reduce Traffic Congestion Maping traffic congestion in city Use crowd-sourcing (of cell-phone GPS measurements), speed sensor readings, eye witness reports, …

Approach: Back to the Basics Interpret the simplest fact-finder as a classical (Bayesian) sensor fusion problem Identify the duality between information link analysis and Bayesian sensor fusion (links = sensor readings) Use that duality to quantify probability of correctness of fusion (i.e., information link analysis) results Incrementally extend analysis to more complex information network models and mining algorithms

An Interdisciplinary Team Abdelzaher (QoI, sensor fusion) Roth (fact-finders, machine learning) Aggarwal, Han (Data mining, veracity analysis) Fusion Task I1.1 QoI Mining Task I3.1 QoI Task I1.2

The Bayesian Interpretation The Simplest Fact-finder: John Sally Mike Claim4 Claim1 Claim3 Claim2 The Simplest Bayesian Classifier (Naïve Bayesian):

The Equivalence Condition We know that for a sufficiently small x k : Consider individually unreliable sensors:

A Bayesian Fact-finder and: By duality, if: Then, Bayes Theorem eventually leads to:

Fusion of Sensors and Information Networks Putting fusion of sensors and information network link analysis on a common analytic foundation: Can quantify probability of correctness of results Can leverage existing theory to derive accuracy bounds Source1 Source3 Source2 Claim4 Claim1 Claim3 Claim2 Sensor1 Sensor2 Sensor3 Fusion Result Information Network

Fusion of Sensors and Information Networks Putting fusion of sensors and information network link analysis on a common analytic foundation: Can quantify probability of correctness of results Can leverage existing theory to derive accuracy bounds Source1 Source3 Source2 Claim4 Claim1 Claim3 Claim2 Sensor1 Sensor2 Sensor3 Fusion Result Information Network Measurements

Simulation-based Evaluation Generate thousands of “assertions” (some true, some false – unknown to the fact-finder) Generate tens of sources (each source has a different probability of being correct – unknown to the fact-finder) Sources make true/false assertions consistently with their probability of correctness A link is created between each source and each assertion it makes Analyze the resulting network to determine: The set of true and false assertions The probability that a source is correct No prior knowledge of individual sources and assertions is assumed

Evaluation Results Comparison to 4 fact-finders from literature Significantly improved prediction accuracy of source correctness probability (from 20% error to 4% error)

(Almost) no false positives for larger networks (> 30 sources) Evaluation Results Comparison to 4 fact-finders from literature

Below 1% false negatives for larger networks (> 30 sources) Evaluation Results Comparison to 4 fact-finders from literature

Coming up: The Apollo FactFinder Apollo Architecture Apollo: Towards Factfinding in Participatory Sensing, H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth, B. Szymanski, and S. Adali, demo session at ISPN10, The 10th International Conference on Information Processing in Sensor Networks, April, 2011, Chicago, IL, USA. Abdelzaher, Adali, Han, Huang, Roth, Szymanski Apollo: Improves fusion QoI from noisy human and sensor data. Demo in IPSN 2011 (in April) Collects data from cell-phones Interfaced to twitter Can use sensors and human text Analysis on several data sets: what really happened?

Apollo Datasets Track data from cell-phones in a controlled experiment 2 Million tweets from Egypt Unrest Tweets on Japan Earthquake, Tsunami and Nuclear Emergency

Immediate Extensions Non-independent sources Sources that have a common bias, sources where one influences another, etc. Collaboration opportunities with SCNARC and Trust Non-independent claims Claims that cannot be simultaneously true Claims that increase or decrease each other’s probability Mixture of reliable and unreliable sources More reliable sources can help calibrate correctness of less reliable sources

Road Ahead Develop a unifying QoI-assurance theory for fact-finding/fusion from hard and soft sources Sources Use different media: signals, text, images, … Feature differ authors: physical sensors, humans Capabilities Computes accurate best estimates of probabilities of correctness Computes accurate confidence bounds in results Enhances QoI/cost trade-offs in data fusion systems Integrates sensor and information network link analysis into a unified analytic framework for QoI assessment Accounts for data dependencies, constraints, context and prior knowledge Account for effect of social factors such as trust, influence, and homophily on opinion formation, propagation, and perception (in human sensing) Impact: Enhanced warfighter ability to assess information

Collaborations Fusion Task I1.1 QoI/cost analysis (unified theory for estimation/prediction and information network link analysis QoI Task I1.2 QoI Mining Task I3.1 (w/Jiawei Han) Consider new link analysis algorithms OICC Task C1.2 Community Modeling S2.2 Sister QoI Task C1.1 Decisions under Stress S3.1 (w/Dan Roth) Account for prior knowledge and constraints (w/Boleslaw Szymanski and Sibel Adali) Model humans in the loop (w/Ramesh Govindan) Improve communication resource efficiency (w/Aylin Yener) Increase OICC

Collaborations Collaborative – Multi-institution: Q2 (UIUC+IBM): Tarek Abdelzaher, Dong Wang, Hossein Ahmadi, Jeff Pasternack, Dan Roth, Omid Fetemieh, and Hieu Le, Charu Aggarwal, “On Bayesian Interpretation of Fact-finding in Information Networks,” submitted to Fusion 2011 Collaborative – Inter-center: Q2 (I+SC): H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth, B. Szymanski, S. Adali, “Apollo: Towards Factfinding in Participatory Sensing,” IPSN Demo, April 2011 Q2 (I+SC): Mani Srivastava, Tarek Abdelzaher, Boleslaw Szymanski, “Human-centric Sensing,” Philosophical Transactions of the Royal Society, special issue on Wireless Sensor Networks, expected in 2011 (invited). Invited Session on QoI at Fusion 2011 (co-chaired with Ramesh Govindan, CNARC)

Military Relevance Enhanced warfighter decision-making ability based on better quality assessment of fusion outputs A unified QoI assurance theory for fusion systems that utilize both sensors and information networks Offers a quantitative understanding of the benefits of exploiting information network links in data fusion Enhances result accuracy and provides confidence bounds in result correctness