Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie Jorgensen, 06, G4 Advisors: MAJ Ian McCulloh, D/MATH LTC John.

Slides:



Advertisements
Similar presentations
Analysis of Variance (ANOVA)
Advertisements

Design of Experiments Lecture I
Understanding the Variability of Your Data: Dependent Variable.
Applied systemic approach in the banking sector: financial contagion in the “cheques-as-collateral” network Michalis Vafopoulos joint work with D. Soumpekas.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Combining Test Data MANA 4328 Dr. Jeanne Michalski
Chapter Fourteen The Two-Way Analysis of Variance.
Ch11 Curve Fitting Dr. Deshi Ye
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 6 Finding the Evidence: Informational Sources, Search Strategies, and Critical.
More on ANOVA. Overview ANOVA as Regression Comparison Methods.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Detecting Changes in Social Networks Using Statistical Process Control Cadet Matthew R. Webb Advisor: Major Ian McCulloh, D/Math USMA 20 May 2007 NetSci.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Vector Space Model CS 652 Information Extraction and Integration.
Analysis of Variance & Multivariate Analysis of Variance
Concept Mapping. What is Concept Mapping ? Concept mapping is a technique for representing knowledge in graphs. This technique was developed by Professor.
Correlational Designs
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia.
Grade 1 Mathematics in the K to 12 Curriculum Soledad Ulep, PhD UP NISMED.
Dr. Howard Eisner Professor Emeritus, GWU SEDC CONFERENCE, April 2014 SYSTEM ARCHITECTING – VIEWS vs. FUNCTIONS vs. ALTERNATIVES.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Access to grocery stores EFGS Johan Stålnacke, Statistics Sweden.
Abstract On October 31 st, 2011, the United Nations reported that the world population had reached 7 billion and was projected to hit 9 billion by 2050,
1 1 Slide © 2004 Thomson/South-Western Chapter 17 Multicriteria Decisions n Goal Programming n Goal Programming: Formulation and Graphical Solution and.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Introduction to Management
September 19, 2012 SYSTEMATIC REVIEWS It is necessary, while formulating the problems of which in our advance we are to find the solutions, to call into.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Learning outcomes for BUSINESS INFORMATCIS Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST)
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Language Objective: Students will be able to practice agreeing and disagreeing with partner or small group, interpret and discuss illustrations, identify.
A Bibliometric Comparison of the Research of Three UK Business Schools John Mingers, Kent Business School March 2014.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
National Guidelines for the Curriculum in kindergarten and the first cycle of' Education (2012 September) Mathematics contributes to the cultural formation.
Graphical Toolbox For Enhancements in Operations Research Education Author: Ramzi Kesrouany Research Advisor: Dr. Nan Kong Department of Industrial & Management.
Relay Placement Problem in Smart Grid Deployment Wei-Lun Wang and Quincy Wu Department of Computer Science and Information Engineering, National Chi Nan.
Nonparametric Statistics
Combining Test Data MANA 4328 Dr. Jeanne Michalski
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.
Click to Edit Talk Title USMA Network Science Center Specific Communication Network Measure Distribution Estimation Daniel.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Introduction to the Semantic Web and Linked Data
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
University of Ostrava Czech republic 26-31, March, 2012.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
UNIT 2 LESSON 6 CS PRINCIPLES. UNIT 2 LESSON 6 OBJECTIVES Students will be able to: Write an algorithm for solving the minimum spanning tree (MST) problem.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Chapter 11: The ANalysis Of Variance (ANOVA)
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Building Valid, Credible & Appropriately Detailed Simulation Models
Step 1: Specify a null hypothesis
Physician Performance Measures: Like It Or Not?
MATH Mathematical Literacy for College Students II
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Label propagation algorithm
Presentation transcript:

Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie Jorgensen, 06, G4 Advisors: MAJ Ian McCulloh, D/MATH LTC John Graham, D/BS&L

Agenda The Real-World Problem Text Analysis/Social Network Analysis Solution  Social Network Analysis  Simple Text Analysis A Better Solution  Themed Analysis  Example Case – Jihadist Texts  Theme Scores Network Construction Procedure  Jihadist Network Results Importance and Conclusions

The Real-World Problem Commanders need to understand “Human Terrain” Majority of ‘HT’ information is in text form  The Combating Terrorism Center receives volumes of data every day.  Harmony Database is being rapidly declassified Need an efficient way to plow through large amounts of text data and see the linkages. Solution: Text Analysis Displayed in Social Network Analysis

Social Network Analysis A mathematical method of quantifying connections between individuals or groups and drawing conclusions from those connections Assumes rational beings are interdependent  Nodes Key Actors  Links Relationships between Nodes

“Human Terrain” Example: 9/11 Hijacker Network

Barzani Khamenei Iraq Elections

Demonstration Data Set: Jihadist Texts Approx. 250 translated texts  MEMRI  FBIS  Other Sources 15 Authors  More than 1 text  Not well known

Simple Text Analysis: The Plagiarism Check Problem Word matching is overly simple. Ignores context Actors can be overly weighted by writing more

Alternative: Themed Analysis Traditional Network Analysis Methods  Citation Analysis  Physical Network  Communication or Financial Network Themed Analysis  Relates nodes across multiple fields One similar theme versus many similar themes

Demonstration: Text Analysis

Theme Scores *Theme Score is the sum of each word’s score per text Problem  Commander needs information in representations he/she understands.  Networks can compare authors across single themes  But difficult to compare authors across multiple themes

Constructing a Network Across Multiple Themes Scrub Texts Construct Theme Scores Construct Confidence Intervals Discern Similarity between Nodes  Binary or Standardized Difference of Means Create Square Matrix Draw Network *why not ANOVA?

Confidence Intervals 95% Confidence Interval =  Each Author, Each Theme Example:

Relationship Scores Each possible pair of authors per theme  Overlapping Confidence Intervals  Disparate Confidence Intervals

Matrix Construction Multiplication of Scores for each author and each theme Resultant Square Matrix Geometric Mean =

Themed Network

Theme Analysis: Confidence Interval vs Average Able to look at each theme individually. Average Rank does not account for connections importance, weighting, predictors Themes are combined Can see connections between authors across a combination of themes.

Method Comparison

Conclusions Socially Engineered Algorithms involve extensive tradeoffs and decisions by the mathematician that can significantly impact commander’s decision-making. Multiple views of the same data is a critical requirement. Find Linkages in large amounts of data Find Connections across multiple fields Non-Tangible Relationships Real World: Track / Catch criminals / radical ideologues Representation of Human Terrain

Future Work Publish method in Journal of Computational and Mathematical Organization Theory Integration into ORA (Organizational Risk Analysis) Statistical Software: In use by Intelligence Analysts. Analysis of change over time

Questions?

References Dr. Jaret Brachman. Combating Terrorism Center, USMA. Dr. Steven Corman. Hugh Downs School of Human Communication, Arizona State University. HusseinCapture.jpg HusseinCapture.jpg Wasserman, Stanley and Katherine Faust. Social Network Analysis: Methods and Applications. New York: Cambridge University Press, 1994, 4.