Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail.

Slides:



Advertisements
Similar presentations
Chapter 4 Partition I. Covering and Dominating.
Advertisements

Coloring Warm-Up. A graph is 2-colorable iff it has no odd length cycles 1: If G has an odd-length cycle then G is not 2- colorable Proof: Let v 0, …,
Clickstream Models & Sybil Detection Gang Wang ( 王刚 ) UC Santa Barbara
Every edge is in a red ellipse (the bags). The bags are connected in a tree. The bags an original vertex is part of are connected.
Copyright © Blue-Wireless Pty Ltd 2005 SoftBX ACD Queue Options & Setup SoftBX Call Centre.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Bayesian Piggyback Control for Improving Real-Time Communication Quality Wei-Cheng Xiao 1 and Kuan-Ta Chen Institute of Information Science, Academia Sinica.
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
Sociology and CS Philip Chan. How close are people connected? Are people closely connected, not closely connected, isolated into groups, …
Minimum Spanning Trees
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Applications of Numbered Undirected Graphs Gary s. Bloom and Solomon w. Golomb.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
3D Skeletons Using Graphics Hardware Jonathan Bilodeau Chris Niski.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
February 12, 2009 Center for Hybrid and Embedded Software Systems Encapsulated Model Transformation Rule A transformation.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Tuple – InfoVis Publication Browser CS533 Project Presentation by Alex Gukov.
1 An Overlay Scheme for Streaming Media Distribution Using Minimum Spanning Tree Properties Journal of Internet Technology Volume 5(2004) No.4 Reporter.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
New Algorithm DOM for Graph Coloring by Domination Covering
Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
CIS786, Lecture 4 Usman Roshan.
Detecting Conversing Groups of Chatters: A Model, Algorithm and Tests S. A. Çamtepe, M. Goldberg, M. Magdon-Ismail, M. S. Krishnamoorthy {camtes, goldberg,
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Using Rotatable and Directional (R&D) Sensors to Achieve Temporal Coverage of Objects and Its Surveillance Application You-Chiun Wang, Yung-Fu Chen, and.
An Impossibility Theorem for Clustering By Jon Kleinberg.
COMP 261 Lecture 12 Disjoint Sets. Menu Kruskal's minimum spanning tree algorithm Disjoint-set data structure and Union-Find algorithm Administrivia –Marking.
HOUSELISTING SCHEDULE NPR SCHEDULE HOUSEHOLD SCHEDULE.
Finding dense components in weighted graphs Paul Horn
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
1Ellen L. Walker Segmentation Separating “content” from background Separating image into parts corresponding to “real” objects Complete segmentation Each.
Computer Vision Lecture 5. Clustering: Why and How.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Aim: Graph Theory - Trees Course: Math Literacy Do Now: Aim: What’s a tree?
Inferring Agent Dynamics from Social Communication Network Hung-Ching (Justin) Chen Mark Goldberg Malik Magdon-Ismail William A. Wallance RPI, Troy, NY.
The mathematics of graphs A graph has many representations, the simplest being a collection of dots (vertices) and lines (edges). Below is a cubic graph.
Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date:
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Computer Graphics and Image Processing (CIS-601).
Complex Network Theory – An Introduction Niloy Ganguly.
Text Clustering Hongning Wang
Homework - hints Problem 1. Node weights  Edge weights
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Discovering Hidden Groups in Communication Networks Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail William Wallace.
Strongly Connected Components for Directed Graphs Kelley Louie Credits: graphs by /demo/graphwin/graphwin.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Hw. 6: Algorithm for finding strongly connected components. Original digraph as drawn in our book and in class: Preorder label : Postorder label Nodes:
Algorithm Design and Analysis
Dilations: (Stretching/Shrinking)
Painted Desert Challenge
Sending Bits on the Internet
Mean Shift Segmentation
Unweighted Shortest Path Neil Tang 3/11/2010
Search Related Algorithms
COMPUTER NETWORKS CS610 Lecture-42 Hammad Khalid Khan.
Progress Report Meng-Ting Zhong 2015/9/10.
Clustering The process of grouping samples so that the samples are similar within each group.
“Traditional” image segmentation
ACHIEVEMENT DESCRIPTION
Presentation transcript:

Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail

Who is Using Multiple IDs? Consider two chatrooms, “letters” and “numbers” lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :)

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!mack: i love 5

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules!

Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg/cattjoopmackanna

Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) dogg/cattjoopmackanna

Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results

Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results

A Model of a Public Forum Every actor has one response queue Common average and variance of response delay The server has a queue to process messages with very short delay

Model Friendship Graph anna catt mack doggjoop

Model Alias Graph anna catt mack doggjoop

Multi-ID Users One response queue for each actor dogg/cattjoopmackanna catt dogg

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! catt dogg anna dogg

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!mack: i love 5 cattdogg mack catt dogg

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 cattmack anna dogg

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 cattmack anna mack dogg

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t catt anna mack catt mack

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t catt mack catt dogg anna

Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! mackcattdogg joop catt

Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg/cattjoopmackanna cattdogg joop dogg mack

Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) dogg/cattjoopmackanna cattdogg catt joop

Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi-ID users Simulation results

First Algorithm 1.Collect information {‹time i, ID i ›} 2.Compute minimum time delay minD for every pair of IDs 3.Cluster the delays using k-means into two groups. Call pairs with larger center suspected to be the same actor (red) Call pairs with smaller center suspected to different (blue) 4.Connected components using red edges are ID groups representing one actor

1. Collect information anna catt mack dogg joop mack catt dogg time

2. minD(dogg, mack) mack dogg mack dogg

2. minD(dogg, catt) catt dogg catt dogg

3. Cluster minD values minD(dogg, mack) minD(dogg, catt) Different actors Same actor

3. Resulting Alias Graph anna catt mack doggjoop Contradictions

4. One Red Component anna catt mack doggjoop 9 false positives 10% accuracy

Refinement Algorithm Find groups connected with red edges 5.Color IDs into smaller ID groups using blue edges Follow original algorithm

4. One Red Component anna catt mack doggjoop

5. Color Blue Graph anna catt mack doggjoop

5. Color Blue Graph anna catt mack doggjoop 1 false positive 90% accuracy

Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results

The average number of messages per pair Accuracy without coloring (%) Accuracy with coloring (%) The dependence of Accuracy on the number of messages

Future Work Real-life testing More sophisticated model –Different types of users –Short and overlapping online appearances –Length of posts Algorithm improvements –Statistical evaluation of the new models –Lexical and semantic analysis