EXTRACT: MINING SOCIAL FEATURES FROM WLAN TRACES: A GENDER-BASED CASE STUDY By Udayan Kumar Ahmed Helmy University of Florida Presented by Ahmed Alghamdi.

Slides:



Advertisements
Similar presentations
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Advertisements

By Venkata Sai Pulluri ( ) Narendra Muppavarapu ( )
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Smart Routers for Cross-Layer Integrated Mobility and Service Management in Mobile IPv6 Systems Authors: Ding-Chau Wang. Weiping He. Ing-Ray Chen Presented.
Preference-based Mobility Model and the Case for Congestion Relief in WLANs using Ad hoc Networks Wei-jen Hsu, Kashyap Merchant, Haw-wei Shu, Chih-hsin.
Delay and Throughput in Random Access Wireless Mesh Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE Department Rensselaer Polytechnic Institute (RPI)
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
A Layered Hybrid ARQ Scheme for Scalable Video Multicast over Wireless Networks Zhengye Liu, Joint work with Zhenyu Wu.
An Architectural Framework for Providing WLAN Roaming D.Vassis G.Kormentzas Dept. of Information and Communication Systems Engineering University of the.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
Computing Trust in Social Networks
Presented by Zeehasham Rasheed
Distributed and Efficient Classifiers for Wireless Audio-Sensor Networks Baljeet Malhotra Ioanis Nikolaidis Mario A. Nascimento University of Alberta Canada.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Presented by Tao HUANG Lingzhi XU. Context Mobile devices need exploit variety of connectivity options as they travel. Operating systems manage wireless.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
Detecting Node encounters through WiFi By: Karim Keramat Jahromi Supervisor: Prof Adriano Moreira Co-Supervisor: Prof Filipe Meneses Oct 2013.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Chapter 1: Introduction to Statistics
Gender based analysis Udayan Kumar Computer and Information Science and Engineering (CISE) Department, University Of Florida, Gainesville, FL.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Extract: Mining Social Features from WLAN Traces A Gender-Based Case Study Udayan Kumar and Ahmed Helmy Computer and Information Sciences and Engineering,
Ahmed Helmy, USC1 State Analysis and Aggregation for Multicast-based Micro Mobility Ahmed Helmy Electrical Engineering Department University of Southern.
Find regular encounter pattern from mobile users. Regular encounter indicates an encounter trend that is repetitive and consistent. Using this metric can.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica TrajPattern: Mining Sequential Patterns from Imprecise Trajectories.
A Graph-based Friend Recommendation System Using Genetic Algorithm
User Behavior Analysis of Location Aware Search Engine Third international Conference of MDM, 2002 Takahiko Shintani, Iko Pramudiono NTT Information Sharing.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
A Study of Smartphone User Privacy from the Advertiser's Perspective Yan Wang 1, Yingying Chen 1, Fan Ye 2, Jie Yang 3, Hongbo Liu 4 1 Department of Electrical.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Measuring Behavioral Trust in Social Networks
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Saving Bitrate vs. Users: Where is the Break-Even Point in Mobile Video Quality? ACM MM’11 Presenter: Piggy Date:
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
Weighted Waypoint Mobility Model and Its Impact on Ad Hoc Networks Electrical Engineering Department UNIVERSITY OF SOUTHERN CALIFORNIA USC Kashyap Merchant,
Experience Report: System Log Analysis for Anomaly Detection
Authors: Jiang Xie, Ian F. Akyildiz
Automated Experiments on Ad Privacy Settings
DATA MINING © Prentice Hall.
Statistical Data Analysis
Cristian Ferent and Alex Doboli
Discriminative Frequent Pattern Analysis for Effective Classification
Statistical Data Analysis
The Spatiotemporal Organization of the Striatum Encodes Action Space
“The Spread of Physical Activity Through Social Networks”
Presentation transcript:

EXTRACT: MINING SOCIAL FEATURES FROM WLAN TRACES: A GENDER-BASED CASE STUDY By Udayan Kumar Ahmed Helmy University of Florida Presented by Ahmed Alghamdi

Outline 2  Introduction  Motivations  Challenges and Research Questions  Contribution  Approach  Location Based Classification (LBC)  Group Behavior Based Filtering (GBF)  Hybrid filtering (HF)  Name Based Classification (NBC)  Validation of (LBC)  Temporal Consistency Validation  IBF vs. GBF  Cross Validation  User Behavior Analysis  User Spatial Distribution  Average Duration or Temporal Analysis  Device Preference  Application  Conclusion

Introduction  WLAN traces to understand mobile user characteristics and behavior  Essential to network modeling and designing  This paper provide techniques to classify WLAN users into social groups  By area  By users’ info  it presents general methodology with an example case study of grouping by gender with investigation of gender gaps in WLAN usage 3

Introduction  WLAN Traces  From 2 Universities (more than 50K users)  Over 3 Years  U1 - Feb 2006, Oct 2006, and Feb 2007  U2 - Nov 2007, Apr 2008  WLAN traces are logs of user association with a Wireless Access Point (AP)  Traces generally contain  machine’s MAC address  associating time  duration  associated AP  WLAN traces are fed into a database for easy SQL retrieving 4

Motivations  Mobile devices becomes tightly coupled to users  Communication performance is bound to user mobility and behavior  In AdHoc networks, any node can act as a router  It is imperative to understand the various aspects of user behavior to design efficient protocols and effective network models 5

 How can we meaningfully infer gender information from such anonymous traces?  Does gender information influence user behavior and preference in a significant and consistent manner?  what is the impact of these finding on network modeling, protocol and service design in the future? 6 Challenges and Research Questions

 Class and gender inference methods based on location, usage and name filtering from extensive WLAN traces  Providing the first gender-based trace-driven analysis in mobile societies, including study of majors and device preferences  Identifying unique features in the studied grouping that suggests consistent behavior and the design of potential future applications 7 Contributions

 gender classification on campus  Location-based method  Based on individual and group network behavior  Analysis of WLAN traces  Cross validation with ground truth using Name based method  90% Accuracy  Usage patterns of males and females are different  Gender does affect user activity and vendor preference  This contribution enhances the understanding of the mobile society  It is essential to provide efficient network protocols and services in the future Approach 8

 Gender-Based Grouping  Location Based Classification (LBC)  Name Based Classification (NBC) Approach 9

 Sororities APs - female  Fraternities APs - males  CS Dept. APs - CS Students  Visitors Filtering  Visitor  Is a user with less number of sessions and smaller duration of sessions than the average user in that location (group behavior)  Or as user who has more sessions and larger online duration at other locations (individual behavior) Location Based Classification (LBC) 10

 Individual Behavior Based filtering (IBF)  The probability of a user being male or female by counting the number of sessions and measuring the duration he/she spends in fraternities versus sororities  The probability of a user being male, considering only session counts at fraternities and sororities  The probability of a user being male, considering only session durations at fraternities and sororities Location Based Classification (LBC) 11

 Users visiting Fraternity and/or Sorority in decreasing order of their Male probability (U1 feb2006)  1119 Users  425 Males  362 Females  P C M > 0.80 and PDM > 0.80 are males  PCM < 0.20 and P DM < 0.20 are females Location Based Classification (LBC) 12

 filter a user based on where his usage pattern lies with respect to all the users at a particular location  Find a Threshold  All users satisfy threshold are male or female due to the AP location  All other users are visitors Group Behavior Based Filtering (GBF) 13

 Clustering: is dividing a set of users into several subsets such that users in each subset are most similar based on WLAN usage metrics (duration, session count, distinct login days)  Metrics for user evaluation  Number of distinct days of login  Session count  Sum of session durations  By applying clustering technique to Sororities and Fraternity user trace from both Universities U1 and U2  Best Cluster Size is 2 (Regular/Visitor)  Maximum width is 0.84  Minimum width is 0.65 Group Behavior Based Filtering (GBF) 14

 Average Width for Sorority and Fraternities from University U1 and U2 Clustering results for University U1 Sororities (feb2006) Group Behavior Based Filtering (GBF) 15

 classification validation  compare the results from IBF and GBF  methods mainly select same set of users, which should be the case as both methods attempt to identify regular users  for high confidence, choose the users selected by both filtering methods  more than 90% of the users selected by GBF are common to users selected by IBF Hybrid filtering (HF) 16

 Usernames obtained on campuses that require authorization mechanism to access WLAN  Traces coming from university U2 provide us with usernames University U2 also host a directory that can be searched using these usernames  By Searching the directory first names corresponding to these usernames obtained  from the US Social Security administration, a list of top 1000 males and females first names is used and the names present in both lists (neutral names) are removed  this list is compared to the list obtained from university U2 directory Name Based Classification (NBC) 17

 11,000 out of 27,000 users classified as males or females in the trace period of Nov 2007  12,500 out of 30,000 users classified as males or females in the trace period of Apr 2008  foreign national students  non-popular names Name Based Classification (NBC) 18

 Validation of LBC is needed to raise confidence in the results  Three statistical methods to validate filtering mechanisms 1. temporal consistency: this method finds out regular users in the trace set belonging to adjacent months and compares this list to see how many are common 2. IBF vs GBF: this method compares results from IBF and GBF to check the similarities in the results 3. Cross Validation: this method takes the classification achieved using NBC method and compares it with the results of LBC Validation of (LBC) 19

 Multiple one-month traces from one semester  Apply IBF, GBF and HF to find out the common users in all adjacent months before and after filtering  Because users living in fraternities and sororities do not change from one month to another in the same semester, after filtering, the percentage of common users should increase Temporal Consistency Validation 20

 Similarity in the user population selected after filtering fraternity users for U1 Temporal Consistency Validation 21

 validation mechanism that compares the results of IBF and GBF methods Comparing users selected by IBF and GBF for U1 IBF vs. GBF 22

 NBC has a low error rate because of using statistics from real data coming from the US Social Security Office  Using this property of NBC, we can find out the error bound for the LBC  To calculate the error bounds, the users classified by LBC as females and males are put in sets FL and ML  Using NBC, we classify all users from Fraternities and Sororities and put them in sets FN and MN and remove unclassified users  The error in female classification by LBC  Ef = (FL∩MN)/FL  The error in male classification by LBC  Em =(ML∩FN)/ML Cross validation of LBC by NBC for U2 Cross Validation 23

 Group classification to understand usage differences between groups  Gender based grouping  Male  Female  Unclassified  Groups evaluated on multiple metrics depending on the application  This paper examines the existence of differences between genders, they used the metrics  spatio-temporal distribution for wireless usage  vendor preference User Behavior Analysis 24

 This metrics can identify where users spend most of their time  Difference in the number of users among the genders can tell us about the building preferences of the genders  Existence of locations, which are consistently preferred by one of the two genders, highlights the existence of difference in WLAN usage by two genders User Spatial Distribution 25

Comparison of user distribution across the university U1 campus (in Percentage) Comparison of user distribution across the university U2 campus (in Percentage) User Spatial Distribution 26

 Average duration of a session for males and females gives us an understanding of the extent of WLAN usage at different areas Average Duration or Temporal Analysis 27

Average duration of male and females in different Areas of university U1 campus Average duration of male and females in different Areas of the university U2 campus Average Duration or Temporal Analysis 28

 Some of these differences were found to be significant and spatio-temporally consistent even across campuses; females’ wireless activity is stronger in Social Science and Sports areas, whereas males’ activity is stronger in Engineering and Music Average Duration or Temporal Analysis 29

 MAC address is used to find preferred vendors for the groups  To test whether gender provides a bias towards specific vendors, the Chi-Square statistical significance test is used  The Chi-Square test shows with 90% confidence that there is a bias between gender and vendor/brand Device Preference 30

Device distribution by manufacturer at university U1 Device distribution by manufacturer at university U2 Device Preference 31

 The results from these metrics ca be applied to an application to make it context sensitive  Mobility Models  Mobility models are important tools to understand user movements and create models on which protocols can be tested  Protocol Design  Protocol and service design in Mobile Ad-Hoc networks can take features of various groups to evaluate its performance  Privacy Applications 32

 This paper proposes novel methods, which use WLAN traces to classify WLAN users in to social groups based on features such as gender and study-major among others  It presents a general framework that can be applied to traces coming from multiple sources  there is a distinct difference in WLAN usage patterns for different genders even with similar population sizes Conclusion 33