REZA ZAFARANI AND HUAN LIU DATA MINING AND MACHINE LEARNING LABORATORY (DMML) ARIZONA STATE UNIVERSITY KDD 2013 – CHICAGO, ILLINOIS.

Slides:



Advertisements
Similar presentations
U.S. Government Language Requirements U.S. Government Language Requirements 7 September 2000 Everette Jordan Department of Defense
Advertisements

Paragon Software Group presents PenReader. Paragon Software Group – International Holding Founded in 1994 Location Germany (HQ), NL, Russia, USA, Japan.
Yemelia International Language Services Translations Translations Translations Interpreting InterpretingInterpreting Multi-lingual IT Presentations Multi-lingual.
EU Institutions “To Understand Europe You Have to Be a Genius or French.” --Madeleine Albright, US Secretary of State, 1998.
MIG-KOMM-EU Multilingual intercultural business communication in Europe University of Bucharest Faculty of Foreign Languages and Literatures German Studies.
What is eEuroInclusion?: eEuroInclusion is a European Project funded under a special call for projects relating to ‘Language Learning and Linguistic Diversity’.
Curricular exams Irish, English, Ancient Greek, Arabic, French, German, Hebrew Studies, Italian, Japanese, Spanish and Russian.
 They speak German  8.47 million of people live there.
Linkedin “Your Professional Networking Hub”. What is linkedin Linkedin is a social networking website for professionals. It’s highly homogenous with most.
Mining Social Media: Looking Ahead Arizona State University Data Mining and Machine Learning Lab Arizona State University Data Mining and Machine Learning.
Eleni Galiotou, Dept. of Informatics
English Language Proficiency 2011 Census Analysis Tristan Browne.
Connecting Users across Social Media Sites: A Behavioral-Modeling Approach Jingchi Zhang.
INTERNATIONAL MARKETING MANAGEMENT SESSION 7: CUSTOMER BEHAVIOR AND MARKET SEGMENTATION 1.
INTERNATIONAL MARKETING MANAGEMENT SESSION 8: CUSTOMER BEHAVIOR 1.
23 October 2014 • AIPLA Annual Meeting Washington, DC Pierre Véron
European Feeling! What is the European feeling?. Content: History of European Feeling The European countries The Euro Then and now Our opinion.
1 EU & languages Elisabetta Gibertini Michela Sgarbi Mirjam Arula Hanna-Liis Karp.
Languages in Action Translating for the European Commission
Translating for the European Commission Vilnius, 7 June 2013 Miroslav Adamiš Director DGT.
Advanced Google Searching June Liebert Director and Assistant Professor The John Marshall Law School “Do no harm” – the Google mantra.
Survey on university students choosing a language course as an extra-curricular activity DIUS & AULC Department for Innovation Universities and Skills.
Understanding User Migration Patterns in Social Media Authors: Shamanth Kumar, Reza Zafarani, and Huan Liu Published at AAAI 2011 How and Where are people.
Frankfurt Book Fair Clare Hart, President & CEO Frankfurt, Germany October 2000.
IATE EU tool for translation-oriented terminology work
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
2013 Court of Justice of the European Union Language arrangements at the Court of Justice of the European Union Interpretation - Translation.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
Contemporary World. The European Union Since the end of WWII and the Cold War, European countries have gradually developed a feeling of collective identity.
1 Translate and Translator Toolkit Universally accessible information through translation Jeff Chin Product Manager Michael Galvez Product Manager.
Rosh ( ראש ) in Ezekiel Tim LaHaye writes that one way we know that Ezekiel 38 and 39 “can only mean modern-day Russia” is because of “etymology,”
The Swiss National Library in short The project e-helvetica The CENL ELAG BERN Dr. Jean-Frédéric Jauslin Director Swiss National Library April 2,
New RCLayout. Do product layout 3 improvements All products Local databases New functionalities.
The Perfect European. 16/11/2015AIVb VBS-HAKIII Schoenborngasse 2 The Perfect European should …
© Melanie Fiedler, Attorney at law 2005 Sofia The Community Trade Mark The functions of a trade mark distinguishing the goods or services of one undertaking.
Security Systems BU Communication Systems ST/SEU-CO 1 DCN MCCU IO Maintenance Select settings in Maintenance Menu  Default language for the.
© 2009 AccuWeather, Inc. Proprietary1. 2 Weather content around the globe. Dan Ryan New Media Sales
Mining Social Media Data Arizona State University Data Mining and Machine Learning Lab Arizona State University Data Mining and Machine Learning Lab Nov.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
1 European Association for Language Testing and Assessment
Curricular language exams Irish, English, Ancient Greek, Arabic, French, German, Hebrew Studies, Italian, Japanese, Spanish and Russian.
Notes for the teacher: Start off by revising which countries speak which languages. The following slides enable you and the students to build up short.
1.What is a language family?. A group of languages that came from the same ancestor language and have words in common.
Are you sure to know all about the European Union ?
Connecting Users across Social Media Sites: A Behavioral- Modeling Approach Reza Zafarani and Huan Liu KDD’13 Presenter: Changqing Luo, Zhihao Cao, and.
LanguagesLanguages. What is language? A human system of communication that uses arbitrary signals such as voice sounds, gestures, or written symbols.
F ACTORS TO G OOGLE A D S ENSE A PPROVAL By: Aarif Habeeb.
LECTURE 6 Natural Language Processing- Practical.
1 Standardisation supporting cultural diversity: From 5 to 28 STF QD Expanding the language coverage of the ETSI spoken command vocabulary standard. Mike.
Languages of Europe Romance, Germanic, and Slavic.
Mitubishi Chemical Holdings Group
Sales Presenter Available now
The Perfect European.
Sales Presenter Available now
Oracle Supplier Management Solution Product Availability
Member States of the EU Austria
Mitubishi Chemical Holdings Group

EU and multilingualism
Dissemination Working Group Luxemburg 25 & 26 October User support
The Perfect European.
Mitubishi Chemical Holdings Group
Part of Speech Tagging with Neural Architecture Search
Workshop of “Best practices exchanges” Luxemburg February 2011 User support – New organisation Norbert REINERT/ Henric ANSELM.
Sales Presenter Available now Standard v Slim
IATEFL LASIG Local Conference Brno 2018
Statistics Explained goes multilingual
Languages of Europe Today you are going to draw a “tree” that will show the different types of languages that are spoken in Europe. It would be good.
Lars Ballieu Christensen Advisor, Ph.D., M.Sc. Tanja Stevns

Presentation transcript:

REZA ZAFARANI AND HUAN LIU DATA MINING AND MACHINE LEARNING LABORATORY (DMML) ARIZONA STATE UNIVERSITY KDD 2013 – CHICAGO, ILLINOIS

How hard can it be to identify an individual across sites? Privacy Experts Claim Advertisers Know a lot about People Can they stop showing you the same repetitive ads across sites?

More information about individuals Many social media sites Partial Information Complementary Information Better User Profiles Facebook Google+ Age Location Education Huan Liu N/A USA USC ( ) Can we connect individuals across sites? Connectivity is not available Consistency in Information Availability

Can we verify that the information provided across sites belong to the same individual?

MO deling B ehavior for I dentifying U sers across S ites Human behavior generates Information redundancy Information shared across sites provides a behavioral fingerprint MOBIUS - Behavioral Modeling - Minimum Information

Identification Function Minimum information available on ALL sites:Usernames Candidate Username (john.smith) Prior Usernames ({jsmith, john.s})

Behavior 1 Behavior 2 Behavior n Information Redundancy Feature Set 1 Feature Set 2 Feature Set n Generates Captured Via Learning Framework Data Identification Function

59% of individuals use the same username

Identifying individuals by their vocabulary size Alphabet Size is correlated to language: शमंत कुमार -> Shamanth Kumar

QWERTY Keyboard Variants: AZERTY, QWERTZ DVORAK Keyboard Keyboard type impacts your usernames

: N-gram statistical language detector for 21 European Languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene, and Swedish : N-gram statistical language detector for 21 European Languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene, and Swedish Usernames of individuals follow a language distribution European Parliament Parallel Corpus- 40m words per language

Kalambo To avoid redundancy we can use username with maximum entropy

Adding Prefixes/Suffixes, Abbreviating, Swapping or Adding/Removing Characters Nametag and Gateman Usernames come from a language model

Data: 200,000 instances (50% class balance) 414 Features Previous Methods: 1) Zafarani and Liu, ) Perito et al., 2011 Baselines: 1) Exact Username Match 2) Substring Match 3) Patterns in Letters

Discover applications of connecting users across sites Information shared across sites acts as a behavioral fingerprint Human Behavior Results in Information Redundancy Incorporating features indigenous to specific sites A methodology for connecting individuals across sites  A behavioral modeling approach  Uses minimum information across sites  Allows for integration of additional behaviors when required