Presentation on theme: "Mosaic: Quantifying Privacy Leakage in Mobile Networks"— Presentation transcript:
1Mosaic: Quantifying Privacy Leakage in Mobile Networks Ning Xia (Northwestern University)Han Hee Song (Narus Inc.)Yong Liao (Narus Inc.)Marios Iliofotou (Narus Inc.)Antonio Nucci (Narus Inc.)Zhi-Li Zhang(University of Minnesota)Aleksandar Kuzmanovic (Northwestern University)
2Different information ScenarioDifferent informationDifferent servicesIP1ISP A…IPKCSP BDynamic IP,CSP/ISPDifferentdevices
3Problem Other research work We are here! Input packet traces IP1ISP A…IPKCSP BInput packettracesTessellationMosaicHow much private information can be obtained and expanded about end users by monitoring network traffic?
4I will know everything about everyone! MotivationI will know everything about everyone!IP1ISP A…IPKCSP BAgenciesBad guysMobile Traffic:Relevant: more personal informationChallenging: frequent IP changes
5Challenges How to track users when they hop over different IPs? Sessions:Flows(5-tuple) are grouped into sessionstimeIP3IP2IP1Traffic Markers:Identifiers in the traffic that can be used to differentiate usersWith Traffic Markers, it is possible toconnect the users’ true identities to their sessions.
6Datasets 3h-Dataset: main dataset for most experiments SourceDescription3h-DatasetCSP-AComplete payload9h-DatasetOnly HTTP headersGround Truth DatasetCSP-BPayload & RADUIS info.3h-Dataset: main dataset for most experiments9h-Dataset: for quantifying privacy leakageGround Truth Dataset: for evaluation of session attributionRADIUS: provide session owners
7Via activity fingerprinting Methodology OverviewTessellationIP1ISP A…IPKCSP BMapping fromsessions to usersTraffic attributionViatraffic markersVia activity fingerprintingMosaic constructionNetwork data analysisWeb crawlingCombine information from both network data and OSN profiles to infer the user mosaic.
8Traffic Attribution via Traffic Markers Identifiers in the traffic to differentiate usersKey/value pairs from HTTP headerUser IDs, device IDs or sessions IDsDomainKeywordsCategorySourceosn1.comc_user=<OSN1_ID>OSN User IDCookiesosn2.comoauth_token=<OSN2_ID>-##HTTP headeradmob.comX-Admob-ISUAdvertisingpandora.comuser_idUser IDgoogle.comsidSession IDHow can we select and evaluatetraffic markers from network data?
9Traffic Attribution via Traffic Markers OSN IDs as Anchors:The most popular user identifiers among all servicesLinked to user public profilesOSNSourceSession CoverageOSN1 IDHTTP URL and cookies1.3%OSN2 IDHTTP header1.0%Top 2 OSN providersfrom North AmericaOnly 2.3% sessions contain OSN IDsOSN IDs can be used as anchors, but their coverage on sessions is too small
10Traffic Attribution via Traffic Markers Block Generation: Group Sessions into BlocksOSN IDOther sessions?Session interval δDepends on the CSPδ=60 seconds in our study≥δtimeIP 1IPBlockSession group on the same IP within a short period of timeTraffic markers shared by the same blocktimeIPBlockIP 199K session blocks generated from the 12M sessions
11Traffic Attribution via Traffic Markers Culling the Traffic Markers: OSN IDs are not enoughUniqueness: Can the traffic marker differentiate between users?Persistency: How long does a traffic marker remain the same?Traffic markersmobclix.com#uidOSN1 IDpandora.com#user_id#umydas.mobi#mac-idgoogle.com#sidcraigslist.org#cl_bUniqueness = 1No two users will share the same google.com#sid valuePersistency ~= 1The value of Google.com#sid remains the same for the same user nearly all the observation durationWe pick 625 traffic markers with uniqueness = 1, persistency > 0.9
12Traffic Attribution via Traffic Markers Traffic Attribution: Connecting the DotsTessellation User Ti( )IP 1IP 2IP 3Same OSN IDSame traffic markerTraffic markers are the key in attributing sessions to the same user over different IP addresses
13Traffic Attribution via Activity Fingerprinting What if a session block has no traffic markers?Assumption (Activity Fingerprinting):Users can be identified from the DNS names of their favorite servicesDNS names:Extracted 54,000 distinct DNS namesClassified into 21 classesServiceclassesprovidersSearchbing, google, yahooChatskype, mtalk.googl.comDatingplentyoffish, dateE-commerceamazon, ebaygoogle, hotmail, yahooNewsmsnbc, ew, cnnPictureFlickr, picasa…Activity Fingerprinting:Favorite (top-k) DNS names as the user’s “fingerprint”
14Traffic Attribution via Activity Fingerprinting Fi : Top k DNS names from user as “activity fingerprint”: Uniqueness of the fingerprintNormalized DNS namesY-axis:closer to 1, more distinct the fingerprint isX-axis:normalized by the total number of DNS namesMobile users can be identifiedby the DNS names from their preferred services
17Construction of User Mosaic Mosaic of Real UserSub-classes:Residence, coordinates, city, state, and etc.Least gainMost gainMOSAIC with 12 information classes(tesserae):Information (Education, affiliation and etc.) from OSN profilesInformation (Locations, devices and etc.) from users’ network data
18Quantifying Privacy Leakage Leakage from OSN profiles vs. from Network DataOSN profiles provide static user information (education, interests)Analysis on network data provides real-time activities and locationsInformation from both sides can corroborate to each otherInformation from OSN profiles and network data can complete and corroborate each other18
19Preventing User Privacy Leakage Protecttraffic markersTraffic markers (OSN IDs and etc.) should be limited and encryptedRestrict3rd partiesThird party applications/developers should be strongly regulatedProtectuser profilesOSN public profiles should be carefully obfuscated19
20ConclusionsPrevalence in the use of OSNs leaves users’ true identities available in the networkTracking techniques used by mobile apps and services make trafﬁc attribution easierSessions can be labeled with network users’ true identities, even without any identity leaksVarious types of information can be gleaned to paint rich digital Mosaic about users20
21Mosaic: Quantifying Privacy Leakage in Mobile Network Thanks!Mosaic: Quantifying Privacy Leakage in Mobile Network21