Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing
Large Scale Statistics in Internet Behaviors Hongguang Bi Greetingland, LLC Los Angeles, CA
Internet and WWW History, how it works Internet User Behaviors & Privacy Online Advertising Geo, contextual and behavior targetings, Real-time bidding, Yield management Chapter 1 About Collect User Information, what and how Chapter 2 Chapter 3 Chapter 4
Cosmology: Nature defines physical laws Internet: Human defines laws (or specifically: protocols) Chapter 1: Internet and WWW Cosmology: photons, electrons, neutrinos … (monad? Leibniz) Internet: bit Cosmology: particles => stars => galaxies => clusters etc. Internet: bits => bytes or integers => words => pages & s Cosmology: millions of galaxies detected => billions Internet: millions to billions of users Cosmology: goal=> structures, statistics of galaxies Internet: goal=> behaviors, statistics of users Cosmology: Real World Internet: Information World, or Virtual World
Open Systems Interconnection Model: 7 layers TCP, UDP IP HTTP Encrypt
Information Age: Web and WWW: March 1989, Tim Berners-Lee http 0.9: 1995; http 1.0: 1996; http 1.1: June 1999, RFC 2616 Mailbox Protocol: 1971 SMTP: 1982, RFC 821 Later developments: UUCP, sendmail,
User sends request URL Address Browser (Firefox, IE, Mobile etc.) Language, who refers you, etc. Cookies Web server responses Message body Message size, modified time etc. Server information Setup cookies http, how web works Cookie is the only way that server can insert data into user’s browser. How does it work? Client: send request without cookie; Server: response with a “Set-Cookie” header, containing some information Client: send request with a “Cookie” header containing the SAME information Cookie is bound to the specific server, and can be multiple
Chapter 2: User Behaviors & Privacy 1 Billion internet users: few hundred millions in Europe, 100M in US, China IP4 is full, which is 2^32 = 4.3 Billion addresses Google gets 80 billions views every day, e.g. one internet user visits about 1 Google page very day (e.g. search, , ad) Internet brings new economics, life styles, and social phenomena. E.g. online shopping, social network (facebook), newspaper and publication, US elections For the 1 st time in history, human beings might lose privacy; and their social activities can be tracked, studied, finally, manipulated by powerful players such as US government or Google etc.
Cases: Currently: “Tracking case”, Apply & Google Information is transmitted securely to the Apply iAd server via a cellular network connection or Wi-Fi Internet conneciton,” explained a letter Apple sent to US Rep Edward Marke, D-Mass., on July 12 in response to his request for information, “The latitude/longitude coordinates are converted immediately by the server to a five-digit ZIP code” “Suicide” case, mySpace On the technical side, Credit card industry has successfully built up tracking tools that track user behaviors for 20 year!
You definitely expose Geographic information (via IP) OS and Browser, such as PC, Linux, iPhone Language May lost, protected by laws You name, identity cards (credit card, SSI, driver license etc.) Via online shopping sites, government/university service sites, credit report sites, dating sites etc. practically, still be stolen => virus, spyware, break-in May lose, un-protected Demographic information e.g. age, gender, income, household Via ISP, or cellular service provider, social network sites, other Free services What kind of Private Information?
User Profile Uniquely identified by an anonymous ID The ID is tracked by using cookie and permanently saved in disk Every ID has a profile, consisting of geographic information, demographic information, interests, shopping histories, recent behavior types (or, audiences) => any valuable information for advertisers Existing Techniques Relational Database Moving averages Artificial neural Network Chapter 4: Collect User Information
Relational Database A database consists of many “normalized” tables A table consists of a primary key and multiple values One table can have many keys to search ResearchGroup: group_id, name, desciption, head Member: member_id, group_id, name, type (profession, postdoc, student), status (current, left) Left: left_id, member_id, when, where
Moving Average A new value is an average of the last N detections, with weights that decay on time. A simplified time-series analysis tool
Artificial Neural Network Machine learning Training: 3,5 => 15 4,6 => 24 9,8 => 72 …. ….. 6,7 => 41 Neurons work in parallel => very fast
The Good side of tracking Chapter 5: Online Advertising
The Good side of user tracking Current Challenges server process 10,000 requests per second for each request, update user profile with 100 attributes pick up one from 100 possible advertiser candidates 10^8 decisions per second 100 million impressions per day The system we are developing
Statistics => dynamic, finding rules, clustering analysis, time-series analysis Instant change of behaviors, e.g. shopping intention How are behaviors affected by environment : social effect, “friend- recommendation” effect THANKS! In the Future