Presentation on theme: "Bill Gauvin 21-Jan-2009 Exploring MySpace: Measurement and Analysis of the Online Social Network Site."— Presentation transcript:
Bill Gauvin 21-Jan-2009 Exploring MySpace: Measurement and Analysis of the Online Social Network Site
Online Social Network Sites: A Definition Web-based services that allow individuals to (1)construct a public or semi-public profile within a bounded system (2)articulate a list of other users with whom they share a connection (3)view and traverse their list of connections and those made by others within the system [Boyd & Ellison]
The Rise of Online Social Networks (OSN) 1997: SixDegrees allowed users to create profiles, list and surf and friend lists : a number of community tools support profile and friend lists, AsianAvenue, BlackPlanet, MiGente, LiveJournal : business and professional social network emerged, Ryze’s, LinkedIn,
The Rise of Online Social Networks (OSN) 2004:Facebook designed for college networking (Harvard), expanded to other colleges, high schools, and everyone 2003: MySpace attracts teens, bands, among others and grows to largest OSN a global phenomenon: ???
Online Social Networking Goes Mobile Displays locations of friends with their presence status (available, away, etc) visually on maps or on lists
MySpace launched in Santa Monica, CA, in 2003 grew rapidly and attracted Friendster’s users, bands, etc teenagers began joining en masse in 2004 three distinct populations began to form: musicians/artists teenagers post-college urban social crowd purchased by News Corporation for $580M in 2005 arguably the largest online social network site
each user has a profile that contains age, gender, location, last login time, and other information each user has a unique id associated with the profile some profiles claim neutral gender, e.g., bands user can set his/her profile to be private (default is public) or customize the layout
MySpace Profile user can search and add friends to his/her friend list user can post messages to friend’s blog space only friends have access to private profile’s friend list and blog space other functions: IM/Call, Block/Rank User, Add to Group favorite, etc
Measurement: SnailCrawler generate random ids uniformly between 1 and max (15,000,000,000) many ids are not occupied (invalid) retrieve profile information from MySpace (HTTP) name, id, gender, age, location, public/private/custom other information when available for public profiles: company, religion, marriage, children, smoke/drink, orientation, zodiac, education, ethnicity, occupation, hometown, body-type, mood, last login, etc
Measurement: SnailCrawler scan the friend list and record id of each friend scan the blog space and record for each entry (publisher id, time, word/object/image/reference counts) secondary scan to retrieve the profile information of publishers For public profiles
Recorded information of a profile: an example Tantric Daydream USA Male 23 years old all around California MySpace False 6221, , , , , , , Tantric Daydream's Companies Single Someday Straight Virgo Some college White / Caucasian USAF Saint Louis 5' 11" / Athletic pretty 3/12/
Measurement: issues Profile format changes over time Some ids went away? Some data are lost when the program crashes MySpace rate limits IP address for invalid ids
MySpace Analysis Profile Analysis –Distribution of number of Friends and Publishers –Correlation between Friends and Publishers Publishing Analysis –Age/Gender –Times Published –Day Published –Day/Month/Year Published –Distribution of number of blog entries –Distribution of inter-blog time (need this) Content Analysis –Number of words published –Number of HREFs, objects, images used –Distribution of message length (number of words) (need this)
Friends Distribution female male neutral friends distribution of female and male profiles are similar friends distribution of neutral profiles different for female/male profiles, it appears that there are two distinct scaling regimes How to model the two scaling regimes? Can it be modeled as superposition of two power-law distributions? Can the neutral curve be fitted by a power-law distribution?
Publisher Distribution Similar to friends distribution, male/female turning point smoother a sharp turning point for neutral profiles at high end
Number of Blog Entry of Profiles Distribution Can this be modeled by a power-law distribution? Further analysis needed
Publisher Age and Gender age of 16 and under protected by law, aggregated at 0 in the figure teenagers and twenties post most blogs false ages at years old? among teenagers 16-19, female publish more than male; after 20, no significant differences, often male publish more than female
Blog publish time FebSeptDec females publish more than males, and male more than neutral spikes on holidays, e.g., Valentine’s day, Christmas Valentine’s day Christmas
Blog publish time females publish more than males more blogs posted May to Oct slightly more blogs posted during weekdays SunMon JanDecSun Sat
Blog publish time big jump at 1 pm people tend to publish from afternoon well into mid-night peak around 10pm, bottom around 5am
Publisher vs Friend number of friends number of publishers Linear-scale Log-log only friends can publish in a user’s blog, but some profiles have more publishers than friends, i.e., above the 45 degree line. This is because that some friend profiles are removed by themselves or by MySpace after in-activity and hence not counted number of publishers remains relative flat as number of friends increase
Publisher-Friend ratio spikes caused by integer round-up problem? Need finer granularity data.
Blog Contents Analysis # pure word blogs >> hrefs blogs > images blogs > objects females write more words, post more images and references males post more objects words HREFs images objects