Presentation on theme: "Cong Ding, Yang Chen*, and Xiaoming Fu University of Göttingen"— Presentation transcript:
1Crowd Crawling: Towards Collaborative Data Collection for Large-scale Online Social Networks Cong Ding, Yang Chen*, and Xiaoming FuUniversity of Göttingen*Duke University
2Significance of social network data crawling Understanding user behaviorsImproving SNS architecturesHandling privacy/security issuesand so on...
3Current data collection methods (1) ISP-based measurement [Schneider IMC’09]Only ISP companies can do that
4Current data collection methods (2) Cooperate with SNS companies [Yang IMC’11]Most research groups do not have chance
5Current data collection methods (3) Crawl data by a single group (and share them to others) [Gjoka INFOCOM’10]Suffering request rate limiting
6Shortages of crawling by a single group Waste computing and network resourcesIntroduce overhead to service providers (and may lead stricter rate limiting)Lack of ground truth for the research community
7Why not collect data collaboratively? A new thoughtWhy not collect data collaboratively?
10ImplementationA proof-of-concept prototype (without the data fidelity part) to crawl in Weibo472 PlanetLab servers as crawlers
11EvaluationIn 24 hours, we have crawled 2.22M users’ data from Weibo, including user profiles, all the posts, all the social connectionsComparison:Fu et al. (PLOS ONE 2013) get 30K user’s data in 6 daysGuo et al. (PAM 2013) get 1M user’s data in 1 monthCrowd CrawlingFu et al.Guo et al.#UIDs/day2.22M5K33K
14Conclusion and Discussion Data sharing may violate some providers’ terms of servicesTwitter does not allow to share data (even for research)Weibo allows to share data among researchersUnlimited data sharing might cause ethical issuesThe data should be anonymizedWe will publish the data crawled in the evaluation