Presentation is loading. Please wait.

Presentation is loading. Please wait.

USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA Haoran Yu, Guangzhong Sun, Min Lv.

Similar presentations


Presentation on theme: "USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA Haoran Yu, Guangzhong Sun, Min Lv."— Presentation transcript:

1 USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA Haoran Yu, Guangzhong Sun, Min Lv

2 Outline Introduction Data Methods Find the longest inactive time span to discover sleeping time from dense data Statistics method in discovering sleeping time pattern from sparse data United time series by month Subsequence statistics Experimental Results General result of sleeping time Further analysis Detect time zone of Sina Weibo users Detection on movement between different time zones Related Applications Conclusion

3 Introduction(1) Goals Inferring the sleeping time of Sina Weibo users Enable general location(time zone) prediction

4 Introduction(1) Goals Inferring the sleeping time of Sina Weibo users Enable general location(time zone) prediction Motivation The increasing availability of user-generated contents. User-generated posts Time series of activities

5 Introduction(1) Goals Inferring the sleeping time of Sina Weibo users Enable general location(time zone) prediction Motivation The increasing availability of user-generated contents. User-generated posts Time series of activities Users activities reflect their sleeping time Posts: Goodnight, Sleep, or Guess How Much I Love You, etc Time series: Active time & Inactive time Mid-nightSleep 20:05 Beijing Time 23:51 Beijing Time 13:42 Beijing Time – 00:42 Central Time

6 Introduction(1) Goals Inferring the sleeping time of Sina Weibo users Enable general location(time zone) prediction Motivation The increasing availability of user-generated contents. User-generated posts Time series of activities Users activities reflect their sleeping time Posts: Goodnight, Sleep, or Guess How Much I Love You Time series: Active time & Inactive time Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well) (Sleep at night)

7 Counting Sheep with Path Data Science – Aug 23, 2012 By Path Data Team http://blog.path.com/post/30041197400/counting-sheep-with-path-data-science

8 Counting Sheep with Path Data Science – Aug 23, 2012 By Path Data Team http://blog.path.com/post/30041197400/counting-sheep-with-path-data-science

9 Art related worksEconomy & finance Different time to go to bed Different time to wake up Different active degree

10 User live in western China User live in western Europe

11 Introduction(1) Goals Inferring the sleeping time of Sina Weibo users Enable general location(time zone) prediction Motivation The increasing availability of user-generated contents. User-generated posts Time series of activities Users activities reflect their sleeping time Posts: Goodnight, Sleep, or Guess How Much I Love You Time series: Active time & Inactive time Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well) (Sleep at night)

12 Introduction(1) Goals Inferring the sleeping time of Sina Weibo users Enable general location(time zone) prediction Motivation The increasing availability of user-generated contents. User-generated posts Time series of activities Users activities reflect their sleeping time Posts: Goodnight, Sleep, or Guess How Much I Love You Time series: Active time & Inactive time Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well) (Sleep at night) Significance Detect users with fake location (Distinguish real person and robots) The change of location => advertisement

13 Introduction(2) Difficulty & Challenges User are not always active => data are sparse Aspects other then location => influence the sleeping time Contribution and insights A new open problem related to the relationship between sleeping time and micro- blogging activities is presented Preliminary simple solutions are presented in this paper. Based on experiment results on real data set. A data set of users time series of activities on Weibo is collected and open to the public.

14 Data Source: Sina Weibo Sina Weibo (http://weibo.com/), as a micro-blogging platform, akin to a hybrid of Twitter and Facebook; it is used by well over 30% of 513 million users (Up to the end of year 2011) in China. It has a similar market penetration that Twitter has established out of China. User type: Verified users with daytime work jobs > 500 fans and >100 posts Total number of users: 844 Total number of posts: 1, 579, 623 Form of Data => each corresponds to an action of user (temporarily only posts) UIDTimestamp

15 Methods: Find the longest inactive time span to discover sleeping time from dense data IDEAL CONDITIONS: people sleep for a long time once a day and keep active in the rest of time. -> can't find out two or more long inactive span -> D = {d 1, d 2,…, d n-1 } determined by every two points can be defined Therefore, a final span (d') ̅ representing the average sleeping time of a user can be figured out by calculating the average lower bound and upper bound of all result spans. t 1 t 2 t 3 t 4 t n-2 t n-1 t n

16 Methods: Statistics method in discovering sleeping time pattern from sparse data IDEAL CONDITIONS -> Less than 1% of Weibo users can satisfy Statistics 1440-minutes-period (60 X 24) Daily data of time series are united together by a section(week/month/quarter…) Day 1 Day 2 … Day n

17 Find the longest inactive time span to discover sleeping time from dense data - United time series by month In order to solve this problem in this case, daily data of the time series are united by month for example, as new sequences Tm = {tm1, tm2 … tmn}. In each of the time series data of month m, the lengths of spans D m = {d m1, d m2, …,d mn } determined by every two points can be defined as: long inactive spans can be easily obtained from the time series data of the corresponding month. t m1 t m2 t m3 t m4 … t m(n-2) t m(n-1) t mn d mn d m1 d m2 d m3 … d m(n-2) d m(n-1) d mn

18 Find the longest inactive time span to discover sleeping time from dense data - Subsequence statistics ! The sparsity of time series data is different from different users. ->The chosen length of the section for doing statistics cannot match all the circumstances. Subsequence statistics Figure out all inactive spans of time Move start point & end point of subsequence Make every subsequence a ideal condition Accuracy+ Efficiency – ->Cost too much time

19 Further analysis: User with fake or unclear location label Sina Weibo users label their location in the profile without verification. Some of them label the location as 'unknown' or as fake locations. 12 users are judged to have obvious fake location labeled in their profiles. Fake location within the real time zone cant been detected (Beijing & Shanghai will not viewed as different location here) The List are as follows: 9775521, 1025882825, 1092620107, 1191220232, 1194431627, 1215914717, 1222524197, 1228151340, 1686658697, 1686719832, 1697717391, 1713138055

20 Labels his location as 'The United States, Oversea' (GMT-8~GMT-5) Verified User. Xiaosong Gao, a famous musician of mainland China Verified User. Xiaosong Gao, a famous musician of mainland China

21 The result shows that he sleeps at about 0 a.m. (GMT+8) and wakes up at about 8 a.m. (GMT+8). The corresponding time zone is GMT+7. Therefore, the location Mr. Gao labels himself is fake according to the difference between the location in his profile and the detected time zone.

22 Further analysis: Detection on movement between different time zones There are users who study abroad or have business overseas. They travel far, but they won't change the label of location in the profile page frequently. 33 users are likely to have obvious movement (move from locations in different time zones) Movement within a time zone cant been detected The List are as follows: 1025722364, 1025882825, 1043898503, 1059500987, 1061252391, 1094551220, 1120630263, 1152571540, 1159282133, 1191220232, 1193715355, 1214305597, 1216517652, 1217907573, 1218288163, 1221281087, 1222524197, 1222838993, 1229162931, 1682771764, 1689243304, 1689647122, 1693141730, 1693775877, 1695248214, 1697033804, 1697370827, 1702062500, 1703580823, 1706608200, 1711754322, 1711962552, 1712058193

23 Labels his location as Brazil, Oversea (GMT-4~GMT-2) Labels his location as Brazil, Oversea (GMT-4~GMT-2) Verified User. Xiaoqian Liu, a reporter of CCTV Brazil Verified User. Xiaoqian Liu, a reporter of CCTV Brazil

24 The result indicates that, before Sept. 2011, he goes to bed at about 0 a.m. (GMT+8) and wakes up at about 6 a.m. (GMT+8). The corresponding time zone is GMT+8~GMT+9. After Sept. 2011, the time he starts sleeping change to about 9 a.m. (GMT+8) and the sleep period end at 4 p.m. (GMT+8). The corresponding time zone is GMT-3.5 ~ GMT-2.5. Thus, the obvious movement is detected and the time of this change can be estimated.

25 Related Applications The study can be extended to other data sets. Web visiting logs on the web server Users or robots? IP address of users or IP address of proxy/VPN Data of user login activates Dont lock my account when I use a proxy please… May be useful to the recommendation Users who share a same sleeping time is more likely to meet on social network. Why not recommendation their feeds first? Watch your health!

26 Conclusion A new open problem related to the relationship between sleeping time and micro-blogging activities is presented. This study associates the pattern of sleep and the time zone users live in. For such a problem above, preliminary simple solutions are presented in this paper. Based on experiment results on real data set, the proposed solutions are proved both efficient and effective.

27 Thanks Questions Welcome Thanks A data set of is open to the public. The URL is: http://www.haoranyu.com/research/weibosleep Questions Welcomehttp://www.haoranyu.com/research/weibosleep


Download ppt "USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA Haoran Yu, Guangzhong Sun, Min Lv."

Similar presentations


Ads by Google