Presentation is loading. Please wait.

Presentation is loading. Please wait.

Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

Similar presentations


Presentation on theme: "Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels."— Presentation transcript:

1 add image

2

3 3 “ Content is NOT king ”

4 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels 4 How to search content?

5 Infinite Choice = Overwhelming Confusion Filters required to connect users with content that appeal to their interests 5

6  Trends in video services  Users generate new videos  Users help each other finding videos  Need to understand users and contents  Video characteristics in YouTube  User-behavior and potential for recommendations 6

7 “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]  Plethora of YouTube clones  UGC is very different How different? 7

8  Massive production scale 15 days in YouTube to produce 120-yr worth of movies in IMDb!  Extreme publishers 1000 uploads over few years vs. 100 movies over 50 years  Short video length 30 sec–5 min vs. 100 min movies in LoveFilm the rest: consumption patterns 8

9  Despite Web 2.0 features, user participation remains low  Only 0.16%-0.22% viewers rate videos/comment.  47% videos have pointers from external sites  But requests from such sites account for less than 3% of the total views 9

10  Potential for recommendation systems?  Popularity evolution  Content Duplication  Crawled YouTube and other UGC systems metadata: video ID, length, views 1.6M Entertainment, 250KScience videos 10 Goals Data

11 Static popularity characteristics Underlying mechanism 11

12 12 Normalized video ranking Fraction of aggregate views Other online VoD systems show smaller skew!  10% popular videos account for 80% total views

13  Richer-get-richer principle If video has K views, then users will watch the video with rate K 13 - word frequency - citations of papers - scale of earthquakes - web hits City population (log) Frequency (log) y=x a

14 14  Straight-line waists and truncated both ends

15 15  Why popular videos deviate from power-law?  Fetch-at-most-once [SOSP2003]  Behavior of fetching immutable objects once cf. visiting popular web sites many times

16  Natural shape is curved  Sampling bias or pre-filters  Publishers tend to upload interesting videos  Information filtering or post-filters  Search results or suggestions favor popular items 16

17 17  Videos exposed longer to filtering effect appear more truncated video rank

18 18 Science videos Zipf Log-normal Exponential Zipf + exp cutoff  Matlab curve fitting for Science

19 19 Science videos Zipf Log-normal Exponential Zipf + exp cutoff Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks  Matlab curve fitting for Science

20 ” Latent demand for products that is suppressed by bottlenecks in the system [Chris Anderson, The Long Tail] 20 “ Rankings Views Entertainment 40% additional views! How? Personalized recommendation Enriched metadata Abundant videos

21 Relationship between popularity and age 21

22  So far, we focused on static popularity  Now focus on popularity dynamics  How requests on any given day are distributed across the video age?  6-day daily trace of Science videos  Step1- Group videos requested at least once by age  Step2- Count request volume per age group 22

23 23 User preference relatively insensitive to age --> 80% requests on videos older than a month The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly

24 Level of duplication Birth of duplicates 24

25  Alias- identical or similar copies of the same content  Aliases dilute popularity of a single event  Views distributed across multiple copies  Difficulty in recommendation & ranking systems  Test with 51 volunteers  Find alias using keyword search  Identified 1,224 aliases for 184 original videos 25

26 26  Popularity diluted up to few-orders magnitude  Often aliases got more requests than original (e.g. alias got >1000 times more requests)

27 27  Significant aliases appear within one week  Within the first day of posting the original video, sometimes you get more than 80 aliases

28  UGC is a new form of video social interaction  User interaction remains low  Lots of potential for social recommendations 28

29 Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html 29


Download ppt "Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels."

Similar presentations


Ads by Google