Presentation is loading. Please wait.

Presentation is loading. Please wait.

Common Crawl : enabling machine-scale analysis of web data Lisa Green Kurt Bollacker Jordan Mendelson IIPC 2014-05-19.

Similar presentations


Presentation on theme: "Common Crawl : enabling machine-scale analysis of web data Lisa Green Kurt Bollacker Jordan Mendelson IIPC 2014-05-19."— Presentation transcript:

1 Common Crawl : enabling machine-scale analysis of web data Lisa Green Kurt Bollacker Jordan Mendelson IIPC

2

3 Photo license: Public Domain Origin:

4 Photo license: CC BY-SA

5 Photo license: CC-BY-NC https://www.flickr.com/photos/malloreigh/

6 Photo license: CC-BY-SA Origin:

7 Enable machine scale access and analysis of web data for everyone “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

8 Web Data Commons: “Extracting Structured Data from the Common Crawl” “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

9 WikiEntities (Han Xiaogang) In What Context Is a Term Referenced? “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

10 WikiEntities Example: Discography Who are the most popular artists? “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

11 How Easily Can Google Analytics Track Our Browsing? (S. Merity, C. Hornbaker) “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

12 Data Publica: Finding French Open Data “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

13 Commercial Applications: Improved Spell Checking “Those who do not understand Unix are condemned to reinvent it, poorly.” - Henry Spencer

14 may be too domain specific Photo license: CC-BY-NC-ND https://www.flickr.com/photos/blueforce4116/

15 Photo license: CC BY-SA

16 Photo license: CC-BY Origin:

17

18

19

20

21

22 Image license: CC BY-SA https://www.flickr.com/photos/xdxd_vs_xdxd/ https://www.flickr.com/photos/xdxd_vs_xdxd/

23 Photo license: CC-BY-SA https://www.flickr.com/photos/hackny/ https://www.flickr.com/photos/hackny/

24

25

26 Thank You


Download ppt "Common Crawl : enabling machine-scale analysis of web data Lisa Green Kurt Bollacker Jordan Mendelson IIPC 2014-05-19."

Similar presentations


Ads by Google