Presentation is loading. Please wait.

Presentation is loading. Please wait.

Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011 1.

Similar presentations


Presentation on theme: "Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011 1."— Presentation transcript:

1 Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011 1

2 Preface For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, they began to publish this vocabulary as linked open data. The New York Times also uses approximately 30,000 tags to power their Times Topics Pages. It is their intention to publish all of these tags as linked open data.has maintainedauthoritative news vocabularieslinked open dataTimes Topics Pagestheir intention Today AOL Government publishes both of those together as linked open data in Spotfire so our readers can more readily browse, search, and download these invaluable data sets! 2

3 data.nytimes.com http://data.nytimes.com/ See next slide People is a 14 MB RDF file! These can be screen scrape into Excel! 3

4 Build Your Own NYT Linked Data Application March 30, 2010, 1:21 PM Build Your Own NYT Linked Data Application By EVAN SANDHAUSEVAN SANDHAUS – That’s It?: So there you have it — all it takes to build a simple linked data application with New York Times Linked Open Data. But remember: this post just focuses on the highlights. We encourage you to take a closer look at the code and dig into some of the more advanced features we didn’t discuss. We hope that you share our excitement about the possibilities of linked data, and we look forward to seeing what you create! http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/ 4

5 Alumni in the News http://data.nytimes.com/schools/schools.html http://topics.nytimes.com/top/reference/timestopics/people/l/frank_lorenzo/index.html http://select.nytimes.com//2005/10/15/business/15nocera.html Opens and Closes Snippet 5

6 “Who Went Where” Code http://data.nytimes.com/code/schools.html 833 lines of code! 6

7 Subject Headings http://data.nytimes.com/home/a.html 7 See next slide

8 Subject Headings http://data.nytimes.com/86075200336035840002 See next slide 8

9 Using Our Linked Data http://data.nytimes.com/home/about.html 9

10 Times Topics http://topics.nytimes.com/topics/reference/timestopics/index.html The New York Times uses approximately 30,000 tags to power our Times Topics Pages. It is our intention to publish all of these tags as linked open data. 10 See next page

11 Times Topics http://topics.nytimes.com/topics/reference/timestopics/all/a/index.html 11 See next page

12 Times Topics http://topics.nytimes.com/top/news/business/companies/a-m-castle-and-company/index.html 12

13 Spotfire Describe the chart, how it’s made: – The Spotfire chart was made by screen scraping the NY Times Subject Headings and Topics into an Excel spreadsheet and importing it into Spotfire. The author decided to place the two listings side-by-side as Tufte suggests to facilitate comparisons. The author also decided to re- create the summary table of Subject Heading categories to see how much change had occurred between January 13, 2010, and July 4, 2011 (very little). How it succeeds or falls short – This single Spotfire chart makes the two lists at the NY Times sortable (click on column headers), searchable (use Filters and facets), and downloadable (click on the down arrow in the table header in the Spotfire Web Player). Add any tips for improving: – The NY Times Topics need URLs (25,389) and the author will find a way to automate that task and will soon finish adding the URLs for NY Time Reporters by-hand. 13

14 Spotfire 14 PC Desktop Spotfire


Download ppt "Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011 1."

Similar presentations


Ads by Google