Presentation is loading. Please wait.

Presentation is loading. Please wait.

F o o d i eF o o d i e Marc Greenberg – A study in collecting and parsing recipes…

Similar presentations


Presentation on theme: "F o o d i eF o o d i e Marc Greenberg – A study in collecting and parsing recipes…"— Presentation transcript:

1 F o o d i eF o o d i e Marc Greenberg – mgreenberg@cs.usfca.edu A study in collecting and parsing recipes…

2 Im Hungry!... Enter ingredients at your disposal Foodie lists recipe options Rate recipes It learns what you like, and your eating habits… (thats another presentation)

3 But We Need To Populate The Device Food and recipe database needed Collect and parse recipes instead of manual entry Recipe collection from different sources –Predictable vs. non-predictable URLs –Regular vs. irregular recipe format

4 Collecting Recipes Two types of crawlers (written in python) –URL Substitution: Epicurious.com, http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311 http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311

5 Collecting Recipes Two types of crawlers (written in python) –URL Substitution: Epicurious.com, http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311 http://www.epicurious.com/recipes/recipe_views/printer_friendly/11311 –Link Crawler: RecipeSource.com (serving, title, minute, hour,.6) http://www.recipesource.com/fgv/rice/03/rec0362.html http://www.recipesource.com/fgv/rice/03/rec0362.html FoodNetwork.com, (recipe, serving, yield, time, print, minute,.8) http://www.foodnetwork.com/food/recipes/recipe/0,,FOOD_9936_17273,00.html Need to identify good and bad pages

6 Finding the Ingredients Induction wrappers Layout Character and grammar structure

7 Parsing Recipe metadata –Title, summary, serving size, prep time, etc. Ingredient list –Amount, unit, food item Directions

8 Existing Software MasterCook TM, leading software product Manual import features Slow full text search Starting database has just over 8000 recipes

9 ? Questions


Download ppt "F o o d i eF o o d i e Marc Greenberg – A study in collecting and parsing recipes…"

Similar presentations


Ads by Google