Learning Patterns on the World Wide Web Andrew Hogue Advisor: David Karger October 17, 2003
Agenda What is a pattern? How do we make one? How do we use it? Why do you want one? Demo
What is a pattern? Objects in the world have certain semantic properties A pattern is a way of recognizing the semantic properties of an object we’ve seen before A pattern is a structure with semantic slots to be filled in
Example – Books Define an object’s semantics (ontology): Class: Book Property: Author Property: Title Property: Price Property: Publisher Property: ISBN...
Class: Book Property: Author Property: Title Property: Price Property: Publisher Property: ISBN... Example - Books ? ?
Class: Book Property: Author Property: Title Property: Price Property: Publisher Property: ISBN...
Example - Books Class: Book Property: Author Property: Title Property: Price Property: Publisher Property: ISBN...
Creating a Pattern Choose positive examples
Creating a Pattern
Choose positive examples Find best mapping between examples
Creating a Pattern
Choose positive examples Find best mapping between examples Merge mapped elements and assign semantic labels
Creating a Pattern
Choose positive examples Find best mapping between examples Merge mapped elements and assign semantic labels Eliminate unmapped elements
Creating a Pattern
Matching Patterns Given a pattern with slots and a page to search Look for items on page with same structure Map pattern slots to page text
Matching Patterns
Applications Extract search engine results Extract and news headlines Watch sites for updates Reformat sites for easier reading Monitor bank account balances
Demo
More Information