Presentation on theme: "The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences."— Presentation transcript:
The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences
Overview Problem: Browser Data Management Solution: Provenance for Web Browsers Use Cases Details and Challenges Implementation
The Modern Browser: A Super-Application Originally a distributed document reader. But now most documents are distributed. And the definition of “document” has changed: –Webmail –YouTube –Google Apps It is difficult for users to manage all this data. –e.g., recall a specific web page.
Browser Data Management (I) A “little big data” problem… –My history: ~25k objects in ~2 months. –Tractable for computers, but not for users. Traditional solution: Bookmarks. –Requires users to tag their data in advance… –…and to manage the bookmarks. Advanced solutions: –History Search (Google Chrome’s “New Tab” page) –Autocompletion (form history, saved passwords)
Browser Data Management (II) Firefox 3’s “Smart Location Bar” from Most solutions powered by history and usage statistics. “History and usage statistics” = provenance.
Traditional Browser History
Web Graphs (Firefox 3 Places)
Use Case: Contextual History Search Most history search is textual Edges imply contextual relationships. –E.g. “rosebud” “Citizen Kane”. 2-phase contextual search (Shah et. al): –Perform a textual history search. –Then, push the weight of results to neighbors. Similar to modern web search… –And good for the same reasons.
Use Case: Personalizing Web Search Context is created by the user. –So a gardener relates “rosebud” “flower”. –Frustrating if Google returns “Citizen Kane”. Browser could clarify context to search engine! –Naïve: Just insert “flower” into “rosebud” searches. –If engine had a better interface, we could do better. Personalization with privacy. –Browser knows more about user than cookies can. –No need to give third parties raw personal data.
Use Case: Time-Contextual History Search Current histories can’t recreate prior state. –e.g., “were these two pages open simultaneously?” Time relationships… –Are natural: “rosebud, and I think I was also looking at gardening tools around that time.” –Narrow the search space a great deal. Related Work: –Gyllstrom and Soules’ “SeeTrieve” –Dumals et. al’s “Stuff I’ve Seen”
Use Case: Download Lineage Need to know where data comes from. –For source attribution, finding updates, etc. URL is not always sufficient. –“This image came from…ImageShack!” This is exactly what provenance is for! –Just query ancestors!
Conclusion Browsers record many statistics. These statistics are provenance records. Provenance techniques can improve: –History search, via context. –Web search, via personalization. –Data management, via lineage. Some details in the paper. Excruciating details in future work.