Making Mashups with Marmite Jeff Wong Jason I. Hong Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
1 CS 501 Spring 2002 CS 501: Software Engineering Lecture 11 Designing for Usability I.
Advertisements

RETRIEVING DATA FROM FCC LICENSE DATABASE Steps for obtaining query results, and importing it into MS Excel Spreadsheet.
End User Mashup Programming Environments Oleg Beletski HUT, Telecommunications Software and Multimedia Laboratory
Development and Evaluation of Emerging Design Patterns for Ubiquitous Computing Eric Chung Carnegie Mellon Jason Hong Carnegie Mellon Madhu Prabaker University.
DENIM: Finding a Tighter Fit with Web Design Practice James Lin, Mark W. Newman, Jason I. Hong, James A. Landay April 6, 2000 CHI 2000, The Hague
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Russell Taylor Lecturer in Computing & Business Studies.
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
SM3121 Software Technology Mark Green School of Creative Media.
Tutorial 11: Connecting to External Data
Overview of Search Engines
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 16 Slide 1 User interface design.
Creating a SharePoint App with Microsoft Access Services
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Lecturer: Ghadah Aldehim
Platforms for Learning in Computer Science July 28, 2005.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Advanced File Processing
Object Matching With Faces CS460 Project Presentation By Sam Buyarski.
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
WorkPlace Pro Utilities.
CPSC 203 Introduction to Computers T59 & T64 By Jie (Jeff) Gao.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
A Web Crawler Design for Data Mining
The Effectiveness of Web Components Presented By: Geoffrey Zimmerman Computer Science Capstone Fall 2004/Spring 2005 Mentor: Dr. C. David Shaffer.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Web Mashups Presented By: Saket Goel Uni: sg2679.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Unit 2, cont. September 12 More HTML. Attributes Some tags are modifiable with attributes This changes the way a tag behaves Modifying a tag requires.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
Software. Generic Software  e.g. word processing, spreadsheet and database. – This simply implies that any of the dozens of spreadsheet packages, for.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Enhancing the Web With End-User Programming Tak Yeon Lee, Ben Bederson.
Prof. Jason Hong, Carnegie Mellon University Rapid End-User Programming and Visualization for the Web IDA Session CS Study Panel 24 April 2008.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
1 Peter Fox GIS for Science ERTH 4750 (98271) Week 10, Friday, April 6, 2012 Lab:
The Birth & Growth of Web 2.0 COM 415-Fall II Ashley Velasco (Prince)
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Deep Web Exploration Dr. Ngu, Steven Bauer, Paris Nelson REU-IR This research is funded by the NSF REU program AbstractOur Submission Technique Results.
Prototyping. REVIEW : Why a prototype? Helps with: –Screen layouts and information display –Work flow, task design –Technical issues –Difficult, controversial,
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Build a database V: Create forms for a new Access database Overview: A window into your data So far in this series of courses, you’ve built tables, relationships,
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
D. Heynderickx DH Consultancy, Leuven, Belgium 22 April 2010EuroPlanet, London, UK.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
Word 2007® Business and Personal Communication How can Microsoft Word 2007 help you work with others?
Data analytics and mash-up Real time analytics of employment data Team Shadowfax 1/25/2016 CMPE Class Project 0.
CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.
By Ishaq Shinwari.  Designing a web site needs careful thinking and planning  The most important thing is to KNOW YOUR AUDIENCE  You Must Design website.
User Interfaces and Information Retrieval Dina Reitmeyer WIRED (i385d)
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Lesson 11: Web Services and API's
GO! with Microsoft Office 2016
GO! with Microsoft Access 2016
UNIT 15 Webpage Creator.
Guide To UNIX Using Linux Third Edition
Lesson 11: Web Services and API's
Spreadsheets, Modelling & Databases
RECHORDS Assignment #6: Med-Fi Prototype
Tutorial 7 – Integrating Access With the Web and With Other Programs
Tiffany Ong, Rushali Patel, Colin Dolese, Joseph Lim
Presentation transcript:

Making Mashups with Marmite Jeff Wong Jason I. Hong Carnegie Mellon University

The Big Picture Problem Lots of content out there on the web –But not always in a form amenable to your needs –Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center Two observations: –In many cases, all of the data and services people need already exist, but not connected together –Unlikely that a web site can predict all possible needs

A Solution: Mashups Rapidly growing community of users creating “mashups” combining content from multiple web sites –Ex. Housingmaps.com

A Solution: Mashups Rapidly growing community of users creating “mashups” combining content from multiple web sites –Ex. Housingmaps.com –Ex. MySpace child predators –Ex. Friendster locations –Ex. Most popular videos on YouTube, Yahoo Video, …

A Solution: Mashups Rapidly growing community of users creating “mashups” combining content from multiple web sites –Ex. Housingmaps.com –Ex. MySpace child predators –Ex. Friendster locations –Ex. Most popular videos on YouTube, Yahoo Video, … ProgrammableWeb.com statistics –~1500 mashups created since April 2005 –356 open web-based APIs available

But Creating Mashups is Hard Requires lots of skill to create a mashup –Ex. Housingmaps creator has PhD in computer science –Ex. MySpace child predator list took months Requires programming expertise in many areas –Web crawling –Text parsing –Pattern matching –Databases –HTML

Marmite End-User Programming for Mashups Main idea: make it easy to create web mashups Use a dataflow approach connecting small operators –Inspired by Unix pipes and Apple’s Automator Example: –Get all events from Upcoming.org –Filter out events that are too old –Put them all onto a map Runs inside of a standard web browser

Set of Operators

Data Flow View

Data View

Using Marmite (Envisioned) Extract content from one or more web pages –names, addresses, dates, phone #, URLs Process it in a data flow manner –filtering out values or adding metadata –integrating with other data sources (similar to a database join operation) Direct the output to a variety of sinks –databases, map services, text files, visualizations, web pages, or source code that can be further edited

Marmite Motivation and Examples Features and Design Rationale User Evaluation

Features and Design Rationale Conducted a series of quick evaluations to understand design space and potential problems –Automator –Lo-fi prototypes

Automator

Informal Automator Evaluation Had three novices try three simple web-based tasks –Warm-up task –Traverse a set of web pages –Download a set of images Some findings: –Some difficulties knowing how to start and what to do next –Little feedback about state of system between operations –Difficult to iterate due to network speed issues

Lo-Fi Prototypes 6 paper prototypes with 20 participants

Design Solutions Problem: how to start and what to do next Solution: Suggest next actions –Weak data typing to find types (addresses, numbers, etc) –Filter operators to only show relevant ones –Suggest operators that might be applicable

Design Solutions Problem: little feedback about state of system between operations Solution: link data flow and data view together –Many systems take program-centric view (ex. Automator) or data-centric view (ex. spreadsheets) –Use hybrid data flow / data view, showing an operation and its effects together –Data view usually “spreadsheet”, other views possible too (for example, maps)

Design Solutions Problem: difficult to iterate due to network speeds Solution: cache data, let people “replay” data –Reload, pause, play

Other Design Findings Screen real estate issues –Collapsible operators, leaving a readable label

Extracting Generic Content Can’t have pre-defined extractor operators for every possible web site –Need a more general way of extracting data from pages Developed a generic wizard UI for selecting links –Content from that set could be extracted via other operators –Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages Finds “groups” of related web content based on how HTML is structured

Marmite

Operators Operators have input types –Operator uses this to guess which columns it wants Operators have output types

Implementation JavaScript (for underlying code) and Extensible Binding Language (XBL for UI) Operators currently in JavaScript –Ideally could be scriptable in any programming language –Currently ~15 operators

Marmite Motivation and Examples Features and Design Rationale User Evaluation

Evaluation Informal user study with 6 people –2 novices –2 people with spreadsheet experience (formulas) –2 people with programming experience Tasks (in increasing difficulty) –Warmup task showing how to retrieve a set of addresses and how to geocode an address –Search for and filter out events further than a week away –Compile a list of events from two event services and plot them on a map –Recreate the housingmaps site

Results Three people able to complete all tasks in ~1 hour –First two users confused about suggested actions (automatically popped up, made manual for other 4 users) –Novice made some progress, not able to finish all tasks Able to re-create housingmaps in ~15 minutes

Marmite

More Results Biggest barrier was understanding the data flow –Did not understand input and output concept –Applied operators as one-off, did not realize that it was a static representation of flow –Did not understand data flow and data view were linked

Future Directions Short-term –Better screen-scraping operators –More operators –Better connection with web services (WSDL and REST) –Better help for starting a data flow Long-term –Intelligence analysis –Better visualizations –Location-based services

Conclusions Marmite, a tool for creating web-based mashups –Extract content from one or more web pages –Process it in a data flow manner –Direct the output to a variety of sinks Hybrid data flow / data view User evaluation shows some promising results Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End- User Programming, CHI 2007

Marmite

Types of Operators Sources –Add data into Marmite by querying databases, extracting information from web pages, and so on. Processors –modify, combine, or delete existing rows. Example operators include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well Sinks –redirect the flow the data out of Marmite. Examples include showing data on a map, saving it to a file, or to a web page.