Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

Similar presentations


Presentation on theme: "Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University."— Presentation transcript:

1 Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University

2 2 Contextual inquiry: What challenges do end users face? Observed 3 administrative assistants, 4 managers, and 3 webmasters/graphic designers (1-3 hrs, each) Background  Toped  Evaluation  New Opportunities

3 3 One person’s task: validate web forms-- but he didn’t know JavaScript / regexps Is the input valid? “EDSH 225” Is the input questionable? “GATE 225” Or is it obviously invalid? “412-555-5444” Background  Toped  Evaluation  New Opportunities 3

4 4 Hurricane Katrina “Person Locator” site: Many inputs unvalidated Background  Toped  Evaluation  New Opportunities 4

5 5 Spreadsheets contain lots of typos: inconsistent formatting & invalid strings Above: part of an actual spreadsheet on our university’s web site Plenty of invalid strings in users’ spreadsheets during contextual inquiry For thousands of other examples: EUSES Spreadsheet Corpus Background  Toped  Evaluation  New Opportunities

6 6 Needed: a usable mechanism for implementing validation 6 Background  Toped  Evaluation  New Opportunities

7 7 Coming Up… Background –Formative pilot study –Related work Toped Evaluations –Usability –Expressiveness New opportunities

8 8 Formative pilot study Motivation: Exploring the “gulf of execution” for data –User has to figure out how to map intentions to the features provided by a computer system –Poor “closeness of mapping” impedes system use  Before designing system, probe the concepts and terminology familiar to users Asked 4 administrative assistants to verbally describe two kinds of data –American mailing addresses –University project numbers Background  Toped  Evaluation  New Opportunities

9 9 Formative pilot study Participants identified and named the parts of data Eg: Street address, city, state, zip code –They hierarchically refined parts until sub-parts became small enough that they lacked names At that point, they described parts with constraints –Constraints were sometimes “soft”: not always true –They used adverbs of frequency to indicate softness Eg: “usually” or “sometimes” Implications –Users describe data in terms of constrained parts –Valid data sometimes violate certain constraints Background  Toped  Evaluation  New Opportunities

10 10 Alternate approaches: limited support for expressing constraints on structured strings Grammars based on sequences of characters –Context-free grammars (CFGs) Grammex Apple data detectors (CFGs + regexps) –Regular expressions (regexps) SWYN regexp editor Lapis patterns: constrained structured strings –Intentionally designed to support outlier finding @PhoneNumber is Number equal to /\d\d\d/ then "-" then Number equal to /\d\d\d\d/ ignoring nothing Background  Toped  Evaluation  New Opportunities

11 11 1. Name 2. Describe 3. Test 4. Save 11 Background  Toped  Evaluation  New Opportunities Toped: A form fill-in UI to mediate between users and grammars

12 12 The system generates an augmented CFG from format description A part that almost always has 1-8 lowercase letters: #WORD : #CHLIST : COUNT(#CH)>=1 && COUNT(#CH)<=8 {90} #CHLIST : #CH | #CH #CHLIST #CH : a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z More compact than a pure CFG More expressive than a pure CFG –Some constraints are impossible to represent as CFG –Some constraints need to be soft Background  Toped  Evaluation  New Opportunities

13 13 Testing strings against grammars Downgrade a parse if it violates constraints –Penalty = 1 – (strength of constraint)/100 –Multiply penalties –Propagate penalties up parse tree –Choose best parse (ie: parse with least penalties) Show error messages –Track violated constraints, concatenate into message If parse fails completely, show portions of format description that were used to generate unsatisfied CFG productions. –End-user development tools may offer user option of overriding some errors, depending on penalties. Background  Toped  Evaluation  New Opportunities

14 14 Showing error messages after testing strings against the generated CFGs 14 Background  Toped  Evaluation  New Opportunities

15 15 Usability: Does Toped help users to implement string validation? Between-subjects lab experiment –Direct comparison system: Lapis –(We also compare results to those of SWYN study – see paper) Recruited 17 participants (9 Toped, 8 Lapis) –Approx half were administrative assistants, approx half were master’s students (mostly information systems), distributed roughly equally across tools –1 participant mis-interpreted instructions (=> 8 & 8) Background  Toped  Evaluation  New Opportunities

16 16 Usability: Does Toped help users to implement string validation? Study structure –Background questionnaire –Tutorial (30 min) –3 tasks (20 min) –User satisfaction questionnaire Detail of a task: –Validate 1 kind of data phone numbers, mailing addresses, company names –User goal: For each kind, find typos in 25 strings Randomly drawn from EUSES spreadsheet corpus And we also retained 25 strings for further accuracy tests Background  Toped  Evaluation  New Opportunities

17 17 Usability: Users were nearly 2 times as fast and found 3 times as many typos TopedLapisRelative Improvement Significant? (Mann-Whitney) Tasks completed2.791.75 60% p<0.01 Typos identified On 75 visible strings16.505.75187%p<0.01 On all 150 strings31.259.50 229% p<0.01 F 1 accuracy measure On 75 visible strings0.740.5145%No On all 150 strings0.680.46 48% No User satisfaction3.783.06 24% p=0.02 Toped also compares favorably to SWYN regexp editor – see paper Background  Toped  Evaluation  New Opportunities

18 18 Expressiveness: Does Toped provide adequate primitives for validating real data? Logged data typed by 4 users into browser (3 weeks) –For each text string, we recorded: A label for the text field (e.g.: “Phone”) A regexp summarizing the string (e.g.: \d\d\d-\d\d\d-\d\d\d\d) Examined data, wrote scripts to cluster strings –94% of the 5897 strings were in 19 clusters –Each cluster had 1-2 formats Used Toped to create formats –Omitted 5 clusters that were for “general text”, usernames or passwords (so we could post format descriptions online) Background  Toped  Evaluation  New Opportunities

19 19 Expressiveness: Does Toped provide adequate primitives for validating real data? Overall, successful –We were able to create formats for each kind of data –The formats identified many probable typos Ideas for improvements –Ways to reuse constraints from format to format –Primitives for kinds of parts: Numeric, word-like, … Background  Toped  Evaluation  New Opportunities

20 20 Data Description Editor Toped + : an improved editor 20 Background  Toped  Evaluation  New Opportunities

21 21 Contributions and New Opportunities Toped – UI to mediate between users & grammars –Enables users to work faster & more effectively –Adequately expressive for validating many kinds of data –Provided a start for new line of similar editor tools New Opportunities (aka “Future Work”) –Extending Toped + to automatically reformat data [IUI’09] –Providing a repository for sharing formats (in-progress) –Developing new ways to make use of ability to identify strings that violate soft constraints Background  Toped  Evaluation  New Opportunities

22 22 Thank You… To Margaret Burnett, Brad Myers, Valentina Grigoreanu, Mary Beth Rosson, Mary Shaw and others in the EUSES Consortium for feedback over the years To NSF for funding To ISEUD 2009 for this opportunity to present

23 23 Toped + : key improvements vs Toped in terms of Cognitive Dimensions Better closeness of mapping –Constraints “belong” to parts in all formats Higher juxtaposability –Easy to view & compare multiple formats Lower error-proneness –Helps prevent senseless combinations of constraints Lower viscosity –Drag-and-drop / copy-and-paste speeds up edits Improved progressive evaluation –User can test each part individually Background  Toped  Evaluation  New Opportunities


Download ppt "Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University."

Similar presentations


Ads by Google