1 ‘Class Exercise’ III: Application Project Evaluation Deborah McGuinness and Peter Fox CSCI/ITEC-6962-01 Week 10, November 16, 2009.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Performance Assessment
Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.
Intelligence Step 5 - Capacity Analysis Capacity Analysis Without capacity, the most innovative and brilliant interventions will not be implemented, wont.
Laurie E Damianos, MITRE September 2008 Approved for Public Release; Distribution Unlimited. MITRE Case # ©2008 The MITRE Corporation. All rights.
Chapter 4 Design Approaches and Methods
Project Proposal.
Alternate Software Development Methodologies
Dynamic Systems Development Method (DSDM)
Chapter 2 Succeeding as a Systems Analyst
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
National Science Foundation: Transforming Undergraduate Education in Science, Technology, Engineering, and Mathematics (TUES)
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
User Experience Design Goes Agile in Lean Transformation – A Case Study (2012 Agile Conference) Minna Isomursu, Andrey Sirotkin (VTT Technical Research.
CHAPTER 19 Building Software.
RESEARCH DESIGN.
1. Learning Outcomes At the end of this lecture, you should be able to: –Define the term “Usability Engineering” –Describe the various steps involved.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Business and Management Research
Needs Analysis Session Scottish Community Development Centre November 2007.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Current Situation and CI Requirements OOI Cyberinfrastructure Integrated Observatory Management Workshop San Diego May 28-29, 2008.
N By: Md Rezaul Huda Reza n
Where Innovation Is Tradition SYST699 – Spec Innovations Innoslate™ System Engineering Management Software Tool Test & Analysis.
OSF/ISD Project Portfolio Management Framework January 17, 2011.
1 Class Exercise I: Use Cases Deborah McGuinness and Peter Fox (NCAR) CSCI Week 4 (part II), 2008.
ITEC224 Database Programming
ASSESSMENT SYED A RIZVI INTERIM ASSOCIATE PROVOST FOR INSTITUTIONAL EFFECTIVENESS.
Chapter 6 Supplement Knowledge Engineering and Acquisition Chapter 6 Supplement.
Chapter 6 : Software Metrics
Outcome Based Evaluation for Digital Library Projects and Services
An Online Knowledge Base for Sustainable Military Facilities & Infrastructure Dr. Annie R. Pearce, Branch Head Sustainable Facilities & Infrastructure.
Comp 20 - Training & Instructional Design Unit 6 - Assessment This material was developed by Columbia University, funded by the Department of Health and.
Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
The student will demonstrate an understanding of how scientific inquiry and technological design, including mathematical analysis, can be used appropriately.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
Let’s Talk Assessment Rhonda Haus University of Regina 2013.
Fundamental Skills The skills needed as a base for further development You will be better prepared to progress in the world of work when you can: Manage.
Monitoring & Evaluation: The concepts and meaning Day 9 Session 1.
Current Situation and CI Requirements OOI CyberInfrastructure Science User Requirements Workshop: San Diego January 23-24, 2008.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
University of Toronto at Scarborough © Kersti Wain-Bantin CSCC40 other methodologies 1 Method/Process = step-by-step description of the steps involved.
The RPI/TWC semantic development methodology re-dux: Use cases, and beyond January 18, 2012 USGS, St. Petersburg, FL Peter Fox (RPI* and WHOI**)
User Interfaces 4 BTECH: IT WIKI PAGE:
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
1 EMS Fundamentals An Introduction to the EMS Process Roadmap AASHTO EMS Workshop.
Developing a Framework In Support of a Community of Practice in ABI Jason Newberry, Research Director Tanya Darisi, Senior Researcher
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Introduction to Evaluation without Users. Where are you at with readings? Should have read –TCUID, Chapter 4 For Next Week –Two Papers on Heuristics from.
Requirements Engineering Processes. Syllabus l Definition of Requirement engineering process (REP) l Phases of Requirements Engineering Process: Requirements.
Development of a Web-Based Groupwork Assessment Tool Groupwork and Assessment Methods Demonstration of Software Discussion Hannah Whaley David Walker
An Agile Requirements Approach 1. Step 1: Get Organized  Meet with your team and agree on the basic software processes you will employ.  Decide how.
NMFS Use Case 1 review/ evaluation and next steps April 19, 2012 Woods Hole, MA Peter Fox (RPI* and WHOI**) and Andrew Maffei (WHOI) *Tetherless World.
44222: Information Systems Development
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
ISWG / SIF / GEOSS OOSSIW - November, 2008 GEOSS “Interoperability” Steven F. Browdy (ISWG, SIF, SCC)
Nominal Group Process (NGP) A well researched technique (Delbecq et al., 1986) that is effective in facilitating a group to come to the best combined judgements.
BG 5+6 How do we get to the Ideal World? Tuesday afternoon What gaps, challenges, obstacles prevent us from attaining the vision now? What new research.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
improve the efficiency, collaborative potential, and
CSc4730/6730 Scientific Visualization
Systems Engineering for Mission-Driven Modeling
NMFS Use Case 1 review/ evaluation and next steps
Presentation transcript:

1 ‘Class Exercise’ III: Application Project Evaluation Deborah McGuinness and Peter Fox CSCI/ITEC Week 10, November 16, 2009

Contents Review of reading, questions, comments Evaluation Summary Next week 2

3 Semantic Web Methodology and Technology Development Process Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Use Case Small Team, mixed skills Analysis Adopt Technology Approach Leverage Technology Infrastructure Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Science/Expert Review & Iteration Develop model/ ontology Evaluation

References Twidale, Randall and Bentley (1994) and references therein Scriven (1991, 1996) Weston, Mc Alpine, and Bordonaro, (1995) Worthen, Sanders, and Fitzpatrick, (1997) 4

Inventory What categories can you measure? –Users –Files –Databases –Catalogs –Existing UI capabilities (or lack thereof) –Services –Ontologies In the stage of use case development is a very good time to capture these elements; do not guess, get them from quantitative sources or the users/ actors 5

Metrics Things you can measure (numerical) Things that are categorical –Could not do before –Faster, more complete, less mistakes, etc. –Wider range of users Measure or estimate the baseline before you start 6

Result/ outcome Refer to the use case document Outcome (and value of it) is a combination of data gathering processes, including surveys, interviews, focus groups, document analysis and observations that will yield both qualitative and quantitative results. Did you meet the goal? Just listen… do not defend … if you start to then: QTIP – quit taking it personally 7

Example: what we wanted to know about VSTO Evaluation questions are used to determine the degree to which the VSTO enhanced search, access, and use of data for scientific and educational needs and effectively utilized and implemented a template for user-centric utilization of the semantic web methodology. VO – appears to local and integrated and in the end-users language (this is one of the metrics) 8

Evaluation (Twidale et al.) An assessment of the overall effectiveness of a piece of software, ideally yielding a numeric measure by which informed cost-benefit analysis of purchasing decisions can be made. An assessment of the degree to which the software fulfils its specification in terms of functionality, speed, size or whatever measures were pre-specified. 9

Evaluation An assessment of whether the software fulfils the purpose for which it was intended. An assessment of whether the ideas embodied in the software have been proved to be superior to an alternative, where that alternative is frequently the traditional solution to the problem addressed. An assessment of whether the money allocated to a research project has been productively used, yielding useful generalizeable results. 10

Evaluation An assessment of whether the software proves acceptable to the intended end-users. An assessment of whether end-users continue to use it in their normal work. An assessment of where the software fails to perform as desired or as is now seen to be desirable. An assessment of the relative importance of the inadequacies of the software. 11

(Orthogonal) Dimensions of evaluations StructuredLess structured QuantitativeQualitative SummativeFormative Controlled experimentsEthnographic observations Formal and rigorousInformal and opportunistic 12

Formative and Summative evaluation carried out for two reasons: –grading translations = summative evaluation –giving feedback = formative evaluation 13

Formative and Summative 14

What if questions (qualitative) could not only use your data and tools but remote colleague's data and tools? understood their assumptions, constraints, etc. and could evaluate applicability? knew whose research currently (or in the future) would benefit from your results? knew whose results were consistent (or inconsistent) with yours? 15

Evaluation questions and associated data collection methods Evaluation questions: To what extent does VSTO’s Interviews/ Focus Group SurveysDocument Analysis Observation Activities enhance end-user access to and use of data to advance science and education needs? XXXX Activities enable higher levels of semantic capability and interoperability such as explanation, reasoning on rules, and semantic query? XXXX Contribute to the development and support of community resources, virtual observatories and data systems and provision of results from diverse observing systems using semantically-enabled technologies? XXXX 16

Evaluation questions and associated data collection methods Evaluation questions: To what extent does X’s Interviews/ Focus Group SurveysDocument Analysis Observation Template contribute to the reports on modern data frameworks, user interfaces, and science progress achieved? XXXX Incorporate user experiences in the redesign and development cycles of the VSTO? XXXX 17

Evaluation questions and associated data collection methods Evaluation questions:Interviews/ Focus Group SurveysDocument Analysis Observation How do VSTO activities a ff ect IHE faculty and sta ff from participating institutions (e.g., changes to virtual observatories) and data sources, results from diverse observing systems using sematically-enabled technologies and institutional collaboration activities? XXXX What factors impede or facilitate progress toward VSTO goals? XX What progress has been made toward sustaining and ‘scaling up’ VSTO activities? XXX 18

Implementing an evaluation Based on our experience with use case development and refinement, community engagement, and ontology vetting, a workshop format (6 up to 25 participants, depending on desired outcomes and scope) is a very effective mechanism to make rapid progress. The workshops can be part of a larger meeting, stand-alone or partly virtual (via remote telecommunication). We have found (for example, in our data integration work) that domain experts in particular are extremely willing to participate in these workshops. 19

Implementing Let’s take an example –VSTO –Representative but does not exercise all semantic web capabilities 20

VSTO qualitative results Decreased input requirements: The previous system required the user to provide 8 pieces of input data to generate a query and our system requires 3. Additionally, the three choices are constrained by value restrictions propagated by the reasoning engine. Thus, we have made the workflow more efficient and reduced errors (note the supportive user comments two paragraphs above) 21

VSTO qualitative results Syntactic query support: The interface generates only syntactically correct queries. The previous interface allowed users to edit the query directly, thus providing multiple opportunities for syntactic errors in the query formation stage. As one user put it: “I used to do one query, get the data and then alter the URL in a way I thought would get me similar data but I rarely succeeded, now I can quickly re-generate the query for new data and always get what I intended”. 22

VSTO qualitative results Semantic query support: By using background ontologies and a reasoner, our application has the opportunity to only expose query options that will not generate incoherent queries. Additionally, the interface only exposes options for example in date ranges for which data actually exists. This semantic support did not exist in the previous system. In fact we limited functionality in the old interface to minimize the chances of misleading or semantically incorrect query construction. 23

VSTO qualitative results Semantic query support: for example, that a user has increased functionality – i.e., they can now initiate a query by selecting a class of parameter(s). As the query progresses, the sub-classes and/or specific instances of that parameter class are available as the datasets are identified later in the query process. 24

VSTO qualitative results Semantic query support: We removed the parameter initiated search in the previous system because only the parameter instances could be chosen (8 different instances to represent neutral temperature, 18 representations of time, etc.) and it was too easy for the wrong one to be chosen, quickly leading to a dead-end query and frustrated user. One user with more than 5 years of CEDAR system experience noted: “Ah, at last, I’ve always wanted to be able to search this way and the way you’ve done it makes so much sense”. 25

VSTO qualitative results Semantic integration: Users now depend on the ontologies rather than themselves to know the nuances of the terminologies used in varying data collections. Perhaps more importantly, they also can access information about how data was collected including the operating modes of the instruments used. “The fact that plots come along with the data query is really nice, and that when I selected the data it comes with the correct time parameter” (New graduate student, ~ 1 year of use). 26

VSTO qualitative results Semantic integration: The nature of the encoding of time for different instruments means that not only are there 18 different parameter representations but those parameters are sometimes recorded in the prologue entries of the data records, sometimes in the header of the data entry (i.e. as metadata) and sometimes as entries in the data tables themselves. Users had to remember (and maintain codes) to account for numerous combinations. The semantic mediation now provides the level of sensible data integration required. 27

VSTO qualitative results Broader range of potential users: VSTO is usable by people who do not have PhD level expertise in all of the domain science areas, thus supporting efforts including interdisciplinary research. The user population consists of: Student (under- graduate, graduate) and non-student (instrument PI, scientists, data managers, professional research associates). 28

VSTO quantitative results Broader range of potential users: For CEDAR, students: 168, non-students: 337, for MLSO, students: 50, non-students: 250. In addition 36% and 25% of the users are non- US based (CEDAR – a 57% increase over the last year - and MLSO respectively). The relative percentage of students has increased by ~10% for both groups. 29

Adoption (circa 2007) Currently there are on average between distinct users authenticated via the portal and issuing data requests per day, resulting in data access volumes of 100KB to 210MB per request. In the last year, 100 new users have registered, more than four times the number from the previous year. The users registered last year when the new portal was released, and after the primary community workshop at which the new VSTO system was presented. At that meeting, community agreement was given to transfer operations to the new system and move away from the existing one. 30

Facilitating new projects At the community workshop a priority-area was identified which involved the accuracy and consistency of temperature measurements determined from instruments like the Fabry-Perot Interferometer. As a result, we have saw a 44% increase in data requests in that area. We increased the granularity in the related portion of the ontology to facilitate this study. 31

Facilitating new projects We focused on improving a users’ ability to find related or supportive data, with which to evaluate the neutral temperatures under investigation. We are seeing an increase (10%) in other neutral temperature data accesses, which we believe is a result of this related need. 32

Informal evaluation We conducted an informal user study asking three questions: What do you like about the new searching interface? Are you finding the data you need? What is the single biggest difference? Users were already changing the way they search for and access data. Anecdotal evidence indicated that users are starting to think at the science level of queries, rather than at the former syntactic level. 33

Informal evaluation For example, instead of telling a student to enter a particular instrument and date/time range and see what they get, they are able to explore physical quantities of interest at relevant epochs where these quantities go to extreme values, such as auroral brightness at a time of high solar activity (which leads to spectacular auroral phenomena). This suggested to us some new use cases to support even greater semantic mediation 34

Further measuring One measure that we hoped to achieve is to have usage by all levels of domain scientist – from the PI to the early level graduate student. Anecdotal evidence shows this is happening and self classification also confirms the distribution. A scientist doing model/observational comparisons: noted “took me two passes now, I get it right away”, “nice to have quarter of the options”, and “I am getting closer to 1 query to 1 data retrieval, that’s nice”. 35

Focus group A one hour workshop was held at the annual community meeting on the day after the main plenary presentation for VSTO. The workshop was very well attended with 35 diverse participants (25 were expected) ranging from a number senior researchers, junior researchers, post-doctoral fellows and students - including 3 that had just started in the field. After some self-introductions eight questions were posed and responses recorded, some by count (yes/no) or comment. Overall responses ranged from 5 to 35 per question. 36

VSTO quantitative results How do you like to search for data? Browse, type a query, visual? Responses: 10; Browse=7, Type=0, Visual=3. What other concepts are you interested in using for search, e.g. time of high solar activity, campaign, feature, phenomenon, others? Responses: 5; all of these, no others were suggested. Does the interface and its services deliver the functionality, speed, flexibility you require? Responses: 30; Yes=30, No=0. 37

VSTO quantitative results Are you finding the data you need? Responses: 35; Yes=34, No=1. How often do you use the interface in your normal work? Responses: 19; Daily=13, Monthly=4, Longer=2. Are there places where the interface/ services fail to perform as desired? Responses: 5; Yes=1, No=4. 38

Qualitative questions What do you like about the new searching interface? Responses: 9. What is the single biggest difference? Responses: 8. The general answers were as follows: –Less clicks to data (lots) –Auto identification and retrieval of independent variables (lots) –Faster (lots) –Seems to converge faster (few) 39

Unsolicited/ unstructured comments It makes sense now! [I] Like the plotting. Finding instruments I never knew about. Descriptions are very handy. What else can you add? How about a python interface [to the services]? 40

Surprise! New use cases The need for a programming/ script level interface, i.e. building on the services interfaces; in Python, Perl, C, Ruby, Tcl, and 3 others. Addition of models alongside observational data, i.e. find data from observations/ models that are comparable and/or compatible. More services (particularly plotting options - e.g. coordinate transformation - that are hard to add without detailed knowledge of the data). 41

Other examples ETC667 Evaluation plan Template 42

Keep in mind You need an evaluation plan that can lead to improvements in what you have built You need an evaluation to value what you have built You need an evaluation as part of your publication (and thesis) 43

Iterating Evolve, iterate, re-design, re-deploy –Small fixes –Full team must be briefed on the evaluation results and implications –Decide what to do about the new use cases, or if the goal is not met –Determine what knowledge engineering is required and who will do it (often participants in the evaluation may become domain experts in your methodology) –Determine what new knowledge representation –Assess need for an architectural re-design 44

Summary Project evaluation has many attributes Structured and less-structured Really need to be open to all forms A good way to start is to get members of your team to do peer evaluation This is a professional exercise, treat it that way at all times Other possible techniques for moving forward on evolving the design, what to focus upon, priorities, etc.: SWOT, Porter’s 5 forces 45

Next week This weeks assignments: –Reading: no reading Next class (week 11 – November 23): –Team Use Case Implementation Office hours this week – Tuesday 11-12, Winslow Questions? 46