Presentation is loading. Please wait.

Presentation is loading. Please wait.

2004 ARDA Challenge Workshop An Investigation of Evaluation Metrics for Analytic Question Answering Overview Antonio Sanfilippo PNNL/NWRRC AQUAINT Phase.

Similar presentations


Presentation on theme: "2004 ARDA Challenge Workshop An Investigation of Evaluation Metrics for Analytic Question Answering Overview Antonio Sanfilippo PNNL/NWRRC AQUAINT Phase."— Presentation transcript:

1 2004 ARDA Challenge Workshop An Investigation of Evaluation Metrics for Analytic Question Answering Overview Antonio Sanfilippo PNNL/NWRRC AQUAINT Phase 2 Fall Workshop Tampa, FL

2 Northwestern Regional Research Center Hosted by Pacific Northwest National Laboratory Located in Richland, WA

3 Problem The adoption of new QA technologies in the IC is hindered by the gap between the development and usage environments –There is no systematic way of ensuring that QA systems conform to the working practices of analysts –Systems may perform well in terms of accuracy, but do not address the needs of analysts

4 Solution Develop evaluation metrics that reflect the interaction of users and QA systems to determine how and to which extent these systems meet user requirements –Determine the utility of features and functionalities –Establish and corroborate user requirements –Perform a user-centric comparison of different systems

5 Experimental Focus The development of the evaluation metrics is based on empirical studies of analysts using –3 Question Answering systems Cycorp LCC SUNY@Albany –the Google search engine as the baseline system

6 Stakeholders Government Champions –John Prange (ARDA) –Kelcy Allwein (DIA) –Mike Blair (NAVY) Team Leaders –Emile Morse & Jean Scholtz (NIST) Team Participants –Tomek Strzalkowski, Sharon Small, Sean Ryan, Hilda Hardy (SUNY@Albany) –Sanda Harabagiu, Andy Hickl, John Williams (LCC) –Stefano Bertolo (Cycorp) –Paul Kantor (Rutgers University) –Diane Kelly (University of North Carolina) –Peter LaMonica, Chuck Messenger (AFRL) –Joe Konczal (NIST) –Katherine Johnson, Frank Greitzer (PNNL) Analysts: 7 from NAVY, 1 from ARMY Graduate Students: Robert Rittman, Aleksandra Sarcevic, Ying Sun (Rutgers University) PNNL Oversight –Rich Quadrel (NWRRC Director) –Troy Juntunen (System Installation and Connectivity) –Ben Barnett, Trina Pitcher, John Calhoun, Eileen Boiling (Admin) –Antonio Sanfilippo (Project Manager)

7 Roadmap Feb 23 –Project planning meeting (NIST) March-April –Preparation (contracts, purchases, data collection, initial scenario development) April 15-16 –Kickoff meeting (NIST) April-May –Finalize scenarios, metric hypotheses, and evaluation methods & materials –Work with NWRRC to set up facilities for data collection at PNNL June 7-25 –Install systems at PNNL –Carry out user studies with analysts –Collect data July –First version of data analysis –Internal progress report and agenda for the remaining work August –Final version of data analysis and final exam September –Final report

8 Technical Approach Construct evaluation metric hypotheses about the utility of QA systems and test these in experimental user studies –Collect data relative to evaluation hypotheses for 8 analysts working on 8 task assignment scenarios with 4 QA systems –Analyze collected data to verify utility of evaluation metric hypotheses

9 Evaluation Hypotheses Question answering systems should Question -naires NASA TLX SmiFro & Status Cross- evaluation System Logs Glass Box Query Trails H1 Support information gathering with lower cognitive workload X X X H2Assist in exploring more paths/hypothesesX X H3Enable production of higher quality reportsX X H4Provide useful suggestions to the analystX XX H5Provide more good surprises than badX X H6 Enable more focus on analysis than data collectionX H7 Enable analysts to collect more data in less timeX X H8Reduce the time spent reading X X H9Identify gaps in the knowledge baseX X H10 Help the analyst recognize gaps in their thinkingX H11Provide context for informationX X H12 Provide context, continuity and coherence of dialogueX XXX H13 Let analysts relocate previously seen materialsX H14Be easy to useXX H15 Increase an analyst’s confidence in exploration and reportX X ID Scenario Topics AIndian Chemical Weapons Production and Delivery Systems BLibyan Chemical Weapons Program CIranian Chemical Weapons Development and Impact DNorth Korean Chemical and Biological Weapons Research EPakistani Chemical Agent Production FCurrent Status of Russia’s Chemical Weapons Program GSouth African Chemical Agents Program Status HAssessment of Egypt’s Biological Weapons

10 Methodology

11 Accomplishments Results to-date from the analysis of the data collected during the user studies at PNNL indicate that –Most of the valuation hypotheses initially set by the team proved to be useful for the user-centered assessment of QA systems –The methodology developed by the team during the course of the user studies is effective for applying these evaluation metrics –On average, the Cycorp, Albany and LCC Question Answering systems were deemed to be more useful by users than the baseline system (Google)

12 Results & Benefits The workshop delivered a set of tested user-centric evaluation criteria and a methodology for applying these evaluation criteria to gain knowledge about how QA systems meet the needs of analysts The availability of user-centric evaluation metrics enables a systematic methodology for tailoring the utility of QA systems to the specific needs of the Intelligence Community –Target feature and functionalities that are most impactful –Facilitate technology insertion

13 Assessment The work has been carried out on schedule and with extreme precision, attention to details and high technical standards Results indicate that the Workshop will be impactful in establishing a user-centered evaluation framework for interactive information systems. Results will be presented in the next talk by Emile Morse A version of the methodology developed will be demonstrated in today’s exercise

14 Parting Shots Views from the June Challenge problem in Richland

15

16

17

18

19

20

21

22

23

24

25

26 Thank You!


Download ppt "2004 ARDA Challenge Workshop An Investigation of Evaluation Metrics for Analytic Question Answering Overview Antonio Sanfilippo PNNL/NWRRC AQUAINT Phase."

Similar presentations


Ads by Google