Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI.

Similar presentations

Presentation on theme: "Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI."— Presentation transcript:

1 Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI

2 Overview 1. EDC system –NHANES health questionnaire data –(Semi-)automatic domain model construction –NL-based question understanding 2. Proposals –Urban Transportation SGER awarded –Submitted proposal to ITR 3. Outreach –Connections to USC campus –Conference planning: dg.o 2002

3 NHANES Data Collection We acquired and wrapped NHANES database –From National Center for Health Statistics –Survey of thousands of records (people), each record contains max. 12,000 questions about health, family, medical history, etc. –Database wrapped and accessible via EDC system Challenge: can we learn the domain model automatically? –Try to extract terms from DB, cluster them, and then link them into Ontology –Then test Domain Model using SIMS

4 Automated Domain Modeling Research Step 1: performed manual pre-test –extracted approx. 60 column headings (database questions) –clustered them manually –compared accuracy: about 50% overlap only Step 2: developed clustering toolkit –assembled CLINK, SLINK, Median, k-Means, etc. into toolkit –developed speedup techniques Step 3: ran series of 10 experiments –various word manipulations (word weighting by inverse frequency, etc.; word stemming; longer passage extracts; etc.) –mapped out extensive parameter space; did pinpointed sweep Results still not great

5 NL Question Understanding Challenge: can we interpret user’s question when posed in English, not using menus or ontology? Approach: 1. create new Finite State Machine 2. create question grammar and lexicon (linked to Ontology) 3. create conversion routines that assemble SQL queries out of user input 4. test and evaluate using EDC system and SIMS Current status: –new FSM completed –grammar and conversion routines under construction –will demo English (+ other?) query input at conference

6 Proposals SGER proposal funded –Topic: Urban transportation study—new methods for freight tracking in LA by comparing across databases –Grant awarded to USC, shared by ISI and USC’s Dept of Policy and Planning –Jose Luis Ambite will spend approx. 25% time on this study White paper to DoT –Topic: Searching for patterns in freight traffic –Submitted by USC campus people and Jose Luis Ambite ITR proposal submitted –Topic: Semi-automated topic hierarchy creation –Partners: Eduard Hovy communicated with EPA group –If funded will use EPA’s CARAT ontology as starting point and evaluation standard

7 Outreach USC Campus Group –Urban policy planners, digital democracy sociologists, industrial and systems engineers, etc. –Held several meetings, chaired by Yigal Arens and Genevieve Giuliano, to explore collaborations and to see if we can extend DGRC to start a separate organization –Drafted a statement of goals to hand to Provost and USC-based small funding offices New issue of DG Online! Conference: dg.o 2002 –Hotel arranged –Website up (but still need fancy graphics) –Call for presentations disseminated –Some portions of program and invitees determined

Download ppt "Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI."

Similar presentations

Ads by Google