Presentation is loading. Please wait.

Presentation is loading. Please wait.

Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University.

Similar presentations


Presentation on theme: "Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University."— Presentation transcript:

1 Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University

2 2 The purpose of DGRC To Make Digital Government Happen Advance information systems research Bring the benefits of cutting edge IS research to government systems Help educate government and the community Learn needs from government partners to drive next stage system development Build pilot systems as part of new infrastructure

3 3 The problem and the solution Solution: Create a system to provide easy standardized access: need multi-database access engine, need powerful user interface, need terminology standardization mechanism. Problem:FedStats has thousands of databases in over seventy Government agencies: data is duplicated and near-duplicated, even Government officials and specialists cannot find it

4 4 The Vision: Ask the Government... How have property values in the area changed over the past decade? How many people had breast cancer in the area over the past 30 years? Is there an orchestra? An art gallery? How far are the nightclubs? We’re thinking of moving to Denver...What are the schools like there? Census Labor Stats

5 5 Research challenges Scale to incorporate many databases … build data models automatically Process large and disparate data efficiently … develop fast processing techniques … create aggregation and substitution operators Integrate data models across sources and agencies …take a large ontology and link the models into it automatically … develop ways to automatically harvest glossary data for building ontologies Develop new ways to interact with data … use language processing tools for question-answering Display complex information from distributed sources …develop and evaluate new presentation techniques

6 6 The Energy Data Consortium EDC members Government partners Research challenge Information Sciences Institute, USC Columbia University Energy Information Admin. (EIA) Bureau of Labor Statistics (BLS) Census Bureau Make accessible in standardized way the contents of thousands of data sets, represented in many different ways (webpages, pdf, MS Access, text…) Xxx x x Xx xxxxxx Xx xx Xxx xx X Xxx x x xx

7 7 The Vision: Ask the Government... Are alternative energy sources any cheaper to use? Which state has the highest oil production? How long has the nuclear plant been in service? We’re thinking of moving to Cambridge…How much does gas cost there? Census Labor Stats

8 8 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definition Ontology query

9 9 From Phase I to Phase II Phase One Terminology/ontology Information integration and in-memory data analysis New Interfaces for Complex Human-computer interaction Phase Two Question-Answering Usability Testing and Evaluation Privacy Portal

10 10 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definition Ontology Trade Main Memory Query Processing Question-Answer Access User Evaluation Task-based Evaluation query

11 11 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definition Ontology Trade Main Memory Query Processing Question-Answer Access User Evaluation Task-based Evaluation query

12 12 Data Integration ??? EPA EIA Census Heterogeneous Data & Meta-data Sources User InterfaceInformation Access Data Definitions (Ontology) interface query Labor definitions Metadata mediates

13 13 http://www.eia.doe.gov/emeu/states/main_ca.html Recent example EIA problem: Data cleared for publication is grouped together across states Also need data gathered by state separately Need general ability to ungroup and reaggregate data http://www.eia.doe.gov/emeu/states/main_ca.html

14 14 Main Memory Achievements on large data manipulation – optimization for efficiency and speed New input for visualization with dials that user can manipulate Applications with electoral boundaries

15 15 Get Gloss The Identification of Glossaries in High Fan-out Websites Large sites with many links Glossaries hidden all over No coherent view within and across sites No way to determine who is defining what and how

16 16 Glossary Finding Function Function to compute a best guess score Ranked list Higher is better Evaluation to determine how likely it is that a high score will be associated with a (large) glossary.

17 17 ParseGloss Once a glossary is found, then how can individual definitions be analyzed Once analyzed into components, how then can this be loaded into the ontology GetGlossParseGloss Ontology

18 18 Evaluation New Effort Peter Sommer, Director of Education Center for New Media Teaching and Learning Focus on purposeful use of emerging technologies for researchers, students, teachers, analysts… Funded by NSF and BLS

19 19 Privacy Portal Increasing multiple access to data bases creates a security problem Original DGRC proposal included component on privacy Newly funded NSF SGER proposal Columbia – Computer Science and School of Business (Stolfo and Johnson)

20 20 Privacy and Government Websites What are user fears? What are their preferences? What are their perceptions of privacy issues? What are the implications for design of systems and interfaces?

21 21 Social Science Research Explorations of “dial manipulation” application for health databases for dynamic querying Useful for interactive mapping for redistricting Use statistics on neighborhoods, e.g. CPS (long and wide) Census summary data is another source – tables compiled for various levels Joint with ISERP Social Science Research Center

22 22 Proposals SGER proposal funded Topic: Urban transportation study—new methods for freight tracking in LA by comparing across databases Grant awarded to USC, shared by ISI and USC’s Dept of Policy and Planning White paper to DoT Topic: Searching for patterns in freight traffic Submitted by USC campus people and Jose Luis Ambite ITR proposal submitted Topic: Semi-automated topic hierarchy creation Partners: Eduard Hovy communicated with EPA group If funded will use EPA’s CARAT ontology as starting point and evaluation standard

23 23 Digital Government is Here! An increasing quantity and variety of information is available in digital form Government agencies already collect much digital information Government is a holder and provider of often unique data and services Access to information/services by industry and citizen-users must be facilitated, while limiting cost and risk

24 24 Well – Not Quite... Expectations are very high due to the pervasiveness of Web/Internet information technology Government IT/IS is behind best practices Legacy, stovepipe systems designed for trusted staff Failed very large modernization efforts A disconnect exists between the research community and government IS

25 25 The purpose of DGRC To Make Digital Government Happen Advance information systems research Bring the benefits of cutting edge IS research to government systems Help educate government and the community Learn needs from government partners to drive next stage system development Build pilot systems as part of new infrastructure

26 26 Thank you! Any questions?


Download ppt "Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University."

Similar presentations


Ads by Google