Presentation is loading. Please wait.

Presentation is loading. Please wait.

Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)

Similar presentations


Presentation on theme: "Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)"— Presentation transcript:

1 Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)

2 Goal: To Provide Single-Stop Access to Multiple Distributed Autonomous Data Sets

3 Federal Agency Partners Energy Information Administration (EIA) Bureau of Labor Statistics (BLS) Census Funded by National Science Foundation

4 The Purpose of DGRC To Make Digital Government Happen! Advance information systems research Bring the benefits of cutting edge information science research to government systems Help inform government and the community Work with government partners to drive next stage system development Built pilot systems as part of new infrastructure

5 The Problem and the Solution Solution: Create a system to provide easy standardized access: –need multi-database access engine, –need terminology standardization mechanism, –need powerful user interface. Problem: FedStats has thousands of databases in over seventy government agencies: –data is duplicated and near-duplicated, –even government officials and specialists cannot find it

6 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definitions of Terms Trade Main Memory Query Processing Multilingual Access User Evaluation Task-based Evaluation query

7 Problem: Data is not in exactly the form the user needs (monthly, not annually; actual values, not averaged) Solution: Attempt to provide unified view of data of various granularities: –time period –geographical region –product Information Integration

8 Data & Metadata - Hard to Use Proliferation of terms across many domains. Different definitions across Agencies for similar concepts. Lengthy, dense, and technical definitions Buried information in notes, appendices, or cross- references. Metadata not linked to data or other metadata

9 Solution: Automate, Standardize, Organize…  Automate: Machine-read metadata terms—definitions, documentation, etc.  Standardize: Merge terms into a standardized terminology of data terms, automatically.  Organize: Organize terms into an “ontology” that links definitions into a conceptual view of a domain of knowledge, e.g., the available data. Ontology

10 Glossary Analysis Create framework into which text will be analyzed Extract ontological information applying language sensitive analysis tools Gather glossaries, thesauri, definitions from govt agencies

11

12 www.cs.columbia.edu/digigov www.dgrc.org


Download ppt "Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)"

Similar presentations


Ads by Google