Presentation is loading. Please wait.

Presentation is loading. Please wait.

The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National.

Similar presentations


Presentation on theme: "The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National."— Presentation transcript:

1 The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National Statistical Knowledge Network Carol A. Hert Syracuse University NSF Grants EIA 0131824 and EIA 0129978 Principal Investigators: Gary Marchionini, Stephanie Haas, Ben Shneiderman, Catherine Plaisant, and Carol Hert Gov Stat

2 Project Partners Bureau of Labor Statistics Census Bureau Center for Health Statistics Social Security Administration National Agriculture Statistical Service Energy Information Administration Gov Stat

3 Project Goals To create an integrated model of user access to and use of US government statistical information (The Statistical Knowledge Network) Design and test prototype interface tools to support finding and using statistics To support integration (technical and intellectual) of statistical data Gov Stat

4 Statistical Knowledge Network Architecture Agencies SKN Registry Actions Contribute Find Display Annotate Understand Manipulate Collaborate ….. …………. Objects Actions Private Work Space ObjectsActions Private Work Space OntologyRules & Constraints SKN Consortium ….. Objects Reports metadata Tables metadata People metadata Glossary Annotations ObjectsActions Private Work Space ObjectsActions Private Work Space

5 Statistical Knowledge Network Architecture Enable statistical agencies to: –Reach wider audiences –Standardize strategies for transmission, retrieval & use –Reduce costs –Facilitate cooperation among agencies & organizations Goal: Increase find-ability, understand- ability & use of government statistics

6 Metadata as a Linchpin of Integration of Diverse Statistical Information Metadata during statistical information seeking User studies of statistical information use Building a schema to support these activities A hierarchy of integration (and the metadata to support it) With a few closing words on technology transfer! Gov Stat

7 Metadata for Statistical Information Seeking The user challenges: –Who has the relevant data? decentralized statistical system –Finding data that map to the set of topical, time period, geographic and other requirements Interface tool relying on metadata (currently harvested automatically from webpages) –Supports exploration prior to access Gov Stat

8 . gov Relation Browser with all EIA pages

9 User Studies of Metadata and Statistical Information Use 1.metadata requirements for understanding tables (Hert & Hernández, 1999). 2.metadata requirements in a variety of integration tasks (Denn, Haas, & Hert, 2003). 3.Statistical comparisons particularly investigating the types of comparisons made and the rules experts employ during those comparison processes (Hert, 2004). Gov Stat

10 Some insights from the studies Some types needed: –Definitions –Survey methodology –Rationales and information on differences (what is the difference between concept 1 and concept 2) –Currency of information (what’s the latest data I can get, when will more data be available, etc.) –Table structure –Interface design Supporting use requires significant amounts of metadata including some not easily generated (automatically or otherwise) Gov Stat

11 Some insights from the studies Comparing is a key activity in integrating statistics Business rules for operating on the metadata necessary to support user tasks Metadata supports help tools, help tools will be necessary to support metadata usage Gov Stat

12 Metadata Schema Philosophy To provide sub-document level access and integration across documents and agencies. To provide a minimal set of metadata elements necessary while allowing for extensibility. To achieve these goals in a manner that enables efficient transfer to agencies. Gov Stat

13 Our Schema in Action: An Example Scenario: The fact that the percentage of older people in the population of the US is increasing raises a question about the overall economic status of this group. In particular, we are interested in people who are retired or no longer in the work force and over a certain age (65 or older). We want to know the following things to understand the economic status of this particular group of people: –Income level (in terms of median income) compared to the general (whole) population –Sources of income –Employment status

14 Examples from the Markup Table markup: –For each table, the schema encodes the table title, each row or column heading, and the data values in the table. Each data value element references the row and column heading elements associated with it. Footnotes are encoded at the highest level to which they apply – the table level, the row/column level, or the individual data value level.

15 Examples from the Markup Table 1.1 Percentage with income from specified source, by age, marital status, and sex of nonmarried persons Source of Income - Earnings r001 Source of Income - Earnings - Wages and salaries r002 Source of Income - Earnings - Self-employment r003 Source of Income - Retirement benefits r004 Source of Income - Retirement benefits - Social Security Social Security includes retired-worker benefits, dependents' or survivors' benefits, disability benefits, transitionally insured benefits, or special age-72 benefits r005... In order to preserve category information, individual row and column headings include the category labelling. Including the category labelling within the row/column headings improves access to data embedded within tables by making the category information searchable.

16 Examples from the Markup (cont.) Table 3. Comparison of Summary Measures of Money Income and Earnings by Selected Characteristics: 2001 and 2002 Source: US Census Bureau, Current Population Survey, 2002 and 2003 Annual Social and Economic Supplements Households and people as of March of the following year Age of Householder - 65 years and over r015 2002 - Median money income - value dollars c005 23,152

17 Examples from the Markup (cont.) Age of Householder - 65 years and over r015 2002 - Median money income - value dollars c005 23,152 Aged 65 or older Total All units c003 Source of Income - Earnings - Wages and salaries r002 Source of Income - Earnings - Wages and salaries r002 19 Note that since these headings both contain keywords for age 65 or older that we can begin to think about ways to integrate these data.

18 What the Example Demonstrates Access: preserving data from table titles, row/column headings, and footnotes allows metadata essential for understanding to travel with the data values, and aids in search and retrieval Integration: once we have this essential metadata tagged, it becomes easier to use tag similarities to allow us to investigate options for displaying data from different tables in an integrated manner.

19 A Hierarchy of Integration Low level of integration High level of integration Searchable table titles Searchable row and column headings Linking of data values to row and column headings Linking of row and column headings to underlying survey variables Linking of analysis units, universe statements, concept definitions, across documents and agencies Linking of contextual information (such as footnotes) to tables, row/column headings, or data values Our schema can provide the items beneath this dotted line. Limited amount of metadata Increasing amounts of metadata

20 Using the Hierarchy of Integration Low level of integration High level of integration Searchable table titles Searchable row and column headings Linking of data values to row and column headings Linking of row and column headings to underlying survey variables Linking of analysis units, universe statements, concept definitions, across documents and agencies Linking of contextual information (such as footnotes) to tables, row/column headings, or data values Limited amount of metadata Increasing amounts of metadata Organization can determine where to“sit” on this hierarchy in terms of effort and level of integration desired

21 Using the Hierarchy of Integration Low level of integration High level of integration Searchable table titles Searchable row and column headings Linking of data values to row and column headings Linking of row and column headings to underlying survey variables Linking of analysis units, universe statements, concept definitions, across documents and agencies Linking of contextual information (such as footnotes) to tables, row/column headings, or data values Limited amount of metadata Increasing amounts of metadata

22 What have we learned about technology transfer Must demonstrate utility of research with working prototypes –Relationship Browser (and other interface tools) –Metadata workstation in development Agencies need simplicity or to understand value of complexity to readjust resources –Hierarchy of integration used as a conceptual tool –Provide training Gov Stat

23 Further information Cahert@syr.edu Project website (including demos of Relationship Browser, an interactive glossary tool, etc.) at http://ils.unc.edu/govstat Gov Stat


Download ppt "The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National."

Similar presentations


Ads by Google