Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT Meeting on the Management of.

Similar presentations


Presentation on theme: "The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT Meeting on the Management of."— Presentation transcript:

1 The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT Meeting on the Management of Statistical Information Systems (MSIS 2012) Washington DC - May 21-23, 2012

2 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 2 The Census Web-based Information System SGR: the Census management system –assignment of households to enumerators –monitoring of collection activities, particularly of questionnaires collected in the various possible ways (online, munic. collection centers, post offices, enumerators) –visualization of some key indicators (a kind of data warehouse on the collection process) –Census to Local Population Registries comparison and re-alignment –… RETE: the online documentation for operators QPOP: the online questionnaire –the main topic of this presentation...

3 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 3 QPOP: the main requirements To be used by both citizens (self-compilation) and operators (online data entry) tight integration with the SGR Census Management System, in particular with its workflow Easy to use, fast and scalable Assisting users in following the correct compilation rules (without bothering them) Multi-language (Italian, German and Slovenian) Immediate coding of open questions (textual in the paper version)

4 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 4 QPOP: the main requirements To be used by both citizens (self-compilation) and operators (online data entry) tight integration with the SGR Census Management System, in particular with its workflow Easy to use, fast and scalable Assisting users in following the correct compilation rules (without bothering them) Multi-language (Italian, German and Slovenian) Immediate coding of open questions (textual in the paper version) Almost impossible re-using already available applications

5 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 5 The application design GUI: JSP pages implementing the graphical user interface. They can be forms for sending data to the server, processed by an action, and/or results of an action execution; Actions: Java classes whose execution is triggered by a HTTP call, activated by a form submission on the GUI. They receive data from the HTTP request and execute some server-side processing by calling Services; Services: Java classes that implement database transactions, realized through sequences of calls to DAOs; Data Access Objects (DAOs): Java classes that implement so- called CRUD (Create-Read-Update-Delete) database operations related to one or more domain objects; Entities: Java classes representing records of one database table. GUI ActionsServicesDAOsEntities Struts2Spring Hibernate

6 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 6 A metadata-driven application The leading principle: write more metadata, write less (more generalized) programming code Metadata to specify the type (single choice, multi- response, textual input, data, etc.) of a question –Questions sharing the same type are handled by the same pieces of Java code (templates) –The whole processing chain from HTML forms down to DB records (and viceversa) is automatically handled Metadata to specify (multi-language) texts in all GUI fragments BUT ALSO Metadata to specify question routing –Based on the concept of Questionnaire Graph

7 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 7 The Questionnaire Graph (QG) The basic idea: formally modeling the structure of the questionnaire and the correct set and sequence of questions to be filled in by respondents A Questionnaire Graph (QG) in QPOP is a Directed Acyclic Graph (DAG), such that: –Nodes N i are in 1-1 correspondence with each questionnaire fragment (mainly questions, but not only); –Node types correspond to templates (which in turn determine appearance and behavior) –Edge labels represent conditions on questions (e.g. “Has the respondent checked the option 2 of question X?”). –A (directed) labeled edge from node (question) N i to node (question) N j corresponds to the fact that the user has to respond to question N j after having given a response to node N i, if the condition expressed on the edge label is true QG is used by the application (both on client and server side) to enable and disable questions on the web page and to validate the user’s input before saving the user’s answers in the microdata tables, i.e. to enforce consistency

8 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 8 From the questionnaire to the QG

9 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 9 En(Dis-)abling questions by updating QG node states (1)

10 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 10 En(Dis-)abling questions by updating QG node states (2)

11 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 11 En(Dis-)abling questions by updating QG node states (3)

12 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 12 The search engine for assisted coding M A T H E A M T I C S D E G R E E Reference dictionary... 72001001Degree in Astronomy 72001002Degree in Chemistry 72001003Degree in Mathematics 72001004Degree in Physics... 72001003 ?

13 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 13 Reference dictionary pre-processing 1.Character normalization accented letters are replaced with the corresponding unaccented version, uppercase letters with lowercase ones, other characters like punctuation marks are removed 2.Stopword removal “useless” words are removed from the character-normalized version of the items, produced in the previous step. Both “general” (like conjunctions, articles, etc.) and “context- specific” stopwords (e.g. the word “degree”, when considering a list of academic degrees) are removed 3.Search terms extraction and weighting the single terms (words) constituting the normalized items produced by the previous two steps are extracted and stored in the search engine DB tables. A weight is also assigned to each term, depending on its relative frequency inside the dictionary

14 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 14 Search string processing 4.Similarity search each (normalized) term to be searched is compared with those in the database; the terms that produce a similarity above a given (relatively high) threshold are passed to the following step 5.Extraction of the dictionary items the dictionary items containing one or more terms obtained in the previous step are extracted from the DB. At the same time, for each item extracted, some values are either read or computed, which will be used in the following step 6.Dictionary item sorting by using the values extracted/computed in the previous step, the score of each item in the result set is computed and the list is sorted accordingly in descending order. This sorted list is proposed to the respondent

15 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 15 QPOP in (a few) figures

16 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012 16 Future (current) work Questionnaires for the Industry and Services Census comprising: –Businesses (2 “fairly similar” questionnaires implemented as one with “special” routing conditions) –Non-profit institutions More general question templates More general checks and routing conditions (support for existential and universal quantifications, as well as counting)


Download ppt "The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT Meeting on the Management of."

Similar presentations


Ads by Google