Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March 2008 1 Storing data.

Similar presentations


Presentation on theme: "Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March 2008 1 Storing data."— Presentation transcript:

1 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Storing data. Structure for dissemination Different data storage formats Data retrieval and presentation European Statistical Training Programme

2 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Different data storage formats

3 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Different designs for cities Chaotic urbanization (old towns) Madrid (Spain), City CentreToledo (Spain), City Centre

4 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Different designs for cities Organized development (new districts and towns) Madrid, a modern districtManhattan, NY

5 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Different designs for cities... most often, old and new urban districts New cities And, what about storing ( for disseminating) statistical data? Is there a best solution ? Tres Cantos, Spain, 1970 Brasilia, Brazil, 1956 MADRIDMADRID must coexist can be designed in a ‘structured’ way, but...

6 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Introduction Questions to answer When the dissemination stage begins Different storage formats SDMX-ML Standard Issues to address The role of metadata Document structure normalisation Example of an application with unstructured data Example of a tool for structuring data

7 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The role of metadata Experiences of document structure normalisation applied to statistical dissemination. Ordinary files to multidimensional databases Example of tools for structuring information to be disseminated: PX-Make Introduction

8 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March How to structure statistical information in order to disseminate it better –and therefore metadata, –different structuring and storage formats, –and some information technologies for dissemination and try to answer some questions: Must we always think big? Should we use the latest and most powerful dissemination technology? Must we try to use one single technology? Are the DW systems the best, or the only, dissemination technology? Questions to answer

9 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Anticipating the content of the presentation… The INE’s answer to those questions is based on our own experience, and on our own restrictions (resources, time, etc.) and the answer to all of them is NO, because: –Some systems may demand a large previous investment of time and resources, and then not be sufficiently dynamic –Each type of statistical information may require a different dissemination technology –And because it is possible, under a single “brand and aspect”, (INEbase in our case), to group data from very different statistical operations, applying different dissemination technologies, trying to achieve as well very similar interfaces for final users Questions to answer

10 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The day that we finish tabulating a survey or statistics, it seems as though the work is done, but… There is still time and work to do before the statistics are disseminated. This is our situation: –1 day for the press release (on paper, fax and the Internet) –1 day to post all the content from the tables database and the temporal series database on the Internet –10 days to edit a diskette or CD-ROM, including replication –1 month for the book When the dissemination stage begins

11 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March In order to comply with these deadlines and shorten the time taken by editorial operations... It is necessary to begin the dissemination project a long time in advance of the tabulation process Therefore, –it is useful for us to know as much as possible about data, metadata, formats, methods, dissemination techniques and standards –it is to this that we dedicate part of this presentation When the dissemination stage begins

12 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March UnstructuredUnstructured formats, not enriched with metadata, not particularly focused on computer processing or statistical dissemination –easy, quick and cheap to produce –poor informative content –very limited computer processing StructuredStructured formats, programs and standard methods –less easy or cheap –quick to produce (…it can be obtained) –rich dissemination media, secure and stable –with a guarantee of being able to address new requirements –easy to automate dissemination processes Document structure normalisation

13 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Document structure normalisation Example of non-structured document or file: a Press release (.doc, or.pdf ) Possible ‘processing’ of this document or file: Reading Printing Or ‘photocopying’

14 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Basically what we are going to be looking at in this presentation, as the degree of structuring chosen increases, is... 1.Visual and presentation performance of the format we are using will increase 2.Complexity and the human and economic cost of implementing that solution will increase We will also see that, on a website (ours in this case) different formats can share the same storage system with no problems and are used for different purposes or types of statistical products Document structure normalisation

15 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Unstructured formats: –Not enriched with metadata (there will be a different session dealing exclusively with metadata) –Not particularly focused on statistical dissemination Adobe Acrobat PDF Text and spreadsheets Static HTML pages –We can certainly “get by” with them Document structure normalisation

16 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March PDF, XLS … (no comment will be made about them) Use of static HTML pages online and in offline publications AdvantagesAdvantages table aspect –Documents with a statistical table aspect can be “shown” using the tags,,... –There are many functions available for formatting text, although not so many for organising tables –Both static and dynamic HTML pages can be created (dynamic pages are usually generated with the help of “CGI” type programs which send logical queries to databases and file servers, or with other online data access technologies (ASP, PSP) ExampleExample –And all Office tools offer the possibility of editing static HTML pages. Document structure normalisation. Unstructured formats

17 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Use of static HTML pages online and in offline publications Disadvantage –HTML enables us to “show” metadata, –but it will not manage it for us in our best interests: (such as conventions regarding the meaning of different parts of the information, which would be useful for computer presentation processes), but as just another text Document structure normalisation. Unstructured formats

18 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Example of HTML source code Población de 16 y más años por sexo, grupos de edad (4) y relación con la actividad económica. Población de 16 y más años por sexo, grupos de edad (4) y relación con la actividad económica. Población > 16 años Activos Ocupados Parados Parados que buscan primer empleo Inactivos Población contada aparte Ambos sexos Total Document structure normalisation. Unstructured formats... De 16 a 19 años 9.0

19 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Structured(based on metadata) Structured formats (based on metadata) : –Standards promoted by official statistical institutions, EDIFACT/GESMES, SDMX... –Actual ‘de facto’ standards for disseminating statistical data ( readers: “Pseudo OLAP”: PC-Axis, SuperTABLE, EVA, Navidata, Beyond 20/20 ®) –Conventional databases, with capacity to store data and metadata, and to dynamically generated the required information –Multidimensional systems, or OLAP as they are actually called, for storage and dissemination of data: Data Warehouse approach Document structure normalisation

20 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Is it necessary to spend time and resources structuring statistical data files or creating costly databases for dissemination? irrespective of the medium 1.- Automating data presentation tasks and achieving productivity bonuses will only be possible when the structure of the files generated is widely recognised, stable, repetitive … All of which will aid editing tasks, irrespective of the medium (paper or electronic and online publications) search 2.- Presentation logic will be in response to a metadata model, and metadata may be used to reinforce search functions communication between services 3.- Clear metadata documentation simplifies communication between services, producers and the dissemination unit, and enables concurrent working between organizations, 4.- and it will subsequently facilitate the communication of data between organizations, or to individuals, Web Services, content syndication environments.. Structured formats. The key question: the role of metadata

21 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March One way or another, using metadata will bring us closer to a matrix model However, how easy is it to structure tables in multidimensional matrix form reflecting possible variable crosses, based on metadata used to describe them?… Not always easy, sometimes “cubist art” (using cubes) is required for complex tables such as this one:Not always easy, sometimes “cubist art” (using cubes) is required for complex tables such as this one: Structured formats. The key question: the role of metadata

22 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March This “cubist art” demands that, besides concerning ourselves with metadata, we focus on clearly identifying matrices resulting from the tabulation process and which are valid for dissemination systems. Sometimes it is necessary to manipulate a tabulated matrix Dividing it into several matrices Combining a classification variable with a counting one Concatenating classification variables Combining previous actions It should be recognised this may entail an added workload Structured formats. The key question: the role of metadata

23 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The decision: –Have we already opted to simply produce structured files or databases with statistical tables ordered as matrices, resulting from systematically crossing variables, and accompanied by all the metadata necessary for their interpretation?... If the answer is ‘YES’, it is necessary to talk of available formats and procedures Structured formats. The key question: the role of metadata

24 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March EDIFACT,EDIFACTEDIFACT, Electronic Data Interchange For Administration Commerce and Trade, electronic document structures promoted by the United Nations for exchanging documents electronically in the field of trade and public administrations GESMES GESMESGESMES = GEneric Statistical MESsage –adaptation for statistical purposes of the EDIFACT EDI syntax –Designed by a workgroup composed of statistics institutes, customs bodies and central banks. European Union IDA –Financed as part of the European Union IDA project (Interchange of data between administrations) –Published in 1993 –Adapted to multidimensional “data set” structures including their own metadata –Complete, detailedcomplex –Complete, detailed, and somewhat complex –In use between EUROSTAT and all the INEs, on a communication system based on the “Stadium - Statel - Testa services” extranet Different storage formats Standards promoted by official statistics institutions EDIFACT/GESMES, EDAMIS

25 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Basic EDIFACT syntax: segments –An EDIFACT exchange comprises of a sequence of segments identifier –each segment has a unique 3-character identifier rules of order –There are rules of order for segments –“Entity-relation” modelling techniques were used to design message syntax Different storage formats Standards promoted by official statistics institutions EDIFACT/GESMES

26 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March GESMES implementations –ECOSER (economic time series) –BOPSTA (balance of payments) –PRODCOM (production data) –CLASET (statistical classifications) –RDRMES (raw data collection) –GESMES / CB (central banks short term economic indicators) Different storage formats Standards promoted by official statistics institutions EDIFACT/GESMES

27 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Problems …Problems … –The more popular table presentation programs often do not have the capability of exporting and importing data with GESMES (PC-Axis does this) –Through intensive use of the internet, new technologies emerged, particularly the standard XML / SDMX Different storage formats Standards promoted by official statistics institutions : EDIFACT/GESMES

28 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March An example of a messageAn example of a message: UNH GESMES:D:95A:E6' BGM+74:::PC-AXIS Win 2.0' DTM+137: :101' NAD+MS+ine' CTA+CC+:INE Difusion Fax: ' NAD+MR+eurostat' ASI+01001' SCD+4+sexo++++:1' SCD+4+grupos de edad (4)++++:2' SCD+4+relacion con la actividad economica++++:3' SCD+3+Poblacion de 16 y mas años++++:4' DSI+epa4t97' GIR+5+SDB:AB+01:AC+Ejemplo.- Resultados nacionales:AD' ARR++9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9.0:9..0.0:9.0:9.0:9.0: 9.0:9.0:9.0:++9.0:9.0:9.0:9.0:9.0' IDE ' Different storage formats Standards promoted by official statistics institutions EDIFACT/GESMES

29 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March SDMSDMX (Standard Data and Metadata eXchange) is an initiative promoted by BIS (International Payments Bank), OCDE, IMF, World Bank, European Central Bank, UN and EUROSTAT in order to:http://www.sdmx.org/ Promote the use of standards in exchanging statistical information between institutions There are already pilot projects in place, or experiences such as: SODI –Eurostat SODI ( Sdmx Open Data Interchange) –NAWWE (“The primary objective of the NAWWE project is to use a web based mechanism for collecting national accounts data based on already internationally agreed national accounts standards”) –Mexico : Different storage formats. SDMX

30 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Not institutionally standardised, although they have come to be “de facto” standards Specially designed for holding and presenting data and metadata Reader programs mimic the functions of OLAP multidimensional data presentation (to show dimensions and hierarchies, to pivot, to deepen, to nest, to show graphics and statistical maps) Full metadata handling capability Several programs, several regional markets: –PC Axis (Sweden) –PC Axis (Sweden): Nordic countries, UNECE, other EU countries (Spain too), South Africa, Guatemala… –CUB X / EVA –CUB X / EVA: Eurostat program –Beyond 2020 –Beyond 2020: USA, Canada, UK, France... –SuperTABLE –SuperTABLE: Australia Different storage formats De facto standards and “pseudo OLAP” visualisers

31 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Statistical table management application with a spreadsheet interface Windows environment Simple to handle and use by non-IT experts Simple to generate: Write in ASCII with tags, structured, self- documented, and easy to translate to XML (Adaptation to SDMX ver. 2 format underway) Easy to associate with Office applications Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

32 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Statistics SwedenThe program has been developed by Statistics Sweden The GUI shows typically statistical elements: universes or contents, variables, modify variable and value selections, nestings, etc... File generation can be fully automated : –By means of robots or tabulation program macros (SAS) –From relational or multidimensional databases containing the information (such as our Tempus 2 system) –Or simply displayed online using “cgi / web gateways” type programs Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

33 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March table filemetadatadata The table file (*.px) contains both metadata and data : Lines of Metadata Lines of Metadata … AXIS-VERSION="2000"; CREATION-DATE=" :44"; SUBJECT-AREA="Demography"; SUBJECT-CODE="l1"; MATRIX="L10026E"; TITLE="Population of main capital cities."; CONTENTS="Population of the largest urban agglomeration. Year 2005"; DESCRIPTION="Population of the largest urban agglomeration. Year 2005 "; DECIMALS=0; SHOWDECIMALS=0; STUB="Country/Agglomeration"; HEADING="population (thousands)"; UNITS="population (thousands)"; LAST-UPDATED="26/03/07"; CONTACT="INE www.ine.es/infoine Internet:www.ine.es Tel: " ""; VALUES("Country/Agglomeration")="Afghanistan. Kabul","Albania. Tirana", "Algeria. Algiers","American Samoa. Pago Pago", … Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

34 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March SOURCE="Statistical Yearbook of Spain "; COPYRIGHT=YES; NOTE="Information Source: United Nations Demographic Yearbook. "; VALUENOTE("Country/Agglomeration","Australia. Sydney ")=" Including Christmas Island, Cocos (Keeling) Island and Norfolk " "Island."; VALUENOTE("Country/Agglomeration","Channel Islands. ST. Helier")="Including the islands of Guernsey and Jersey. "; VALUENOTE("Country/Agglomeration","China. Shanghai")="For statistical purposes, the data for China do not include Hong Kong #" "and Macao Special Administrative regions (SAR) of China. #" " "; VALUENOTE("Country/Agglomeration","Comoros (The). Moroni")="Including the island of Mayotte. "; VALUENOTE("Country/Agglomeration","Finland. Helsinki")="Including Aland Islands. "; VALUENOTE("Country/Agglomeration","Mauritius. Port Louis")="Including Agalega, Rodrigues and Saint Brandon. "; VALUENOTE("Country/Agglomeration","Norway. Oslo")="Including Savalbard and Jan Mayen islands. "; VALUENOTE("Country/Agglomeration","Saint Helena. Half Tree Hollow")="Including Ascension and Tristan da Cunha. "; Lines of data Lines of data … DATA= … Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

35 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The correspondence between tables and the program interface is intuitive: publication / folder or database, subject areas, tables, variables, values, data... Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

36 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March View a table Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

37 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March DDE and OLE communication with other Office programs: Excel, Word... Multiple export formats: Excel, Text, Html, Dbase, Gesmes, shortly SDMX... Other functions Other functions: Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

38 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Easy to combine with browsing structures based on static and dynamic HTML pages (as done by the INE in INEbase and monthly INEbase) Statistical Graphs and Maps (with PX-Map) In several languages Calculation functions, on rows, columns and between tables of equal dimensions Customisable views and printing Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

39 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Site in English maintained by Statistics Sweden with links to all the programs in the PC-Axis suite, to countries where PC-Axis solutions are used, to the forum, download area, etc Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

40 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March How the INE uses PC-AxisHow the INE uses PC-Axis: –As a reference for designing the dissemination database, and particularly for its metadata storage structures (INEbase and Tempus 2 ) –As a format for the files from all statistical operations not yet uploaded to Tempus 2, or which are not anticipated to be uploaded ( a program -‘Jaxi’- displays them online) –As another export format offered by INEbase –And for building “offline” programs: monthly INEbase, EPA... Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

41 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

42 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March One or more CGI programs provide “pseudo OLAP” browsing, search and presentation functions Data is held, by means of PC-Axis files, on the internet file server, in a directory structure which follows the logical subject tree of the organisation – that of the ISO-. Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

43 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Different storage formats De facto standards and “pseudo OLAP” visualisers: PC-Axis

44 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March EVA (ex CUB.X) is a very similar program to PC- Axis, which also enables handling of multidimensional tables ajava/help/en/homepage.htm states that:http://epp.eurostat.cec.eu.int/extraction/evajava/ev ajava/help/en/homepage.htm EVA “EVA stands for Eurostat's Visual Application, the Eurostat's Common Browser for its statistical databases. EVA is a specialised multidimensional statistical table browser” Different storage formats De facto standards and “pseudo OLAP” visualisers: CUB-X / EVA

45 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Eurostat’s “New Cronos” database and its HTML presentation Different storage formats De facto standards and “pseudo OLAP” visualisers: CUB-X / EVA

46 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Shell HTML Different storage formats De facto standards and “pseudo OLAP” visualisers: CUB-X / EVA

47 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Retrieving data in HTML table format Different storage formats De facto standards and “pseudo OLAP” visualisers: CUB-X / EVA

48 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The visualiser for Windows The dimension “blocks” show the groups of values of the variable, and allow for rotating the chosen values “Spreadsheet” type interface, drag and drop functions to modify the header row and the header column Values and codes Multiple export format: Excel, Dbase, Gesmes... Different storage formats De facto standards and “pseudo OLAP” visualisers: CUB-X / EVA

49 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March INFO Fri Dec 19 10:16:08 trueValues=5246 on 5544 LASTUP Fri, 19 Dec :11: TYPE RV DELIMS DIMLST (soft,theme,domain,collect,table,indic,country,time) DIMUSE (R,R,N,N,N,V,V,V) POSLST (newcronos) (theme1) (eur2) (01-cn) (01-cn-a) (cnpib90a) (01, 22,30,11,34,32,14,28,16, 24,18,36,38,40,41,26,42, 46) (1999a00, 1998a00,1997a00,1996a00,1995a00,1994a00,1993a00,1992 a00,1991a00, 1990a00,1989a00,1988a00,1987a00,1986a00) FORMAT FORMATR NOTAV : VALLST (0) ( , , , , , , , , , , An example of an EVA file: Different storage formats De facto standards and “pseudo OLAP” visualisers: CUB-X / EVA

50 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March –Distributed by the Canadian company of the same name, which also produces: the (independent) file and metadata preparation system and an “internet file server” version of the program –Data and metadata are stored in files which are not directly legible (binary), it is not possible to create Beyond files from outside its specific “builder” programs –Capabilities: spreadsheet interface, drag and drop, exchange and nesting of variables, statistical graphs and maps tablesextracts –Two main types of file: tables and “extracts”. A distinguishing feature of Beyond is its capability for handling microdata and tables with the same program. “Extracts” are indexed microdata files which are specially handled so that online tabulation is quick Different storage formats De facto standards and “pseudo OLAP” visualisers : Beyond 20/20

51 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Chapters, tables and “extracts” Different storage formats De facto standards and “pseudo OLAP” visualisers: Beyond 20/20

52 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Viewing tables Different storage formats De facto standards and “pseudo OLAP” visualisers: Beyond 20/20

53 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Selecting variables from an “extract” or microdata file Different storage formats De facto standards and “pseudo OLAP” visualisers: Beyond 20/20

54 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Structured storage. Conventional databases with the capability to store data and metadata and dynamically generate the information requested It is also possible to create Databases for online statistical dissemination, with robust metadata support : –Adhering in its design to a pre-existing structured format (the Swedish model, Spain –Oracle-,...) –Or with a model of its own (Holland, the StatLine system) Different storage formats Databases

55 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March StatLine, from the Netherlands Central Bureau of Statistics, is a fantastic reference, of how to combine a database with a look-up system… It seems a simple medium, but is the outcome of several years’ work... Different storage formats Databases

56 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March StatLine: powerful presentation of metadata, “pseudo OLAP” look-up functions. (Data supported by a relational-model Database on the server, Java Applet internet technology, it is recommendable to have ample broadband …) Viewing metadata, cubes, dimensions Different storage formats Databases

57 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Viewing data, functions of deepening, nesting, pivoting, Dragging and dropping StatLine: powerful presentation of metadata, “pseudo OLAP” look-up functions. ( Data supported by a relational-model Database on the server, Java Applet internet technology, it is recommendable to have ample broadband …) Different storage formats Databases

58 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The INEbase Tempus II subsystem ( Time Series databank ) Relational database system are also widely used as dissemination tools. The INE uses them: 1.- As a more compact store than PC-Axis files, distributing in different tables the different metadata components and data, and enabling: construction of queries on demand exporting PC-Axis, Excel format … A growing part of the information is uploaded to the INEbase Tempus II subsystem, a relational database system (Oracle) in which the following are made compatible: single information storage and a presentation in two possible forms: tables and chronological series Different storage formats Databases

59 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Analyze Define Create 1.- Relational Model 2.- GES_Tempus Tool for managing processes. Using new TP2 format 4.- Tempus 2 (model + data) 3.- Gathering data from Tempus, PC-Axis and other sources. 5.- Displaying tables (collection of series) (March 2004) 8.- Program for accessing series Tempus Accessing to series (first version) 7.- Extracting data from T2 e.g: FMI

60 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The INEbase Tempus II subsystem ( Time Series databank ) Different storage formats Databases

61 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The INEbase Tempus II subsystem We developed a tool ( Ges-Tempus) for managing all operations at Tempus 2 Different storage formats Databases

62 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March

63 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March As a dissemination system of statistical data closer to the concept of “lists” than of “tables”: An example, the List of place names : Filtrable lists, not crosses of variables Different storage formats Databases

64 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March As a dissemination system of statistical data closer to the concept of “lists” than of “tables”. Another example, the Industrial Product Survey Filtrable lists, not crosses of variables Different storage formats Databases

65 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March What might the role be of BI/DW systems in a statistical dissemination strategy? When in a company or economic purpose being studied … –The number of variables or dimensions to be analysed is high –Granularity or level of subject or territorial detail is also high –It is difficult to predict many of the possible subject and territorial crosses, as well as that of hierarchical presentation levels appropriate for different types of users …We shall need to model “n-dimensional cubes” populated by cell volumes significantly greater than 10 raised to 5… We can continue to use traditional relational modelling systems, however… It is time to speak to an expert in multidimensional analysis! Different storage formats OLAP systems for data dissemination Multidimensional databases

66 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March This will spell the end of working with multiple “data sets”, or with a set of relational tables, to store (less numerous) “cubes” which contain a large amount of data with a high level of subject, territorial or temporal granularity Ideal for displaying results of Censuses and other operations enabling small-area statistics –Censuses –Large surveys, large company or establishment directories, high level of detail Intranet or Internet Use of the most advanced OLAP techniques One example is the experience of the INE in the 2001 Censuses Different storage formats OLAP systems for data dissemination Multidimensional databases

67 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March It is not yet the most common dissemination technology, however it is more predictable around the XML format (and the SDMX project), for there to be built, in addition to data exchange standards … –automated surveying systems for companies via the internet –data dissemination systems They are ideal for combining with structuring systems for data in herent to XML, using “classic” or “multidimensional - OLAP” databases An interesting experience which is underway: the pilot project on the foreign debt, based on the SDMX standard Different storage formats Operating international normalisation experiences: SDMX-XML

68 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Interesting example of the OECD using SDMX: Different storage formats Operating international normalisation experiences: SDMX-XML

69 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March We have said “Structure to disseminate”. However... What if the data is completely unstructured, as is the case with old, paper-based publications?What if the data is completely unstructured, as is the case with old, paper-based publications? Example of an application with unstructured data The INE does not rule out using the internet to disseminate these valuable historical collections. The INEbase HISTORIA project is currently in its final stages of cataloguing, and combines en mass OCR processing (scanning), a SGBDR system, and a file server, in order to provide guided access and search systems in order to view and download pages from those publications, in PDF and Excel formats This will be covered in another presentation

70 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Example of an application with unstructured data

71 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Depending on the IT infrastructure dedicated to storage, and dissemination of data and metadata, we are able to use different tools to structure the information to be disseminated, from greater to lesser complexity … –A metadata creation environment in a multidimensional database system (The INE uses it in the DW 2001 Census ) –Or one associated with a relational database (The INEbase Tempus II environment …) PX-Make ( O PX-Edit) –Or something as simple as handling PX-Make ( O PX-Edit), in order to produce PC-Axis files... Example of a tool for structuring data: PX-Make

72 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Example of a tool for structuring data: PX-Make Interface designed for working with PX files Exchange of data with Excel, Access..., and with PX files already made EASY: a day’s training is enough. Used by service promoters It is part of the SW from the “PC-Axis suite”

73 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Example of a tool for structuring data: PX-Make

74 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Example of a tool for structuring data: PX-Make

75 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Data retrieval and presentation Are our ‘official’ statistical data, naked or boring data?

76 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? Let’s see some ways for helping our users to access more friendly to our information. Statistical data can be even amusing! Specially if the information is structured!... of course

77 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March “Some say that statistics or data aren’t very sexy, that they have the image of being quite difficult, of being boring or even of being biased and not worth to be studied”. “Listening to representatives of Gapminder or Swivel one could think official statistics is just naked data, difficult to access and not considering new technologies. Is this true?” Are ‘official’ statistical data boring, or ‘naked data’? “Official statistics are a key “public good” that foster the progress of societies”. OECD World Forum, Istanbul Declaration, June 2007 Kindly suggested to watch interesting video ‘Unveiling the beauty of statistics’, presented by Hans Roslings at the OECD World Forum in Istanbul in June 2007 Kindly suggested to watch interesting video ‘Unveiling the beauty of statistics’, presented by Hans Roslings at the OECD World Forum in Istanbul in June 2007

78 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? One of the questions we did on our survey was: “Do you think blogs and forums are interesting in statistical dissemination?” Blogs can be perfect tools for being used for statisticians in order to know new initiatives for improving statistical data dissemination e.g. BLOGS

79 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Private initiatives Using ‘our’ official data they attempt to make it user-friendly Gapminder : “Gapminder developed the Trendalyzer software that converts international statistics into moving, interactive and enjoyable graphics.” ( )www.gapminder.orgTrendalyzerhttp://en.wikipedia.org/wiki/Hans_Rosling Many Eyes “Our goal is to "democratize" visualization and to enable a new social kind of data analysis” ( )www.many-eyes.com Internet Penetration and Usage in Europe, by Country, Sept Swivel : “Where Curious People Explore Data”www.swivel.org Average age at death by Age at retirement Netvibes : “Built-in Netvibes modules include an RSS/Atom feed reader” ( RSSAtom

80 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? Graphs “A picture is worth a thousand words” Different ‘friendly’ visualization styles for similar data

81 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? Certi Enabling users to make calculations, even using everyday language to explain the objective of the program

82 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? ‘Gossiping’ ( Why not?) some demanded data Or surnames ( Spain ) ‘Friendly’ style ‘Naked’ style The most frequent names ( Zurich)

83 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? Giving users tools for helping to use our sites Search engines (including suggested links) and sitemap

84 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Are ‘official’ statistical data boring, or ‘naked data’? Giving users possibility to look for values, to configure results screen, to export to different formats... Selection of variables (INE Spain)... or PX-Web model (Finland)

85 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Special sections for ‘other’ types of users With children in mind (Brazil)... or journalists (Spain)

86 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Historical information is very popular and demanded indeed Evolution of municipalities in Spain

87 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Even giving colloquial texts for unspecialised users The same information is available for specialized users in other sections

88 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March The icing on the cake Interesting maps ( Switzerland)... Population clock (Census- USA)

89 Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March Thank you very much for your attention. Any questions, please? Storing data Structure for dissemination Different data storage formats Data retrieval and presentation European Statistical Training Programme


Download ppt "Internet dissemination. Part I. Storing and retrieving data. Disseminating statistics: Internet and Publications Madrid, 3-5 March 2008 1 Storing data."

Similar presentations


Ads by Google