Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.

Slides:

Advertisements

Similar presentations

1 WATER AUTHORITY Dr. Or Goldfarb CENTRAL BUREAU of STATISTICS Zaur Ibragimov Water Accounts in Israel Vienna January 2009.

Advertisements

Aki Hecht Seminar in Databases (236826) January 2009

Distance Functions for Sequence Data and Time Series

Fundamentals, Design, and Implementation, 9/e COS 346 Day 2.

Chapter 1: Data Collection

Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.

Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.

1 Service Discovery using Diane Service Descriptions Ulrich Küster and Birgitta König-Ries University Jena Germany

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.

Palestinian Central Bureau of Statistics (PCBS) Palestine Poverty Maps 2009 March

LOGICAL DATABASE DESIGN

National Statistical Office, Thailand 2-6 December 2013, Hanoi, Viet Nam Census Evaluation.

Domain Modelling the upper levels of the eframework Yvonne Howard Hilary Dexter David Millard Learning Societies LabDistributed Learning, University of.

1 Census Evaluation in Kenya By M.G. Obudho & J. K. Bore Kenya National Bureau of Statistics.

Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.

OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR

2 1 Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.

Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.

Overview of the Database Development Process

Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.

ECO Statistical Network Statistical Center of Iran.

Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.

25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.

Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.

Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.

Methodology Conceptual Databases Design

4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.

1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief

Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 2/1 Copyright © 2004 Please……. No Food Or Drink in the class.

PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.

Population Census carried out in Armenia in 2011 as an example of the Generic Statistical Business Process Model Anahit Safyan Member of the State Council.

Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.

RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,

Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.

Study on How to Improve the Quality of Official Statistics and Provide Accurately Categorized Data SAFE Shanghai Branch Deputy Director-General Lv Jinzhong.

DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.

BES Equitable and Sustainable Well-being in Italy

European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.

P15 Lai Xiaoni (U077151L) Qiao Li (U077194E) Saw Woei Yuh (U077146X) Wang Yong (U077138Y)

United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE

6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.

3 & 4 1 Chapters 3 and 4 Drawing ERDs October 16, 2006 Week 3.

INFORMATION SYSTEM FOR SUPPORT OF REGIONAL DEVELOPMENT (INFOREG) IN THE SLOVAK REPUBLIC INFOSTAT, Bratislava, Slovakia Prepared by Lenka Priehradnikova,

VISUALIZATION AND OTHER TOOLS FOR BETTER UNDERSTANDING & DISSEMINATION UNECE Work Session on Statistical Dissemination and Communication May 2008,

Methodology – Physical Database Design for Relational Databases.

Quality requirements EMOS RESEARCH TO COLLECT EVIDENCE BEFORE ANYTHING Benchmark against good existing model, e.g. European Masters in Translation, European.

1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.

State of Palestine Palestinian Central Bureau of Statistics (PCBS) UNSD DFID Project on National Development Indicators October, 2014.

Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.

David Chiu and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University 1 Supporting Workflows through Data-driven Service.

S T A T I S T I K A U S T R I A Quality Assessment of register-based Statistics A Quality Framework Manuela LENK Directorate.

Fundamentals, Design, and Implementation, 9/e Appendix B The Semantic Object Model.

Agency on Statistics of the Republic of Kazakhstan A strategy for the dissemination of statistical information: the Kazakhstan experience.

The FDES revision process: progress so far, state of the art, the way forward United Nations Statistics Division.

1 Enhancing data quality by using harmonised structural metadata within the European Statistical System A. Götzfried Head of Unit B6 Eurostat.

10 August Inter-Regional Workshop on the Production of Gender Statistics New Delhi, India, 6-10 August 2007 Strengthening National Gender Statistics.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.

Applying Deep Neural Network to Enhance EMPI Searching

UNECE Seminar on New Frontiers for Statistical Data Collection, Geneva

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

Translation of ER-diagram into Relational Schema

CSc4730/6730 Scientific Visualization

[jws13] Evaluation of instance matching tools: The experience of OAEI

Actively Learning Ontology Matching via User Interaction

Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University

Palestinian Central Bureau of Statistics

Presentation transcript:

Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central Bureau of Statistics European Conference on Quality in Official Statistics (Q2014) 1

2 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Contents  Objectives  Methodology:  Data preparation  Data Analysis  Database Schema  Data Mapping Algorithm  Importing and Mapping Indicators  Data Visualization  Experimental Results  Conclusion and Future Work

3 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Objectives The objective of this study is to introduce a new interoperable visual analytics framework for:  Collecting, mapping, grouping, and integrating heterogeneous data and statistical indicators into a common schema.  Mapping indicators by building mapping algorithm using hamming distance, edit distance and ontology.  Enhancing presentation of official statistics based on visual analytical approach that combines both data analysis and interactive visualization.

4 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Objectives

5 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria To achieve the objectives of this research and build the Visual Analytics Framework, we collected statistical data indicators (MDGS) from different surveys and sources.  Data Preparation: Collecting Data/Indicators, we select data from different subjects (sectors).  Data Analysis: Defining indicators, units and subgroups, Linking indicators, units and subgroups to form I-U-S combinations, Categorizing I-U-S combinations under various classifications.  Entity-Relationship (ER) Diagram and Database Schema  Implementation of Data Mapping Algorithm  Indicators Mapping  Visualization  Experimental Results and Algorithm Accuracy Methodology

6 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria  We focus in our study on Millennium Development Goals (MDGs) indicators  Statistical data and indicators were included in different sectors and sub sectors like economy, education, environment, health, information and communication, nutrition, population and women.  Many interviews with the managers of Palestinian Central Bureau of Statistics (PCBS) departments and divisions were conducted to know the structure of the indicators, subgroups and units and to help us in building the final schema. Data Preparation

7 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria  After defining and preparing indicators, we defined the unit for each indicator and associated each indicator with the correct unit, then we defined the subgroups for each indicator. A subgroup is a subset within a sample or population identified by some common dimension such as sex, age or location. Data Analysis Multidimensional ‘Cube’ of data

8 Entity-Relationship (ER) Diagram European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria

9 Database Schema European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Database Schema Snapshot

10 Data Mapping Algorithm European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria After building our schema, we build mapping algorithm in C# using hamming distance and edit (Levenshtein) distance, and by adding ontology, edit distance can be considered a generalization of the Hamming distance, which is used for strings of the same length and only considers substitution edits

11 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria String Comparator Metrics Hamming Distance: The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.  For instance, the Hamming distance between "toned" and "roses" is 3, between “ ” and “ ” is 2, and between “ ” and “ ” is 3.  For binary strings "a" and "b", the Hamming distance is equal to the number of ones (population count) in a XOR b

12 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria String Comparator Metrics Edit (Levenshtein) Distance : Edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.  Edit distance can be calculated using dynamic programming (Navarro & Raffinot 2002). Dynamic programming is a method of solving a large problem by regarding the problem as the sum of the solution to its recursively solved sub problems

13 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria String Comparator Metrics Edit (Levenshtein) Distance (Cont.): To compute the edit distance ed(x,y) between strings x and y, a matrix M 1...m+1,1...n+1 is constructed where M i,j is the minimum number of edit operations needed to match x 1...i to y 1...j. Each matrix element M i,j is calculated as per Equation 1, where if a = b and 1 otherwise. The matrix element M 1,1 is the edit distance between two empty strings. Equation 1: Edit distance ed(x,y) between strings x and y.

14 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria String Comparator Metrics Edit (Levenshtein) Distance (Cont.): An example to calculate the edit distance between the strings “Kitten" and “Sitting". The edit distance between these strings given as Mm+1,n+1 is 3. K i t t e n S i t t i n g 1 st step: kitten  sitten (substitution) 2 nd step: sitten  sittin (substitution) 3 rd step: sittin  sitting (insertion)

15 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria String Comparator Metrics Edit (Levenshtein) Distance (Cont.): Table (1) is an example of the matrix produced to calculate the edit distance between the strings "DFGDGBDEGGAB" and "DGGGDGBDEFGAB". The edit distance between these strings given as Mm+1,n+1 is 3. DGGGDGBDEFGAB D F G D G B D E G G A B

16 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria An ontology typically provides a vocabulary that describes a domain of interest and a specification of the meaning of terms used in the vocabulary  Ontology matching is a solution to the semantic heterogeneity problem.  It finds correspondences between semantically related entities of ontologies. These correspondences can be used for various tasks, such as ontology merging, query answering, or data translation. Ontology

17 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Indicators Mapping Example of Importing and Mapping "Growth rate of GDP /person employed" Indicator.

18 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Indicators Mapping Example of Importing and Mapping "Urbanization Level" Indicator without Ontology.

19 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Indicators Mapping with Ontology Example of Importing and Mapping "Urbanization Level" Indicator with Ontology.

20 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Sectors, Classes, and Units Mapping Importing % Unit without using Ontology.

21 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Sectors, Classes, and Units Mapping Importing % Unit using Ontology.

22 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Data Visualization

23 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Experimental Results To test and evaluate the accuracy of the mapping algorithm in practice, we performed some experiments on many indicators, the indicators chosen from different countries since each country indicators different from others in the name of the indicators, units and subgroups, this will help us to test the algorithm accuracy.

24 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Mapping Results without Ontology Algorithm Indicators, Units and Subgroups Mapping Accuracy without Ontology.

25 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Mapping Results with Ontology Algorithm Indicators, Units and Subgroups Mapping Accuracy with Ontology.

26 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Accuracy of the algorithm according to our results without ontology and with ontology Algorithm Indicators, Units and Subgroups Mapping Accuracy with Ontology and without Ontology.

27 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria  This research aimed to introduce a new interoperable visual analytics framework for Collecting, mapping, processing, and disseminating statistical data based on common schema, heterogeneous data from different data sources integrated using the created algorithm  we suggested new mapping algorithm based on hamming distance, edit distance and ontology, using our algorithm we enhanced integration and mapping of statistical data indicators from different sources.  We tested the accuracy of the algorithm, experimental results shown high accuracy of mapping for the algorithm by adding the ontology to the algorithm. the accuracy of the algorithm when importing and mapping indicators without ontology was 67% and with ontology the accuracy was 89%, for units mapping the accuracy was 82% without ontology and 95% with ontology, and for subgroups the accuracy was 78% without ontology and 100% with ontology. Conclusion and Future Work

28 European Conference on Quality in Official Statistics (Q2014)- 2-5 June, 2014, Vienna, Austria Future work includes:  To focus more on data mapping using ontology.  Extending our mapping algorithm to handle more sophisticated mappings between ontologies (i.e., non 1-1 mappings),  Also to focus more on collecting data from different sources since we focused as a case study on importing data from different excel sources (files),  Future work includes also improving collaboration with visual analytics framework.  Additional methods are required to support the users in finding good views on the data and in determining appropriate visualization techniques.  We have to consider the 3D visualization of uncertain graph structures with uncertain attributes, which we think is a formidable challenge. Conclusion and Future Work

29 European Conference on Quality in Official Statistics (Q2014) Thanks Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central Bureau of Statistics