Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.

Slides:



Advertisements
Similar presentations
Database Planning, Design, and Administration
Advertisements

Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Introduction To System Analysis and Design
Aki Hecht Seminar in Databases (236826) January 2009
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Case-based Reasoning System (CBR)
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Knowledge is Power Marketing Information System (MIS) determines what information managers need and then gathers, sorts, analyzes, stores, and distributes.
Methodology Conceptual Database Design
Lecture Nine Database Planning, Design, and Administration
Palestinian Central Bureau of Statistics (PCBS) Palestine Poverty Maps 2009 March
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Domain Modelling the upper levels of the eframework Yvonne Howard Hilary Dexter David Millard Learning Societies LabDistributed Learning, University of.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
2 1 Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
LESSON 8 Booklet Sections: 12 & 13 Systems Analysis.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
ECO Statistical Network Statistical Center of Iran.
ITEC224 Database Programming
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.
Methodology Conceptual Databases Design
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 2/1 Copyright © 2004 Please……. No Food Or Drink in the class.
VOA3R Virtual Open Access Agriculture & Aquaculture Repository: sharing scientific and scholarly research related to agriculture, food, and environment.
Introduction To System Analysis and Design
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
SICENTER Ljubljana, Slovenia TRACKING THE IMPLEMENTATION OF THE MDGs WITH TIME DISTANCE MEASURE Professor Pavle Sicherl SICENTER and University of Ljubljana.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
P15 Lai Xiaoni (U077151L) Qiao Li (U077194E) Saw Woei Yuh (U077146X) Wang Yong (U077138Y)
Methodology - Conceptual Database Design
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
Object-Oriented Software Engineering using Java, Patterns &UML. Presented by: E.S. Mbokane Department of System Development Faculty of ICT Tshwane University.
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
Methodology – Physical Database Design for Relational Databases.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
State of Palestine Palestinian Central Bureau of Statistics (PCBS) UNSD DFID Project on National Development Indicators October, 2014.
Module 4: Systems Development Chapter 13: Investigation and Analysis.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Fundamentals, Design, and Implementation, 9/e Appendix B The Semantic Object Model.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
Advanced Higher Computing Science
Methodology Conceptual Databases Design
Methodology Conceptual Database Design
Object-Oriented Software Engineering Using UML, Patterns, and Java,
Methodology – Physical Database Design for Relational Databases
Translation of ER-diagram into Relational Schema
Database Systems Instructor Name: Lecture-3.
Methodology Conceptual Databases Design
Palestinian Central Bureau of Statistics
Presentation transcript:

Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference on Official Statistics 1

2 Contents  Objectives  Methodology:  Data preparation  Data Analysis  Database Schema  Data Mapping Algorithm  Importing and Mapping Indicators  Data Visualization  Experimental Results  System Design and evaluation  Conclusion and Future Work IAOS 2014 Conference on Official Statistics

3 Objectives The objective of this study is to introduce mapping algorithm and visualization application for:  Collecting, mapping, grouping, and integrating heterogeneous data and statistical indicators into a common schema.  Due to increase in complexity and heterogeneity of statistical data, an increasing need for sophisticated visualization technology arises, We introduced a new visualization application to enhance presentation of official statistics based on visual analytical approach that combines both data analysis and interactive visualization.  We implemented new mapping algorithm based on hamming distance, edit distance and ontology, using our algorithm we enhanced integration, harmonization and mapping of statistical data indicators from different sources IAOS 2014 Conference on Official Statistics

4 Objectives IAOS 2014 Conference on Official Statistics

5 To achieve the objectives of this research and build the mapping algorithm and visualization system, we collected statistical data indicators from different surveys and sources.  Data Preparation: Collecting Data/Indicators, we selected data from different subjects (sectors).  Data Analysis: Defining indicators, units and subgroups, Linking indicators, units and subgroups to form I-U-S combinations, Categorizing I-U-S combinations under various classifications.  Entity-Relationship (ER) Diagram and Database Schema  Implementation of Data Mapping Algorithm  Indicators Mapping  Experimental Results and Algorithm Accuracy  Visualization System Design and Case Study IAOS 2014 Conference on Official Statistics

6  We focused on in our study on Millennium Development Goals (MDGs) indicators.  Statistical data and indicators were included in different sectors and sub sectors like economy, education, environment, health, information and communication, nutrition, population and women.  Many interviews with the managers of Palestinian Central Bureau of Statistics (PCBS) departments and divisions were conducted to know the structure of the indicators, subgroups and units and to help us in building the final schema. Data Preparation IAOS 2014 Conference on Official Statistics

7  After defining and preparing indicators, we defined the unit for each indicator and associated each indicator with the correct unit, then we defined the subgroups for each indicator. A subgroup is a subset within a sample or population identified by some common dimension such as sex, age or location. Data Analysis Multidimensional ‘Cube’ of data IAOS 2014 Conference on Official Statistics

8 Entity-Relationship (ER) Diagram IAOS 2014 Conference on Official Statistics

9 Data Mapping Algorithm After building our schema, we built mapping algorithm in C# using hamming distance and edit (Levenshtein) distance, and by adding ontology, edit distance can be considered a generalization of the Hamming distance, which is used for strings of the same length and only considers substitution edits IAOS 2014 Conference on Official Statistics

10 String Comparator Metrics Hamming Distance: The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.  For instance, the Hamming distance between "toned" and "roses" is 3, between “ ” and “ ” is 2, and between “ ” and “ ” is 3.  For binary strings "a" and "b", the Hamming distance is equal to the number of ones (population count) in a XOR b. IAOS 2014 Conference on Official Statistics

11 String Comparator Metrics Edit (Levenshtein) Distance : Edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.  Edit distance can be calculated using dynamic programming (Navarro & Raffinot 2002). Dynamic programming is a method of solving a large problem by regarding the problem as the sum of the solution to its recursively solved sub problems IAOS 2014 Conference on Official Statistics

12 String Comparator Metrics Edit (Levenshtein) Distance (Cont.): To compute the edit distance ed(x,y) between strings x and y, a matrix M 1...m+1,1...n+1 is constructed where M i,j is the minimum number of edit operations needed to match x 1...i to y 1...j. Each matrix element M i,j is calculated as per Equation 1, where if a = b and 1 otherwise. The matrix element M 1,1 is the edit distance between two empty strings. Equation 1: Edit distance ed(x,y) between strings x and y. IAOS 2014 Conference on Official Statistics

13 String Comparator Metrics Edit (Levenshtein) Distance (Cont.): An example to calculate the edit distance between the strings “Kitten" and “Sitting". The edit distance between these strings given as Mm+1,n+1 is 3. K i t t e n S i t t i n g 1 st step: kitten  sitten (substitution) 2 nd step: sitten  sittin (substitution) 3 rd step: sittin  sitting (insertion) IAOS 2014 Conference on Official Statistics

14 String Comparator Metrics Edit (Levenshtein) Distance (Cont.): Table (1) is an example of the matrix produced to calculate the edit distance between the strings "DFGDGBDEGGAB" and "DGGGDGBDEFGAB". The edit distance between these strings given as Mm+1,n+1 is 3. DGGGDGBDEFGAB D F G D G B D E G G A B IAOS 2014 Conference on Official Statistics

15 An ontology typically provides a vocabulary that describes a domain of interest and a specification of the meaning of terms used in the vocabulary  Ontology matching is a solution to the semantic heterogeneity problem.  It finds correspondences between semantically related entities of ontologies. These correspondences can be used for various tasks, such as ontology merging, query answering, or data translation. Ontology IAOS 2014 Conference on Official Statistics

16 Indicators Mapping Example of Importing and Mapping "Growth rate of GDP /person employed" Indicator. IAOS 2014 Conference on Official Statistics

17 Indicators Mapping Example of Importing and Mapping "Urbanization Level" Indicator without Ontology. IAOS 2014 Conference on Official Statistics

18 Indicators Mapping with Ontology Example of Importing and Mapping "Urbanization Level" Indicator with Ontology. IAOS 2014 Conference on Official Statistics

19 Sectors, Classes, and Units Mapping Importing % Unit without using Ontology. IAOS 2014 Conference on Official Statistics

20 Sectors, Classes, and Units Mapping Importing % Unit using Ontology. IAOS 2014 Conference on Official Statistics

21 Experimental Results To test and evaluate the accuracy of the mapping algorithm in practice, we performed some experiments on many indicators, the indicators chosen from different countries since each country indicators different from others in the name of the indicators, units and subgroups, this will help us to test the algorithm accuracy. IAOS 2014 Conference on Official Statistics

22 Accuracy of the algorithm according to our results without ontology and with ontology Algorithm Indicators, Units and Subgroups Mapping Accuracy with Ontology and without Ontology. IAOS 2014 Conference on Official Statistics

23  The visualization in our system were implemented based on the Microsoft platform and.NET Framework using the software development tools of Microsoft visual studio, including.NET and ASP.NET., High charts Java Script libraries.  Searching, comparing, re-visualizing, sorting, and filtering interaction techniques were implemented for visualizations in our application, The purpose of our system to store data and indicators, for organizing, storing and presenting data in a uniform way, and present information in the form of tables, graphs and maps, and to facilitate data sharing. the application supports several tasks to do visualization. System Design IAOS 2014 Conference on Official Statistics

24 Data Visualization IAOS 2014 Conference on Official Statistics

25  We make interviews to analyze the user requirements. Each type of users has their own concerns and expectations from the system. In order to identify user needs, interview with expert users have been done.  This step is done by making interview to get requirements before implementation phase of our system visualization from the end users, We collected feedbacks from interview with 13 participants. Participants were selected to be representative of the intended user community, including Statisticians, researchers, decision makers so that they could specify user requirements related to visualization, design appropriate functionalities, and develop a visualization system for statistical data. Interview with expert users to specify users requirements IAOS 2014 Conference on Official Statistics

26  A structured interview with pre-defined questions performed. 12 participants were asked to use the visualization techniques of the system to perform a number of preselected tasks that are related to participants work context. Participants were asked to think aloud while carrying out the tasks.  All questions were carefully selected in order to make conclusions about effectiveness, efficiency of the system, and the level of the user satisfaction Interview with end users to evaluate the visualization and interaction techniques in the system IAOS 2014 Conference on Official Statistics

27  This research aimed to introduce a new mapping algorithm and visualization system for matching statistical indicators based on common schema, heterogeneous data from different data sources integrated using the created algorithm before visual methods applied.  We suggested new mapping algorithm based on hamming distance, edit distance and ontology, using our algorithm we enhanced integration and mapping of statistical data indicators from different sources.  The system was successfully evaluated by different users and experts, statisticians, researchers, decision makers. The evaluation results have shown high levels for effectiveness, efficiency, and for the user satisfaction of the system. Conclusion and Future Work IAOS 2014 Conference on Official Statistics

28 Future work includes:  To focus more on data mapping using ontology.  Extending our mapping algorithm to handle more sophisticated mappings between ontologies (i.e., non 1-1 mappings).  Also to focus more on collecting data from different sources since we focused as a case study on importing data from different excel sources (files),  Future work includes also improving collaboration with visualization system  Additional methods are required to support the users in finding good views on the data and in determining appropriate visualization techniques.  We have to consider the 3D visualization of uncertain graph structures with uncertain attributes, which we think is a formidable challenge. Conclusion and Future Work IAOS 2014 Conference on Official Statistics

29 IAOS 2014 Conference on Official Statistics Thanks Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics