Using the Concept of Info- Cubes to Facilitate Data Analysis in Demographic Surveillance Systems Yazoumé Yé, Uwe Wahser Centre de Recherche en Santé de Nouna, Burkina Faso
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser2/21 Content Introduction and Background (Slide 3 - 6) The Relational Model in DSS (Slide ) Introducing Info Cubes (Slide ) Conclusion (Slide 20)
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser3/21 Introduction Background: Data Analysis in Demographic Surveillance Systems (DSS) Dilemma: Analysis of original data is difficult, preprocessed output by technical staff is not flexible enough Proposition: provide Info-Cubes as easy-to-access data sources
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser4/21 Data Processing in DSS Data Collection D.Management Data Entry Analysis Analysis of Fixed Format Output Online Analytical Processing (OLAP) Online Transaction Processing (OLTP)
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser5/21 OLTP vs. OLAP provides big picture supports analysis needs aggregate data evaluate all datasets quickly multidimensional model Q: “HOW MANY live in Atown?” provides detailed audit supports operations needs detailed data find one dataset quickly relational model Q: “WHO lives in Atown?”
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser6/21 Peeping into DWH Architecture Operational Database(s) Data Warehouse (DWH) Data Marts OLAP on Multidimensional DB OLTP on Relational DB Info Cubes
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser7/21 Relational Model in DSS Location Individual Group Residence Relationship Membership Outmigration Inmigration Death BirthObservation Status Observation Preg. Outcome
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser8/21 Advantages of the RM Optimized for creation, reading, updating and deletion of data sets Ensures the retrieval of data Eliminates redundant data to –Ensure data consistency –Minimize data volume Insensitive to change The RM supports OnLine Transaction Processing (OLTP)
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser9/21 Expected Output for Analysis
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser10/21 Querying the Database Location Individual Group Resident Relationship Membership Outmigration Inmigration Death Birth Observation Status Observation Preg. Outcome
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser11/21 Relational Data Storage
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser12/21 Problems of the RM Information has to be collected from several tables with complex queries –Design of queries is time consuming –Execution of queries is time consuming Complex model is difficult to understand –Design of queries needs skilled staff The RM is not optimized for OnLine Analytical Processing (OLAP)
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser13/21 From Fixed Format Tables... Analysis Variable: Number of People Dimension 1: Ethnic Group Dimension 2: Sex Dimension 3: Town
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser14/21... to Info-Cubes Atown Betown Tababu Ferengi M Cetown Muzungu 45 F Dimension 1: Ethnic Group Dimension 2: Sex Dimension 3: Town
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser15/21... to MDD Tables Dimension 1: Ethnic Group EthGrp Dimension 2: Sex Sex Dimension 3: Town Town MuzunguMCetown45MuzunguFCetown54TababuMCetown123TababuFCetown132FerengiMCetown234FerengiFCetown243MuzunguMBetown234MuzunguFBetown243TababuMBetown45... V_Count Analysis Variable
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser16/21 Some Terms Granularity: degree of detail of the aggregated data Drill Down: zoom into detail during OLAP Roll Up: zoom out Slicing: restricting analysis to one category Dicing: restricting analysis to a selection of categories
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser17/21 Example of Slicing Atown Betown Tababu Ferengi M Cetown Muzungu F Dimension 1: Ethnic Group Dimension 2: Sex Dimension 3: Town Atown Betown only Tababu M Cetown F
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser18/21 Some Remarks Adding dimensions increases granularity Adding categories increases granularity High granularity produces big cubes Number of data sets in a cube = number of existing combinations of categories across all dimensions Challenge: define useful info-cubes which are not too big (performance) but contain sufficient dimensions (flexibility)
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser19/21 Advantages of Info-Cubes Queries on Info-Cubes are simple are fast, when the granularity is not too high are more flexible than fixed output tables Info-Cubes normally don’t contain confidential data can be seized according to the needs of the trageted researcher Info-Cubes are suitable for OLAP and for dissemination of DSS Data
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser20/21 Conclusion: Possible Benefits Facilitate dissemination of DSS data Give more responsibilities to the researcher Reduce workload of technical staff Ensure consistent analysis results With suitable browser-tools: enable online cubes on the WWW
INDEPTH General Meeting 2000 Yazoumé Yé, Uwe Wahser21/21 Links and Literature Commercial Demo Cubes on the WWW: Good Overview on DWH: Inmon WH, Welch JD, Glassey KL, Managing the Data Warehouse. New York: John Wiley & Sons, 1997 Some Links on DWH, MDD: www2.andrews.edu/~dheise/dw/Avondale/ACDWTOC.html members.aol.com/fmcguff/dwmodel/index.htm muenchen.de/~system42/public/Line42/Literatur/OLAP- Modeling.html