Data Warehousing and Knowledge Management Slide 1 Data Warehousing: the New Knowledge Management Architecture for Humanities Research? Janet Delve University.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
C6 Databases.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Managing Data Resources
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 3 Database Management
Introduction to Data Warehousing Enrico Franconi CS 636.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
Designing a Data Warehouse
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Data Warehouse & Data Mining
1 California State University, Fullerton Chapter 7 Information System Data Management.
Data Warehouse Concepts Transparencies
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Data Warehousing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Warehousing/Mining 1 Data Warehousing/Mining Introduction.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Managing Data Resources File Organization and databases for business information systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Intro to MIS – MGS351 Databases and Data Warehouses
Data warehouse and OLAP
Data Warehouse.
Databases and Data Warehouses Chapter 3
Basic Concepts in Data Management
Data Warehouse and OLAP
An Introduction to Data Warehousing
C.U.SHAH COLLEGE OF ENG. & TECH.
MANAGING DATA RESOURCES
Introduction of Week 9 Return assignment 5-2
Data Warehousing Concepts
Terms: Data: Database: Database Management System: INTRODUCTION
Data Warehouse and OLAP
Presentation transcript:

Data Warehousing and Knowledge Management Slide 1 Data Warehousing: the New Knowledge Management Architecture for Humanities Research? Janet Delve University of Portsmouth, UK UKAIS 2004

Data Warehousing and Knowledge Management Slide 2 Introduction Data Warehouses everywhere Amazon Wal*Mart Opodo DWs used a lot in industry, and scientific research, but not in humanities research. Written paper covers linguistics and history. Talk covers history in detail and gestures towards linguistics.

Data Warehousing and Knowledge Management Slide 3 Overview Introduction Data modelling and traditional databases Source-oriented data modelling Data Mining Philosophy of data warehousing Background of DWs Basic components of a data warehouse (DW) Advantages of DWs Findings –Humanities and DWs Humanities and DWs – some issues Examples of possible Humanities DWs Ideas for the future?

Data Warehousing and Knowledge Management Slide 4 Data Modelling Relational data modelling – material split into many tables in order to gain enhanced performance – no duplication, updating or insertion anomalies etc. Source-oriented data modelling – emphasis on modelling data as closely as possible to original source which is included in its entirety for posterity. DW data modelling nearer to source-oriented approach in spirit.

Data Warehousing and Knowledge Management Slide 5 Traditional databases ERD p117 Harvey and Press

Data Warehousing and Knowledge Management Slide 6 Traditional databases Harvey and Press p.129

Data Warehousing and Knowledge Management Slide 7 Historical Data This can be difficult to model because: It is irregular in structure, It is complex It is erratic in terms of when it occurs Using a relational database can mean data from a single source being spit into many tables.

Data Warehousing and Knowledge Management Slide 8 Source-oriented data modelling ‘a semantic network tempered by hierarchical considerations’ [Thaller 1991, 155]. Its flexible nature gives  a ‘rubber band data structures’ facility [Denley 1994, 37]. The fluid nature of creating a database with  marks it out as an ‘organic’ DBMS.

Data Warehousing and Knowledge Management Slide 9 Data Mining The whole field is often referred to as data mining, which is also a major component within the field. Data mining (DM) is normally used on large quantities (terabytes) of data, to find meaningful patterns. Neural nets, statistical modelling, decision trees are just some AI methods used. SQL can be used too. Parallel data processing is used with DM. In order to mine data, it must be kept in a suitable system - a data warehouse is ideal.

Data Warehousing and Knowledge Management Slide 10 Philosophy of data warehousing ‘Data warehousing is an architecture, not a technology. There is the architecture, and there is the underlying technology, and they are two very different things. Unquestionably there is a relationship between data warehousing and database technology, but they are most certainly not the same. Data warehousing requires the support of many different kinds of technology.’ Inmon 2002

Data Warehousing and Knowledge Management Slide 11 Background of DWs Business-oriented – serve the analytical needs of a company. The ordinary DBMS is still needed for the day-to-day queries, and also to feed the DW. W.H. Inmon, father of DW. Cabinet effect –1991 R. Kimball, expert on dimensional modelling Need for single, integrated source of clean data, particularly for multinational etc. companies Supporting technology from e.g. Oracle, Prism Solutions, IBM

Data Warehousing and Knowledge Management Slide 12 Data Marts Data marts contain DW data but are restricted to one department or one business process. The industry is divided about data marts, Inmon recommends building the DW first, then siphoning off the data to data marts. Kimball believes you should build several data marts first, then integrate them into a DW.

Data Warehousing and Knowledge Management Slide 13 Basic components of a Data Warehouse (DW) A DW is subject-oriented, integrated, non-volatile & time-variant. The major subjects for an insurance company are customer, policy, premium and claim. Previously data modelled around applications - car, health, life and accident. Integration is the most important facet of a DW. Previous inconsistencies are ironed out and all data unambiguously entered into DW. Many sources of data can be placed in DW.

Data Warehousing and Knowledge Management Slide 14 Basic components of a Data Warehouse (DW) Non-volatile data in a DW means that it is not changed in the way data is in operational database – data is loaded en masse and isn’t updated. Obviates need for normalisation. Time- variant – DW time horizon 5 –10 years, operational database 2-3 months. DW snapshots, operational database current data, DW always has element of time, operational database may or may not have. Inmon 2002

Data Warehousing and Knowledge Management Slide 15 Kimball p7 Basic components of a Data Warehouse (DW)

Data Warehousing and Knowledge Management Slide 16 Typical Architecture of a Data Warehouse

Data Warehousing and Knowledge Management Slide 17 Meta Data Meta data is extremely important in a DW. It is used: to log the extraction and loading of data into the warehouse; in query management to locate the most appropriate data source and also to help end users to build queries; to show how the data has been mapped when carrying out data cleansing and transformations; To manage all the data in the DW – recording where data came from, when etc.

Data Warehousing and Knowledge Management Slide 18 Basic components of a Data Warehouse (DW) Fact Tables ‘A fact table is the primary table in a dimensional model where the numerical performance measurements of the business are stored… The measurement data resulting from a business process is stored in a single data mart Since measurement data is overwhelmingly the largest part of any data mart, we avoid duplicating it in multiple places around the enterprise’Kimball 2002

Data Warehousing and Knowledge Management Slide 19 Basic components of a Data Warehouse (DW) Dimension tables These contain the textual descriptors of the business. Their depth and breadth define the usefulness of the DW. Contains data that doesn’t change frequently Can have attributes. Not usually normalized. (Snowflake and starflake) Coding disparaged (Long term view)

Data Warehousing and Knowledge Management Slide 20 Star schema Kimball p51 Basic components of a Data Warehouse (DW)

Data Warehousing and Knowledge Management Slide 21 Kimball p43 Basic components of a Data Warehouse (DW)

Data Warehousing and Knowledge Management Slide 22 Basic components of a Data Warehouse (DW) Kimball p39

Data Warehousing and Knowledge Management Slide 23 Data Warehousing Tools and Technologies Building a data warehouse is a complex task because there is no vendor that provides an ‘end-to-end’ set of tools. Necessitates that a data warehouse is built using multiple products from different vendors. Ensuring that these products work well together and are fully integrated is a major challenge.

Data Warehousing and Knowledge Management Slide 24 Advantages of DWs Flexibility in modelling data. Time dimension – country-specific calendars and synchronization across multiple time zones. Easy to add external data and summarised data. Built for analysis. Built for huge volumes of data (terabytes of data – a trillion ). Can cope with ‘idiosyncrasies of geographic location dimensions’ within GISs.

Data Warehousing and Knowledge Management Slide 25 Possible advantages of DWs Indexing facilities of DW. Publishing the ‘right data’ – data collected from a variety of sources and edited for quality and consistency. DW seeks to collate all data so a variety of different subsets can be analysed whenever required. Easy to extend DW and add material from a new source. Data cleansing techniques. Tracking facility afforded by meta data

Data Warehousing and Knowledge Management Slide 26 Disadvantages of DWs Some humanities data fits into the ‘numerical fact’ topology, some doesn’t Technology not easy and is based on having existing databases to extract from Regular snapshots not the same but they could equate to data sets taken at different periods of time (e.g census, 1861 census) A lot to learn.

Data Warehousing and Knowledge Management Slide 27 Findings – Humanities and DWs NAGARA (National Association of Government Archives and Records Administrators) Article on DWs by Mary Klauda of the Minnesota Historical Society 1999 (archivist) Eastern Connecticut schools DW 2002 Bo Wandschneider – University of Guelph, Canada -DW and the use of census data. ICPSR (Inter- university Consortium for Political and Social Research)

Data Warehousing and Knowledge Management Slide 28 Findings – Humanities and DWs University of California DW – memo to Humanities department Social Science DW – Human Resources DW project of Human Sciences Research Council, South Africa GEOBASE, Israel. DW of Israel’s regional statistics, supported by National Planning Authority in the Ministry of Interior Affairs.

Data Warehousing and Knowledge Management Slide 29 Humanities and DWs – some issues Scale – can cope with really large country / state -wide problems. Can analyse e.g. British censuses (10 8 ). Can put several databases together to produce a time run – e.g Hearth taxes, window taxes, poll taxes, land taxes, poor rates all in one DW. Oracle site licenses.

Data Warehousing and Knowledge Management Slide 30 Examples of possible History DWs

Data Warehousing and Knowledge Management Slide 31 Examples of possible History DWs MANOR ManorId Holding Id Property Id Original Owner Id Date Manor Value Tax (Hides) Cottar Population Bordar Population Villein Population Sokeman Population Pries Population Number of Burgesses Number of slaves Etc. HOLDING DETAILS Holding ID King Tenant in Chief Manor Lord VILL Etc. ORIGINAL OWNER Original Owner ID Etc. PROPERTY INFORMATION Property Id Property description Property value Etc

Data Warehousing and Knowledge Management Slide 32 Examples of possible History DWs Data from a variety of sources over time– hearth tax, poor rates, trade directories, census, street directories, wills and inventories, GIS maps for a city e.g. Winchester. Voting data – poll book data and rate book data up to 1870 for whole country (note some data missing). Port data – all data from portbooks for all British ports together with yearly trade figures. Street directories for whole country for last 100 years. Taxation overview – different types / areas / periods.

Data Warehousing and Knowledge Management Slide 33 Examples of possible History DWs 19 th C British census data doesn’t fit into the typical DW model as it doesn’t have the numerical facts to go into a fact table. However, there’s a recent development in DWs – ‘factless’ fact tables. There is real scope to be able to model historical data using these.

Data Warehousing and Knowledge Management Slide 34 Examples of possible History DWs Kimball p247

Data Warehousing and Knowledge Management Slide 35 Examples of possible Humanities DWs Language DW – could contain databases of different languages for comparison, or many databases of same languages over larger area. DW of worldwide scholarly community / whole culture GIS or archaeological DW by continent etc. rather than country. DW of biographies. DW of library catalogues or archives for enhanced public access.

Data Warehousing and Knowledge Management Slide 36 Ideas for the future? Instead of ‘me and my database’ - emphasis on smallish, individual, national projects, Maybe ‘Our integrated warehouse’ – emphasis on large scale, collaborative, international projects?