Presentation is loading. Please wait.

Presentation is loading. Please wait.

FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.

Similar presentations


Presentation on theme: "FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5."— Presentation transcript:

1 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Data, Information & Knowledge 2

2 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Introduction Previous presentation covered what data is* In this presentation we cover where data comes from and factors we need to take into account when gathering data for processing * Should really be data “are” but nobody talks like this!

3 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Data Sources Data can be collected either: DIRECTLY Gathered from an original source or INDIRECTLY Gathered from an another source or as a by-product of another operation In the world of business these would be described as primary and secondary sources of data

4 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Direct (Original) Data Sources Sale of an item in a supermarket recorded at EFTPOS terminal Data from sensors e.g. a weather station Data collected in a survey e.g. a questionnaire or an interview

5 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Indirect Data Sources 1 Data collected for one purpose and used for another A credit card company collects data about your spending in order to bill you each month. However, a secondary use of this data is to build up a “profile” of your spending habits. This data can then be used to send you direct marketing about goods and services that may appeal to you. Credit Card Transaction Indirect Use of Data Direct Use of Data Customer Billing Direct Marketing

6 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Indirect Data Sources 2 Purchased data/data passed on There are a number of ways data can be acquired from 3 rd parties and then used for a different purpose A good example is the electoral roll. Its main use is to gather data about who is eligible to vote. However, marketing companies make extensive use of the roll to target customers.

7 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Coding Data Before being stored in a computer information can be coded as data e.g. M or F Mo, Tu, We, Th, Fr, Sa, Su I, II, IIIM, IIIN, IV, V S, M, L, XL, XXL In the picture shown we can see the date code for the tyre This represents the eighth week of 2006

8 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Benefits of Coding Less storage space is required M and F require less storage space than male and female Faster data input See above Validation is easier With a limited number of codes it is easier to match them against rules to check they are entered correctly

9 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Drawbacks of Coding Precision of data can be lost (coarsened) In the example all shades of blue are coded as “blue” The user needs to know the codes used How many of these top level domains do you know? au, ch, de, ie, pk, fr, il, lk, es Data in Stored data PinkBlueBlackBlue

10 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Coding Value Judgements Coding value judgements can be a particular problem as they are subject to personal opinion What do you think of this presentation? Good? Average? Poor? One person’s good may be another person’s poor!!! Value judgements are very difficult to encode without some coarsening (loss of detail) How would you improve the analysis? What are the time/cost implications?

11 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Quality of the Data Source 1 GIGO (Garbage In Garbage Out) If data input is poor the resulting information output will be poor i.e. corrupt, inaccurate etc. Can you think of any “real life” examples? Garbage In Garbage Out

12 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Quality of the Data Source 2 Examples of GIGO can include: Unreliable questionnaires/surveys e.g. inappropriate samples, badly worded questions etc. Incorrectly calibrated instruments e.g. an incorrectly calibrated balance will give incorrect measures of mass Human error e.g. transcription errors when entering data Incomplete data sets e.g. failing to account for “shrinkage” when measuring supermarket stock

13 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Summary/Revision Topics Data can arise from direct and indirect sources Information can be coded as data This has a number of benefits but can lead to coarsening The source/accuracy of data has a major impact on the quality of information produced i.e. GIGO

14 FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5 License Revision Tasks Use your textbook/Internet sources to make your own notes on: Sources of Data Encoding Data Quality of Data Sources Try questions 18-24 on this worksheet http://www.teach-ict.com/as_a2/topics/data_info_know/data_worksheet.doc Diagram/example on slide 9 courtesy of teach-ict.com. See the original here.teach-ict.comhere


Download ppt "FatMax 2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5."

Similar presentations


Ads by Google