Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics and Data. inference Sample (Data) statistics probability sampling Population Ch 2, Ch 4, Ch 6 Ch 5, Ch 9, Ch 21 Ch 7, Ch 8 Ch 3, Ch 23 Ch 10-16.

Similar presentations


Presentation on theme: "Statistics and Data. inference Sample (Data) statistics probability sampling Population Ch 2, Ch 4, Ch 6 Ch 5, Ch 9, Ch 21 Ch 7, Ch 8 Ch 3, Ch 23 Ch 10-16."— Presentation transcript:

1 Statistics and Data

2 inference Sample (Data) statistics probability sampling Population Ch 2, Ch 4, Ch 6 Ch 5, Ch 9, Ch 21 Ch 7, Ch 8 Ch 3, Ch 23 Ch 10-16 We will cover ch. 1-15, 16, 21 and 23 of SVV. 4 major parts of STATISTICS

3 Sample & Population population: a group, as a whole, of objects we want to know. sample: a part of population, which we observe(d).

4 data: (a set of) numbers and symbols indicating a fact in reality. A sample is also represented as data. The word DATA is plural form of DATUM. Sometimes people use DATA as if singular conventionally. To avoid the confusion in grammar, a word DATASET is often used as a singular form of DATA. Others use DATASET meaning for groups of DATA. Data obtained by observing (a part of) a population are a sample.

5 observations (records, cases) variables (attributes ) In a data table, each column means variable, and each row corresponds to observation. Database technicians use the terminologies attribute and record, instead of variable and observation. Data table is a data shown in tabular form.

6 Database (DB) is a collection of closely related data, managed by Database Management System (DBMS) software. Relational DB uses tabular representation of data.

7 Data and Statistics Statistics is the way to read data, and interprets the story the data says. To understand the story, it’s important to delve the CONTEXT of the data, as well as dataset itself.

8 Context of data Something told by data regardless of whether it was represented explicitly or not (implicitly). - Meaning of the variables and values of the data - Who & When are the data collected (by) - How & Why …. - All history & background of data

9 Types of variables identifierquantitative variable categorical variable (qualitative variable) categorical or quantitative : by scale of measurement

10 Categorical variable - nominal scale : sex - male (1), female (2) - ordered scale : grade of score - A,B,C,D,F Quantitative variable - interval scale : year of birth - BC300, AD1900 - ratio scale: price, weight, ….

11 Interval scale or Ratio scale Is quotient meaningful? Then, ratio scale. Test with the cases of -30kg, -50Won, BC1000, -10 o C. - 60kg is 2 times heavier than 30kg. - AD2000 is 2 times older than AD1000. - 20 o C is 2 times warmer than 10 o C. - 100Won is 2 times more than 50Won.

12 CustomerDaily callYearly call Clinton2954 Ford1450 David0320 Gates2795 Categorical or Quantitative eg) number of phone calls rather likely to be categorical rather likely to be quantitative

13 Cross-Sectional Data, Longitudinal Data and Time Series When several variables are all measured at the same time point, the data is called cross-sectional data. For example, determining sales revenue, number of customers, and expenses for the last month of business. Variables that are measured at some time points are called a longitudinal data. For example, number of victims of earthquake. Longitudinal data measured at very many and regularly spaced time points are called time series. For example, monthly recorded discount rates of US treasury bond.

14 How to get data No TreatmentTreatment Non-designed Sampling by Voluntary Response, Convenience, etc. Observational Study Designed SamplingExperiment Using data collected from computer DB is basically considered as non-designed study. For example, using customer data from DB. Data-mining dealt in chapter 24 analyzes data collected from DB. Chapter 3 Chapter 23

15 Sampling & Experiment For selected patients by design, blood pressures of the patients are measured without any treatment, then that is data collection by sampling. Before measuring blood pressures, treatments are given, eg. dosing different types of medicines for comparison of the medicines, that is by experiment.

16 Designed Study & Non-designed Study Designed study aims to generalize the results obtained from data to assumed population. For example, the result of well performed political poll for an election can be interpreted as opinions of all voters. To generalize, the sample (or experimental units) must be selected by well designed plan. Non-designed study does not aim to generalize obtained results from data to population, and aims just to see tentative results. For example, a poll done by internet website can not be generalized to of all citizens.

17 Sampling schemes - cluster sampling - systematic sampling - simple random sampling (SRS) - stratified sampling

18 Assume we want to measure math ability of 5 th year grade students of a school. The number of students is 300 and divided into 10 classes. Examples of the sampling schemes selecting 60 students are as follows. Simple random sampling (SRS) : after giving numbers from 1 to 300 to the students, select 60 students by using random number generator (see, www.random.org). Stratified sampling : select 6 students by SRS, for each class. Cluster sampling : select 2 classes, and measure all students in those classes. Systematic sampling : select 60 students who got the number of multiples of 5.

19 Which sampling scheme will give the most precise result, if the school streamed the students for their math ability priori ? Which sampling scheme is the most convenient to measure math ability of 5 th year grade students of the school ?

20 Voluntary Response Sampling and Convenience Sampling In a voluntary response sample, a large group of individuals is invited to respond, and all who do respond are counted. Voluntary response samples are almost always biased, and so conclusions drawn from them are almost always wrong. In convenience sampling we simply include the individuals who are convenient. Unfortunately, this group may not be representative of the population.

21 Thank you !!


Download ppt "Statistics and Data. inference Sample (Data) statistics probability sampling Population Ch 2, Ch 4, Ch 6 Ch 5, Ch 9, Ch 21 Ch 7, Ch 8 Ch 3, Ch 23 Ch 10-16."

Similar presentations


Ads by Google