Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

Similar presentations


Presentation on theme: "1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole."— Presentation transcript:

1

2 1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Slides by John Loucks St. Edward’s University

3 2 2 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 1 Data and Statistics n Data n Data Sources n Descriptive Statistics n Statistical Inference n Computers and Statistical Analysis n Data Mining n Ethical Guidelines for Statistical Practice n Applications in Business and Economics n Statistics

4 3 3 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Statistics n The term statistics can refer to numerical facts such as averages, medians, percents, and index numbers that help us understand a variety of business and economic situations. n Statistics can also refer to the art and science of collecting, analyzing, presenting, and interpreting data.

5 4 4 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Applications in Business and Economics n Accounting n Economics Public accounting firms use statistical sampling procedures when conducting audits for their clients. Economists use statistical information in making forecasts about the future of the economy or some aspect of it. Financial advisors use price-earnings ratios and dividend yields to guide their investment advice. Finance Finance

6 5 5 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Applications in Business and Economics A variety of statistical quality control charts are used to monitor the output of a production process. n Production Electronic point-of-sale scanners at retail checkout counters are used to collect data for a variety of marketing research applications. n Marketing

7 6 6 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data and Data Sets n Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. and summarized for presentation and interpretation. All the data collected in a particular study are referred All the data collected in a particular study are referred to as the data set for the study. to as the data set for the study.

8 7 7 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Elements are the entities on which data are collected. Elements are the entities on which data are collected. A variable is a characteristic of interest for the elements. A variable is a characteristic of interest for the elements. The set of measurements obtained for a particular The set of measurements obtained for a particular element is called an observation. element is called an observation. The total number of data values in a complete data The total number of data values in a complete data set is the number of elements multiplied by the set is the number of elements multiplied by the number of variables. number of variables. Elements, Variables, and Observations A data set with n elements contains n observations. A data set with n elements contains n observations.

9 8 8 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Stock Annual Earn/ Stock Annual Earn/ Exchange Sales($M) Share($) Data, Data Sets, Elements, Variables, and Observations Company Dataram Dataram EnergySouth EnergySouth Keystone Keystone LandCare LandCare Psychemedics Psychemedics NQ 73.10 0.86 NQ 73.10 0.86 N 74.00 1.67 N 74.00 1.67 N365.70 0.86 N365.70 0.86 NQ111.40 0.33 NQ111.40 0.33 N 17.60 0.13 N 17.60 0.13 Variables Element Names Names Data Set Observation

10 9 9 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement The scale indicates the data summarization and The scale indicates the data summarization and statistical analyses that are most appropriate. statistical analyses that are most appropriate. The scale indicates the data summarization and The scale indicates the data summarization and statistical analyses that are most appropriate. statistical analyses that are most appropriate. The scale determines the amount of information The scale determines the amount of information contained in the data. contained in the data. The scale determines the amount of information The scale determines the amount of information contained in the data. contained in the data. Scales of measurement include: Scales of measurement include: Nominal Ordinal Interval Ratio

11 10 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Nominal A nonnumeric label or numeric code may be used. A nonnumeric label or numeric code may be used. Data are labels or names used to identify an Data are labels or names used to identify an attribute of the element. attribute of the element. Data are labels or names used to identify an Data are labels or names used to identify an attribute of the element. attribute of the element.

12 11 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Example: Example: Students of a university are classified by the Students of a university are classified by the school in which they are enrolled using a school in which they are enrolled using a nonnumeric label such as Business, Humanities, nonnumeric label such as Business, Humanities, Education, and so on. Education, and so on. Alternatively, a numeric code could be used for Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and 2 denotes Humanities, 3 denotes Education, and so on). so on). Example: Example: Students of a university are classified by the Students of a university are classified by the school in which they are enrolled using a school in which they are enrolled using a nonnumeric label such as Business, Humanities, nonnumeric label such as Business, Humanities, Education, and so on. Education, and so on. Alternatively, a numeric code could be used for Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and 2 denotes Humanities, 3 denotes Education, and so on). so on). Scales of Measurement n Nominal

13 12 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Ordinal A nonnumeric label or numeric code may be used. A nonnumeric label or numeric code may be used. The data have the properties of nominal data and The data have the properties of nominal data and the order or rank of the data is meaningful. the order or rank of the data is meaningful. The data have the properties of nominal data and The data have the properties of nominal data and the order or rank of the data is meaningful. the order or rank of the data is meaningful.

14 13 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Ordinal Example: Example: Students of a university are classified by their Students of a university are classified by their class standing using a nonnumeric label such as class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on). Freshman, 2 denotes Sophomore, and so on). Example: Example: Students of a university are classified by their Students of a university are classified by their class standing using a nonnumeric label such as class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on). Freshman, 2 denotes Sophomore, and so on).

15 14 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Interval Interval data are always numeric. Interval data are always numeric. The data have the properties of ordinal data, and The data have the properties of ordinal data, and the interval between observations is expressed in the interval between observations is expressed in terms of a fixed unit of measure. terms of a fixed unit of measure. The data have the properties of ordinal data, and The data have the properties of ordinal data, and the interval between observations is expressed in the interval between observations is expressed in terms of a fixed unit of measure. terms of a fixed unit of measure.

16 15 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Interval Example: Example: Melissa has an SAT score of 1205, while Kevin Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 has an SAT score of 1090. Melissa scored 115 points more than Kevin. points more than Kevin. Example: Example: Melissa has an SAT score of 1205, while Kevin Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 has an SAT score of 1090. Melissa scored 115 points more than Kevin. points more than Kevin.

17 16 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Ratio The data have all the properties of interval data The data have all the properties of interval data and the ratio of two values is meaningful. and the ratio of two values is meaningful. The data have all the properties of interval data The data have all the properties of interval data and the ratio of two values is meaningful. and the ratio of two values is meaningful. Variables such as distance, height, weight, and time Variables such as distance, height, weight, and time use the ratio scale. use the ratio scale. Variables such as distance, height, weight, and time Variables such as distance, height, weight, and time use the ratio scale. use the ratio scale. This scale must contain a zero value that indicates This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. that nothing exists for the variable at the zero point. This scale must contain a zero value that indicates This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. that nothing exists for the variable at the zero point.

18 17 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement n Ratio Example: Example: Melissa’s college record shows 36 credit hours Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned. Kevin has twice as many credit hours earned as Melissa. hours earned as Melissa. Example: Example: Melissa’s college record shows 36 credit hours Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned. Kevin has twice as many credit hours earned as Melissa. hours earned as Melissa.

19 18 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data can be further classified as being categorical Data can be further classified as being categorical or quantitative. or quantitative. Data can be further classified as being categorical Data can be further classified as being categorical or quantitative. or quantitative. The statistical analysis that is appropriate depends The statistical analysis that is appropriate depends on whether the data for the variable are categorical on whether the data for the variable are categorical or quantitative. or quantitative. The statistical analysis that is appropriate depends The statistical analysis that is appropriate depends on whether the data for the variable are categorical on whether the data for the variable are categorical or quantitative. or quantitative. In general, there are more alternatives for statistical In general, there are more alternatives for statistical analysis when the data are quantitative. analysis when the data are quantitative. In general, there are more alternatives for statistical In general, there are more alternatives for statistical analysis when the data are quantitative. analysis when the data are quantitative. Categorical and Quantitative Data

20 19 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Categorical Data Labels or names used to identify an attribute of Labels or names used to identify an attribute of each element each element Labels or names used to identify an attribute of Labels or names used to identify an attribute of each element each element Often referred to as qualitative data Often referred to as qualitative data Use either the nominal or ordinal scale of Use either the nominal or ordinal scale of measurement measurement Use either the nominal or ordinal scale of Use either the nominal or ordinal scale of measurement measurement Can be either numeric or nonnumeric Can be either numeric or nonnumeric Appropriate statistical analyses are rather limited Appropriate statistical analyses are rather limited

21 20 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Quantitative Data Quantitative data indicate how many or how much: Quantitative data indicate how many or how much: discrete, if measuring how many discrete, if measuring how many continuous, if measuring how much continuous, if measuring how much Quantitative data are always numeric. Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for Ordinary arithmetic operations are meaningful for quantitative data. quantitative data. Ordinary arithmetic operations are meaningful for Ordinary arithmetic operations are meaningful for quantitative data. quantitative data.

22 21 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Scales of Measurement CategoricalCategoricalQuantitativeQuantitative NumericNumericNumericNumericNon-numericNon-numeric DataData NominalNominalOrdinalOrdinalNominalNominalOrdinalOrdinalIntervalIntervalRatioRatio

23 22 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Cross-Sectional Data Cross-sectional data are collected at the same or Cross-sectional data are collected at the same or approximately the same point in time. approximately the same point in time. Cross-sectional data are collected at the same or Cross-sectional data are collected at the same or approximately the same point in time. approximately the same point in time. Example: data detailing the number of building Example: data detailing the number of building permits issued in February 2010 in each of the permits issued in February 2010 in each of the counties of Ohio counties of Ohio Example: data detailing the number of building Example: data detailing the number of building permits issued in February 2010 in each of the permits issued in February 2010 in each of the counties of Ohio counties of Ohio

24 23 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Time Series Data Time series data are collected over several time Time series data are collected over several time periods. periods. Time series data are collected over several time Time series data are collected over several time periods. periods. Example: data detailing the number of building Example: data detailing the number of building permits issued in Lucas County, Ohio in each of permits issued in Lucas County, Ohio in each of the last 36 months the last 36 months Example: data detailing the number of building Example: data detailing the number of building permits issued in Lucas County, Ohio in each of permits issued in Lucas County, Ohio in each of the last 36 months the last 36 months

25 24 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Time Series Data U.S. Average Price Per Gallon For Conventional Regular Gasoline Source: Energy Information Administration, U.S. Department of Energy, May 2009.

26 25 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Sources n Existing Sources Internal company records – almost any department Business database services – Dow Jones & Co. Government agencies - U.S. Department of Labor Industry associations – Travel Industry Association of America of America Special-interest organizations – Graduate Management Admission Council Admission Council Internet – more and more firms

27 26 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Record Some of the Data Available Record Some of the Data Available Data Sources n Data Available From Internal Company Records Employee records Production records Inventory records Sales records Credit records Customer profile name, address, social security number part number, quantity produced, direct labor cost, material cost part number, quantity in stock, reorder level, economic order quantity product number, sales volume, sales volume by region customer name, credit limit, accounts receivable balance age, gender, income, household size

28 27 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Government Agency Some of the Data Available Government Agency Some of the Data Available Data Sources n Data Available From Selected Government Agencies Census Bureau www.census.gov Federal Reserve Board www.federalreserve.gov Office of Mgmt. & Budget www.whitehouse.gov/omb Department of Commerce www.doc.gov Bureau of Labor Statistics www.bls.gov Population data, number of households, household income Data on money supply, exchange rates, discount rates Data on revenue, expenditures, debt of federal government Data on business activity, value of shipments, profit by industry Customer spending, unemployment rate, hourly earnings, safety record

29 28 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Statistical Studies - Experimental Data Sources In experimental studies the variable of interest is first identified. Then one or more other variables are identified and controlled so that data can be obtained about how they influence the variable of interest. In experimental studies the variable of interest is first identified. Then one or more other variables are identified and controlled so that data can be obtained about how they influence the variable of interest. The largest experimental study ever conducted is The largest experimental study ever conducted is believed to be the 1954 Public Health Service believed to be the 1954 Public Health Service experiment for the Salk polio vaccine. Nearly two experiment for the Salk polio vaccine. Nearly two million U.S. children (grades 1- 3) were selected. million U.S. children (grades 1- 3) were selected. The largest experimental study ever conducted is The largest experimental study ever conducted is believed to be the 1954 Public Health Service believed to be the 1954 Public Health Service experiment for the Salk polio vaccine. Nearly two experiment for the Salk polio vaccine. Nearly two million U.S. children (grades 1- 3) were selected. million U.S. children (grades 1- 3) were selected.

30 29 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. n Statistical Studies - Observational Data Sources In observational (nonexperimental) studies no In observational (nonexperimental) studies no attempt is made to control or influence the attempt is made to control or influence the variables of interest. variables of interest. In observational (nonexperimental) studies no In observational (nonexperimental) studies no attempt is made to control or influence the attempt is made to control or influence the variables of interest. variables of interest. a survey is a good example Studies of smokers and nonsmokers are observational studies because researchers do not determine or control who will smoke and who will not smoke. Studies of smokers and nonsmokers are observational studies because researchers do not determine or control who will smoke and who will not smoke.

31 30 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Acquisition Considerations Time Requirement Cost of Acquisition Data Errors Data Errors Searching for information can be time consuming. Searching for information can be time consuming. Information may no longer be useful by the time it Information may no longer be useful by the time it is available. is available. Organizations often charge for information even Organizations often charge for information even when it is not their primary business activity. when it is not their primary business activity. Using any data that happen to be available or were Using any data that happen to be available or were acquired with little care can lead to misleading acquired with little care can lead to misleading information. information.

32 31 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Descriptive Statistics Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy to understand. Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy to understand. Such summaries of data, which may be tabular, graphical, or numerical, are referred to as descriptive statistics. Such summaries of data, which may be tabular, graphical, or numerical, are referred to as descriptive statistics.

33 32 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in her shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide.

34 33 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Example: Hudson Auto Repair Example: Hudson Auto Repair n Sample of Parts Cost ($) for 50 Tune-ups

35 34 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Tabular Summary: Frequency and Percent Frequency Tabular Summary: Frequency and Percent Frequency 50-59 50-59 60-69 60-69 70-79 70-79 80-89 80-89 90-99 90-99 100-109 100-109 2 13 16 7 7 5 50 4 26 32 14 14 10 100 (2/50)100(2/50)100 Parts Cost ($) Cost ($) Frequency Frequency PercentFrequency n Example: Hudson Auto

36 35 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Graphical Summary: Histogram Graphical Summary: Histogram 2 2 4 4 6 6 8 8 10 12 14 16 18 Parts Cost ($) Parts Cost ($) Frequency 50  59 60  69 70  79 80  89 90  99 100-110 Tune-up Parts Cost n Example: Hudson Auto

37 36 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Numerical Descriptive Statistics Numerical Descriptive Statistics Hudson’s average cost of parts, based on the 50 Hudson’s average cost of parts, based on the 50 tune-ups studied, is $79 (found by summing the tune-ups studied, is $79 (found by summing the 50 cost values and then dividing by 50). 50 cost values and then dividing by 50). The most common numerical descriptive statistic The most common numerical descriptive statistic is the average (or mean). is the average (or mean). The average demonstrates a measure of the central The average demonstrates a measure of the central tendency, or central location, of the data for a variable. tendency, or central location, of the data for a variable.

38 37 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Statistical Inference PopulationPopulation SampleSample Statistical inference CensusCensus Sample survey  the set of all elements of interest in a particular study particular study  a subset of the population  the process of using data obtained from a sample to make estimates from a sample to make estimates and test hypotheses about the and test hypotheses about the characteristics of a population characteristics of a population  collecting data for the entire population  collecting data for a sample

39 38 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Process of Statistical Inference Process of Statistical Inference 1. Population 1. Population consists of all tune- ups. Average cost of parts is unknown parts is unknown. 1. Population 1. Population consists of all tune- ups. Average cost of parts is unknown parts is unknown. 2. A sample of 50 2. A sample of 50 engine tune-ups is examined. 2. A sample of 50 2. A sample of 50 engine tune-ups is examined. 3.The sample data provide a sample average parts cost of $79 per tune-up. 3.The sample data provide a sample average parts cost of $79 per tune-up. 4. The sample average 4. The sample average is used to estimate the population average. 4. The sample average 4. The sample average is used to estimate the population average.

40 39 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Computers and Statistical Analysis Statisticians often use computer software to perform Statisticians often use computer software to perform the statistical computations required with large the statistical computations required with large amounts of data. amounts of data. To facilitate computer usage, many of the data sets To facilitate computer usage, many of the data sets in this book are available on the website that in this book are available on the website that accompanies the text. accompanies the text. The data files may be downloaded in either Minitab The data files may be downloaded in either Minitab or Excel formats. or Excel formats. Also, the Excel add-in StatTools can be downloaded Also, the Excel add-in StatTools can be downloaded from the website. from the website. Chapter ending appendices cover the step-by-step Chapter ending appendices cover the step-by-step procedures for using Minitab, Excel, and StatTools. procedures for using Minitab, Excel, and StatTools.

41 40 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Warehousing Organizations obtain large amounts of data on a Organizations obtain large amounts of data on a daily basis by means of magnetic card readers, bar daily basis by means of magnetic card readers, bar code scanners, point of sale terminals, and touch code scanners, point of sale terminals, and touch screen monitors. screen monitors. Wal-Mart captures data on 20-30 million transactions Wal-Mart captures data on 20-30 million transactions per day. per day. Visa processes 6,800 payment transactions per second. Visa processes 6,800 payment transactions per second. Capturing, storing, and maintaining the data, referred Capturing, storing, and maintaining the data, referred to as data warehousing, is a significant undertaking. to as data warehousing, is a significant undertaking.

42 41 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Mining Analysis of the data in the warehouse might aid in Analysis of the data in the warehouse might aid in decisions that will lead to new strategies and higher decisions that will lead to new strategies and higher profits for the organization. profits for the organization. Using a combination of procedures from statistics, Using a combination of procedures from statistics, mathematics, and computer science, analysts “mine mathematics, and computer science, analysts “mine the data” to convert it into useful information. the data” to convert it into useful information. The most effective data mining systems use automated The most effective data mining systems use automated procedures to discover relationships in the data and procedures to discover relationships in the data and predict future outcomes, … prompted by only general, predict future outcomes, … prompted by only general, even vague, queries by the user. even vague, queries by the user.

43 42 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Mining Applications The major applications of data mining have been The major applications of data mining have been made by companies with a strong consumer focus made by companies with a strong consumer focus such as retail, financial, and communication firms. such as retail, financial, and communication firms. Data mining is used to identify related products that Data mining is used to identify related products that customers who have already purchased a specific customers who have already purchased a specific product are also likely to purchase (and then pop-ups product are also likely to purchase (and then pop-ups are used to draw attention to those related products). are used to draw attention to those related products). As another example, data mining is used to identify As another example, data mining is used to identify customers who should receive special discount offers customers who should receive special discount offers based on their past purchasing volumes. based on their past purchasing volumes.

44 43 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Mining Requirements Statistical methodology such as multiple regression, Statistical methodology such as multiple regression, logistic regression, and correlation are heavily used. logistic regression, and correlation are heavily used. Also needed are computer science technologies Also needed are computer science technologies involving artificial intelligence and machine learning. involving artificial intelligence and machine learning. A significant investment in time and money is A significant investment in time and money is required as well. required as well.

45 44 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Data Mining Model Reliability Finding a statistical model that works well for a Finding a statistical model that works well for a particular sample of data does not necessarily mean particular sample of data does not necessarily mean that it can be reliably applied to other data. that it can be reliably applied to other data. With the enormous amount of data available, the With the enormous amount of data available, the data set can be partitioned into a training set (for data set can be partitioned into a training set (for model development) and a test set (for validating model development) and a test set (for validating the model). the model). There is, however, a danger of over fitting the model There is, however, a danger of over fitting the model to the point that misleading associations and to the point that misleading associations and conclusions appear to exist. conclusions appear to exist. Careful interpretation of results and extensive testing Careful interpretation of results and extensive testing is important. is important.

46 45 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Ethical Guidelines for Statistical Practice In a statistical study, unethical behavior can take a In a statistical study, unethical behavior can take a variety of forms including: variety of forms including: Improper sampling Improper sampling Inappropriate analysis of the data Inappropriate analysis of the data Development of misleading graphs Development of misleading graphs Use of inappropriate summary statistics Use of inappropriate summary statistics Biased interpretation of the statistical results Biased interpretation of the statistical results You should strive to be fair, thorough, objective, and You should strive to be fair, thorough, objective, and neutral as you collect, analyze, and present data. neutral as you collect, analyze, and present data. As a consumer of statistics, you should also be aware As a consumer of statistics, you should also be aware of the possibility of unethical behavior by others. of the possibility of unethical behavior by others.

47 46 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. Ethical Guidelines for Statistical Practice The American Statistical Association developed the The American Statistical Association developed the report “Ethical Guidelines for Statistical Practice”. report “Ethical Guidelines for Statistical Practice”. ProfessionalismProfessionalism Responsibilities to Funders, Clients, EmployersResponsibilities to Funders, Clients, Employers Responsibilities in Publications and TestimonyResponsibilities in Publications and Testimony Responsibilities to Research SubjectsResponsibilities to Research Subjects Responsibilities to Research Team ColleaguesResponsibilities to Research Team Colleagues The report contains 67 guidelines organized into The report contains 67 guidelines organized into eight topic areas: eight topic areas: Responsibilities to Other Statisticians/PractitionersResponsibilities to Other Statisticians/Practitioners Responsibilities Regarding Allegations of MisconductResponsibilities Regarding Allegations of Misconduct Responsibilities of Employers Including Organizations,Responsibilities of Employers Including Organizations, Individuals, Attorneys, or Other Clients Individuals, Attorneys, or Other Clients

48 47 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. or duplicated, or posted to a publicly accessible website, in whole or in part. End of Chapter 1


Download ppt "1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole."

Similar presentations


Ads by Google