1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.

Slides:



Advertisements
Similar presentations
Section 1-2 Data Classification
Advertisements

Sections 1.3 Types of Data.
Data Engineering.
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
1.2: The Nature of Data Objective: To understand the different types of data CHS Statistics.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
BUS 297D: Data Mining Professor David Mease Lecture 1 Agenda:
Elementary Statistics Picturing the World
Section 1-2 Variables and types of Data. Objective 3: Identify types of Data In this section we will detail the types of data and nature of variables.
Unit 1 Section 1.2.
Data Mining Lecture 2: data.
STA 2023 Chapter 1 Notes. Terminology  Data: consists of information coming from observations, counts, measurements, or responses.  Statistics: the.
INTRODUCTION TO STATISTICS Yrd. Doç. Dr. Elif TUNA.
R Software We Will Use: Can be downloaded from
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 2 = Start chapter 2 Agenda:
Variables and Types of Data.   Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or.
Sections 1-3 Types of Data. PARAMETERS AND STATISTICS Parameter: a numerical measurement describing some characteristic of a population. Statistic: a.
Introduction to Statistics What is Statistics? : Statistics is the sciences of conducting studies to collect, organize, summarize, analyze, and draw conclusions.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Elementary Statistics M A R I O F. T R I O L A Copyright © 1998, Triola, Elementary.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
Probability & Statistics – Bell Ringer  Make a list of all the possible places where you encounter probability or statistics in your everyday life. 1.
1 Concepts of Variables Greg C Elvers, Ph.D.. 2 Levels of Measurement When we observe and record a variable, it has characteristics that influence the.
Statistics 300: Introduction to Probability and Statistics Section 1-2.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Basics: Data Remark: Discusses “basics concerning data sets (first half of Chapter.
1  Specific number numerical measurement determined by a set of data Example: Twenty-three percent of people polled believed that there are too many polls.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Chap 1-1 Chapter 1 Introduction and Data Collection Business Statistics.
Chapter 1 Introduction to Statistics 1-1 Overview 1-2 Types of Data 1-3 Critical Thinking 1-4 Design of Experiments.
MATH Elementary Statistics. Salary – Company A.
Vocabulary of Statistics Part Two. Variable classifications Qualitative variables: can be placed into distinct categories, according to some characteristic.
What is Data? Attributes
Unit 1 Section : Variables and Types of Data  Variables can be classified in two ways:  Qualitative Variable – variables that can be placed.
CIS527: Data Warehousing, Filtering, and Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
1 BUS 297D: Data Mining Professor David Mease Lecture 2 Agenda: 1) Assign Homework #1 (Due Thursday 9/10) 2) Finish lecture over Chapter 2 3) Start lecture.
1 Data Mining Lecture 2: Data. 2 What is Data? l Collection of data objects and their attributes l Attribute is a property or characteristic of an object.
Overview and Types of Data
INTRODUCTION TO STATISTICS CHAPTER 1: IMPORTANT TERMS & CONCEPTS.
Data Classification Lesson 1.2.
Warm-Up A sample is a part of the population. True or False 2.Is the following a Population or a Sample? A survey of 24 of a company’s 200 employees.
1 PAUF 610 TA 1 st Discussion. 2 3 Population & Sample Population includes all members of a specified group. (total collection of objects/people studied)
1.2 Data Classification Qualitative Data consist of attributes, labels, or non-numerical entries. – Examples are bigger, color, names, etc. Quantitative.
Chapter 1: Section 2-4 Variables and types of Data.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 1-1 Chapter 1 Introduction and Data Collection Basic Business Statistics 10 th Edition.
1 Basic Geographic Concepts Real World  Digital Environment Data in a GIS represent a simplified view of physical entities or phenomena 1. Spatial location.
Biostatistics Introduction Article for Review.
Do Now  47 TCNJ students were asked to complete a survey on campus clubs and activities. 87% of the students surveyed participate in campus clubs and.
By: Michael Mack, Ana Meneses and Zhane’ Fleming.
Chapter 1 Introduction to Statistics 1-1 Overview 1-2 Types of Data 1-3 Critical Thinking 1-4 Design of Experiments.
Data Preliminaries CSC 600: Data Mining Class 1.
Variables and Types of Data
Unit 1 Section 1.2.
Pharmaceutical Statistics
Elementary Statistics
What is Data? Attributes Objects An attribute is a property or
Elementary Statistics
Statistics 202: Statistical Aspects of Data Mining
Lecture Notes for Chapter 2 Introduction to Data Mining
Introduction to Statistics
Lecture Notes for Chapter 2 Introduction to Data Mining
statistics Specific number
The Nature of Probability and Statistics
Vocabulary of Statistics
statistics Specific number
The Nature of Probability and Statistics
The Nature of Probability and Statistics
Group 9 – Data Mining: Data
Data Preliminaries CSC 576: Data Mining.
Statistics Definitions
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known as variable, field, characteristic, or feature l A collection of attributes describe an object l Object is also known as record, point, case, sample, entity, instance, or observation Attributes Objects

2 Experimental vs. Observational Data l Experimental data describes data which was collected by someone who exercised strict control over all attributes. l Observational data describes data which was collected with no such controls. Most all data used in data mining is observational data so be careful. l Examples: - Diet Coke vs. Weight -Carbon Dioxide in Atmosphere vs. Earth’s Temperature

3 Types of Attributes: Qualitative vs. Quantitative l Qualitative (or Categorical) attributes represent distinct categories rather than numbers. Mathematical operations such as addition and subtraction do not make sense. Examples: eye color, letter grade, IP address, zip code l Quantitative (or Numeric) attributes are numbers and can be treated as such. Examples: weight, failures per hour, number of TVs, temperature

4 Types of Attributes (P. 25): l All Qualitative (or Categorical) attributes are either Nominal or Ordinal. Nominal = categories with no order Ordinal = categories with a meaningful order l All Quantitative (or Numeric) attributes are either Interval or Ratio. Interval = no “true” zero, division makes no sense Ratio = true zero exists, division makes sense division -> (increase %)

5 Types of Attributes: l Some examples: – Nominal  Examples: ID numbers, eye color, zip codes – Ordinal  Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in {tall, medium, short} – Interval  Examples: calendar dates, temperatures in Celsius or Fahrenheit, GRE score – Ratio  Examples: temperature in Kelvin, length, time, counts

6 Properties of Attribute Values l The type of an attribute depends on which of the following properties it possesses: – Distinctness: =  – Order: – Addition: + - – Multiplication: * / – Nominal attribute: distinctness – Ordinal attribute: distinctness & order – Interval attribute: distinctness, order & addition – Ratio attribute: all 4 properties

7 Discrete vs. Continuous Discrete Attribute – Has only a finite or countably infinite set of values – Examples: zip codes, counts, or the set of words in a collection of documents – Note: binary attributes are a special case of discrete attributes which have only 2 values l Continuous Attribute – Has real numbers as attribute values – Can compute as accurately as instruments allow – Examples: temperature, height, or weight – Practically, real values can only be measured and represented using a finite number of digits – Continuous attributes are typically represented as floating-point variables

8 Discrete vs. Continuous (P. 28) l Qualitative (categorical) attributes are always discrete l Quantitative (numeric) attributes can be either discrete or continuous

9 Sampling l Sampling involves using only a random subset of the data for analysis l Statisticians are interested in sampling because they often can not get all the data from a population of interest l Data miners are interested in sampling because sometimes using all the data they have is too slow and unnecessary

10 Sampling l The key principle for effective sampling is the following: – using a sample will work almost as well as using the entire data sets, if the sample is representative – a sample is representative if it has approximately the same property (of interest) as the original set of data

11 Sampling l Sampling can be tricky or ineffective when the data has a more complex structure than simply independent observations. l For example, here is a “sample” of words from a song. Most of the information is lost. oops I did it again I played with your heart got lost in the game oh baby baby oops!...you think I’m in love that I’m sent from above I’m not that innocent

12 Sampling l Sampling can be tricky or ineffective when the data has a more complex structure than simply independent observations. l For example, here is a “sample” of words from a song. Most of the information is lost. oops I did it again I played with your heart got lost in the game oh baby baby oops!...you think I’m in love that I’m sent from above I’m not that innocent