Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.

Similar presentations


Presentation on theme: "Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany."— Presentation transcript:

1 Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany

2 Trellis Graphics I Syntax: Dependent variable ~ explanatory variable |conditioning variable Data set Output: >trellis.device(motif) >dev.off() or >graphics.off()

3 Trellis Graphics II Example: histogram(~height | voice.part, data=singer) –No dependent variable for histogram –Height is explanatory variable –Data set is singer

4 Trellis Graphics III Layout: layout and skip and aspect parameters (p.147). Ordering graphs: left to right, bottom to top. If as.table=T, left to right top to bottom p.149).

5 Descriptive Data Exploration summary : mean, median, quartiles p.171 stem : stem and leaf display p.171 quantile p.172 stdev p.173 tapply : splits data p.174 by p.175 mean works on vector, and other structures need to be converted to vectors before computing means. (example on p.176-7)

6 Data Preprocessing for Datamining I Why –Incomplete Attribute values not available, equipment malfunctions, not considered important –Noisy (errors) instrument problems, human/computer errors, transmission errors –Inconsistent inconsistencies due to data definitions

7 Data Preprocessing for Datamining II Data Cleaning –Missing values: ignore tuple, fill-in values manually, use a global constant (unknown), missing value=attribute mean, missing value = attribute group mean, missing value= most probable value –Noisy data: Binning: partitioning into equi-sized bins, smoothing by bin means or bin boundaries Clustering Inspection: computer & human Regression –Inconsistencies

8 Data Preprocessing for Datamining III Data Integration: Combining data from different sources into a coherent whole –Schema integration: combining data models (entity identification problems) –Redundancy (derived values, calculated fields, use of different key attributes): use of correlations to detect redundancies –Resolution of data value conflicts (coding values in different measures)

9 Data Preprocessing for Datamining III Transformation –Smoothing –Aggregation –Generalisation –Normalisation –Attribute (or feature) construction

10 Data Preprocessing for Datamining IV Data Reduction & compression –Data cube aggregation (p.117) –Dimension reduction: minimise loss of information. Attribute selection Decision tree induction Principal components analysis

11 Data Preprocessing for Datamining IV –Numerosity reduction Regression/log-linear regression histograms Clustering


Download ppt "Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany."

Similar presentations


Ads by Google