Statistical tools: The genutil Package

Slides:



Advertisements
Similar presentations
Chapter 4: Basic Estimation Techniques
Advertisements

Computer, Minitab, and other printouts. r 2 = 67.8% 67.8% of the variation in percent body fat can be explained by the variation in waist size.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
One Dimensional Arrays
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
Climate Specific Tools: The cdutil Package. cdutil - overview The cdutil Package contains a collection of sub- packages useful to deal with Climate Data.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Mathematics and Graphing in Chemistry Lab 1. Outline Mathematics in Chemistry Units Rounding Digits of Precision (Addition and Subtraction) Significant.
Arrays. What is an Array? An array is a way to structure multiple pieces of data of the same type and have them readily available for multiple operations.
© McGraw-Hill Higher Education. All Rights Reserved. Chapter 2F Statistical Tools in Evaluation.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Lecture 3 Review of Linear Algebra Simple least-squares.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
Statistics: Data Analysis and Presentation Fr Clinic II.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Statistics Psych 231: Research Methods in Psychology.
Statistics: Data Presentation & Analysis Fr Clinic I.
Data Analysis II.
Business Statistics BU305 Chapter 3 Descriptive Stats: Numerical Methods.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Graphing Linear Relations and Functions Objectives: Understand, draw, and determine if a relation is a function. Graph & write linear equations, determine.
COMPUTER SCIENCE FEBRUARY 2011 Lists in Python. Introduction to Lists Lists (aka arrays): an ordered set of elements  A compound data type, like strings.
Chapter 3 – Descriptive Statistics
Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chapter 17 Partial Correlation and Multiple Regression and Correlation.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Python Functions.
Python Mini-Course University of Oklahoma Department of Psychology Lesson 21 NumPy 6/11/09 Python Mini-Course: Lesson 21 1.
Hossain Shahriar Announcement and reminder! Tentative date for final exam need to be fixed! We have agreed so far that it can.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Data Analysis, Presentation, and Statistics
Statistics Presentation Ch En 475 Unit Operations.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
1 09/2003 Processing Library Update CF Checker – Script made available as a web based form on the BADC site -
Chapter 3: Transformations of Graphs and Data Lesson 6: Scale Changes of Data Mrs. Parziale.
Data Structures I (CPCS-204) Week # 2: Algorithm Analysis tools Dr. Omar Batarfi Dr. Yahya Dahab Dr. Imtiaz Khan.
PH2150 Scientific Computing Skills
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
CHAPTER 3 Describing Relationships
Review 1. Describing variables.
Regression Analysis Module 3.
BUSINESS MATHEMATICS & STATISTICS.
Computer Programming Fundamentals
Basic Estimation Techniques
CMSC201 Computer Science I for Majors Lecture 21 – Dictionaries
Introduction, class rules, error analysis Julia Velkovska
Regression.
Functions.
Array List Pepper.
Basic Estimation Techniques
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
PH2150 Scientific Computing Skills
Imperative Data Structures
Summarizing Data by Statistics
Simple Linear Regression
Simple Linear Regression
ADVANCED DATA ANALYSIS IN SPSS AND AMOS
mdul-cntns-sub-cmmnts2.f90
Multivariate Analysis Regression
Descriptive Statistics
2.2 Linear relations and functions.
Procedures with optional parameters which do not require matching arguments Example: consider the exponent function which may take one argument, m, in.
Presentation transcript:

Statistical tools: The genutil Package

General Utilities : genutil genutil contains statistical (and other) tools such as: statistics grower picker udunits

genutil.statistics The statistics module provides the user with some basic statistics function: (auto)correlation (auto)covariance geometricmean laggedcorrelation laggedcovariance linearregression percentiles meanabsdiff median rank rms std variance See CDAT documentation and doc strings for more info: >>> help(genutil.statistics.geometricmean)

genutil.statistics.correlation (1) genutil.statistics.correlation() returns the correlation between 2 slabs. By default on the first dimension, centered and biased by default. Slabs must be of the same shape and size. Usage: result = correlation(slab1, slab2, weights=weightoptions, axis=axisoptions, centered=centeredoptions, biased=biasedoptions) Options: weightoptions default = None. If you want to compute the weighted correlation, provide the weights here. NOTE: the weights array must be the same shape and size as the slabs.

genutil.statistics.correlation (2) Options (continued): axisoptions ‘x’ | ‘y’ | ‘z’ | ‘t’ | ‘(dimension_name)’ | 0 | 1 ... | n default value = 0. You can pass the name of the dimension or index (integer value 0...n) over which you want to compute the statistic. centeredoptions None | 0 | 1 default value = 1 computes and removes the mean first. Set to 0 or None for uncentered. biasedoptions None | 0 | 1 default value = 1 returns biased statistic. If want to compute an unbiased statistic pass anything but 1.

genutil.statistics.correlation example >>> import cdms, genutil >>> f=cdms.open(‘file1.nc’) >>> var1=f(‘u_wind’) ; var2=f(‘v_wind’) >>> print var2.shape (12, 21, 144, 288) >>> # Get the overall correlation >>> print genutil.statistics.correlation(var1, \ var2, axis=“tzyx”) correlation array(-0.237745400348) >>> # Now get the gridded array of correlations over ... # time and level >>> tl_corr=genutil.statistics.correlation(var1, \ var2, axis=“tz”) >>> print tl_corr.shape (144, 288)

genutil.statistics.std (1) genutil.statistics.std() returns the standard deviation from a slab. By default on first dimension, centered, and biased. Usage: result = std(slab, weights=weightoptions, axis = axisoptions, centered=centeredoptions, biased = biasedoptions) Options: weightoptions default = None. If you want to compute the weighted correlation, provide the weights here. NOTE: the weights array must be the same shape and size as the slab.

genutil.statistics.std (2) Options (continued): axisoptions ‘x’ | ‘y’ | ‘z’ | ‘t’ | ‘(dimension_name)’ | 0 | 1 ... | n default value = 0. You can pass the name of the dimension or index (integer value 0...n) over which you want to compute the statistic. centeredoptions None | 0 | 1 default value = 1 computes and removes the mean first. Set to 0 or None for uncentered. biasedoptions None | 0 | 1 default value = 1 returns biased statistic. If want to compute an unbiased statistic pass anything but 1.

genutil.statistics.std example >>> import cdms, genutil >>> f=cdms.open(‘file1.nc’) >>> var1=f(‘u_wind’) >>> var1.shape # just a linear variable (9,) # We want to set some weights to add importance to # the higher range of values >>> wghts=[.2, .3, .5, .6, 1.0, 1.0, 1.2, 1.4, 1.0] >>> std=genutil.statistics.std(var1, weights=wghts) >>> print std 0.73401665568359065

Explaining “biased” options A number of the statistical functions allow the user to specify: This comes from 2 different definition of the standard deviation: exact definition, with a division by "n-1" (biased=0) an approximate one with a div by "n” (biased=1). As "n" (number of elements) gets big (quickly) there's no difference and the second approach is faster. But for small number of elements you get a different answer.

genutil.statistics.linearregression (1) genutil.statistics.linearregression() computes the linear regression between to one-dimensional arrays, where the independent variable (x) can be an axis or values of another data array. Usage: result = linearregression(y, axis=axisoptions, x=xvalues, error=erroroptions, probability=probabilityoptions, nointercept=nointerceptoptions, noslope=noslopeoptions) Options: axisoptions - ‘x’ | ‘y’ | ‘z’ | ‘t’ | ‘(dimension_name)’ | 0 | 1 ... | n default value = 0. You can pass the name of the dimension or index (integer value 0...n) over which you want to treat the array as the dependent variable. xvalues - default = None. You can pass an array of values that are to be used as the independent axis x.

genutil.statistics.linearregression - example >>> import genutil.statistics as gs >>> f1=cdms.open('~/my_cdat_files/data/lsp_200001.nc') >>> lsp1=f1('lsp', squeeze=1) >>> f2=co('~/my_cdat_files/data/lsp_200007.nc') >>> lsp2=f2('lsp', squeeze=2) >>> lsp1.shape >>> gs.linearregression(lsp1, axis="y") [slope array([ 2.30989548e-08, 2.56871230e-08, ...]) , intercept array([ 7.28723094e-06, 7.50319075e-06, ...]) >>> gs.linearregression(lsp1[0], x=lsp2[0]) array(-0.1875) array(1.07485055923e-05) ]

The “grower” function grower is an unusual function that grows 2 variables to their combined largest shape by replicating data values in any dimension required: grower(x, y, singleton*=0) >>> a.shape (288,) >>> b.shape (1, 20, 144, 288) >>> c,d=grower(a,b) >>> c.shape (288, 1, 20, 144) >>> d.shape *singletonoption is 0 or 1 - Default = 0 If singletonoption is set to 1 then an error is raised if one of the dims is not a singleton dimension.

The “picker” function (1) “picker” allows to select non contiguous values of an axis, for example: >>> mypick=genutil.picker(level=(100,850,200)) >>> picked=var(mypick) >>> print picked.getLevel()[:] [ 100., 850., 200.,] An additional “match” keyword can be provided to the picker. If “match” is set to 1 then all requested values must be present, if set to 0 then non-existent values will be returned with “missing_value” everywhere, if set to -1, then non-existent requested values will be skipped.

The “picker” function (2) This picker example shows how you can do strange things to your variable very quickly and easily. Suppose you wanted to select a discrete number of latitudes (10°N, 43°N, 86°N and 90°N): >>> import cdms, genutil >>> var=cdms.open(‘myfile.nc’)(‘myvariable’) >>> mypick=genutil.picker(latitude=(10, 43, 86, 90)) >>> newvar=var(mypick) >>> print newvar.getLatitude()[:] [ 10., 43., 86., 90.,]

genutil - “udunits” The “udunits” module is a python port of the C/Fortran “udunits” conversion package: >>> from genutil import udunits >>> # print udunits.__doc__ # OR help(udunits) >>> myunit=udunits(5.6, "m s**-1") >>> myunit.to("knot") udunits(10.8855291577,"knot") To search units for keyword “meter”: >>> for unit in myunit.known_units(): ... if unit.find("meter")>-1: ... print unit

genutil – the rest minmax filters statusbar color Other sub-components of genutil: minmax filters statusbar color