# Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne, UK.

## Presentation on theme: "Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne, UK."— Presentation transcript:

Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne, UK

 Many statistical software packages out there: Minitab, R, Excel, SPSS  Excel has about 87 statistical functions. 6 of them involve the t distribution alone: T.DIST T.INV T.DIST.RT T.INV.2T T.DIST.2T T.TEST  R has four related functions for each of 20 distributions resulting in a total of 80 distribution functions alone

Defined Operators!  How can we exploit operators to reduce the explosive number of statistical functions?  Let’s look at an example...

 Typical attendance is about 100 delegates with a standard deviation of 20.  Assume next year’s conference centre can support up to130 delegates.  What are the chances that next year’s attendance will exceed capacity?

=1-NORM.DIST(130,100,20,TRUE) Now let’s use R-Connect in APL: +#.∆r.x 'pnorm( ⍵, ⍵, ⍵, ⍵ )' 130 100 20 0 Wouldn’t it be nice to enter: 100 20 normal probability > 130 100 20 (normal probability >) 130

normal probability < 1.64 100 20 normal probability between 110 130 5 0.5 binomial probability = 2 7 tDist criticalValue < 0.05 5 chiSquare randomVariable 13 mean confidenceInterval X (SEX='F') proportion hypothesis ≥ 0.5 GROUPA mean hypothesis = GROUPB variance theoretical binomial 5 0.2

 Summary Functions ◦ Descriptive Statistics  Probability Distributions ◦ Theoretical Models  Relations

 Examples ◦ Measures of central tendency: mean, median, mode ◦ Measures of Spread variance, standard deviation, range, IQR ◦ Measures of Position min, max, quartiles, percentiles ◦ Measures of shape skewness, kurtosis

 Probability Distributions are functions defined in a natural way when they are called without an operator: ◦ Discrete: probability mass function ◦ Continuous: density function  Left argument is parameter list  Right argument can be any value taken on by the distribution.  Probability Distributions are scalar with respect to the right argument.

Discrete Distributions Parameter List uniforma - lower bound (default 1), b - upper bound. binomialn - Sample size, p - probability of success poissonλ - average number of arrivals per time period negativeBinomialn - number of success, p - probability of success hyperGeometric m - number of successes, n - sample size, N - Population size multinomialV - List of Values (default 1 thru n), P - List of probabilities totaling 1

Continuous DistributionsParameter List normal μ - theoretical mean (default 0); σ - standard deviation (default 1) exponentialλ - mean time to fail rectangular (continuous uniform) a - lower bound (default 0), b - upper bound (default 1) triangular a - lower bound, m - most common value, b - upper bound chiSquaredf - degrees of freedom tDist (Student)df - degrees of freedom fDistdf1 - degrees of freedom for numerator, df2 - degrees of freedom for denominator

 Relational functions are dyadic functions whose range is {0,1}  1=relation is satisfied, 0 otherwise.  Examples: ≠ ∊ between←{¯1=×/× ⍺∘.- ⍵ }

 By limiting the domain of an operator to one of the previously-defined functional classifications, we can create an operator to perform statistical analysis.  For a dyadic operator, each operand can be limited to a particular (but not necessarily the same) functional classification.

OperatorLeft OperandRight Operand probabilityDistributionRelation criticalValueDistributionRelation confidenceIntervalSummaryN/A hypothesisSummaryRelation goodnessOfFitDistributionN/A randomVariableDistributionN/A theoreticalSummaryDistribution runningSummaryN/A

 Most functions and operators can easily be written in APL.  Internals not important to user  R interface can be used if necessary for statistical distributions.  Correct nomenclature and ease of use is critical.

DescExcel R APL Operator DensityT.DIST(DF,X,0) dt(X, df=DF) DF tDist X Cumul Prob T.DIST(DF,X,1) pt(X, df=DF) DF tDist probability ≤ X 2-Tail Prob T.DIST.2T(DF,X) 2*pt(X,df=DF) DF tDist probability (~between)(-X)X Upper Tail T.DIST.RT(DF,X) qt(X,df=df, lowertail=FALSE) DF tDist probability > X Crit. Value T.INV(DF,P) qt(P, df=DF, lower,tail=FALSE) DF tDist criticalValue< P 2-tail c.v. T.INV.2T(DF,P) qt(P/2,df=DF, lower.tail=FALSE) DF tDist criticalValue≠ P Hyp test T.TEST(X1,X2) t.Test(X1,X1, paired=FALSE,mu=0) X1 mean hypothesis = X2

A sample can be represented by raw data, a frequency distribution, or sample statistics. The following items are interchangeable as arguments to the limited domain operators above:  Raw data: Vector  Frequency Distribution: Matrix  Summary Statistics: PropertySpace

Matrix: Frequency Distribution Namespace: Sample Statistics D 2 0 3 4 3 1 0 2 0 4 ⎕ ←FT←frequency D 0 3 1 2 3 2 4 2 mean D 1.9 variance D 2.5444 PS← ⎕ NS '' PS.count←10 PS.mean←1.9 PS.variance←2.544

 )LOAD TamingStatistics ◦ All APL version  )LOAD TamingStatisticsR ◦ Third party – Must install R (Free)

 There are many statistical packages out there; some, like R can be used with APL  Operator syntax is unique to APL  R can be called directly from APL using RCONNECT, but APL operator syntax is easier to understand.

Download ppt "Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne, UK."

Similar presentations