Sei sulla pagina 1di 560

Computa tion

Visua liza tion


Progra mming
For Use w ith MATLAB

Users Guide
Version 3
Statistics
Toolbox
How to Conta ct The Ma thW orks:
www.mathworks.com Web
comp.soft-sys.matlab Newsgr oup
support@mathworks.com Technical suppor t
suggest@mathworks.com Pr oduct enhancement suggest ions
bugs@mathworks.com Bug r epor t s
doc@mathworks.com Document at ion er r or r epor t s
service@mathworks.com Or der st at us, license r enewals, passcodes
info@mathworks.com Sales, pr icing, and gener al infor mat ion
508-647-7000 Phone
508-647-7001 Fax
The Mat hWor ks, Inc. Mail
3 Apple Hill Dr ive
Nat ick, MA 01760-2098
For cont act infor mat ion about wor ldwide offices, see t he Mat hWor ks Web sit e.
S tatistics Toolbox Users Guide
COPYRIGHT 1993 - 2001 by The Mat hWor ks, Inc.
The soft war e descr ibed in t his document is fur nished under a license agr eement . The soft war e may be used
or copied only under t he t er ms of t he license agr eement . No par t of t his manual may be phot ocopied or r epr o-
duced in any for m wit hout pr ior wr it t en consent fr om The Mat hWor ks, Inc.
FEDERAL ACQUISITION: This pr ovision applies t o all acquisit ions of t he Pr ogr am and Document at ion by
or for t he feder al gover nment of t he Unit ed St at es. By accept ing deliver y of t he Pr ogr am, t he gover nment
her eby agr ees t hat t his soft war e qualifies as "commer cial" comput er soft war e wit hin t he meaning of FAR
Par t 12.212, DFARS Par t 227.7202-1, DFARS Par t 227.7202-3, DFARS Par t 252.227-7013, and DFARS Par t
252.227-7014. The t er ms and condit ions of The Mat hWor ks, Inc. Soft war e License Agr eement shall per t ain
t o t he gover nment s use and disclosur e of t he Pr ogr am and Document at ion, and shall super sede any
conflict ing cont r act ual t er ms or condit ions. If t his license fails t o meet t he gover nment s minimum needs or
is inconsist ent in any r espect wit h feder al pr ocur ement law, t he gover nment agr ees t o r et ur n t he Pr ogr am
and Document at ion, unused, t o Mat hWor ks.
MATLAB, Simulink, St at eflow, Handle Gr aphics, and Real-Time Wor kshop ar e r egist er ed t r ademar ks, and
Tar get Language Compiler is a t r ademar k of The Mat hWor ks, Inc.
Ot her pr oduct or br and names ar e t r ademar ks or r egist er ed t r ademar ks of t heir r espect ive holder s.
Pr int ing Hist or y: Sept ember 1993 Fir st pr int ing Ver sion 1
Mar ch 1996 Second pr int ing Ver sion 2
J anuar y 1997 Thir d pr int ing For MATLAB 5
May 1997 Revised for MATLAB 5.1 (online ver sion)
J anuar y 1998 Revised for MATLAB 5.2 (online ver sion)
J anuar y 1999 Revised for Ver sion 2.1.2 (Release 11) (online only)
November 2000 Four t h pr int ing Revised for Ver sion 3 (Release 12)
May 2001 Fift h pr int ing minor r evision
i
Contents
Preface
Overvi ew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi i
What Is the Stati sti cs Toolbox? . . . . . . . . . . . . . . . . . . . . . . . . . xi i i
How to Use Thi s Gui de . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi v
Related Products Li st . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Mathemati cal Notati on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi i
Typographi cal Conventi ons . . . . . . . . . . . . . . . . . . . . . . . . . . xvi i i
1
Tutori al
Introducti on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Pr imar y Topic Ar eas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Probabi li ty Di stri buti ons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Over view of t he Funct ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Over view of t he Dist r ibut ions . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
Descri pti ve Stati sti cs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43
Measur es of Cent r al Tendency (Locat ion) . . . . . . . . . . . . . . . . 1-43
Measur es of Disper sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-45
Funct ions for Dat a wit h Missing Values (NaNs) . . . . . . . . . . . 1-46
Funct ion for Gr ouped Dat a . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-47
Per cent iles and Gr aphical Descr ipt ions . . . . . . . . . . . . . . . . . . 1-49
The Boot st r ap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50
i i Contents
Cluster Analysi s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-53
Ter minology and Basic Pr ocedur e . . . . . . . . . . . . . . . . . . . . . . . 1-53
Finding t he Similar it ies Bet ween Object s . . . . . . . . . . . . . . . . 1-54
Defining t he Links Bet ween Object s . . . . . . . . . . . . . . . . . . . . . 1-56
Evaluat ing Clust er For mat ion . . . . . . . . . . . . . . . . . . . . . . . . . 1-59
Cr eat ing Clust er s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-64
Li near Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-68
One-Way Analysis of Var iance (ANOVA) . . . . . . . . . . . . . . . . . 1-69
Two-Way Analysis of Var iance (ANOVA) . . . . . . . . . . . . . . . . . 1-73
N-Way Analysis of Var iance . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-76
Mult iple Linear Regr ession . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-82
Quadr at ic Response Sur face Models . . . . . . . . . . . . . . . . . . . . . 1-86
St epwise Regr ession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-88
Gener alized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-91
Robust and Nonpar amet r ic Met hods . . . . . . . . . . . . . . . . . . . . 1-95
Nonli near Regressi on Models . . . . . . . . . . . . . . . . . . . . . . . . 1-100
Example: Nonlinear Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 1-100
Hypothesi s Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-105
Hypot hesis Test Ter minology . . . . . . . . . . . . . . . . . . . . . . . . . 1-105
Hypot hesis Test Assumpt ions . . . . . . . . . . . . . . . . . . . . . . . . . 1-106
Example: Hypot hesis Test ing . . . . . . . . . . . . . . . . . . . . . . . . . 1-107
Available Hypot hesis Test s . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-111
Multi vari ate Stati sti cs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-112
Pr incipal Component s Analysis . . . . . . . . . . . . . . . . . . . . . . . 1-112
Mult ivar iat e Analysis of Var iance (MANOVA) . . . . . . . . . . . 1-122
Stati sti cal Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-128
Box Plot s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-128
Dist r ibut ion Plot s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-129
Scat t er Plot s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-135
Stati sti cal Process Control (SPC) . . . . . . . . . . . . . . . . . . . . . 1-138
Cont r ol Char t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-138
Capabilit y St udies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-141
i i i
Desi gn of Experi ments (DOE) . . . . . . . . . . . . . . . . . . . . . . . . 1-143
Full Fact or ial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-144
Fr act ional Fact or ial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 1-145
D-Opt imal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-147
Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-153
The dist t ool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-154
The polyt ool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-156
The aoct ool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-161
The r andt ool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-169
The r smdemo Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-170
The glmdemo Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-172
The r obust demo Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-172
Selected Bi bli ography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-175
2
Reference
Functi on Category Li st . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
anova1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
anova2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
anovan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
aoct ool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
bar t t est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
bet acdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
bet afit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-38
bet ainv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40
bet alike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-41
bet apdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42
bet ar nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-43
bet ast at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44
binocdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
binofit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-46
binoinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
binopdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48
binor nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49
i v Contents
binost at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50
boot st r p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
boxplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-54
capable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-56
capaplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-58
caser ead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-60
casewr it e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-61
cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-62
cdfplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63
chi2cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-65
chi2inv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-66
chi2pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-67
chi2r nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-68
chi2st at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-69
classify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-70
clust er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-71
clust er dat a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-73
combnk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-75
cophenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-76
cor dexch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-78
cor r coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-79
cov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-80
cr osst ab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-81
daugment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-83
dcovar y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-84
dendr ogr am . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-85
dist t ool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-87
dummyvar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-88
er r or bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-89
ewmaplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-90
expcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-92
expfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-93
expinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-94
exppdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-95
expr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-96
expst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-97
fcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-98
ff2n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-99
finv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-100
fpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-101
v
fr acfact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-102
fr iedman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-106
fr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-110
fst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-111
fsur fht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-112
fullfact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-114
gamcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-115
gamfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-116
gaminv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-117
gamlike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-118
gampdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-119
gamr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-120
gamst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-121
geocdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-122
geoinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-123
geomean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-124
geopdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-125
geor nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-126
geost at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-127
gline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-128
glmdemo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-129
glmfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-130
glmval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-135
gname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-137
gplot mat r ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-139
gr pst at s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-142
gscat t er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-143
har mmean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-145
hist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-146
hist fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-147
hougen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-148
hygecdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-149
hygeinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-150
hygepdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-151
hyger nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-152
hygest at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-153
icdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-154
inconsist ent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-155
iqr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-157
jbt est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-158
vi Contents
kr uskalwallis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-160
kst est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-164
kst est 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-169
kur t osis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-172
lever age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-174
lilliet est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-175
linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-178
logncdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-181
logninv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-182
lognpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-184
lognr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-185
lognst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-186
lsline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-187
mad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-188
mahal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-189
manova1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-190
manovaclust er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-194
mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-196
median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-197
mle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-198
moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-199
mult compar e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-200
mvnr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-207
mvt r nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-208
nanmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-209
nanmean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-210
nanmedian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-211
nanmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-212
nanst d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-213
nansum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-214
nbincdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-215
nbininv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-216
nbinpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-217
nbinr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-218
nbinst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-219
ncfcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-220
ncfinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-222
ncfpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-223
ncfr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-224
ncfst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-225
vi i
nct cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-226
nct inv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-227
nct pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-228
nct r nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-229
nct st at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-230
ncx2cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-231
ncx2inv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-233
ncx2pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-234
ncx2r nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-235
ncx2st at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-236
nlinfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-237
nlint ool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-238
nlpar ci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-239
nlpr edci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-240
nor mcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-242
nor mfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-243
nor minv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-244
nor mpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-245
nor mplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-246
nor mr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-248
nor mspec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-249
nor mst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-250
par et o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-251
pcacov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-252
pcar es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-253
pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-254
pdist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-255
per ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-258
poisscdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-259
poissfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-261
poissinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-262
poisspdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-263
poissr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-264
poisst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-265
polyconf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-266
polyfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-267
polyt ool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-268
polyval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-269
pr ct ile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-270
pr incomp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-271
vi i i Contents
qqplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-272
r andom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-274
r andt ool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-275
r ange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-276
r anksum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-277
r aylcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-278
r aylinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-279
r aylpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-280
r aylr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-281
r aylst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-282
r coplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-283
r efcur ve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-284
r efline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-285
r egr ess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-286
r egst at s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-288
r idge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-290
r obust demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-292
r obust fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-293
r owexch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-297
r smdemo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-298
r st ool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-299
schar t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-300
signr ank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-302
signt est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-304
skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-306
squar efor m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-308
st d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-309
st epwise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-310
sur fht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-311
t abulat e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-312
t blr ead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-313
t blwr it e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-315
t cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-316
t dfr ead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-317
t inv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-319
t pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-320
t r immean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-321
t r nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-322
t st at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-323
t t est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-324
i x
t t est 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-326
unidcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-328
unidinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-329
unidpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-330
unidr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-331
unidst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-332
unifcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-333
unifinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-334
unifit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-335
unifpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-336
unifr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-337
unifst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-338
var . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-339
weibcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-341
weibfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-342
weibinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-343
weiblike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-344
weibpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-345
weibplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-346
weibr nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-347
weibst at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-348
x2fx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-349
xbar plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-350
zscor e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-353
zt est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-354
x Contents

Pr eface
Overvi ew . . . . . . . . . . . . . . . . . . . . . xii
What Is the Stati sti cs Toolbox? . . . . . . . . . . . xiii
How to Use Thi s Gui de . . . . . . . . . . . . . . . xiv
Related Products Li st . . . . . . . . . . . . . . . . xv
Mathemati cal Notati on . . . . . . . . . . . . . . . xvii
Typographi cal Conventi ons . . . . . . . . . . . . . xviii
Pr e f a c e
xi i
Overview
This chapt er int r oduces t he St at ist ics Toolbox, and explains how t o use t he
document at ion. It cont ains t he following sect ions:
What Is t he St at ist ics Toolbox?
How t o Use This Guide
Relat ed Pr oduct s List
Mat hemat ical Not at ion
Typogr aphical Convent ions
W h a t Is th e Sta ti sti c s To o l b o x ?
xi i i
What Is the Statistics Toolbox?
The St at ist ics Toolbox is a collect ion of t ools built on t he MATLAB

numer ic
comput ing envir onment . The t oolbox suppor t s a wide r ange of common
st at ist ical t asks, fr om r andom number gener at ion, t o cur ve fit t ing, t o design of
exper iment s and st at ist ical pr ocess cont r ol. The t oolbox pr ovides t wo
cat egor ies of t ools:
Building-block pr obabilit y and st at ist ics funct ions
Gr aphical, int er act ive t ools
The fir st cat egor y of t ools is made up of funct ions t hat you can call fr om t he
command line or fr om your own applicat ions. Many of t hese funct ions ar e
MATLAB M-files, ser ies of MATLAB st at ement s t hat implement specialized
st at ist ics algor it hms. You can view t he MATLAB code for t hese funct ions using
t he st at ement
type function_name
You can change t he way any t oolbox funct ion wor ks by copying and r enaming
t he M-file, t hen modifying your copy. You can also ext end t he t oolbox by adding
your own M-files.
Secondly, t he t oolbox pr ovides a number of int er act ive t ools t hat let you access
many of t he funct ions t hr ough a gr aphical user int er face (GUI). Toget her , t he
GUI-based t ools pr ovide an envir onment for polynomial fit t ing and pr edict ion,
as well as pr obabilit y funct ion explor at ion.
Pr e f a c e
xi v
How to Use This Guide
If you are a new user begin wit h Chapt er 1, Tut or ial. This chapt er
int r oduces t he MATLAB st at ist ics envir onment t hr ough t he t oolbox funct ions.
It descr ibes t he funct ions wit h r egar d t o par t icular ar eas of int er est , such as
pr obabilit y dist r ibut ions, linear and nonlinear models, pr incipal component s
analysis, design of exper iment s, st at ist ical pr ocess cont r ol, and descr ipt ive
st at ist ics.
All toolbox users should use Chapt er 2, Refer ence, for infor mat ion about
specific t ools. For funct ions, r efer ence descr ipt ions include a synopsis of t he
funct ions synt ax, as well as a complet e explanat ion of opt ions and oper at ion.
Many r efer ence descr ipt ions also include examples, a descr ipt ion of t he
funct ions algor it hm, and r efer ences t o addit ional r eading mat er ial.
Use t his guide in conjunct ion wit h t he soft war e t o lear n about t he power ful
feat ur es t hat MATLAB pr ovides. Each chapt er pr ovides numer ous examples
t hat apply t he t oolbox t o r epr esent at ive st at ist ical t asks.
The r andom number gener at ion funct ions for var ious pr obabilit y dist r ibut ions
ar e based on all t he pr imit ive funct ions, randn and rand. Ther e ar e many
examples t hat st ar t by gener at ing dat a using r andom number s. To duplicat e
t he r esult s in t hese examples, fir st execut e t he commands below.
seed = 931316785;
rand('seed',seed);
randn('seed',seed);
You might want t o save t hese commands in an M-file scr ipt called init.m.
Then, inst ead of t hr ee separ at e commands, you need only t ype init.
Re l a te d Pr o d u c ts Li st
xv
Related Products List
The Mat hWor ks pr ovides sever al pr oduct s t hat ar e especially r elevant t o t he
kinds of t asks you can per for m wit h t he St at ist ics Toolbox.
For mor e infor mat ion about any of t hese pr oduct s, see eit her :
The online document at ion for t hat pr oduct if it is inst alled or if you ar e
r eading t he document at ion fr om t he CD
The Mat hWor ks Web sit e, at http://www.mathworks.com; see t he pr oduct s
sect ion
Note The t oolboxes list ed below all include funct ions t hat ext end MATLABs
capabilit ies. The blockset s all include blocks t hat ext end Simulinks
capabilit ies.
Product Description
Dat a Acquisit ion Toolbox MATLAB funct ions for dir ect access t o live,
measur ed dat a fr om MATLAB
Dat abase Toolbox Tool for connect ing t o, and int er act ing wit h,
most ODBC/J DBC dat abases fr om wit hin
MATLAB
Financial Time Ser ies
Toolbox
Tool for analyzing t ime ser ies dat a in t he
financial mar ket s
Financial Toolbox MATLAB funct ions for quant it at ive financial
modeling and analyt ic pr ot ot yping
GARCH Toolbox MATLAB funct ions for univar iat e Gener alized
Aut or egr essive Condit ional Het er oskedast icit y
(GARCH) volat ilit y modeling
Image Pr ocessing
Toolbox
Complet e suit e of digit al image pr ocessing and
analysis t ools for MATLAB
Pr e f a c e
xvi
Mapping Toolbox Tool for analyzing and displaying
geogr aphically based infor mat ion fr om wit hin
MATLAB
Neur al Net wor k Toolbox Compr ehensive envir onment for neur al
net wor k r esear ch, design, and simulat ion
wit hin MATLAB
Opt imizat ion Toolbox Tool for gener al and lar ge-scale opt imizat ion of
nonlinear pr oblems, as well as for linear
pr ogr amming, quadr at ic pr ogr amming,
nonlinear least squar es, and solving nonlinear
equat ions
Signal Pr ocessing
Toolbox
Tool for algor it hm development , signal and
linear syst em analysis, and t ime-ser ies dat a
modeling
Syst em Ident ificat ion
Toolbox
Tool for building accur at e, simplified models of
complex syst ems fr om noisy t ime-ser ies dat a
Product Description
M a th e m a ti c a l N o ta ti o n
xvi i
Mathematical Notation
This manual and t he St at ist ics Toolbox funct ions use t he following
mat hemat ical not at ion convent ions.
Par amet er s in a linear model.
E(x) Expect ed value of x.
f(x| a,b) Pr obabilit y densit y funct ion. x is t he independent var iable;
a and b ar e fixed par amet er s.
F(x| a,b) Cumulat ive dist r ibut ion funct ion.
I([a, b]) or
I
[a, b]
Indicat or funct ion. In t his example t he funct ion t akes t he
value 1 on t he closed int er val fr om a t o b and is 0
elsewher e.
p and q p is t he pr obabilit y of some event .
q is t he pr obabilit y of ~p, so q = 1p.
E x ( ) t f t ( ) t d

=
Pr e f a c e
xvi i i
Typographical Conventions
This manual uses some or all of t hese convent ions.
Item Convention Used Example
Example code Monospace font To assign t he value 5 t o A,
ent er
A = 5
Funct ion names/synt ax Monospace font The cos funct ion finds t he
cosine of each ar r ay element .
Synt ax line example is
MLGetVar ML_var_name
Keys Boldface wit h an init ial capit al
let t er
Pr ess t he Return key.
Lit er al st r ings (in synt ax
descr ipt ions in r efer ence
chapt er s)
Monospace bold for lit er als f = freqspace(n,'whole')
Mat hemat ical
expr essions
Italics for var iables
St andar d t ext font for funct ions,
oper at or s, and const ant s
This vect or r epr esent s t he
polynomial
p = x
2
+ 2x + 3
MATLAB out put Monospace font MATLAB r esponds wit h
A =
5
Menu t it les, menu it ems,
dialog boxes, and cont r ols
Boldface wit h an init ial capit al
let t er
Choose t he Fi le menu.
New t er ms Italics An array is an or der ed
collect ion of infor mat ion.
Omit t ed input ar gument s (...) ellipsis denot es all of t he
input /out put ar gument s fr om
pr eceding synt axes.
[c,ia,ib] = union(...)
St r ing var iables (fr om a
finit e list )
Monospace italics sysc = d2c(sysd,'method')

1
Tut or ial
Introducti on . . . . . . . . . . . . . . . . . . . . 1-2
Probabi li ty Di stri buti ons . . . . . . . . . . . . . . 1-5
Descri pti ve Stati sti cs . . . . . . . . . . . . . . . . 1-43
Cluster Analysi s . . . . . . . . . . . . . . . . . . 1-53
Li near Models . . . . . . . . . . . . . . . . . . . 1-68
Nonli near Regressi on Models . . . . . . . . . . . 1-100
Hypothesi s Tests . . . . . . . . . . . . . . . . . 1-105
Multi vari ate Stati sti cs . . . . . . . . . . . . . . 1-112
Stati sti cal Plots . . . . . . . . . . . . . . . . . 1-128
Stati sti cal Process Control (SPC) . . . . . . . . . 1-138
Desi gn of Experi ments (DOE) . . . . . . . . . . . 1-143
Demos . . . . . . . . . . . . . . . . . . . . . . 1-153
Selected Bi bli ography . . . . . . . . . . . . . . 1-175
1 Tu to r i a l
1-2
Introduction
The St at ist ics Toolbox, for use wit h MATLAB, supplies basic st at ist ics
capabilit y on t he level of a fir st cour se in engineer ing or scient ific st at ist ics.
The st at ist ics funct ions it pr ovides ar e building blocks suit able for use inside
ot her analyt ical t ools.
Primary Topic Areas
The St at ist ics Toolbox has mor e t han 200 M-files, suppor t ing wor k in t he
t opical ar eas below:
Pr obabilit y dist r ibut ions
Descr ipt ive st at ist ics
Clust er analysis
Linear models
Nonlinear models
Hypot hesis t est s
Mult ivar iat e st at ist ics
St at ist ical plot s
St at ist ical pr ocess cont r ol
Design of exper iment s
Proba bility Distributions
The St at ist ics Toolbox suppor t s 20 pr obabilit y dist r ibut ions. For each
dist r ibut ion t her e ar e five associat ed funct ions. They ar e:
Pr obabilit y densit y funct ion (pdf)
Cumulat ive dist r ibut ion funct ion (cdf)
Inver se of t he cumulat ive dist r ibut ion funct ion
Random number gener at or
Mean and var iance as a funct ion of t he par amet er s
For dat a-dr iven dist r ibut ions (bet a, binomial, exponent ial, gamma, nor mal,
Poisson, unifor m, and Weibull), t he St at ist ics Toolbox has funct ions for
comput ing par amet er est imat es and confidence int er vals.
In tr o d u c ti o n
1-3
Descriptive Sta tistics
The St at ist ics Toolbox pr ovides funct ions for descr ibing t he feat ur es of a dat a
sample. These descr ipt ive st at ist ics include measur es of locat ion and spr ead,
per cent ile est imat es and funct ions for dealing wit h dat a having missing
values.
Cluster Ana lysis
The St at ist ics Toolbox pr ovides funct ions t hat allow you t o divide a set of
object s int o subgr oups, each having member s t hat ar e as much alike as
possible. This pr ocess is called cluster analysis.
Linea r M odels
In t he ar ea of linear models, t he St at ist ics Toolbox suppor t s one-way, t wo-way,
and higher -way analysis of var iance (ANOVA), analysis of covar iance
(ANOCOVA), mult iple linear r egr ession, st epwise r egr ession, r esponse sur face
pr edict ion, r idge r egr ession, and one-way mult ivar iat e analysis of var iance
(MANOVA). It suppor t s nonpar amet r ic ver sions of one- and t wo-way ANOVA.
It also suppor t s mult iple compar isons of t he est imat es pr oduced by ANOVA
and ANOCOVA funct ions.
N onlinea r M odels
For nonlinear models, t he St at ist ics Toolbox pr ovides funct ions for par amet er
est imat ion, int er act ive pr edict ion and visualizat ion of mult idimensional
nonlinear fit s, and confidence int er vals for par amet er s and pr edict ed values.
Hypothesis Tests
The St at ist ics Toolbox also pr ovides funct ions t hat do t he most common t est s
of hypot hesis t -t est s, Z-t est s, nonpar amet r ic t est s, and dist r ibut ion t est s.
M ultiva ria te Sta tistics
The St at ist ics Toolbox suppor t s met hods in mult ivar iat e st at ist ics, including
pr incipal component s analysis, linear discr iminant analysis, and one-way
mult ivar iat e analysis of var iance.
1 Tu to r i a l
1-4
Sta tistica l Plots
The St at ist ics Toolbox adds box plot s, nor mal pr obabilit y plot s, Weibull
pr obabilit y plot s, cont r ol char t s, and quant ile-quant ile plot s t o t he ar senal of
gr aphs in MATLAB. Ther e is also ext ended suppor t for polynomial cur ve fit t ing
and pr edict ion. Ther e ar e funct ions t o cr eat e scat t er plot s or mat r ices of scat t er
plot s for gr ouped dat a, and t o ident ify point s int er act ively on such plot s. Ther e
is a funct ion t o int er act ively explor e a fit t ed r egr ession model.
Sta tistica l Process Control (SPC)
For SPC, t he St at ist ics Toolbox pr ovides funct ions for plot t ing common cont r ol
char t s and per for ming pr ocess capabilit y st udies.
Design of Ex periments (DO E)
The St at ist ics Toolbox suppor t s full and fr act ional fact or ial designs and
D-opt imal designs. Ther e ar e funct ions for gener at ing designs, augment ing
designs, and opt imally assigning unit s wit h fixed covar iat es.
Pr o b a b i l i ty D i str i b u ti o n s
1-5
Probability Distributions
Pr obabilit y dist r ibut ions ar ise fr om exper iment s wher e t he out come is subject
t o chance. The nat ur e of t he exper iment dict at es which pr obabilit y
dist r ibut ions may be appr opr iat e for modeling t he r esult ing r andom out comes.
Ther e ar e t wo t ypes of pr obabilit y dist r ibut ions continuous and discrete.
Suppose you ar e st udying a machine t hat pr oduces videot ape. One measur e of
t he qualit y of t he t ape is t he number of visual defect s per hundr ed feet of t ape.
The r esult of t his exper iment is an int eger , since you cannot obser ve 1.5
defect s. To model t his exper iment you should use a discr et e pr obabilit y
dist r ibut ion.
A measur e affect ing t he cost and qualit y of videot ape is it s t hickness. Thick
t ape is mor e expensive t o pr oduce, while var iat ion in t he t hickness of t he t ape
on t he r eel incr eases t he likelihood of br eakage. Suppose you measur e t he
t hickness of t he t ape ever y 1000 feet . The r esult ing number s can t ake a
cont inuum of possible values, which suggest s using a cont inuous pr obabilit y
dist r ibut ion t o model t he r esult s.
Using a pr obabilit y model does not allow you t o pr edict t he r esult of any
individual exper iment but you can det er mine t he pr obabilit y t hat a given
out come will fall inside a specific r ange of values.
Continuous (data) Continuous (statistics) Discrete
Bet a Chi-squar e Binomial
Exponent ial Noncent r al Chi-squar e Discr et e Unifor m
Gamma F Geomet r ic
Lognor mal Noncent r al F Hyper geomet r ic
Nor mal t Negat ive Binomial
Rayleigh Noncent r al t Poisson
Unifor m
Weibull
1 Tu to r i a l
1-6
This following t wo sect ions pr ovide mor e infor mat ion about t he available
dist r ibut ions:
Over view of t he Funct ions
Over view of t he Dist r ibut ions
Overview of the Functions
MATLAB pr ovides five funct ions for each dist r ibut ion, which ar e discussed in
t he following sect ions:
Pr obabilit y Densit y Funct ion (pdf)
Cumulat ive Dist r ibut ion Funct ion (cdf)
Inver se Cumulat ive Dist r ibut ion Funct ion
Random Number Gener at or
Mean and Var iance as a Funct ion of Par amet er s
Proba bility Density Function (pdf)
The pr obabilit y densit y funct ion (pdf) has a differ ent meaning depending on
whet her t he dist r ibut ion is discr et e or cont inuous.
For discr et e dist r ibut ions, t he pdf is t he pr obabilit y of obser ving a par t icular
out come. In our videot ape example, t he pr obabilit y t hat t her e is exact ly one
defect in a given hundr ed feet of t ape is t he value of t he pdf at 1.
Unlike discr et e dist r ibut ions, t he pdf of a cont inuous dist r ibut ion at a value is
not t he pr obabilit y of obser ving t hat value. For cont inuous dist r ibut ions t he
pr obabilit y of obser ving any par t icular value is zer o. To get pr obabilit ies you
must int egr at e t he pdf over an int er val of int er est . For example t he pr obabilit y
of t he t hickness of a videot ape being bet ween one and t wo millimet er s is t he
int egr al of t he appr opr iat e pdf fr om one t o t wo.
A pdf has t wo t heor et ical pr oper t ies:
The pdf is zer o or posit ive for ever y possible out come.
The int egr al of a pdf over it s ent ir e r ange of values is one.
A pdf is not a single funct ion. Rat her a pdf is a family of funct ions char act er ized
by one or mor e par amet er s. Once you choose (or est imat e) t he par amet er s of a
pdf, you have uniquely specified t he funct ion.
Pr o b a b i l i ty D i str i b u ti o n s
1-7
The pdf funct ion call has t he same gener al for mat for ever y dist r ibut ion in t he
St at ist ics Toolbox. The following commands illust r at e how t o call t he pdf for
t he nor mal dist r ibut ion.
x = [-3:0.1:3];
f = normpdf(x,0,1);
The var iable f cont ains t he densit y of t he nor mal pdf wit h par amet er s =0 and
=1 at t he values in x. The fir st input ar gument of ever y pdf is t he set of values
for which you want t o evaluat e t he densit y. Ot her ar gument s cont ain as many
par amet er s as ar e necessar y t o define t he dist r ibut ion uniquely. The nor mal
dist r ibut ion r equir es t wo par amet er s; a locat ion par amet er (t he mean, ) and
a scale par amet er (t he st andar d deviat ion, ).
Cumula tive Distribution Function (cdf)
If f is a pr obabilit y densit y funct ion for r andom var iable X, t he associat ed
cumulat ive dist r ibut ion funct ion (cdf) F is
The cdf of a value x, F(x), is t he pr obabilit y of obser ving any out come less t han
or equal t o x.
A cdf has t wo t heor et ical pr oper t ies:
The cdf r anges fr om 0 t o 1.
If y > x, t hen t he cdf of y is gr eat er t han or equal t o t he cdf of x.
The cdf funct ion call has t he same gener al for mat for ever y dist r ibut ion in t he
St at ist ics Toolbox. The following commands illust r at e how t o call t he cdf for t he
nor mal dist r ibut ion.
x = [-3:0.1:3];
p = normcdf(x,0,1);
The var iable p cont ains t he pr obabilit ies associat ed wit h t he nor mal cdf wit h
par amet er s =0 and =1 at t he values in x. The fir st input ar gument of ever y
cdf is t he set of values for which you want t o evaluat e t he pr obabilit y. Ot her
ar gument s cont ain as many par amet er s as ar e necessar y t o define t he
dist r ibut ion uniquely.
F x ( ) P X x ( ) f t ( ) t d

x

= =
1 Tu to r i a l
1-8
Inverse Cumula tive Distribution Function
The inver se cumulat ive dist r ibut ion funct ion r et ur ns cr it ical values for
hypot hesis t est ing given significance pr obabilit ies. To under st and t he
r elat ionship bet ween a cont inuous cdf and it s inver se funct ion, t r y t he
following:
x = [-3:0.1:3];
xnew = norminv(normcdf(x,0,1),0,1);
How does xnew compar e wit h x? Conver sely, t r y t his:
p = [0.1:0.1:0.9];
pnew = normcdf(norminv(p,0,1),0,1);
How does pnew compar e wit h p?
Calculat ing t he cdf of values in t he domain of a cont inuous dist r ibut ion r et ur ns
pr obabilit ies bet ween zer o and one. Applying t he inver se cdf t o t hese
pr obabilit ies yields t he or iginal values.
For discr et e dist r ibut ions, t he r elat ionship bet ween a cdf and it s inver se
funct ion is mor e complicat ed. It is likely t hat t her e is no x value such t hat t he
cdf of x yields p. In t hese cases t he inver se funct ion r et ur ns t he fir st value x
such t hat t he cdf of x equals or exceeds p. Tr y t his:
x = [0:10];
y = binoinv(binocdf(x,10,0.5),10,0.5);
How does x compar e wit h y?
The commands below illust r at e t he pr oblem wit h r econst r uct ing t he
pr obabilit y p fr om t he value x for discr et e dist r ibut ions.
p = [0.1:0.2:0.9];
pnew = binocdf(binoinv(p,10,0.5),10,0.5)
pnew =
0.1719 0.3770 0.6230 0.8281 0.9453
The inver se funct ion is useful in hypot hesis t est ing and pr oduct ion of
confidence int er vals. Her e is t he way t o get a 99% confidence int er val for a
nor mally dist r ibut ed sample.
Pr o b a b i l i ty D i str i b u ti o n s
1-9
p = [0.005 0.995];
x = norminv(p,0,1)
x =
-2.5758 2.5758
The var iable x cont ains t he values associat ed wit h t he nor mal inver se funct ion
wit h par amet er s =0 and =1 at t he pr obabilit ies in p. The differ ence
p(2)-p(1) is 0.99. Thus, t he values in x define an int er val t hat cont ains 99%
of t he st andar d nor mal pr obabilit y.
The inver se funct ion call has t he same gener al for mat for ever y dist r ibut ion in
t he St at ist ics Toolbox. The fir st input ar gument of ever y inver se funct ion is t he
set of pr obabilit ies for which you want t o evaluat e t he cr it ical values. Ot her
ar gument s cont ain as many par amet er s as ar e necessar y t o define t he
dist r ibut ion uniquely.
Ra ndom N umber Genera tor
The met hods for gener at ing r andom number s fr om any dist r ibut ion all st ar t
wit h unifor m r andom number s. Once you have a unifor m r andom number
gener at or , you can pr oduce r andom number s fr om ot her dist r ibut ions eit her
dir ect ly or by using inver sion or r eject ion met hods, descr ibed below. See
Synt ax for Random Number Funct ions on page 1-10 for det ails on using
gener at or funct ions.
Direct. Dir ect met hods flow fr om t he definit ion of t he dist r ibut ion.
As an example, consider gener at ing binomial r andom number s. You can t hink
of binomial r andom number s as t he number of heads in n t osses of a coin wit h
pr obabilit y p of a heads on any t oss. If you gener at e n unifor m r andom number s
and count t he number t hat ar e gr eat er t han p, t he r esult is binomial wit h
par amet er s n and p.
Inversion. The inver sion met hod wor ks due t o a fundament al t heor em t hat
r elat es t he unifor m dist r ibut ion t o ot her cont inuous dist r ibut ions.
If F is a cont inuous dist r ibut ion wit h inver se F
-1
, and U is a unifor m r andom
number , t hen F
-1
(U) has dist r ibut ion F.
So, you can gener at e a r andom number fr om a dist r ibut ion by applying t he
inver se funct ion for t hat dist r ibut ion t o a unifor m r andom number .
Unfor t unat ely, t his appr oach is usually not t he most efficient .
1 Tu to r i a l
1-10
Rejection. The funct ional for m of some dist r ibut ions makes it difficult or t ime
consuming t o gener at e r andom number s using dir ect or inver sion met hods.
Reject ion met hods can somet imes pr ovide an elegant solut ion in t hese cases.
Suppose you want t o gener at e r andom number s fr om a dist r ibut ion wit h pdf f.
To use r eject ion met hods you must fir st find anot her densit y, g, and a
const ant , c, so t hat t he inequalit y below holds.
You t hen gener at e t he r andom number s you want using t he following st eps:
1 Gener at e a r andom number x fr om dist r ibut ion G wit h densit y g.
2 For m t he r at io .
3 Gener at e a unifor m r andom number u.
4 If t he pr oduct of u and r is less t han one, r et ur n x.
5 Ot her wise r epeat st eps one t o t hr ee.
For efficiency you need a cheap met hod for gener at ing r andom number s
fr om G, and t he scalar c should be small. The expect ed number of it er at ions
is c.
Synta x for Ra ndom N umber Functions. You can gener at e r andom number s fr om
each dist r ibut ion. This funct ion pr ovides a single r andom number or a mat r ix
of r andom number s, depending on t he ar gument s you specify in t he funct ion
call.
For example, her e is t he way t o gener at e r andom number s fr om t he bet a
dist r ibut ion. Four st at ement s obt ain r andom number s: t he fir st r et ur ns a
single number , t he second r et ur ns a 2-by-2 mat r ix of r andom number s, and t he
t hir d and four t h r et ur n 2-by-3 mat r ices of r andom number s.
a = 1;
b = 2;
c = [.1 .5; 1 2];
d = [.25 .75; 5 10];
m = [2 3];
nrow = 2;
ncol = 3;
f x ( ) cg x ( ) x
r
cg x ( )
f x ( )
-------------- =
Pr o b a b i l i ty D i str i b u ti o n s
1-11
r1 = betarnd(a,b)
r1 =
0.4469
r2 = betarnd(c,d)
r2 =
0.8931 0.4832
0.1316 0.2403
r3 = betarnd(a,b,m)
r3 =
0.4196 0.6078 0.1392
0.0410 0.0723 0.0782
r4 = betarnd(a,b,nrow,ncol)
r4 =
0.0520 0.3975 0.1284
0.3891 0.1848 0.5186
M ea n a nd Va ria nce a s a Function of Pa ra meters
The mean and var iance of a pr obabilit y dist r ibut ion ar e gener ally simple
funct ions of t he par amet er s of t he dist r ibut ion. The St at ist ics Toolbox
funct ions ending in "stat" all pr oduce t he mean and var iance of t he desir ed
dist r ibut ion for t he given par amet er s.
The example below shows a cont our plot of t he mean of t he Weibull dist r ibut ion
as a funct ion of t he par amet er s.
x = (0.5:0.1:5);
y = (1:0.04:2);
[X,Y] = meshgrid(x,y);
Z = weibstat(X,Y);
[c,h] = contour(x,y,Z,[0.4 0.6 1.0 1.8]);
clabel(c);
1 Tu to r i a l
1-12
Overview of the Distributions
The following sect ions descr ibe t he available pr obabilit y dist r ibut ions:
Bet a Dist r ibut ion on page 1-13
Binomial Dist r ibut ion on page 1-15
Chi-Squar e Dist r ibut ion on page 1-17
Noncent r al Chi-Squar e Dist r ibut ion on page 1-18
Discr et e Unifor m Dist r ibut ion on page 1-20
Exponent ial Dist r ibut ion on page 1-21
F Dist r ibut ion on page 1-23
Noncent r al F Dist r ibut ion on page 1-24
Gamma Dist r ibut ion on page 1-25
Geomet r ic Dist r ibut ion on page 1-27
Hyper geomet r ic Dist r ibut ion on page 1-28
Lognor mal Dist r ibut ion on page 1-30
Negat ive Binomial Dist r ibut ion on page 1-31
Nor mal Dist r ibut ion on page 1-32
Poisson Dist r ibut ion on page 1-34
Rayleigh Dist r ibut ion on page 1-35
St udent s t Dist r ibut ion on page 1-37
Noncent r al t Dist r ibut ion on page 1-38
Unifor m (Cont inuous) Dist r ibut ion on page 1-39
Weibull Dist r ibut ion on page 1-40
1 2 3 4 5
1
1.2
1.4
1.6
1.8
2
0.4
0.6
1
1.8
Pr o b a b i l i ty D i str i b u ti o n s
1-13
Beta Distribution
The following sect ions pr ovide an over view of t he bet a dist r ibut ion.
Ba ckground on the Beta Distribution. The bet a dist r ibut ion descr ibes a family of
cur ves t hat ar e unique in t hat t hey ar e nonzer o only on t he int er val (0 1). A
mor e gener al ver sion of t he funct ion assigns par amet er s t o t he end-point s of
t he int er val.
The bet a cdf is t he same as t he incomplet e bet a funct ion.
The bet a dist r ibut ion has a funct ional r elat ionship wit h t he t dist r ibut ion. If Y
is an obser vat ion fr om St udent s t dist r ibut ion wit h degr ees of fr eedom, t hen
t he following t r ansfor mat ion gener at es X, which is bet a dist r ibut ed.
if t hen
The St at ist ics Toolbox uses t his r elat ionship t o comput e values of t he t cdf and
inver se funct ion as well as gener at ing t dist r ibut ed r andom number s.
Definition of the Beta Distribution. The bet a pdf is
wher e B( ) is t he Bet a funct ion. The indicat or funct ion I
(0,1)
(x) ensur es t hat
only values of x in t he r ange (0 1) have nonzer o pr obabilit y.
Pa ra meter Estima tion for the Beta Distribution. Suppose you ar e collect ing dat a t hat
has har d lower and upper bounds of zer o and one r espect ively. Par amet er
est imat ion is t he pr ocess of det er mining t he par amet er s of t he bet a
dist r ibut ion t hat fit t his dat a best in some sense.
One popular cr it er ion of goodness is t o maximize t he likelihood funct ion. The
likelihood has t he same for m as t he bet a pdf. But for t he pdf, t he par amet er s
ar e known const ant s and t he var iable is x. The likelihood funct ion r ever ses t he
r oles of t he var iables. Her e, t he sample values (t he xs) ar e alr eady obser ved.
So t hey ar e t he fixed const ant s. The var iables ar e t he unknown par amet er s.
X
1
2
---
1
2
---
Y
Y
2
+
-------------------- + =
Y t ( ) X

2
---

2
--- ,
,
_

y f x a b , ( )
1
B a b , ( )
-------------------x
a 1
1 x ( )
b 1
I
0 1 , ( )
x ( ) = =
1 Tu to r i a l
1-14
Maximum likelihood est imat ion (MLE) involves calculat ing t he values of t he
par amet er s t hat give t he highest likelihood given t he par t icular set of dat a.
The funct ion betafit r et ur ns t he MLEs and confidence int er vals for t he
par amet er s of t he bet a dist r ibut ion. Her e is an example using r andom number s
fr om t he bet a dist r ibut ion wit h a = 5 and b = 0.2.
r = betarnd(5,0.2,100,1);
[phat, pci] = betafit(r)
phat =
4.5330 0.2301
pci =
2.8051 0.1771
6.2610 0.2832
The MLE for par amet er a is 4.5330, compar ed t o t he t r ue value of 5. The 95%
confidence int er val for a goes fr om 2.8051 t o 6.2610, which includes t he t r ue
value.
Similar ly t he MLE for par amet er b is 0.2301, compar ed t o t he t r ue value of 0.2.
The 95% confidence int er val for b goes fr om 0.1771 t o 0.2832, which also
includes t he t r ue value. Of cour se, in t his made-up example we know t he t r ue
value. In exper iment at ion we do not .
Exa mple a nd Plot of the Beta Distribution. The shape of t he bet a dist r ibut ion is quit e
var iable depending on t he values of t he par amet er s, as illust r at ed by t he plot
below.
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
a = b = 1
a = b = 4
a = b = 0.75
Pr o b a b i l i ty D i str i b u ti o n s
1-15
The const ant pdf (t he flat line) shows t hat t he st andar d unifor m dist r ibut ion is
a special case of t he bet a dist r ibut ion.
Binomia l Distribution
The following sect ions pr ovide an over view of t he binomial dist r ibut ion.
Ba ckground of the Binomia l Distribution. The binomial dist r ibut ion models t he t ot al
number of successes in r epeat ed t r ials fr om an infinit e populat ion under t he
following condit ions:
Only t wo out comes ar e possible on each of n t r ials.
The pr obabilit y of success for each t r ial is const ant .
All t r ials ar e independent of each ot her .
J ames Ber noulli der ived t he binomial dist r ibut ion in 1713 (Ars Conjectandi).
Ear lier , Blaise Pascal had consider ed t he special case wher e p = 1/2.
Definition of the Binomia l Distribution. The binomial pdf is
wher e and .
The binomial dist r ibut ion is discr et e. For zer o and for posit ive int eger s less
t han n, t he pdf is nonzer o.
Pa ra meter Estima tion for the Binomia l Distribution. Suppose you ar e collect ing dat a
fr om a widget manufact ur ing pr ocess, and you r ecor d t he number of widget s
wit hin specificat ion in each bat ch of 100. You might be int er est ed in t he
pr obabilit y t hat an individual widget is wit hin specificat ion. Par amet er
est imat ion is t he pr ocess of det er mining t he par amet er , p, of t he binomial
dist r ibut ion t hat fit s t his dat a best in some sense.
One popular cr it er ion of goodness is t o maximize t he likelihood funct ion. The
likelihood has t he same for m as t he binomial pdf above. But for t he pdf, t he
par amet er s (n and p) ar e known const ant s and t he var iable is x. The likelihood
funct ion r ever ses t he r oles of t he var iables. Her e, t he sample values (t he xs)
ar e alr eady obser ved. So t hey ar e t he fixed const ant s. The var iables ar e t he
y f x n p , ( )
n
x ,
_
p
x
q
1 x ( )
I
0 1 n , , , ( )
x ( ) = =
n
x ,
_
n!
x! n x ( )!
------------------------ = q 1 p =
1 Tu to r i a l
1-16
unknown par amet er s. MLE involves calculat ing t he value of p t hat give t he
highest likelihood given t he par t icular set of dat a.
The funct ion binofit r et ur ns t he MLEs and confidence int er vals for t he
par amet er s of t he binomial dist r ibut ion. Her e is an example using r andom
number s fr om t he binomial dist r ibut ion wit h n = 100 and p = 0.9.
r = binornd(100,0.9)
r =
88
[phat, pci] = binofit(r,100)
phat =
0.8800
pci =
0.7998
0.9364
The MLE for par amet er p is 0.8800, compar ed t o t he t r ue value of 0.9. The 95%
confidence int er val for p goes fr om 0.7998 t o 0.9364, which includes t he t r ue
value. Of cour se, in t his made-up example we know t he t r ue value of p. In
exper iment at ion we do not .
Exa mple a nd Plot of the Binomia l Distribution. The following commands gener at e a
plot of t he binomial pdf for n = 10 and p = 1/2.
x = 0:10;
y = binopdf(x,10,0.5);
plot(x,y,'+')
Pr o b a b i l i ty D i str i b u ti o n s
1-17
Chi- Squa re Distribution
The following sect ions pr ovide an over view of t he
2
dist r ibut ion.
Ba ckground of the Chi-Squa re Distribution. The
2
dist r ibut ion is a special case of t he
gamma dist r ibut ion wher e b = 2 in t he equat ion for gamma dist r ibut ion below.
The
2
dist r ibut ion get s special at t ent ion because of it s impor t ance in nor mal
sampling t heor y. If a set of n obser vat ions is nor mally dist r ibut ed wit h
var iance
2
, and s
2
is t he sample st andar d deviat ion, t hen
The St at ist ics Toolbox uses t he above r elat ionship t o calculat e confidence
int er vals for t he est imat e of t he nor mal par amet er
2
in t he funct ion normfit.
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
0.25
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
n 1 ( )s
2

2
-----------------------
2
n 1 ( )
1 Tu to r i a l
1-18
Definition of the Chi-Squa re Distribution. The
2
pdf is
wher e ( ) is t he Gamma funct ion, and is t he degr ees of fr eedom.
Exa mple a nd Plot of the Chi-Squa re Distribution. The
2
dist r ibut ion is skewed t o t he
r ight especially for few degr ees of fr eedom (). The plot shows t he
2

dist r ibut ion wit h four degr ees of fr eedom.
x = 0:0.2:15;
y = chi2pdf(x,4);
plot(x,y)
N oncentra l Chi- Squa re Distribution
The following sect ions pr ovide an over view of t he noncent r al
2
dist r ibut ion.
Ba ckground of the N oncentra l Chi-Squa re Distribution. The
2
dist r ibut ion is act ually
a simple special case of t he noncent r al chi-squar e dist r ibut ion. One way t o
gener at e r andom number s wit h a
2
dist r ibut ion (wit h degr ees of fr eedom) is
t o sum t he squar es of st andar d nor mal r andom number s (mean equal t o zer o.)
What if we allow t he nor mally dist r ibut ed quant it ies t o have a mean ot her t han
zer o? The sum of squar es of t hese number s yields t he noncent r al chi-squar e
dist r ibut ion. The noncent r al chi-squar e dist r ibut ion r equir es t wo par amet er s;
t he degr ees of fr eedom and t he noncent r alit y par amet er . The noncent r alit y
par amet er is t he sum of t he squar ed means of t he nor mally dist r ibut ed
quant it ies.
y f x ( )
x
2 ( ) 2
e
x 2
2
v
2
---
2 ( )
------------------------------------- = =
0 5 10 15
0
0.05
0.1
0.15
0.2
Pr o b a b i l i ty D i str i b u ti o n s
1-19
The noncent r al chi-squar e has scient ific applicat ion in t her modynamics and
signal pr ocessing. The lit er at ur e in t hese ar eas may r efer t o it as t he Ricean or
gener alized Rayleigh dist r ibut ion.
Definition of the N oncentra l Chi-Squa re Distribution. Ther e ar e many equivalent
for mulas for t he noncent r al chi-squar e dist r ibut ion funct ion. One for mulat ion
uses a modified Bessel funct ion of t he fir st kind. Anot her uses t he gener alized
Laguer r e polynomials. The St at ist ics Toolbox comput es t he cumulat ive
dist r ibut ion funct ion values using a weight ed sum of
2
pr obabilit ies wit h t he
weight s equal t o t he pr obabilit ies of a Poisson dist r ibut ion. The Poisson
par amet er is one-half of t he noncent r alit y par amet er of t he noncent r al
chi-squar e.
wher e is t he noncent r alit y par amet er .
Exa mple of the N oncentra l Chi-Squa re Distribution. The following commands gener at e
a plot of t he noncent r al chi-squar e pdf.
x = (0:0.1:10)';
p1 = ncx2pdf(x,4,2);
p = chi2pdf(x,4);
plot(x,p,'--',x,p1,'-')
F x , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,



_
Pr
2j +
2
x [ ]
j 0 =

=
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
1 Tu to r i a l
1-20
Discrete Unifor m Distribution
The following sect ions pr ovide an over view of t he discr et e unifor m dist r ibut ion.
Ba ckground of the Discrete Uniform Distribution. The discr et e unifor m dist r ibut ion is
a simple dist r ibut ion t hat put s equal weight on t he int eger s fr om one t o N.
Definition of the Discrete Uniform Distribution. The discr et e unifor m pdf is
Exa mple a nd Plot of the Discrete Uniform Distribution. As for all discr et e dist r ibut ions,
t he cdf is a st ep funct ion. The plot shows t he discr et e unifor m cdf for N = 10.
x = 0:10;
y = unidcdf(x,10);
stairs(x,y)
set(gca,'Xlim',[0 11])
To pick a r andom sample of 10 fr om a list of 553 it ems:
numbers = unidrnd(553,1,10)
numbers =
293 372 5 213 37 231 380 326 515 468
y f x N ( )
1
N
---- I
1 N , , ( )
x ( ) = =
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
Pr o b a b i l i ty D i str i b u ti o n s
1-21
Ex ponentia l Distribution
The following sect ions pr ovide an over view of t he exponent ial dist r ibut ion.
Ba ckground of the Exponentia l Distribution. Like t he chi-squar e dist r ibut ion, t he
exponent ial dist r ibut ion is a special case of t he gamma dist r ibut ion (obt ained
by set t ing a = 1)
wher e ( ) is t he Gamma funct ion.
The exponent ial dist r ibut ion is special because of it s ut ilit y in modeling event s
t hat occur r andomly over t ime. The main applicat ion ar ea is in st udies of
lifet imes.
Definition of the Exponentia l Distribution. The exponent ial pdf is
Pa ra meter Estima tion for the Exponentia l Distribution. Suppose you ar e st r ess t est ing
light bulbs and collect ing dat a on t heir lifet imes. You assume t hat t hese
lifet imes follow an exponent ial dist r ibut ion. You want t o know how long you
can expect t he aver age light bulb t o last . Par amet er est imat ion is t he pr ocess
of det er mining t he par amet er s of t he exponent ial dist r ibut ion t hat fit t his dat a
best in some sense.
One popular cr it er ion of goodness is t o maximize t he likelihood funct ion. The
likelihood has t he same for m as t he exponent ial pdf above. But for t he pdf, t he
par amet er s ar e known const ant s and t he var iable is x. The likelihood funct ion
r ever ses t he r oles of t he var iables. Her e, t he sample values (t he xs) ar e alr eady
obser ved. So t hey ar e t he fixed const ant s. The var iables ar e t he unknown
par amet er s. MLE involves calculat ing t he values of t he par amet er s t hat give
t he highest likelihood given t he par t icular set of dat a.
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
y f x ( )
1

---e
x

---
= =
1 Tu to r i a l
1-22
The funct ion expfit r et ur ns t he MLEs and confidence int er vals for t he
par amet er s of t he exponent ial dist r ibut ion. Her e is an example using r andom
number s fr om t he exponent ial dist r ibut ion wit h = 700.
lifetimes = exprnd(700,100,1);
[muhat, muci] = expfit(lifetimes)
muhat =
672.8207
muci =
547.4338
810.9437
The MLE for par amet er is 672, compar ed t o t he t r ue value of 700. The 95%
confidence int er val for goes fr om 547 t o 811, which includes t he t r ue value.
In our life t est s we do not know t he t r ue value of so it is nice t o have a
confidence int er val on t he par amet er t o give a r ange of likely values.
Exa mple a nd Plot of the Exponentia l Distribution. For exponent ially dist r ibut ed
lifet imes, t he pr obabilit y t hat an it em will sur vive an ext r a unit of t ime is
independent of t he cur r ent age of t he it em. The example shows a specific case
of t his special pr oper t y.
l = 10:10:60;
lpd = l+0.1;
deltap = (expcdf(lpd,50)-expcdf(l,50))./(1-expcdf(l,50))
deltap =
0.0020 0.0020 0.0020 0.0020 0.0020 0.0020
The plot below shows t he exponent ial pdf wit h it s par amet er (and mean), , set
t o 2.
x = 0:0.1:10;
y = exppdf(x,2);
plot(x,y)
Pr o b a b i l i ty D i str i b u ti o n s
1-23
F Distribution
The following sect ions pr ovide an over view of t he F dist r ibut ion.
Ba ckground of the F distribution. The F dist r ibut ion has a nat ur al r elat ionship wit h
t he chi-squar e dist r ibut ion. If
1
and
2
ar e bot h chi-squar e wit h
1
and
2

degr ees of fr eedom r espect ively, t hen t he st at ist ic F below is F dist r ibut ed.
The t wo par amet er s,
1
and
2
, ar e t he numer at or and denominat or degr ees of
fr eedom. That is,
1
and
2
ar e t he number of independent pieces infor mat ion
used t o calculat e
1
and
2
r espect ively.
Definition of the F distribution. The pdf for t he F dist r ibut ion is
wher e ( ) is t he Gamma funct ion.
Exa mple a nd Plot of the F distribution. The most common applicat ion of t he F
dist r ibut ion is in st andar d t est s of hypot heses in analysis of var iance and
r egr ession.
0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
F
1

2
, ( )

1
------

2
------
------ =
y f x
1

2
, ( )


1

2
+ ( )
2
-----------------------


1
2
------
,
_


2
2
------
,
_
--------------------------------

1

2
------
,
_

1
2
-----
x

1
2
2
--------------
1

1

2
------
,
_
x +

1

2
+
2
-----------------
-------------------------------------------- = =
1 Tu to r i a l
1-24
The plot shows t hat t he F dist r ibut ion exist s on t he posit ive r eal number s and
is skewed t o t he r ight .
x = 0:0.01:10;
y = fpdf(x,5,3);
plot(x,y)
N oncentra l F Distribution
The following sect ions pr ovide an over view of t he noncent r al F dist r ibut ion.
Ba ckground of the N oncentra l F Distribution. As wit h t he
2
dist r ibut ion, t he
F dist r ibut ion is a special case of t he noncent r al F dist r ibut ion. The
F dist r ibut ion is t he r esult of t aking t he r at io of t wo
2
r andom var iables each
divided by it s degr ees of fr eedom.
If t he numer at or of t he r at io is a noncent r al chi-squar e r andom var iable
divided by it s degr ees of fr eedom, t he r esult ing dist r ibut ion is t he noncent r al
F dist r ibut ion.
The main applicat ion of t he noncent r al F dist r ibut ion is t o calculat e t he power
of a hypot hesis t est r elat ive t o a par t icular alt er nat ive.
Definition of the N oncentra l F Distribution. Similar t o t he noncent r al
2
dist r ibut ion,
t he t oolbox calculat es noncent r al F dist r ibut ion pr obabilit ies as a weight ed
sum of incomplet e bet a funct ions using Poisson pr obabilit ies as t he weight s.
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
F x
1

2
, , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,



_
I

1
x

2
+
1
x
-------------------------

1
2
------ j +

2
2
------ ,
,

_
j 0 =

=
Pr o b a b i l i ty D i str i b u ti o n s
1-25
I(x| a,b) is t he incomplet e bet a funct ion wit h par amet er s a and b, and is t he
noncent r alit y par amet er .
Exa mple a nd Plot of the N oncentra l F Distribution. The following commands gener at e
a plot of t he noncent r al F pdf.
x = (0.01:0.1:10.01)';
p1 = ncfpdf(x,5,20,10);
p = fpdf(x,5,20);
plot(x,p,'--',x,p1,'-')
Ga mma Distribution
The following sect ions pr ovide an over view of t he gamma dist r ibut ion.
Ba ckground of the Ga mma Distribution. The gamma dist r ibut ion is a family of
cur ves based on t wo par amet er s. The chi-squar e and exponent ial dist r ibut ions,
which ar e childr en of t he gamma dist r ibut ion, ar e one-par amet er dist r ibut ions
t hat fix one of t he t wo gamma par amet er s.
The gamma dist r ibut ion has t he following r elat ionship wit h t he incomplet e
Gamma funct ion.
For b = 1 t he funct ions ar e ident ical.
When a is lar ge, t he gamma dist r ibut ion closely appr oximat es a nor mal
dist r ibut ion wit h t he advant age t hat t he gamma dist r ibut ion has densit y only
for posit ive r eal number s.
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
x a b , ( ) gammainc
x
b
--- a ,
,
_
=
1 Tu to r i a l
1-26
Definition of the Ga mma Distribution. The gamma pdf is
wher e ( ) is t he Gamma funct ion.
Pa ra meter Estima tion for the Ga mma Distribution. Suppose you ar e st r ess t est ing
comput er memor y chips and collect ing dat a on t heir lifet imes. You assume t hat
t hese lifet imes follow a gamma dist r ibut ion. You want t o know how long you
can expect t he aver age comput er memor y chip t o last . Par amet er est imat ion is
t he pr ocess of det er mining t he par amet er s of t he gamma dist r ibut ion t hat fit
t his dat a best in some sense.
One popular cr it er ion of goodness is t o maximize t he likelihood funct ion. The
likelihood has t he same for m as t he gamma pdf above. But for t he pdf, t he
par amet er s ar e known const ant s and t he var iable is x. The likelihood funct ion
r ever ses t he r oles of t he var iables. Her e, t he sample values (t he xs) ar e alr eady
obser ved. So t hey ar e t he fixed const ant s. The var iables ar e t he unknown
par amet er s. MLE involves calculat ing t he values of t he par amet er s t hat give
t he highest likelihood given t he par t icular set of dat a.
The funct ion gamfit r et ur ns t he MLEs and confidence int er vals for t he
par amet er s of t he gamma dist r ibut ion. Her e is an example using r andom
number s fr om t he gamma dist r ibut ion wit h a = 10 and b = 5.
lifetimes = gamrnd(10,5,100,1);
[phat, pci] = gamfit(lifetimes)
phat =
10.9821 4.7258
pci =
7.4001 3.1543
14.5640 6.2974
Not e phat(1) = and phat(2) = . The MLE for par amet er a is 10.98,
compar ed t o t he t r ue value of 10. The 95% confidence int er val for a goes fr om
7.4 t o 14.6, which includes t he t r ue value.
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
a b

Pr o b a b i l i ty D i str i b u ti o n s
1-27
Similar ly t he MLE for par amet er b is 4.7, compar ed t o t he t r ue value of 5. The
95% confidence int er val for b goes fr om 3.2 t o 6.3, which also includes t he t r ue
value.
In our life t est s we do not know t he t r ue value of a and b so it is nice t o have a
confidence int er val on t he par amet er s t o give a r ange of likely values.
Exa mple a nd Plot of the Ga mma Distribution. In t he example t he gamma pdf is
plot t ed wit h t he solid line. The nor mal pdf has a dashed line t ype.
x = gaminv((0.005:0.01:0.995),100,10);
y = gampdf(x,100,10);
y1 = normpdf(x,1000,100);
plot(x,y,'-',x,y1,'-.')
Geometric Distribution
The following sect ions pr ovide an over view of t he geomet r ic dist r ibut ion.
Ba ckground of the Geometric Distribution. The geomet r ic dist r ibut ion is discr et e,
exist ing only on t he nonnegat ive int eger s. It is useful for modeling t he r uns of
consecut ive successes (or failur es) in r epeat ed independent t r ials of a syst em.
The geomet r ic dist r ibut ion models t he number of successes befor e one failur e
in an independent succession of t est s wher e each t est r esult s in success or
failur e.
700 800 900 1000 1100 1200 1300
0
1
2
3
4
5
x 10
-3
1 Tu to r i a l
1-28
Definition of the Geometric Distribution. The geomet r ic pdf is
wher e q = 1 p.
Exa mple a nd Plot of the Geometric Distribution. Suppose t he pr obabilit y of a
five-year -old bat t er y failing in cold weat her is 0.03. What is t he pr obabilit y of
st ar t ing 25 consecut ive days dur ing a long cold snap?
1 - geocdf(25,0.03)
ans =
0.4530
The plot shows t he cdf for t his scenar io.
x = 0:25;
y = geocdf(x,0.03);
stairs(x,y)
Hypergeometric Distribution
The following sect ions pr ovide an over view of t he hyper geomet r ic dist r ibut ion.
Ba ckground of the Hypergeometric Distribution. The hyper geomet r ic dist r ibut ion
models t he t ot al number of successes in a fixed size sample dr awn wit hout
r eplacement fr om a finit e populat ion.
The dist r ibut ion is discr et e, exist ing only for nonnegat ive int eger s less t han t he
number of samples or t he number of possible successes, whichever is gr eat er .
y f x p ( ) pq
x
I
0 1 , , ( )
x ( ) = =
0 5 10 15 20 25
0
0.2
0.4
0.6
Pr o b a b i l i ty D i str i b u ti o n s
1-29
The hyper geomet r ic dist r ibut ion differ s fr om t he binomial only in t hat t he
populat ion is finit e and t he sampling fr om t he populat ion is wit hout
r eplacement .
The hyper geomet r ic dist r ibut ion has t hr ee par amet er s t hat have dir ect
physical int er pr et at ions. M is t he size of t he populat ion. K is t he number of
it ems wit h t he desir ed char act er ist ic in t he populat ion. n is t he number of
samples dr awn. Sampling wit hout r eplacement means t hat once a par t icular
sample is chosen, it is r emoved fr om t he r elevant populat ion for all subsequent
select ions.
Definition of the Hypergeometric Distribution. The hyper geomet r ic pdf is
Exa mple a nd Plot of the Hypergeometric Distribution. The plot shows t he cdf of an
exper iment t aking 20 samples fr om a gr oup of 1000 wher e t her e ar e 50 it ems
of t he desir ed t ype.
x = 0:10;
y = hygecdf(x,1000,50,20);
stairs(x,y)
y f x M K n , , ( )
K
x ,
_
M K
n x ,
_
M
n ,
_
------------------------------- = =
0 2 4 6 8 10
0.2
0.4
0.6
0.8
1
1 Tu to r i a l
1-30
Lognor ma l Distribution
The following sect ions pr ovide an over view of t he lognor mal dist r ibut ion.
Ba ckground of the Lognorma l Distribution. The nor mal and lognor mal dist r ibut ions
ar e closely r elat ed. If X is dist r ibut ed lognor mal wit h par amet er s and
2
, t hen
lnX is dist r ibut ed nor mal wit h par amet er s and
2
.
The lognor mal dist r ibut ion is applicable when t he quant it y of int er est must be
posit ive, since lnX exist s only when t he r andom var iable X is posit ive.
Economist s oft en model t he dist r ibut ion of income using a lognor mal
dist r ibut ion.
Definition of the Lognorma l Distribution. The lognor mal pdf is
Exa mple a nd Plot of the Lognorma l Distribution. Suppose t he income of a family of
four in t he Unit ed St at es follows a lognor mal dist r ibut ion wit h = log(20,000)
and
2
= 1.0. Plot t he income densit y.
x = (10:1000:125010)';
y = lognpdf(x,log(20000),1.0);
plot(x,y)
set(gca,'xtick',[0 30000 60000 90000 120000])
set(gca,'xticklabel',str2mat('0','$30,000','$60,000',...
'$90,000','$120,000'))
y f x , ( )
1
x 2
------------------
e
ln x ( )
2
2
2
----------------------------
= =
0 $30,000 $60,000 $90,000 $120,000
0
2
4
x 10
-5
Pr o b a b i l i ty D i str i b u ti o n s
1-31
N ega tive Binomia l Distribution
The following sect ions pr ovide an over view of t he negat ive binomial
dist r ibut ion.
Ba ckground of the N ega tive Binomia l Distribution. The geomet r ic dist r ibut ion is a
special case of t he negat ive binomial dist r ibut ion (also called t he Pascal
dist r ibut ion). The geomet r ic dist r ibut ion models t he number of successes
befor e one failur e in an independent succession of t est s wher e each t est r esult s
in success or failur e.
In t he negat ive binomial dist r ibut ion t he number of failur es is a par amet er of
t he dist r ibut ion. The par amet er s ar e t he pr obabilit y of success, p, and t he
number of failur es, r.
Definition of the N ega tive Binomia l Distribution. The negat ive binomial pdf is
wher e .
Exa mple a nd Plot of the N ega tive Binomia l Distribution. The following commands
gener at e a plot of t he negat ive binomial pdf.
x = (0:10);
y = nbinpdf(x,3,0.5);
plot(x,y,'+')
set(gca,'XLim',[-0.5,10.5])
y f x r p , ( )
r x 1 +
x ,
_
p
r
q
x
I
0 1 , , ( )
x ( ) = =
q 1 p =
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
1 Tu to r i a l
1-32
N or ma l Distribution
The following sect ions pr ovide an over view of t he nor mal dist r ibut ion.
Ba ckground of the N orma l Distribution. The nor mal dist r ibut ion is a t wo par amet er
family of cur ves. The fir st par amet er , , is t he mean. The second, , is t he
st andar d deviat ion. The st andar d nor mal dist r ibut ion (wr it t en (x)) set s t o 0
and t o 1.
(x) is funct ionally r elat ed t o t he er r or funct ion, erf.
The fir st use of t he nor mal dist r ibut ion was as a cont inuous appr oximat ion t o
t he binomial.
The usual just ificat ion for using t he nor mal dist r ibut ion for modeling is t he
Cent r al Limit Theor em, which st at es (r oughly) t hat t he sum of independent
samples fr om any dist r ibut ion wit h finit e mean and var iance conver ges t o t he
nor mal dist r ibut ion as t he sample size goes t o infinit y.
Definition of the N orma l Distribution. The nor mal pdf is
Pa ra meter Estima tion for the N orma l Distribution. One of t he fir st applicat ions of t he
nor mal dist r ibut ion in dat a analysis was modeling t he height of school
childr en. Suppose we want t o est imat e t he mean, , and t he var iance,
2
, of all
t he 4t h gr ader s in t he Unit ed St at es.
We have alr eady int r oduced MLEs. Anot her desir able cr it er ion in a st at ist ical
est imat or is unbiasedness. A st at ist ic is unbiased if t he expect ed value of t he
st at ist ic is equal t o t he par amet er being est imat ed. MLEs ar e not always
unbiased. For any dat a sample, t her e may be mor e t han one unbiased
est imat or of t he par amet er s of t he par ent dist r ibut ion of t he sample. For
inst ance, ever y sample value is an unbiased est imat e of t he par amet er of a
nor mal dist r ibut ion. The Minimum Var iance Unbiased Est imat or (MVUE) is
t he st at ist ic t hat has t he minimum var iance of all unbiased est imat or s of a
par amet er .
erf x ( ) 2 x 2 ( ) 1 =
y f x , ( )
1
2
---------------
e
x ( )
2
2
2
----------------------
= =
Pr o b a b i l i ty D i str i b u ti o n s
1-33
The MVUEs of par amet er s and
2
for t he nor mal dist r ibut ion ar e t he sample
aver age and var iance. The sample aver age is also t he MLE for . Ther e ar e t wo
common t ext book for mulas for t he var iance.
They ar e
wher e
Equat ion 1 is t he maximum likelihood est imat or for
2
, and equat ion 2 is t he
MVUE.
The funct ion normfit r et ur ns t he MVUEs and confidence int er vals for and

2
. Her e is a playful example modeling t he height s (inches) of a r andomly
chosen 4t h gr ade class.
height = normrnd(50,2,30,1); % Simulate heights.
[mu,s,muci,sci] = normfit(height)
mu =
50.2025
s =
1.7946
muci =
49.5210
50.8841
sci =
1.4292
2.4125
1) s
2 1
n
--- = x
i
x ( )
2
i 1 =
n

2) s
2 1
n 1
------------- x
i
x ( )
2
i 1 =
n

=
x
x
i
n
----
i 1 =
n

=
1 Tu to r i a l
1-34
Exa mple a nd Plot of the N orma l Distribution. The plot shows t he bell cur ve of t he
st andar d nor mal pdf, wit h = 0 and = 1.
Poisson Distribution
The following sect ions pr ovide an over view of t he Poisson dist r ibut ion.
Ba ckground of the Poisson Distribution. The Poisson dist r ibut ion is appr opr iat e for
applicat ions t hat involve count ing t he number of t imes a r andom event occur s
in a given amount of t ime, dist ance, ar ea, et c. Sample applicat ions t hat involve
Poisson dist r ibut ions include t he number of Geiger count er clicks per second,
t he number of people walking int o a st or e in an hour , and t he number of flaws
per 1000 feet of video t ape.
The Poisson dist r ibut ion is a one par amet er discr et e dist r ibut ion t hat t akes
nonnegat ive int eger values. The par amet er , , is bot h t he mean and t he
var iance of t he dist r ibut ion. Thus, as t he size of t he number s in a par t icular
sample of Poisson r andom number s get s lar ger , so does t he var iabilit y of t he
number s.
As Poisson (1837) showed, t he Poisson dist r ibut ion is t he limit ing case of a
binomial dist r ibut ion wher e N appr oaches infinit y and p goes t o zer o while
Np = .
The Poisson and exponent ial dist r ibut ions ar e r elat ed. If t he number of count s
follows t he Poisson dist r ibut ion, t hen t he int er val bet ween individual count s
follows t he exponent ial dist r ibut ion.
-3 -2 -1 0 1 2 3
0
0.1
0.2
0.3
0.4
Pr o b a b i l i ty D i str i b u ti o n s
1-35
Definition of the Poisson Distribution. The Poisson pdf is
Pa ra meter Estima tion for the Poisson Distribution. The MLE and t he MVUE of t he
Poisson par amet er , , is t he sample mean. The sum of independent Poisson
r andom var iables is also Poisson dist r ibut ed wit h t he par amet er equal t o t he
sum of t he individual par amet er s. The St at ist ics Toolbox makes use of t his fact
t o calculat e confidence int er vals on . As get s lar ge t he Poisson dist r ibut ion
can be appr oximat ed by a nor mal dist r ibut ion wit h = and
2
= . The
St at ist ics Toolbox uses t his appr oximat ion for calculat ing confidence int er vals
for values of gr eat er t han 100.
Exa mple a nd Plot of the Poisson Distribution. The plot shows t he pr obabilit y for each
nonnegat ive int eger when = 5.
x = 0:15;
y = poisspdf(x,5);
plot(x,y,'+')
Ra yleigh Distribution
The following sect ions pr ovide an over view of t he Rayleigh dist r ibut ion.
Ba ckground of the Ra yleigh Distribution. The Rayleigh dist r ibut ion is a special case
of t he Weibull dist r ibut ion. If A and B ar e t he par amet er s of t he Weibull
dist r ibut ion, t hen t he Rayleigh dist r ibut ion wit h par amet er is equivalent t o
t he Weibull dist r ibut ion wit h par amet er s and .
y f x ( )

x
x!
-----e

I
0 1 , , ( )
x ( ) = =
0 5 10 15
0
0.05
0.1
0.15
0.2
b
A 1 2b
2
( ) = B 2 =
1 Tu to r i a l
1-36
If t he component velocit ies of a par t icle in t he x and y dir ect ions ar e t wo
independent nor mal r andom var iables wit h zer o means and equal var iances,
t hen t he dist ance t he par t icle t r avels per unit t ime is dist r ibut ed Rayleigh.
Definition of the Ra yleigh Distribution. The Rayleigh pdf is
Pa ra meter Estima tion for the Ra yleigh Distribution. The raylfit funct ion r et ur ns t he
MLE of t he Rayleigh par amet er . This est imat e is
Exa mple a nd Plot of the Ra yleigh Distribution. The following commands gener at e a
plot of t he Rayleigh pdf.
x = [0:0.01:2];
p = raylpdf(x,0.5);
plot(x,p)
y f x b ( )
x
b
2
------e
x
2

2b
2
---------
,
_
= =
b
1
2n
------- x
i
2
i 1 =
n

=
0 0.5 1 1.5 2
0
0.5
1
1.5
Pr o b a b i l i ty D i str i b u ti o n s
1-37
Students t Distribution
The following sect ions pr ovide an over view of St udent s t dist r ibut ion.
Ba ckground of Students t Distribution. The t dist r ibut ion is a family of cur ves
depending on a single par amet er (t he degr ees of fr eedom). As goes t o
infinit y, t he t dist r ibut ion conver ges t o t he st andar d nor mal dist r ibut ion.
W. S. Gosset t (1908) discover ed t he dist r ibut ion t hr ough his wor k at t he
Guinness br ewer y. At t hat t ime, Guinness did not allow it s st aff t o publish, so
Gosset t used t he pseudonym St udent .
If x and s ar e t he mean and st andar d deviat ion of an independent r andom
sample of size n fr om a nor mal dist r ibut ion wit h mean and
2
= n, t hen
Definition of Students t Distribution. St udent s t pdf is
wher e ( ) is t he Gamma funct ion.
Exa mple a nd Plot of Students t Distribution. The plot compar es t he t dist r ibut ion
wit h = 5 (solid line) t o t he shor t er t ailed, st andar d nor mal dist r ibut ion
(dashed line).
x = -5:0.1:5;
y = tpdf(x,5);
z = normpdf(x,0,1);
plot(x,y,'-',x,z,'-.')
t ( )
x
s
------------ =
n 1 =
y f x ( )

1 +
2
------------
,
_


2
---
,
_
----------------------
1

----------
1
1
x
2

----- +
,
_
1 +
2
------------
-------------------------------- = =
1 Tu to r i a l
1-38
N oncentra l t Distribution
The following sect ions pr ovide an over view of t he noncent r al t dist r ibut ion.
Ba ckground of the N oncentra l t Distribution. The noncent r al t dist r ibut ion is a
gener alizat ion of t he familiar St udent s t dist r ibut ion.
If x and s ar e t he mean and st andar d deviat ion of an independent r andom
sample of size n fr om a nor mal dist r ibut ion wit h mean and
2
= n, t hen
Suppose t hat t he mean of t he nor mal dist r ibut ion is not . Then t he r at io has
t he noncent r al t dist r ibut ion. The noncent r alit y par amet er is t he differ ence
bet ween t he sample mean and .
The noncent r al t dist r ibut ion allows us t o det er mine t he pr obabilit y t hat we
would det ect a differ ence bet ween x and in a t t est . This pr obabilit y is t he
power of t he t est . As x- incr eases, t he power of a t est also incr eases.
Definition of the N oncentra l t Distribution. The most gener al r epr esent at ion of t he
noncent r al t dist r ibut ion is quit e complicat ed. J ohnson and Kot z (1970) give a
for mula for t he pr obabilit y t hat a noncent r al t var iat e falls in t he r ange [-t , t ].
-5 0 5
0
0.1
0.2
0.3
0.4
t ( )
x
s
------------ =
n 1 =
Pr t ( ) x t < < , ( ) ( )
1
2
---
2
,
_
j
j!
-----------------e

2
2
-----
,



_
I
x
2
x
2
+
---------------
1
2
--- j +

2
--- ,
,

_
j 0 =

=
Pr o b a b i l i ty D i str i b u ti o n s
1-39
I(x| a,b) is t he incomplet e bet a funct ion wit h par amet er s a and b, is t he
noncent r alit y par amet er , and is t he degr ees of fr eedom.
Exa mple a nd Plot of the N oncentra l t Distribution. The following commands gener at e
a plot of t he noncent r al t pdf.
x = (-5:0.1:5)';
p1 = nctcdf(x,10,1);
p = tcdf(x,10);
plot(x,p,'--',x,p1,'-')
Unifor m (Continuous) Distribution
The following sect ions pr ovide an over view of t he unifor m dist r ibut ion.
Ba ckground of the Uniform Distribution. The unifor m dist r ibut ion (also called
r ect angular ) has a const ant pdf bet ween it s t wo par amet er s a (t he minimum)
and b (t he maximum). The st andar d unifor m dist r ibut ion (a = 0 and b = 1) is a
special case of t he bet a dist r ibut ion, obt ained by set t ing bot h of it s par amet er s
t o 1.
The unifor m dist r ibut ion is appr opr iat e for r epr esent ing t he dist r ibut ion of
r ound-off er r or s in values t abulat ed t o a par t icular number of decimal places.
Definition of the Uniform Distribution. The unifor m cdf is
Pa ra meter Estima tion for the Uniform Distribution. The sample minimum and
maximum ar e t he MLEs of a and b r espect ively.
-5 0 5
0
0.2
0.4
0.6
0.8
1
p F x a b , ( )
x a
b a
------------I
a b , [ ]
x ( ) = =
1 Tu to r i a l
1-40
Exa mple a nd Plot of the Uniform Distribution. The example illust r at es t he inver sion
met hod for gener at ing nor mal r andom number s using rand and norminv. Not e
t hat t he MATLAB funct ion, randn, does not use inver sion since it is not
efficient for t his case.
u = rand(1000,1);
x = norminv(u,0,1);
hist(x)
Weibull Distribution
The following sect ions pr ovide an over view of t he Weibull dist r ibut ion.
Ba ckground of the Weibull Distribution. Waloddi Weibull (1939) offer ed t he
dist r ibut ion t hat bear s his name as an appr opr iat e analyt ical t ool for modeling
t he br eaking st r engt h of mat er ials. Cur r ent usage also includes r eliabilit y and
lifet ime modeling. The Weibull dist r ibut ion is mor e flexible t han t he
exponent ial for t hese pur poses.
To see why, consider t he hazar d r at e funct ion (inst ant aneous failur e r at e). If
f(t) and F(t) ar e t he pdf and cdf of a dist r ibut ion, t hen t he hazar d r at e is
Subst it ut ing t he pdf and cdf of t he exponent ial dist r ibut ion for f(t) and F(t)
above yields a const ant . The example below shows t hat t he hazar d r at e for t he
Weibull dist r ibut ion can var y.
-4 -2 0 2 4
0
100
200
300
h t ( )
f t ( )
1 F t ( )
-------------------- =
Pr o b a b i l i ty D i str i b u ti o n s
1-41
Definition of the Weibull Distribution. The Weibull pdf is
Pa ra meter Estima tion for the Weibull Distribution. Suppose we want t o model t he
t ensile st r engt h of a t hin filament using t he Weibull dist r ibut ion. The funct ion
weibfit gives MLEs and confidence int er vals for t he Weibull par amet er s.
strength = weibrnd(0.5,2,100,1); % Simulated strengths.
[p,ci] = weibfit(strength)
p =
0.4746 1.9582
ci =
0.3851 1.6598
0.5641 2.2565
The default 95% confidence int er val for each par amet er cont ains t he t r ue
value.
Exa mple a nd Plot of the Weibull Distribution. The exponent ial dist r ibut ion has a
const ant hazar d funct ion, which is not gener ally t he case for t he Weibull
dist r ibut ion.
The plot shows t he hazar d funct ions for exponent ial (dashed line) and Weibull
(solid line) dist r ibut ions having t he same mean life. The Weibull hazar d r at e
her e incr eases wit h age (a r easonable assumpt ion).
t = 0:0.1:3;
h1 = exppdf(t,0.6267) ./ (1-expcdf(t,0.6267));
h2 = weibpdf(t,2,2) ./ (1-weibcdf(t,2,2));
plot(t,h1,'--',t,h2,'-')
y f x a b , ( ) abx
b 1
e
a x
b

I
0 , ( )
x ( ) = =
1 Tu to r i a l
1-42
0 0.5 1 1.5 2 2.5 3
0
5
10
15
D e sc r i p ti v e Sta ti sti c s
1-43
Descriptive Statistics
Dat a samples can have t housands (even millions) of values. Descr ipt ive
st at ist ics ar e a way t o summar ize t his dat a int o a few number s t hat cont ain
most of t he r elevant infor mat ion. The following sect ions explor e t he feat ur es
pr ovided by t he St at ist ics Toolbox for wor king wit h descr ipt ive st at ist ics:
Measur es of Cent r al Tendency (Locat ion)
Measur es of Disper sion
Funct ions for Dat a wit h Missing Values (NaNs)
Funct ion for Gr ouped Dat a
Per cent iles and Gr aphical Descr ipt ions
The Boot st r ap
Measures of Central Tendency (Location)
The pur pose of measur es of cent r al t endency is t o locat e t he dat a values on t he
number line. Anot her t er m for t hese st at ist ics is measures of location.
The t able gives t he funct ion names and descr ipt ions.
The aver age is a simple and popular est imat e of locat ion. If t he dat a sample
comes fr om a nor mal dist r ibut ion, t hen t he sample aver age is also opt imal
(MVUE of ).
Measures of Location
geomean Geomet r ic mean
harmmean Har monic mean
mean Ar it hmet ic aver age (in MATLAB)
median 50t h per cent ile (in MATLAB)
trimmean Tr immed mean
1 Tu to r i a l
1-44
Unfor t unat ely, out lier s, dat a ent r y er r or s, or glit ches exist in almost all r eal
dat a. The sample aver age is sensit ive t o t hese pr oblems. One bad dat a value
can move t he aver age away fr om t he cent er of t he r est of t he dat a by an
ar bit r ar ily lar ge dist ance.
The median and t r immed mean ar e t wo measur es t hat ar e r esist ant (r obust ) t o
out lier s. The median is t he 50t h per cent ile of t he sample, which will only
change slight ly if you add a lar ge per t ur bat ion t o any value. The idea behind
t he t r immed mean is t o ignor e a small per cent age of t he highest and lowest
values of a sample when det er mining t he cent er of t he sample.
The geomet r ic mean and har monic mean, like t he aver age, ar e not r obust t o
out lier s. They ar e useful when t he sample is dist r ibut ed lognor mal or heavily
skewed.
The example below shows t he behavior of t he measur es of locat ion for a sample
wit h one out lier .
x = [ones(1,6) 100]
x =
1 1 1 1 1 1 100
locate = [geomean(x) harmmean(x) mean(x) median(x)...
trimmean(x,25)]
locate =
1.9307 1.1647 15.1429 1.0000 1.0000
You can see t hat t he mean is far fr om any dat a value because of t he influence
of t he out lier . The median and t r immed mean ignor e t he out lying value and
descr ibe t he locat ion of t he r est of t he dat a values.
D e sc r i p ti v e Sta ti sti c s
1-45
Measures of Dispersion
The pur pose of measur es of disper sion is t o find out how spr ead out t he dat a
values ar e on t he number line. Anot her t er m for t hese st at ist ics is measur es of
spr ead.
The t able gives t he funct ion names and descr ipt ions.
The r ange (t he differ ence bet ween t he maximum and minimum values) is t he
simplest measur e of spr ead. But if t her e is an out lier in t he dat a, it will be t he
minimum or maximum value. Thus, t he r ange is not r obust t o out lier s.
The st andar d deviat ion and t he var iance ar e popular measur es of spr ead t hat
ar e opt imal for nor mally dist r ibut ed samples. The sample var iance is t he
MVUE of t he nor mal par amet er
2
. The st andar d deviat ion is t he squar e r oot
of t he var iance and has t he desir able pr oper t y of being in t he same unit s as t he
dat a. That is, if t he dat a is in met er s, t he st andar d deviat ion is in met er s as
well. The var iance is in met er s
2
, which is mor e difficult t o int er pr et .
Neit her t he st andar d deviat ion nor t he var iance is r obust t o out lier s. A dat a
value t hat is separ at e fr om t he body of t he dat a can incr ease t he value of t he
st at ist ics by an ar bit r ar ily lar ge amount .
The Mean Absolut e Deviat ion (MAD) is also sensit ive t o out lier s. But t he MAD
does not move quit e as much as t he st andar d deviat ion or var iance in r esponse
t o bad dat a.
The Int er quar t ile Range (IQR) is t he differ ence bet ween t he 75t h and 25t h
per cent ile of t he dat a. Since only t he middle 50% of t he dat a affect s t his
measur e, it is r obust t o out lier s.
Measures of Dispersion
iqr Int er quar t ile Range
mad Mean Absolut e Deviat ion
range Range
std St andar d deviat ion (in MATLAB)
var Var iance (in MATLAB)
1 Tu to r i a l
1-46
The example below shows t he behavior of t he measur es of disper sion for a
sample wit h one out lier .
x = [ones(1,6) 100]
x =
1 1 1 1 1 1 100
stats = [iqr(x) mad(x) range(x) std(x)]
stats =
0 24.2449 99.0000 37.4185
Functions for Data w ith Missing Values (NaNs)
Most r eal-wor ld dat a set s have one or mor e missing element s. It is convenient
t o code missing ent r ies in a mat r ix as NaN (Not a Number ).
Her e is a simple example.
m = magic(3);
m([1 5]) = [NaN NaN]
m =
NaN 1 6
3 NaN 7
4 9 2
Any ar it hmet ic oper at ion t hat involves t he missing values in t his mat r ix yields
NaN, as below.
sum(m)
ans =
NaN NaN 15
Removing cells wit h NaN would dest r oy t he mat r ix st r uct ur e. Removing whole
r ows t hat cont ain NaN would discar d r eal dat a. Inst ead, t he St at ist ics Toolbox
has a var iet y of funct ions t hat ar e similar t o ot her MATLAB funct ions, but t hat
t r eat NaN values as missing and t her efor e ignor e t hem in t he calculat ions.
D e sc r i p ti v e Sta ti sti c s
1-47
nansum(m)
ans =
7 10 13
In addit ion, ot her St at ist ics Toolbox funct ions oper at e only on t he numer ic
values, ignor ing NaNs. These include iqr, kurtosis, mad, prctile, range,
skewness, and trimmean.
Function for Grouped Data
As we saw in t he pr evious sect ion, t he descr ipt ive st at ist ics funct ions can
comput e st at ist ics on each column in a mat r ix. Somet imes, however , you may
have your dat a ar r anged differ ent ly so t hat measur ement s appear in one
column or var iable, and a gr ouping code appear s in a second column or
var iable. Alt hough MATLABs synt ax makes it simple t o apply funct ions t o a
subset of an ar r ay, in t his case it is simpler t o use t he grpstats funct ion.
The grpstats funct ion can comput e t he mean, st andar d er r or of t he mean, and
count (number of obser vat ions) for each gr oup defined by one or mor e gr ouping
var iables. If you supply a significance level, it also cr eat es a gr aph of t he gr oup
means wit h confidence int er vals.
As an example, load t he lar ger car dat a set . We can look at t he aver age value
of MPG (miles per gallon) for car s gr ouped by org (locat ion of t he or igin of t he
car ).
NaN Functions
nanmax Maximum ignor ing NaNs
nanmean Mean ignor ing NaNs
nanmedian Median ignor ing NaNs
nanmin Minimum ignor ing NaNs
nanstd St andar d deviat ion ignor ing NaNs
nansum Sum ignor ing NaNs
1 Tu to r i a l
1-48
load carbig
grpstats(MPG,org,0.05)
ans =
20.084
27.891
30.451
We can also get t he complet e set of st at ist ics for MPG gr ouped by t hr ee var iables:
org, cyl4 (t he engine has four cylinder s or not ), and when (when t he car was
made).
[m,s,c,n] = grpstats(MPG,{org cyl4 when});
[n num2cell([m s c])]
ans =
'USA' 'Other' 'Early' [14.896] [0.33306] [77]
'USA' 'Other' 'Mid' [17.479] [0.30225] [75]
'USA' 'Other' 'Late' [21.536] [0.97961] [25]
'USA' 'Four' 'Early' [23.333] [0.87328] [12]
'USA' 'Four' 'Mid' [27.027] [0.75456] [22]
'USA' 'Four' 'Late' [29.734] [0.71126] [38]
'Europe' 'Other' 'Mid' [ 17.5] [ 0.9478] [ 4]
'Europe' 'Other' 'Late' [30.833] [ 3.1761] [ 3]
USA Europe Japan
18
20
22
24
26
28
30
32
Group
M
e
a
n
Means and Confidence Intervals for Each Group
D e sc r i p ti v e Sta ti sti c s
1-49
'Europe' 'Four' 'Early' [24.714] [0.73076] [21]
'Europe' 'Four' 'Mid' [26.912] [ 1.0116] [26]
'Europe' 'Four' 'Late' [ 35.7] [ 1.4265] [16]
'Japan' 'Other' 'Early' [ 19] [0.57735] [ 3]
'Japan' 'Other' 'Mid' [20.833] [0.92796] [ 3]
'Japan' 'Other' 'Late' [ 26.5] [ 2.0972] [ 4]
'Japan' 'Four' 'Early' [26.083] [ 1.1772] [12]
'Japan' 'Four' 'Mid' [ 29.5] [0.86547] [25]
'Japan' 'Four' 'Late' [ 35.3] [0.68346] [32]
Percentiles and Graphical Descriptions
Tr ying t o descr ibe a dat a sample wit h t wo number s, a measur e of locat ion and
a measur e of spr ead, is fr ugal but may be misleading.
Anot her opt ion is t o comput e a r easonable number of t he sample per cent iles.
This pr ovides infor mat ion about t he shape of t he dat a as well as it s locat ion
and spr ead.
The example shows t he r esult of looking at ever y quar t ile of a sample
cont aining a mixt ur e of t wo dist r ibut ions.
x = [normrnd(4,1,1,100) normrnd(6,0.5,1,200)];
p = 100*(0:0.25:1);
y = prctile(x,p);
z = [p;y]
z =
0 25.0000 50.0000 75.0000 100.0000
1.5172 4.6842 5.6706 6.1804 7.6035
Compar e t he fir st t wo quant iles t o t he r est .
The box plot is a gr aph for descr ipt ive st at ist ics. The gr aph below is a box plot
of t he dat a above.
boxplot(x)
1 Tu to r i a l
1-50
The long lower t ail and plus signs show t he lack of symmet r y in t he sample
values. For mor e infor mat ion on box plot s, see St at ist ical Plot s on page 1-128.
The hist ogr am is a complement ar y gr aph.
hist(x)
The Bootstrap
In r ecent year s t he st at ist ical lit er at ur e has examined t he pr oper t ies of
r esampling as a means t o acquir e infor mat ion about t he uncer t aint y of
st at ist ical est imat or s.
The boot st r ap is a pr ocedur e t hat involves choosing r andom samples with
replacement fr om a dat a set and analyzing each sample t he same way.
Sampling with replacement means t hat ever y sample is r et ur ned t o t he dat a set
aft er sampling. So a par t icular dat a point fr om t he or iginal dat a set could
appear mult iple t imes in a given boot st r ap sample. The number of element s in
each boot st r ap sample equals t he number of element s in t he or iginal dat a set .
1
2
3
4
5
6
7
V
a
l
u
e
s
Column Number
1 2 3 4 5 6 7 8
0
20
40
60
80
100
D e sc r i p ti v e Sta ti sti c s
1-51
The r ange of sample est imat es we obt ain allows us t o est ablish t he uncer t aint y
of t he quant it y we ar e est imat ing.
Her e is an example t aken fr om Efr on and Tibshir ani (1993) compar ing Law
School Admission Test (LSAT) scor es and subsequent law school gr ade point
aver age (GPA) for a sample of 15 law schools.
load lawdata
plot(lsat,gpa,'+')
lsline
The least squar es fit line indicat es t hat higher LSAT scor es go wit h higher law
school GPAs. But how sur e ar e we of t his conclusion? The plot gives us some
int uit ion but not hing quant it at ive.
We can calculat e t he cor r elat ion coefficient of t he var iables using t he corrcoef
funct ion.
rhohat = corrcoef(lsat,gpa)
rhohat =
1.0000 0.7764
0.7764 1.0000
Now we have a number , 0.7764, descr ibing t he posit ive connect ion bet ween
LSAT and GPA, but t hough 0.7764 may seem lar ge, we st ill do not know if it is
st at ist ically significant .
Using t he bootstrp funct ion we can r esample t he lsat and gpa vect or s as
many t imes as we like and consider t he var iat ion in t he r esult ing cor r elat ion
coefficient s.
540 560 580 600 620 640 660 680
2.6
2.8
3
3.2
3.4
3.6
1 Tu to r i a l
1-52
Her e is an example.
rhos1000 = bootstrp(1000,'corrcoef',lsat,gpa);
This command r esamples t he lsat and gpa vect or s 1000 t imes and comput es
t he corrcoef funct ion on each sample. Her e is a hist ogr am of t he r esult .
hist(rhos1000(:,2),30)
Near ly all t he est imat es lie on t he int er val [0.4 1.0].
This is st r ong quant it at ive evidence t hat LSAT and subsequent GPA ar e
posit ively cor r elat ed. Mor eover , it does not r equir e us t o make any st r ong
assumpt ions about t he pr obabilit y dist r ibut ion of t he cor r elat ion coefficient .
0.2 0.4 0.6 0.8 1
0
20
40
60
80
100
C l u ste r A n a l y si s
1-53
Cluster Analysis
Clust er analysis, also called segment at ion analysis or t axonomy analysis, is a
way t o par t it ion a set of object s int o gr oups, or clusters, in such a way t hat t he
pr ofiles of object s in t he same clust er ar e ver y similar and t he pr ofiles of object s
in differ ent clust er s ar e quit e dist inct .
Clust er analysis can be per for med on many differ ent t ypes of dat a set s. For
example, a dat a set might cont ain a number of obser vat ions of subject s in a
st udy wher e each obser vat ion cont ains a set of var iables.
Many differ ent fields of st udy, such as engineer ing, zoology, medicine,
linguist ics, ant hr opology, psychology, and mar ket ing, have cont r ibut ed t o t he
development of clust er ing t echniques and t he applicat ion of such t echniques.
For example, clust er analysis can be used t o find t wo similar gr oups for t he
exper iment and cont r ol gr oups in a st udy. In t his way, if st at ist ical differ ences
ar e found in t he gr oups, t hey can be at t r ibut ed t o t he exper iment and not t o
any init ial differ ence bet ween t he gr oups.
The following sect ions explor e t he clust er ing feat ur es in t he St at ist ics Toolbox:
Ter minology and Basic Pr ocedur e
Finding t he Similar it ies Bet ween Object s
Defining t he Links Bet ween Object s
Evaluat ing Clust er For mat ion
Cr eat ing Clust er s
Terminology and Basic Procedure
To per for m clust er analysis on a dat a set using t he St at ist ics Toolbox funct ions,
follow t his pr ocedur e:
1 Fi nd the si mi lari ty or di ssi mi lari ty between every pai r of objects i n the
data set. In t his st ep, you calculat e t he distance bet ween object s using t he
pdist funct ion. The pdist funct ion suppor t s many differ ent ways t o
comput e t his measur ement . See Finding t he Similar it ies Bet ween Object s
on page 1-54 for mor e infor mat ion.
2 Group the objects i nto a bi nary, hi erarchi cal cluster tree. In t his st ep,
you link t oget her pair s of object s t hat ar e in close pr oximit y using t he
1 Tu to r i a l
1-54
linkage funct ion. The linkage funct ion uses t he dist ance infor mat ion
gener at ed in st ep 1 t o det er mine t he pr oximit y of object s t o each ot her . As
object s ar e pair ed int o binar y clust er s, t he newly for med clust er s ar e
gr ouped int o lar ger clust er s unt il a hier ar chical t r ee is for med. See Defining
t he Links Bet ween Object s on page 1-56 for mor e infor mat ion.
3 Determi ne where to di vi de the hi erarchi cal tree i nto clusters. In t his
st ep, you divide t he object s in t he hier ar chical t r ee int o clust er s using t he
cluster funct ion. The cluster funct ion can cr eat e clust er s by det ect ing
nat ur al gr oupings in t he hier ar chical t r ee or by cut t ing off t he hier ar chical
t r ee at an ar bit r ar y point . See Cr eat ing Clust er s on page 1-64 for mor e
infor mat ion.
The following sect ions pr ovide mor e infor mat ion about each of t hese st eps.
Note The St at ist ics Toolbox includes a convenience funct ion, clusterdata,
which per for ms all t hese st eps for you. You do not need t o execut e t he pdist,
linkage, or cluster funct ions separ at ely. However, t he clusterdata funct ion
does not give you access t o t he opt ions each of t he individual r out ines offer s.
For example, if you use t he pdist funct ion you can choose t he dist ance
calculat ion met hod, wher eas if you use t he clusterdata funct ion you cannot .
Finding the Similarities Betw een Objects
You use t he pdist funct ion t o calculat e t he dist ance bet ween ever y pair of
object s in a dat a set . For a dat a set made up of m object s, t her e ar e
pair s in t he dat a set . The r esult of t his comput at ion is commonly
known as a similar it y mat r ix (or dissimilar it y mat r ix).
Ther e ar e many ways t o calculat e t his dist ance infor mat ion. By default , t he
pdist funct ion calculat es t he Euclidean dist ance bet ween object s; however ,
you can specify one of sever al ot her opt ions. See pdist for mor e infor mat ion.
m m 1 ( ) 2
C l u ste r A n a l y si s
1-55
Note You can opt ionally nor malize t he values in t he dat a set befor e
calculat ing t he dist ance infor mat ion. In a r eal wor ld dat a set , var iables can be
measur ed against differ ent scales. For example, one var iable can measur e
Int elligence Quot ient (IQ) t est scor es and anot her var iable can measur e head
cir cumfer ence. These discr epancies can dist or t t he pr oximit y calculat ions.
Using t he zscore funct ion, you can conver t all t he values in t he dat a set t o use
t he same pr opor t ional scale. See zscore for mor e infor mat ion.
For example, consider a dat a set , X, made up of five object s wher e each object
is a set of x,y coor dinat es.
Object 1: 1, 2
Object 2: 2.5, 4.5
Object 3: 2, 2
Object 4: 4, 1.5
Object 5: 4, 2.5
You can define t his dat a set as a mat r ix
X = [1 2;2.5 4.5;2 2;4 1.5;4 2.5]
and pass it t o pdist. The pdist funct ion calculat es t he dist ance bet ween
object 1 and object 2, object 1 and object 3, and so on unt il t he dist ances
bet ween all t he pair s have been calculat ed. The following figur e plot s t hese
object s in a gr aph. The dist ance bet ween object 2 and object 3 is shown t o
illust r at e one int er pr et at ion of dist ance.
1
1
5
2 4 3
4
3
2
5
dist ance
1
3
4
5
2
1 Tu to r i a l
1-56
Retur ning Dista nce Infor ma tion
The pdist funct ion r et ur ns t his dist ance infor mat ion in a vect or , Y, wher e each
element cont ains t he dist ance bet ween a pair of object s.
Y = pdist(X)
Y =
Columns 1 through 7
2.9155 1.0000 3.0414 3.0414 2.5495 3.3541 2.5000
Columns 8 through 10
2.0616 2.0616 1.0000
To make it easier t o see t he r elat ionship bet ween t he dist ance infor mat ion
gener at ed by pdist and t he object s in t he or iginal dat a set , you can r efor mat
t he dist ance vect or int o a mat r ix using t he squareform funct ion. In t his mat r ix,
element i,j cor r esponds t o t he dist ance bet ween object i and object j in t he
or iginal dat a set . In t he following example, element 1,1 r epr esent s t he dist ance
bet ween object 1 and it self (which is zer o). Element 1,2 r epr esent s t he dist ance
bet ween object 1 and object 2, and so on.
squareform(Y)
ans =
0 2.9155 1.0000 3.0414 3.0414
2.9155 0 2.5495 3.3541 2.5000
1.0000 2.5495 0 2.0616 2.0616
3.0414 3.3541 2.0616 0 1.0000
3.0414 2.5000 2.0616 1.0000 0
Defining the Links Betw een Objects
Once t he pr oximit y bet ween object s in t he dat a set has been comput ed, you can
det er mine which object s in t he dat a set should be gr ouped t oget her int o
clust er s, using t he linkage funct ion. The linkage funct ion t akes t he dist ance
infor mat ion gener at ed by pdist and links pair s of object s t hat ar e close
t oget her int o binar y clust er s (clust er s made up of t wo object s). The linkage
funct ion t hen links t hese newly for med clust er s t o ot her object s t o cr eat e bigger
clust er s unt il all t he object s in t he or iginal dat a set ar e linked t oget her in a
hier ar chical t r ee.
C l u ste r A n a l y si s
1-57
For example, given t he dist ance vect or Y gener at ed by pdist fr om t he sample
dat a set of x and y coor dinat es, t he linkage funct ion gener at es a hier ar chical
clust er t r ee, r et ur ning t he linkage infor mat ion in a mat r ix, Z.
Z = linkage(Y)
Z =
1.0000 3.0000 1.0000
4.0000 5.0000 1.0000
6.0000 7.0000 2.0616
8.0000 2.0000 2.5000
In t his out put , each r ow ident ifies a link. The fir st t wo columns ident ify t he
object s t hat have been linked, t hat is, object 1, object 2, and so on. The t hir d
column cont ains t he dist ance bet ween t hese object s. For t he sample dat a set of
x and y coor dinat es, t he linkage funct ion begins by gr ouping t oget her object s 1
and 3, which have t he closest pr oximit y (dist ance value = 1.0000). The linkage
funct ion cont inues by gr ouping object s 4 and 5, which also have a dist ance
value of 1.0000.
The t hir d r ow indicat es t hat t he linkage funct ion gr ouped t oget her object s 6
and 7. If our or iginal sample dat a set cont ained only five object s, what ar e
object s 6 and 7? Object 6 is t he newly for med binar y clust er cr eat ed by t he
gr ouping of object s 1 and 3. When t he linkage funct ion gr oups t wo object s
t oget her int o a new clust er , it must assign t he clust er a unique index value,
st ar t ing wit h t he value m+1, wher e m is t he number of object s in t he or iginal
dat a set . (Values 1 t hr ough m ar e alr eady used by t he or iginal dat a set .)
Object 7 is t he index for t he clust er for med by object s 4 and 5.
As t he final clust er , t he linkage funct ion gr ouped object 8, t he newly for med
clust er made up of object s 6 and 7, wit h object 2 fr om t he or iginal dat a set . The
following figur e gr aphically illust r at es t he way linkage gr oups t he object s int o
a hier ar chy of clust er s.
1
1
5
2 4 3
4
3
2
5
6
7
8
1
2
3
4
5
1 Tu to r i a l
1-58
The hier ar chical, binar y clust er t r ee cr eat ed by t he linkage funct ion is most
easily under st ood when viewed gr aphically. The St at ist ics Toolbox includes t he
dendrogram funct ion t hat plot s t his hier ar chical t r ee infor mat ion as a gr aph,
as in t he following example.
dendrogram(Z)
In t he figur e, t he number s along t he hor izont al axis r epr esent t he indices of t he
object s in t he or iginal dat a set . The links bet ween object s ar e r epr esent ed as
upside down U-shaped lines. The height of t he U indicat es t he dist ance
bet ween t he object s. For example, t he link r epr esent ing t he clust er cont aining
object s 1 and 3 has a height of 1. For mor e infor mat ion about cr eat ing a
dendr ogr am diagr am, see t he dendrogram funct ion r efer ence page.
0.5
1
2.5
3 5 4
2
1.5
1
2
C l u ste r A n a l y si s
1-59
Evaluating Cluster Formation
Aft er linking t he object s in a dat a set int o a hier ar chical clust er t r ee, you may
want t o ver ify t hat t he t r ee r epr esent s significant similar it y gr oupings. In
addit ion, you may want mor e infor mat ion about t he links bet ween t he object s.
The St at ist ics Toolbox pr ovides funct ions t o per for m bot h t hese t asks, as
descr ibed in t he following sect ions:
Ver ifying t he Clust er Tr ee
Get t ing Mor e Infor mat ion About Clust er Links
Verifying the Cluster Tree
One way t o measur e t he validit y of t he clust er infor mat ion gener at ed by t he
linkage funct ion is t o compar e it wit h t he or iginal pr oximit y dat a gener at ed by
t he pdist funct ion. If t he clust er ing is valid, t he linking of object s in t he clust er
t r ee should have a st r ong cor r elat ion wit h t he dist ances bet ween object s in t he
dist ance vect or . The cophenet funct ion compar es t hese t wo set s of values and
comput es t heir cor r elat ion, r et ur ning a value called t he cophenetic correlation
coefficient. The closer t he value of t he cophenet ic cor r elat ion coefficient is t o 1,
t he bet t er t he clust er ing solut ion.
You can use t he cophenet ic cor r elat ion coefficient t o compar e t he r esult s of
clust er ing t he same dat a set using differ ent dist ance calculat ion met hods or
clust er ing algor it hms.
For example, you can use t he cophenet funct ion t o evaluat e t he clust er s
cr eat ed for t he sample dat a set
c = cophenet(Z,Y)
c =
0.8573
wher e Z is t he mat r ix out put by t he linkage funct ion and Y is t he dist ance
vect or out put by t he pdist funct ion.
Execut e pdist again on t he same dat a set , t his t ime specifying t he Cit y Block
met r ic. Aft er r unning t he linkage funct ion on t his new pdist out put , use t he
cophenet funct ion t o evaluat e t he clust er ing using a differ ent dist ance met r ic.
c = cophenet(Z,Y)
c =
0.9289
1 Tu to r i a l
1-60
The cophenet ic cor r elat ion coefficient shows a st r onger cor r elat ion when t he
Cit y Block met r ic is used.
Getting M ore Infor ma tion About Cluster Links
One way t o det er mine t he nat ur al clust er divisions in a dat a set is t o compar e
t he lengt h of each link in a clust er t r ee wit h t he lengt hs of neighbor ing links
below it in t he t r ee.
If a link is appr oximat ely t he same lengt h as neighbor ing links, it indicat es
t hat t her e ar e similar it ies bet ween t he object s joined at t his level of t he
hier ar chy. These links ar e said t o exhibit a high level of consist ency.
If t he lengt h of a link differ s fr om neighbor ing links, it indicat es t hat t her e ar e
dissimilar it ies bet ween t he object s at t his level in t he clust er t r ee. This link is
said t o be inconsist ent wit h t he links ar ound it . In clust er analysis,
inconsist ent links can indicat e t he bor der of a nat ur al division in a dat a set .
The cluster funct ion uses a measur e of inconsist ency t o det er mine wher e t o
divide a dat a set int o clust er s. (See Cr eat ing Clust er s on page 1-64 for mor e
infor mat ion.)
The next sect ion pr ovides an example.
Exa mple: Inconsistent Links. To illust r at e, t he following example cr eat es a dat a set
of r andom number s wit h t hr ee deliber at e nat ur al gr oupings. In t he
dendr ogr am, not e how t he object s t end t o collect int o t hr ee gr oups. These t hr ee
gr oups ar e t hen connect ed by t hr ee longer links. These longer links ar e
inconsist ent when compar ed wit h t he links below t hem in t he hier ar chy.
rand('seed',3)
X = [rand(10,2)+1;rand(10,2)+2;rand(10,2)+3];
Y = pdist(X);
Z = linkage(Y);
dendrogram(Z);
C l u ste r A n a l y si s
1-61
The r elat ive consist ency of each link in a hier ar chical clust er t r ee can be
quant ified and expr essed as t he inconsistency coefficient. This value compar es
t he lengt h of a link in a clust er hier ar chy wit h t he aver age lengt h of
neighbor ing links. If t he object is consist ent wit h t hose ar ound it , it will have a
low inconsist ency coefficient . If t he object is inconsist ent wit h t hose ar ound it ,
it will have a higher inconsist ency coefficient .
To gener at e a list ing of t he inconsist ency coefficient for each link t he clust er
t r ee, use t he inconsistent funct ion. The inconsistent funct ion compar es
each link in t he clust er hier ar chy wit h adjacent links t wo levels below it in t he
clust er hier ar chy. This is called t he depth of t he compar ison. Using t he
inconsistent funct ion, you can specify ot her dept hs. The object s at t he bot t om
of t he clust er t r ee, called leaf nodes, t hat have no fur t her object s below t hem,
have an inconsist ency coefficient of zer o.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
23 25 29 30 27 28 26 24 21 22 11 12 15 13 16 18 20 14 17 19 1 2 7 3 6 8 9 10 4 5
These links show consist ency.
These links show inconsist ency when compar ed t o links below t hem.
1 Tu to r i a l
1-62
For example, r et ur ning t o t he sample dat a set of x and y coor dinat es, we can
use t he inconsistent funct ion t o calculat e t he inconsist ency values for t he
links cr eat ed by t he linkage funct ion, descr ibed in Defining t he Links
Bet ween Object s on page 1-56.
I = inconsistent(Z)
I =
1.0000 0 1.0000 0
1.0000 0 1.0000 0
1.3539 0.8668 3.0000 0.8165
2.2808 0.3100 2.0000 0.7071
The inconsistent funct ion r et ur ns dat a about t he links in an (m-1)-by-4
mat r ix wher e each column pr ovides dat a about t he links.
In t he sample out put , t he fir st r ow r epr esent s t he link bet ween object s 1 and 3.
(This clust er is assigned t he index 6 by t he linkage funct ion.) Because t his a
leaf node, t he inconsist ency coefficient is zer o. The second r ow r epr esent s t he
link bet ween object s 4 and 5, also a leaf node. (This clust er is assigned t he
index 7 by t he linkage funct ion.)
The t hir d r ow evaluat es t he link t hat connect s t hese t wo leaf nodes, object s 6
and 7. (This clust er is called object 8 in t he linkage out put ). Column t hr ee
indicat es t hat t hr ee links ar e consider ed in t he calculat ion: t he link it self and
t he t wo links dir ect ly below it in t he hier ar chy. Column one r epr esent s t he
mean of t he lengt hs of t hese links. The inconsistent funct ion uses t he lengt h
infor mat ion out put by t he linkage funct ion t o calculat e t he mean. Column t wo
r epr esent s t he st andar d deviat ion bet ween t he links. The last column cont ains
t he inconsist ency value for t hese links, 0.8165.
Column Description
1 Mean of t he lengt hs of all t he links included in t he calculat ion
2 St andar d deviat ion of all t he links included in t he calculat ion
3 Number of links included in t he calculat ion
4 Inconsist ency coefficient
C l u ste r A n a l y si s
1-63
The following figur e illust r at es t he links and lengt hs included in t his
calculat ion.
Row four in t he out put mat r ix descr ibes t he link bet ween object 8 and object 2.
Column t hr ee indicat es t hat t wo links ar e included in t his calculat ion: t he link
it self and t he link dir ect ly below it in t he hier ar chy. The inconsist ency
coefficient for t his link is 0.7071.
0.5
1
2.5
3 5 4
2
1.5
1
2
Link 1
Lengt hs
Link 2
Link 3
1 Tu to r i a l
1-64
The following figur e illust r at es t he links and lengt hs included in t his
calculat ion.
Creating Clusters
Aft er you cr eat e t he hier ar chical t r ee of binar y clust er s, you can divide t he
hier ar chy int o lar ger clust er s using t he cluster funct ion. The cluster
funct ion let s you cr eat e clust er s in t wo ways, as discussed in t he following
sect ions:
Finding t he Nat ur al Divisions in t he Dat a Set
Specifying Ar bit r ar y Clust er s
Finding the N a tura l Divisions in the Da ta Set
In t he hier ar chical clust er t r ee, t he dat a set may nat ur ally align it self int o
clust er s. This can be par t icular ly evident in a dendr ogr am diagr am wher e
gr oups of object s ar e densely packed in cer t ain ar eas and not in ot her s. The
inconsist ency coefficient of t he links in t he clust er t r ee can ident ify t hese point s
wher e t he similar it ies bet ween object s change. (See Evaluat ing Clust er
For mat ion on page 1-59 for mor e infor mat ion about t he inconsist ency
coefficient .) You can use t his value t o det er mine wher e t he cluster funct ion
dr aws clust er boundar ies.
0.5
1
2.5
3 5 4
2
1.5
1
2
Link 1
Lengt hs
Link 2
C l u ste r A n a l y si s
1-65
For example, if you use t he cluster funct ion t o gr oup t he sample dat a set int o
clust er s, specifying an inconsist ency coefficient t hr eshold of 0.9 as t he value of
t he cutoff ar gument , t he cluster funct ion gr oups all t he object s in t he sample
dat a set int o one clust er . In t his case, none of t he links in t he clust er hier ar chy
had an inconsist ency coefficient gr eat er t han 0.9.
T = cluster(Z,0.9)
T =
1
1
1
1
1
The cluster funct ion out put s a vect or , T, t hat is t he same size as t he or iginal
dat a set . Each element in t his vect or cont ains t he number of t he clust er int o
which t he cor r esponding object fr om t he or iginal dat a set was placed.
If you lower t he inconsist ency coefficient t hr eshold t o 0.8, t he cluster funct ion
divides t he sample dat a set int o t hr ee separ at e clust er s.
T = cluster(Z,0.8)
T =
1
3
1
2
2
This out put indicat es t hat object s 1 and 3 wer e placed in clust er 1, object s 4
and 5 wer e placed in clust er 2, and object 2 was placed in clust er 3.
Specifying Arbitra r y Clusters
Inst ead of let t ing t he cluster funct ion cr eat e clust er s det er mined by t he
nat ur al divisions in t he dat a set , you can specify t he number of clust er s you
want cr eat ed. In t his case, t he value of t he cutoff ar gument specifies t he point
in t he clust er hier ar chy at which t o cr eat e t he clust er s.
For example, you can specify t hat you want t he cluster funct ion t o divide t he
sample dat a set int o t wo clust er s. In t his case, t he cluster funct ion cr eat es one
clust er cont aining object s 1, 3, 4, and 5 and anot her clust er cont aining object 2.
1 Tu to r i a l
1-66
T = cluster(Z,2)
T =
1
2
1
1
1
To help you visualize how t he cluster funct ion det er mines how t o cr eat e t hese
clust er s, t he following figur e shows t he dendr ogr am of t he hier ar chical clust er
t r ee. When you specify a value of 2, t he cluster funct ion dr aws an imaginar y
hor izont al line acr oss t he dendr ogr am t hat bisect s t wo ver t ical lines. All t he
object s below t he line belong t o one of t hese t wo clust er s.
If you specify a cutoff value of 3, t he cluster funct ion cut s off t he hier ar chy
at a lower point , bisect ing t hr ee lines.
T = cluster(Z,3)
T =
1
3
1
2
2
0.5
1
2.5
3 5 4
2
1.5
1
2
cut off = 2
C l u ste r A n a l y si s
1-67
This t ime, object s 1 and 3 ar e gr ouped in a clust er , object s 4 and 5 ar e gr ouped
in a clust er , and object 2 is placed int o a clust er , as seen in t he following figur e.
0.5
1
2.5
3 5 4
2
1.5
1
2
cut off = 3
1 Tu to r i a l
1-68
Linear Models
Linear models r epr esent t he r elat ionship bet ween a cont inuous r esponse
var iable and one or mor e pr edict or var iables (eit her cont inuous or cat egor ical)
in t he for m
wher e:
y is an n-by-1 vect or of obser vat ions of t he r esponse var iable.
X is t he n-by-p design mat r ix det er mined by t he pr edict or s.
is a p-by-1 vect or of par amet er s.
is an n-by-1 vect or of r andom dist ur bances, independent of each ot her and
usually having a nor mal dist r ibut ion.
MATLAB uses t his gener al for m of t he linear model t o solve a var iet y of specific
r egr ession and analysis of var iance (ANOVA) pr oblems. For example, for
polynomial and mult iple r egr ession pr oblems, t he columns of X ar e pr edict or
var iable values or power s of such values. For one-way, t wo-way, and
higher -way ANOVA models, t he columns of X ar e dummy (or indicat or )
var iables t hat encode t he pr edict or cat egor ies. For analysis of covar iance
(ANOCOVA) models, X cont ains values of a cont inuous pr edict or and codes for
a cat egor ical pr edict or .
The following sect ions descr ibe a number of funct ions for fit t ing var ious t ypes
of linear models:
One-Way Analysis of Var iance (ANOVA)
Two-Way Analysis of Var iance (ANOVA)
N-Way Analysis of Var iance
Mult iple Linear Regr ession
Quadr at ic Response Sur face Models
St epwise Regr ession
Gener alized Linear Models
Robust and Nonpar amet r ic Met hods
y X + =
Li n e a r M o d e l s
1-69
See t he sect ions below for a t our of some of t he r elat ed gr aphical t ools:
The polyt ool Demo on page 1-156
The aoct ool Demo on page 1-161
The r smdemo Demo on page 1-170
One-Way Analysis of Variance (ANOVA)
The pur pose of one-way ANOVA is t o find out whet her dat a fr om sever al
gr oups have a common mean. That is, t o det er mine whet her t he gr oups ar e
act ually differ ent in t he measur ed char act er ist ic.
One-way ANOVA is a simple special case of t he linear model. The one-way
ANOVA for m of t he model is
wher e:
y
ij
is a mat r ix of obser vat ions in which each column r epr esent s a differ ent
gr oup.

.j
is a mat r ix whose columns ar e t he gr oup means. (The dot j not at ion
means t hat applies t o all r ows of t he jt h column. That is, t he value
ij
is
t he same for all i.)

ij
is a mat r ix of r andom dist ur bances.
The model posit s t hat t he columns of y ar e a const ant plus a r andom
dist ur bance. You want t o know if t he const ant s ar e all t he same.
The following sect ions explor e one-way ANOVA in gr eat er det ail:
Example: One-Way ANOVA
Mult iple Compar isons
Ex a mple: O ne- Wa y AN O VA
The dat a below comes fr om a st udy by Hogg and Ledolt er (1987) of bact er ia
count s in shipment s of milk. The columns of t he mat r ix hogg r epr esent
differ ent shipment s. The r ows ar e bact er ia count s fr om car t ons of milk chosen
r andomly fr om each shipment . Do some shipment s have higher count s t han
ot her s?
y
i j

. j

i j
+ =
1 Tu to r i a l
1-70
load hogg
hogg
hogg =
24 14 11 7 19
15 7 9 7 24
21 12 7 4 19
27 17 13 7 15
33 14 12 12 10
23 16 18 18 20
[p,tbl,stats] = anova1(hogg);
p
p =
1.1971e-04
The st andar d ANOVA t able has columns for t he sums of squar es, degr ees of
fr eedom, mean squar es (SS/df), F st at ist ic, and p-value.
You can use t he F st at ist ic t o do a hypot hesis t est t o find out if t he bact er ia
count s ar e t he same. anova1 r et ur ns t he p-value fr om t his hypot hesis t est .
In t his case t he p-value is about 0.0001, a ver y small value. This is a st r ong
indicat ion t hat t he bact er ia count s fr om t he differ ent t anker s ar e not t he same.
An F st at ist ic as ext r eme as t he obser ved F would occur by chance only once in
10,000 t imes if t he count s wer e t r uly equal.
The p-value r et ur ned by anova1 depends on assumpt ions about t he r andom
dist ur bances
ij
in t he model equat ion. For t he p-value t o be cor r ect , t hese
dist ur bances need t o be independent , nor mally dist r ibut ed, and have const ant
var iance. See Robust and Nonpar amet r ic Met hods on page 1-95 for a
nonpar amet r ic funct ion t hat does not r equir e a nor mal assumpt ion.
Li n e a r M o d e l s
1-71
You can get some gr aphical assur ance t hat t he means ar e differ ent by looking
at t he box plot s in t he second figur e window displayed by anova1.
M ultiple Compa risons
Somet imes you need t o det er mine not just if t her e ar e any differ ences among
t he means, but specifically which pair s of means ar e significant ly differ ent . It
is t empt ing t o per for m a ser ies of t t est s, one for each pair of means, but t his
pr ocedur e has a pit fall.
In a t t est , we comput e a t st at ist ic and compar e it t o a cr it ical value. The
cr it ical value is chosen so t hat when t he means ar e r eally t he same (any
appar ent differ ence is due t o r andom chance), t he pr obabilit y t hat t he t
st at ist ic will exceed t he cr it ical value is small, say 5%. When t he means ar e
differ ent , t he pr obabilit y t hat t he st at ist ic will exceed t he cr it ical value is
lar ger .
In t his example t her e ar e five means, so t her e ar e 10 pair s of means t o compar e.
It st ands t o r eason t hat if all t he means ar e t he same, and if we have a 5%
chance of incor r ect ly concluding t hat t her e is a differ ence in one pair , t hen t he
pr obabilit y of making at least one incor r ect conclusion among all 10 pair s is
much lar ger t han 5%.
For t unat ely, t her e ar e pr ocedur es known as multiple comparison procedures
t hat ar e designed t o compensat e for mult iple t est s.
1 2 3 4 5
5
10
15
20
25
30
V
a
l
u
e
s
Column Number
1 Tu to r i a l
1-72
Exa mple: Multiple Compa risons. You can per for m a mult iple compar ison t est using
t he multcompare funct ion and supplying it wit h t he stats out put fr om anova1.
[c,m] = multcompare(stats)
c =
1.0000 2.0000 2.4953 10.5000 18.5047
1.0000 3.0000 4.1619 12.1667 20.1714
1.0000 4.0000 6.6619 14.6667 22.6714
1.0000 5.0000 -2.0047 6.0000 14.0047
2.0000 3.0000 -6.3381 1.6667 9.6714
2.0000 4.0000 -3.8381 4.1667 12.1714
2.0000 5.0000 -12.5047 -4.5000 3.5047
3.0000 4.0000 -5.5047 2.5000 10.5047
3.0000 5.0000 -14.1714 -6.1667 1.8381
4.0000 5.0000 -16.6714 -8.6667 -0.6619
m =
23.8333 1.9273
13.3333 1.9273
11.6667 1.9273
9.1667 1.9273
17.8333 1.9273
The fir st out put fr om multcompare has one r ow for each pair of gr oups, wit h an
est imat e of t he differ ence in gr oup means and a confidence int er val for t hat
gr oup. For example, t he second r ow has t he values
1.0000 3.0000 4.1619 12.1667 20.1714
indicat ing t hat t he mean of gr oup 1 minus t he mean of gr oup 3 is est imat ed t o
be 12.1667, and a 95% confidence int er val for t his differ ence is
[4.1619, 20.1714]. This int er val does not cont ain 0, so we can conclude t hat t he
means of gr oups 1 and 3 ar e differ ent .
The second out put cont ains t he mean and it s st andar d er r or for each gr oup.
It is easier t o visualize t he differ ence bet ween gr oup means by looking at t he
gr aph t hat multcompare pr oduces.
Li n e a r M o d e l s
1-73
The gr aph shows t hat gr oup 1 is significant ly differ ent fr om gr oups 2, 3, and 4.
By using t he mouse t o select gr oup 4, you can det er mine t hat it is also
significant ly differ ent fr om gr oup 5. Ot her pair s ar e not significant ly differ ent .
Tw o-Way Analysis of Variance (ANOVA)
The pur pose of t wo-way ANOVA is t o find out whet her dat a fr om sever al
gr oups have a common mean. One-way ANOVA and t wo-way ANOVA differ in
t hat t he gr oups in t wo-way ANOVA have t wo cat egor ies of defining
char act er ist ics inst ead of one.
Suppose an aut omobile company has t wo fact or ies, and each fact or y makes t he
same t hr ee models of car . It is r easonable t o ask if t he gas mileage in t he car s
var ies fr om fact or y t o fact or y as well as fr om model t o model. We use t wo
pr edict or s, fact or y and model, t o explain differ ences in mileage.
Ther e could be an over all differ ence in mileage due t o a differ ence in t he
pr oduct ion met hods bet ween fact or ies. Ther e is pr obably a differ ence in t he
mileage of t he differ ent models (ir r espect ive of t he fact or y) due t o differ ences
in design specificat ions. These effect s ar e called additive.
1 Tu to r i a l
1-74
Finally, a fact or y might make high mileage car s in one model (per haps because
of a super ior pr oduct ion line), but not be differ ent fr om t he ot her fact or y for
ot her models. This effect is called an interaction. It is impossible t o det ect an
int er act ion unless t her e ar e duplicat e obser vat ions for some combinat ion of
fact or y and car model.
Two-way ANOVA is a special case of t he linear model. The t wo-way ANOVA
for m of t he model is
wher e, wit h r espect t o t he aut omobile example above:
y
ijk
is a mat r ix of gas mileage obser vat ions (wit h r ow index i, column index j,
and r epet it ion index k).
is a const ant mat r ix of t he over all mean gas mileage.

.j
is a mat r ix whose columns ar e t he deviat ions of each car s gas mileage
(fr om t he mean gas mileage ) t hat ar e at t r ibut able t o t he car s model. All
values in a given column of
.j
ar e ident ical, and t he values in each r ow of
.j

sum t o 0.

i.
is a mat r ix whose r ows ar e t he deviat ions of each car s gas mileage (fr om
t he mean gas mileage ) t hat ar e at t r ibut able t o t he car s factory. All values
in a given r ow of
i.
ar e ident ical, and t he values in each column of
i.
sum
t o 0.

ij
is a mat r ix of int er act ions. The values in each r ow of
ij
sum t o 0, and t he
values in each column of
ij
sum t o 0.

ijk
is a mat r ix of r andom dist ur bances.
The next sect ion pr ovides an example of a t wo-way analysis.
Ex a mple: Tw o- Wa y AN O VA
The pur pose of t he example is t o det er mine t he effect of car model and fact or y
on t he mileage r at ing of car s.
load mileage
mileage
y
i j k

. j

i .

i j

i j k
+ + + + =
Li n e a r M o d e l s
1-75
mileage =
33.3000 34.5000 37.4000
33.4000 34.8000 36.8000
32.9000 33.8000 37.6000
32.6000 33.4000 36.6000
32.5000 33.7000 37.0000
33.0000 33.9000 36.7000
cars = 3;
[p,tbl,stats] = anova2(mileage,cars);
p
p =
0.0000 0.0039 0.8411
Ther e ar e t hr ee models of car s (columns) and t wo fact or ies (r ows). The r eason
t her e ar e six r ows in mileage inst ead of t wo is t hat each fact or y pr ovides t hr ee
car s of each model for t he st udy. The dat a fr om t he fir st fact or y is in t he fir st
t hr ee r ows, and t he dat a fr om t he second fact or y is in t he last t hr ee r ows.
The st andar d ANOVA t able has columns for t he sums of squar es,
degr ees-of-fr eedom, mean squar es (SS/df), F st at ist ics, and p-values.
You can use t he F st at ist ics t o do hypot heses t est s t o find out if t he mileage is
t he same acr oss models, fact or ies, and model-fact or y pair s (aft er adjust ing for
t he addit ive effect s). anova2 r et ur ns t he p-value fr om t hese t est s.
The p-value for t he model effect is zer o t o four decimal places. This is a st r ong
indicat ion t hat t he mileage var ies fr om one model t o anot her . An F st at ist ic as
ext r eme as t he obser ved F would occur by chance less t han once in 10,000 t imes
if t he gas mileage wer e t r uly equal fr om model t o model. If you used t he
1 Tu to r i a l
1-76
multcompare funct ion t o per for m a mult iple compar ison t est , you would find
t hat each pair of t he t hr ee models is significant ly differ ent .
The p-value for t he fact or y effect is 0.0039, which is also highly significant .
This indicat es t hat one fact or y is out -per for ming t he ot her in t he gas mileage
of t he car s it pr oduces. The obser ved p-value indicat es t hat an F st at ist ic as
ext r eme as t he obser ved F would occur by chance about four out of 1000 t imes
if t he gas mileage wer e t r uly equal fr om fact or y t o fact or y.
Ther e does not appear t o be any int er act ion bet ween fact or ies and models. The
p-value, 0.8411, means t hat t he obser ved r esult is quit e likely (84 out 100
t imes) given t hat t her e is no int er act ion.
The p-values r et ur ned by anova2 depend on assumpt ions about t he r andom
dist ur bances
ijk
in t he model equat ion. For t he p-values t o be cor r ect t hese
dist ur bances need t o be independent , nor mally dist r ibut ed, and have const ant
var iance. See Robust and Nonpar amet r ic Met hods on page 1-95 for
nonpar amet r ic met hods t hat do not r equir e a nor mal dist r ibut ion.
In addit ion, anova2 r equir es t hat dat a be balanced, which in t his case means
t her e must be t he same number of car s for each combinat ion of model and
fact or y. The next sect ion discusses a funct ion t hat suppor t s unbalanced dat a
wit h any number of pr edict or s.
N-Way Analysis of Variance
You can use N-way ANOVA t o det er mine if t he means in a set of dat a differ
when gr ouped by mult iple fact or s. If t hey do differ , you can det er mine which
fact or s or combinat ions of fact or s ar e associat ed wit h t he differ ence.
N-way ANOVA is a gener alizat ion of t wo-way ANOVA. For t hr ee fact or s, t he
model can be wr it t en
In t his not at ion par amet er s wit h t wo subscr ipt s, such as ()
ij.
, r epr esent t he
int er act ion effect of t wo fact or s. The par amet er ()
ijk
r epr esent s t he
t hr ee-way int er act ion. An ANOVA model can have t he full set of par amet er s or
any subset , but convent ionally it does not include complex int er act ion t er ms
unless it also includes all simpler t er ms for t hose fact or s. For example, one
would gener ally not include t he t hr ee-way int er act ion wit hout also including
all t wo-way int er act ions.
y
i j k l

. j .

i . .

. . k
( )
i j .
( )
i . k
( )
. j k
( )
i j k
+ + + +
i j k l
+ + + + =
Li n e a r M o d e l s
1-77
The anovan funct ion per for ms N-way ANOVA. Unlike t he anova1 and anova2
funct ions, anovan does not expect dat a in a t abular for m. Inst ead, it expect s a
vect or of r esponse measur ement s and a separ at e vect or (or t ext ar r ay)
cont aining t he values cor r esponding t o each fact or . This input dat a for mat is
mor e convenient t han mat r ices when t her e ar e mor e t han t wo fact or s or when
t he number of measur ement s per fact or combinat ion is not const ant .
The following examples explor e anovan in gr eat er det ail:
Example: N-Way ANOVA wit h Small Dat a Set
Example: N-Way ANOVA wit h Lar ge Dat a Set
Ex a mple: N - Wa y AN O VA w ith Sma ll Da ta Set
Consider t he following t wo-way example using anova2.
m = [23 15 20;27 17 63;43 3 55;41 9 90]
m =
23 15 20
27 17 63
43 3 55
41 9 90
anova2(m,2)
ans =
0.0197 0.2234 0.2663
The fact or infor mat ion is implied by t he shape of t he mat r ix m and t he number
of measur ement s at each fact or combinat ion (2). Alt hough anova2 does not
act ually r equir e ar r ays of fact or values, for illust r at ive pur poses we could
cr eat e t hem as follows.
cfactor = repmat(1:3,4,1)
cfactor =
1 2 3
1 2 3
1 2 3
1 2 3
1 Tu to r i a l
1-78
rfactor = [ones(2,3); 2*ones(2,3)]
rfactor =
1 1 1
1 1 1
2 2 2
2 2 2
The cfactor mat r ix shows t hat each column of m r epr esent s a differ ent level of
t he column fact or . The rfactor mat r ix shows t hat t he t op t wo r ows of m
r epr esent one level of t he r ow fact or , and bot t om t wo r ows of m r epr esent a
second level of t he r ow fact or . In ot her wor ds, each value m(i,j) r epr esent s an
obser vat ion at column fact or level cfactor(i,j) and r ow fact or level
cfactor(i,j).
To solve t he above pr oblem wit h anovan, we need t o r eshape t he mat r ices m,
cfactor, and rfactor t o be vect or s.
m = m(:);
cfactor = cfactor(:);
rfactor = rfactor(:);
[m cfactor rfactor]
ans =
23 1 1
27 1 1
43 1 2
41 1 2
15 2 1
17 2 1
3 2 2
9 2 2
20 3 1
63 3 1
55 3 2
90 3 2
Li n e a r M o d e l s
1-79
anovan(m,{cfactor rfactor},2)
ans =
0.0197
0.2234
0.2663
Ex a mple: N - Wa y AN O VA w ith La rge Da ta Set
In t he pr evious example we used anova2 t o st udy a small dat a set measur ing
car mileage. Now we st udy a lar ger set of car dat a wit h mileage and ot her
infor mat ion on 406 car s made bet ween 1970 and 1982. Fir st we load t he dat a
set and look at t he var iable names.
load carbig
whos
Name Size Bytes Class
Acceleration 406x1 3248 double array
Cylinders 406x1 3248 double array
Displacement 406x1 3248 double array
Horsepower 406x1 3248 double array
MPG 406x1 3248 double array
Model 406x36 29232 char array
Model_Year 406x1 3248 double array
Origin 406x7 5684 char array
Weight 406x1 3248 double array
cyl4 406x5 4060 char array
org 406x7 5684 char array
when 406x5 4060 char array
We will focus our at t ent ion on four var iables. MPG is t he number of miles per
gallon for each of 406 car s (t hough some have missing values coded as NaN). The
ot her t hr ee var iables ar e fact or s: cyl4 (four -cylinder car or not ), org (car
or iginat ed in Eur ope, J apan, or t he USA), and when (car was built ear ly in t he
per iod, in t he middle of t he per iod, or lat e in t he per iod).
1 Tu to r i a l
1-80
Fir st we fit t he full model, r equest ing up t o t hr ee-way int er act ions and Type 3
sums-of-squar es.
varnames = {'Origin';'4Cyl';'MfgDate'};
anovan(MPG,{org cyl4 when},3,3,varnames)
ans =
0.0000
NaN
0
0.7032
0.0001
0.2072
0.6990
Not e t hat many t er ms ar e mar ked by a # symbol as not having full r ank, and
one of t hem has zer o degr ees of fr eedom and is missing a p-value. This can
happen when t her e ar e missing fact or combinat ions and t he model has
higher -or der t er ms. In t his case, t he cr oss-t abulat ion below shows t hat t her e
ar e no car s made in Eur ope dur ing t he ear ly par t of t he per iod wit h ot her t han
four cylinder s, as indicat ed by t he 0 in table(2,1,1).
[table,factorvals] = crosstab(org,when,cyl4)
table(:,:,1) =
82 75 25
0 4 3
3 3 4
Li n e a r M o d e l s
1-81
table(:,:,2) =
12 22 38
23 26 17
12 25 32
factorvals =
'USA' 'Early' 'Other'
'Europe' 'Mid' 'Four'
'Japan' 'Late' []
Consequent ly it is impossible t o est imat e t he t hr ee-way int er act ion effect s, and
including t he t hr ee-way int er act ion t er m in t he model makes t he fit singular .
Using even t he limit ed infor mat ion available in t he ANOVA t able, we can see
t hat t he t hr ee-way int er act ion has a p-value of 0.699, so it is not significant . We
decide t o r equest only t wo-way int er act ions t his t ime.
[p,tbl,stats,termvec] = anovan(MPG,{org cyl4 when},2,3,varnames);
termvec'
ans =
1 2 4 3 5 6
Now all t er ms ar e est imable. The p-values for int er act ion t er m 4
(Origin*4Cyl) and int er act ion t er m 6 (4Cyl*MfgDate) ar e much lar ger t han a
t ypical cut off value of 0.05, indicat ing t hese t er ms ar e not significant . We could
choose t o omit t hese t er ms and pool t heir effect s int o t he er r or t er m. The out put
termvec var iable r et ur ns a vect or of codes, each of which is a bit pat t er n
r epr esent ing a t er m. We can omit t er ms fr om t he model by delet ing t heir
1 Tu to r i a l
1-82
ent r ies fr om termvec and r unning anovan again, t his t ime supplying t he
r esult ing vect or as t he model ar gument .
termvec([4 6]) = []
termvec =
1
2
4
5
anovan(MPG,{org cyl4 when},termvec,3,varnames)
Now we have a mor e par simonious model indicat ing t hat t he mileage of t hese
car s seems t o be r elat ed t o all t hr ee fact or s, and t hat t he effect of t he
manufact ur ing dat e depends on wher e t he car was made.
Multiple Linear Regression
The pur pose of mult iple linear r egr ession is t o est ablish a quant it at ive
r elat ionship bet ween a gr oup of pr edict or var iables (t he columns of X) and a
r esponse, y. This r elat ionship is useful for :
Under st anding which pr edict or s have t he gr eat est effect .
Knowing t he dir ect ion of t he effect (i.e., incr easing x incr eases/decr eases y).
Using t he model t o pr edict fut ur e values of t he r esponse when only t he
pr edict or s ar e cur r ent ly known.
The following sect ions explain mult iple linear r egr ession in gr eat er det ail:
Mat hemat ical Foundat ions of Mult iple Linear Regr ession
Example: Mult iple Linear Regr ession
Li n e a r M o d e l s
1-83
M a thema tica l Founda tions of M ultiple Linea r Regression
The linear model t akes it s common for m
wher e:
y is an n-by-1 vect or of obser vat ions.
X is an n-by-p mat r ix of r egr essor s.
is a p-by-1 vect or of par amet er s.
is an n-by-1 vect or of r andom dist ur bances.
The solut ion t o t he pr oblem is a vect or , b, which est imat es t he unknown vect or
of par amet er s, . The least squar es solut ion is
This equat ion is useful for developing lat er st at ist ical for mulas, but has poor
numer ic pr oper t ies. regress uses QR decomposit ion of X followed by t he
backslash oper at or t o comput e b. The QR decomposit ion is not necessar y for
comput ing b, but t he mat r ix R is useful for comput ing confidence int er vals.
You can plug b back int o t he model for mula t o get t he pr edict ed y values at t he
dat a point s.
St at ist icians use a hat (cir cumflex) over a let t er t o denot e an est imat e of a
par amet er or a pr edict ion fr om a model. The pr oject ion mat r ix H is called t he
hat matrix, because it put s t he hat on y.
The r esiduals ar e t he differ ence bet ween t he obser ved and pr edict ed y values.
The r esiduals ar e useful for det ect ing failur es in t he model assumpt ions, since
t hey cor r espond t o t he er r or s, , in t he model equat ion. By assumpt ion, t hese
er r or s each have independent nor mal dist r ibut ions wit h mean zer o and a
const ant var iance.
y X + =
b

X
T
X ( )
1
X
T
y = =
y Xb Hy = =
H X X
T
X ( )
1
X
T
=
r y y = I H ( )y =
1 Tu to r i a l
1-84
The r esiduals, however , ar e cor r elat ed and have var iances t hat depend on t he
locat ions of t he dat a point s. It is a common pr act ice t o scale (St udent ize) t he
r esiduals so t hey all have t he same var iance.
In t he equat ion below, t he scaled r esidual, t
i
, has a St udent s t dist r ibut ion
wit h (n-p-1) degr ees of fr eedom
wher e
and:
t
i
is t he scaled r esidual for t he it h dat a point .
r
i
is t he r aw r esidual for t he it h dat a point .
n is t he sample size.
p is t he number of par amet er s in t he model.
h
i
is t he it h diagonal element of H.
The left -hand side of t he second equat ion is t he est imat e of t he var iance of t he
er r or s excluding t he it h dat a point fr om t he calculat ion.
A hypot hesis t est for out lier s involves compar ing t
i
wit h t he cr it ical values of
t he t dist r ibut ion. If t
i
is lar ge, t his cast s doubt on t he assumpt ion t hat t his
r esidual has t he same var iance as t he ot her s.
A confidence int er val for t he mean of each er r or is
Confidence int er vals t hat do not include zer o ar e equivalent t o r eject ing t he
hypot hesis (at a significance pr obabilit y of ) t hat t he r esidual mean is zer o.
Such confidence int er vals ar e good evidence t hat t he obser vat ion is an out lier
for t he given model.
t
i
r
i

i ( )
1 h
i

---------------------------- =

2
i ( )
r
2
n p 1
----------------------
r
i
2
n p 1 ( ) 1 h
i
( )
----------------------------------------------- =
c
i
r
i
t
1

2
--- ,
,
_
t

i ( )
1 h
i
=
Li n e a r M o d e l s
1-85
Ex a mple: M ultiple Linea r Regression
The example comes fr om Chat t er jee and Hadi (1986) in a paper on r egr ession
diagnost ics. The dat a set (or iginally fr om Moor e (1975)) has five pr edict or
var iables and one r esponse.
load moore
X = [ones(size(moore,1),1) moore(:,1:5)];
Mat r ix X has a column of ones, and t hen one column of values for each of t he
five pr edict or var iables. The column of ones is necessar y for est imat ing t he
y-int er cept of t he linear model.
y = moore(:,6);
[b,bint,r,rint,stats] = regress(y,X);
The y-int er cept is b(1), which cor r esponds t o t he column index of t he column
of ones.
stats
stats =
0.8107 11.9886 0.0001
The element s of t he vect or stats ar e t he r egr ession R
2
st at ist ic, t he F st at ist ic
(for t he hypot hesis t est t hat all t he r egr ession coefficient s ar e zer o), and t he
p-value associat ed wit h t his F st at ist ic.
R
2
is 0.8107 indicat ing t he model account s for over 80% of t he var iabilit y in t he
obser vat ions. The F st at ist ic of about 12 and it s p-value of 0.0001 indicat e t hat
it is highly unlikely t hat all of t he r egr ession coefficient s ar e zer o.
rcoplot(r,rint)
0 5 10 15 20
-0.5
0
0.5
R
e
s
i
d
u
a
l
s
Case Number
1 Tu to r i a l
1-86
The plot shows t he r esiduals plot t ed in case or der (by r ow). The 95% confidence
int er vals about t hese r esiduals ar e plot t ed as er r or bar s. The fir st obser vat ion
is an out lier since it s er r or bar does not cr oss t he zer o r efer ence line.
In pr oblems wit h just a single pr edict or , it is simpler t o use t he polytool
funct ion (see The polyt ool Demo on page 1-156). This funct ion can for m an
X mat r ix wit h pr edict or values, t heir squar es, t heir cubes, and so on.
Quadratic Response Surface Models
Response Sur face Met hodology (RSM) is a t ool for under st anding t he
quant it at ive r elat ionship bet ween mult iple input var iables and one out put
var iable.
Consider one out put , z, as a polynomial funct ion of t wo input s, x and y. The
funct ion z = f(x,y) descr ibes a t wo-dimensional sur face in t he space (x,y,z). Of
cour se, you can have as many input var iables as you want and t he r esult ing
sur face becomes a hyper sur face. You can have mult iple out put var iables wit h
a separ at e hyper sur face for each one.
For t hr ee input s (x
1
, x
2
, x
3
), t he equat ion of a quadr at ic r esponse sur face is
It is difficult t o visualize a k-dimensional sur face in k+1 dimensional space
for k>2. The funct ion rstool is a gr aphical user int er face (GUI) designed t o
make t his visualizat ion mor e int uit ive, as is discussed in t he next sect ion.
Ex ploring Gra phs of M ultidimensiona l Polynomia ls
The funct ion rstool is useful for fit t ing r esponse sur face models. The pur pose
of rstool is lar ger t han just fit t ing and pr edict ion for polynomial models. This
GUI pr ovides an envir onment for explor at ion of t he gr aph of a
mult idimensional polynomial.
You can lear n about rstool by t r ying t he commands below. The chemist r y
behind t he dat a in reaction.mat deals wit h r eact ion kinet ics as a funct ion of
y b
0
b
1
x
1
b
2
x
2
b
3
x
3

b
12
x
1
x
2
b
13
x
1
x
3
b
23
x
2
x
3

b
11
x
1
2
b
22
x
2
2
b
33
x
3
2
+ + + +
+ + + +
+ + +
= (linear t er ms)
(int er act ion t er ms)
(quadr at ic t er ms)
Li n e a r M o d e l s
1-87
t he par t ial pr essur e of t hr ee chemical r eact ant s: hydr ogen, n-pent ane, and
isopent ane.
load reaction
rstool(reactants,rate,'quadratic',0.01,xn,yn)
You will see a vect or of t hr ee plot s. The dependent var iable of all t hr ee plot s
is t he r eact ion r at e. The fir st plot has hydr ogen as t he independent var iable.
The second and t hir d plot s have n-pent ane and isopent ane r espect ively.
Each plot shows t he fit t ed r elat ionship of t he r eact ion r at e t o t he independent
var iable at a fixed value of t he ot her t wo independent var iables. The fixed
value of each independent var iable is in an edit able t ext box below each axis.
You can change t he fixed value of any independent var iable by eit her t yping a
new value in t he box or by dr agging any of t he t hr ee ver t ical lines t o a new
posit ion.
When you change t he value of an independent var iable, all t he plot s updat e t o
show t he cur r ent pict ur e at t he new point in t he space of t he independent
var iables.
Not e t hat while t his example only uses t hr ee input s (r eact ant s) and one out put
(r at e), rstool can accommodat e an ar bit r ar y number of input s and out put s.
Int er pr et abilit y may be limit ed by t he size of t he monit or for lar ge number s of
input s or out put s.
The GUI also has t wo pop-up menus. The Export menu facilit at es saving
var ious impor t ant var iables in t he GUI t o t he base wor kspace. Below t he
Export menu t her e is anot her menu t hat allows you t o change t he or der of t he
polynomial model fr om wit hin t he GUI. If you used t he commands above, t his
menu will have t he st r ing Full Quadrati c. Ot her choices ar e:
Li near has t he const ant and fir st or der t er ms only.
Pure Quadrati c includes const ant , linear and squar ed t er ms.
Interacti ons includes const ant , linear , and cr oss pr oduct t er ms.
The rstool GUI is used by t he rsmdemo funct ion t o visualize t he r esult s of a
designed exper iment for st udying a chemical r eact ion. See The r smdemo
Demo on page 1-170.
1 Tu to r i a l
1-88
Stepw ise Regression
St epwise r egr ession is a t echnique for choosing t he var iables t o include in a
mult iple r egr ession model. For war d st epwise r egr ession st ar t s wit h no model
t er ms. At each st ep it adds t he most st at ist ically significant t er m (t he one wit h
t he highest F st at ist ic or lowest p-value) unt il t her e ar e none left . Backwar d
st epwise r egr ession st ar t s wit h all t he t er ms in t he model and r emoves t he
least significant t er ms unt il all t he r emaining t er ms ar e st at ist ically
significant . It is also possible t o st ar t wit h a subset of all t he t er ms and t hen
add significant t er ms or r emove insignificant t er ms.
An impor t ant assumpt ion behind t he met hod is t hat some input var iables in a
mult iple r egr ession do not have an impor t ant explanat or y effect on t he
r esponse. If t his assumpt ion is t r ue, t hen it is a convenient simplificat ion t o
keep only t he st at ist ically significant t er ms in t he model.
One common pr oblem in mult iple r egr ession analysis is mult icollinear it y of t he
input var iables. The input var iables may be as cor r elat ed wit h each ot her as
t hey ar e wit h t he r esponse. If t his is t he case, t he pr esence of one input var iable
in t he model may mask t he effect of anot her input . St epwise r egr ession used as
a canned pr ocedur e is a danger ous t ool because t he r esult ing model may
include differ ent var iables depending on t he choice of st ar t ing model and
inclusion st r at egy.
The following example explor es an int er act ive t ool for st epwise r egr ession.
Ex a mple: Stepw ise Regression
The St at ist ics Toolbox pr ovides an int er act ive gr aphical user int er face (GUI) t o
make compar ison of compet ing models mor e under st andable. You can explor e
t he GUI using t he Hald (1960) dat a set . Her e ar e t he commands t o get st ar t ed.
load hald
stepwise(ingredients,heat)
The Hald dat a come fr om a st udy of t he heat of r eact ion of var ious cement
mixt ur es. Ther e ar e four component s in each mixt ur e, and t he amount of heat
pr oduced depends on t he amount of each ingr edient in t he mixt ur e.
Li n e a r M o d e l s
1-89
The int er face consist s of t hr ee int er act ively linked figur e windows. Two of
t hese ar e discussed in t he following sect ions:
St epwise Regr ession Plot
St epwise Regr ession Diagnost ics Table
All t hr ee windows have hot r egions. When your mouse is above one of t hese
r egions, t he point er changes fr om an ar r ow t o a cir cle. Clicking on t his point
init iat es some act ivit y in t he int er face.
Stepw ise Regression Plot
This plot shows t he r egr ession coefficient and confidence int er val for ever y
t er m (in or out of t he model). The gr een lines r epr esent t er ms in t he model
while r ed lines indicat e t er ms t hat ar e not cur r ent ly in t he model.
St at ist ically significant t er ms ar e solid lines. Dot t ed lines show t hat t he fit t ed
coefficient is not significant ly differ ent fr om zer o.
Clicking on a line in t his plot t oggles it s st at e. That is, a t er m cur r ent ly in t he
model (gr een line) is r emoved (t ur ns r ed), and a t er m cur r ent ly not in t he model
(r ed line) is added (t ur ns gr een).
The coefficient for a t er m out of t he model is t he coefficient r esult ing fr om
adding t hat t er m t o t he cur r ent model.
Sca le Inputs. Pr essing t his but t on cent er s and nor malizes t he columns of t he
input mat r ix t o have a st andar d deviat ion of one.
Expor t. This pop-up menu allows you t o expor t var iables fr om t he st epwise
funct ion t o t he base wor kspace.
Close. The Close but t on r emoves all t he figur e windows.
1 Tu to r i a l
1-90
Stepw ise Regression Dia gnostics Ta ble
This t able is a quant it at ive view of t he infor mat ion in t he St epwise Regr ession
Plot . The t able shows t he Hald model wit h t he second and t hir d t er ms r emoved.
Coefficients a nd Confidence Inter va ls. The t able at t he t op of t he figur e shows t he
r egr ession coefficient and confidence int er val for ever y t er m (in or out of t he
model.) The gr een r ows in t he t able (on your monit or ) r epr esent t er ms in t he
model while r ed r ows indicat e t er ms not cur r ent ly in t he model.
Clicking on a r ow in t his t able t oggles t he st at e of t he cor r esponding t er m. That
is, a t er m cur r ent ly in t he model (gr een r ow) is r emoved (t ur ns r ed), and a t er m
cur r ent ly not in t he model (r ed r ow) is added t o t he model (t ur ns gr een).
The coefficient for a t er m out of t he model is t he coefficient r esult ing fr om
adding t hat t er m t o t he cur r ent model.
Additiona l Dia gnostic Sta tistics. Ther e ar e also sever al diagnost ic st at ist ics at t he
bot t om of t he t able:
RMSE t he r oot mean squar ed er r or of t he cur r ent model.
R-squar e t he amount of r esponse var iabilit y explained by t he model.
F t he over all F st at ist ic for t he r egr ession.
P t he associat ed significance pr obabilit y.
Close Button. Shut s down all windows.
Confidence Intervals
Column #
RMSE
Parameter
R-square
Lower
F
Upper
P
1 1.44 1.02 1.86
2.734 0.9725 176.6 1.581e-08
2 0.4161 -0.1602 0.9924
2.734 0.9725 176.6 1.581e-08
3 -0.41 -1.029 0.2086
2.734 0.9725 176.6 1.581e-08
4 -0.614 -0.7615 -0.4664
2.734 0.9725 176.6 1.581e-08
Li n e a r M o d e l s
1-91
Help Button. Act ivat es online help.
Stepw ise Histor y. This plot shows t he RMSE and a confidence int er val for ever y
model gener at ed in t he cour se of t he int er act ive use of t he ot her windows.
Recrea ting a Previous Model. Clicking on one of t hese lines r ecr eat es t he cur r ent
model at t hat point in t he analysis using a new set of windows. You can t hus
compar e t he t wo candidat e models dir ect ly.
Generalized Linear Models
So far , t he funct ions in t his sect ion have dealt wit h models t hat have a linear
r elat ionship bet ween t he r esponse and one or mor e pr edict or s. Somet imes you
may have a nonlinear r elat ionship inst ead. To fit nonlinear models you can use
t he funct ions descr ibed in Nonlinear Regr ession Models on page 1-100.
Ther e ar e some nonlinear models, known as gener alized linear models, t hat
you can fit using simpler linear met hods. To under st and gener alized linear
models, fir st let s r eview t he linear models we have seen so far . Each of t hese
models has t he following t hr ee char act er ist ics:
The r esponse has a nor mal dist r ibut ion wit h mean .
A coefficient vect or b defines a linear combinat ion X*b of t he pr edict or s X.
The model equat es t he t wo as = X*b.
In gener alized linear models, t hese char act er ist ics ar e gener alized as follows:
The r esponse has a dist r ibut ion t hat may be nor mal, binomial, Poisson,
gamma, or inver se Gaussian, wit h par amet er s including a mean .
A coefficient vect or b defines a linear combinat ion X*b of t he pr edict or s X.
A link funct ion f() defines t he link bet ween t he t wo as f() = X*b.
The following example explor es t his in gr eat er det ail.
Ex a mple: Genera lized Linea r M odels
For example, consider t he following dat a der ived fr om t he carbig dat a set . We
have car s of var ious weight s, and we r ecor d t he t ot al number of car s of each
weight and t he number qualifying as poor -mileage car s because t heir miles per
gallon value is below some t ar get . (Suppose we dont know t he miles per gallon
for each car , only t he number passing t he t est .) It might be r easonable t o
1 Tu to r i a l
1-92
assume t hat t he value of t he var iable poor follows a binomial dist r ibut ion wit h
par amet er N=total and wit h a p par amet er t hat depends on t he car weight . A
plot shows t hat t he pr opor t ion of poor -mileage car s follows a nonlinear
S-shape.
w = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
[w poor total]
ans =
2100 1 48
2300 2 42
2500 0 31
2700 3 34
2900 8 31
3100 8 21
3300 14 23
3500 17 23
3700 19 21
3900 15 16
4100 17 17
4300 21 21
plot(w,poor./total,'x')
2000 2500 3000 3500 4000 4500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Li n e a r M o d e l s
1-93
This shape is t ypical of gr aphs of pr opor t ions, as t hey have nat ur al boundar ies
at 0.0 and 1.0.
A linear r egr ession model would not pr oduce a sat isfact or y fit t o t his gr aph. Not
only would t he fit t ed line not follow t he dat a point s, it would pr oduce invalid
pr opor t ions less t han 0 for light car s, and higher t han 1 for heavy car s.
Ther e is a class of r egr ession models for dealing wit h pr opor t ion dat a. The
logist ic model is one such model. It defines t he r elat ionship bet ween pr opor t ion
p and weight w t o be
Is t his a good model for our dat a? It would be helpful t o gr aph t he dat a on t his
scale, t o see if t he r elat ionship appear s linear . However , some of our
pr opor t ions ar e 0 and 1, so we cannot explicit ly evaluat e t he left -hand-side of
t he equat ion. A useful t r ick is t o comput e adjust ed pr opor t ions by adding small
incr ement s t o t he poor and total values say a half obser vat ion t o poor and
a full obser vat ion t o total. This keeps t he pr opor t ions wit hin r ange. A gr aph
now shows a mor e near ly linear r elat ionship.
padj = (poor+.5) ./ (total+1);
plot(w,log(padj./(1-padj)),'x')
We can use t he glmfit funct ion t o fit t his logist ic model.
p
1 p
------------
,
_
log b
1
b
2
w + =
2000 2500 3000 3500 4000 4500
5
4
3
2
1
0
1
2
3
4
1 Tu to r i a l
1-94
b = glmfit(w,[poor total],'binomial')
b =
-13.3801
0.0042
To use t hese coefficient s t o comput e a fit t ed pr opor t ion, we have t o inver t t he
logist ic r elat ionship. Some simple algebr a shows t hat t he logist ic equat ion can
also be wr it t en as
For t unat ely, t he funct ion glmval can decode t his link funct ion t o comput e t he
fit t ed values. Using t his funct ion we can gr aph fit t ed pr opor t ions for a r ange of
car weight s, and super impose t his cur ve on t he or iginal scat t er plot .
x = 2100:100:4500;
y = glmval(b,x,logit);
plot(w,poor./total,'x',x,y,'r-')
Gener alized linear models can fit a var iet y of dist r ibut ions wit h a var iet y of
r elat ionships bet ween t he dist r ibut ion par amet er s and t he pr edict or s. A full
descr ipt ion is beyond t he scope of t his document . For mor e infor mat ion see
Dobson (1990), or McCullagh and Nelder (1990). Also see t he r efer ence
mat er ial for glmfit.
p
1
1 b
1
b
2
w ( ) exp +
--------------------------------------------------- =
2000 2500 3000 3500 4000 4500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Li n e a r M o d e l s
1-95
Robust and Nonparametric Methods
As ment ioned in t he pr evious sect ions, r egr ession and analysis of var iance
pr ocedur es depend on cer t ain assumpt ions, such as a nor mal dist r ibut ion for
t he er r or t er m. Somet imes such an assumpt ion is not war r ant ed. For example,
if t he dist r ibut ion of t he er r or s is asymmet r ic or pr one t o ext r eme out lier s, t hat
is a violat ion of t he assumpt ion of nor mal er r or s.
The St at ist ics Toolbox has a r obust r egr ession funct ion t hat is useful when
t her e may be out lier s. Robust met hods ar e designed t o be r elat ively insensit ive
t o lar ge changes in a small par t of t he dat a.
The St at ist ics Toolbox also has nonpar amet r ic ver sions of t he one-way and
t wo-way analysis of var iance funct ions. Unlike classical t est s, nonpar amet r ic
t est s make only mild assumpt ions about t he dat a, and ar e appr opr iat e when
t he dist r ibut ion of t he dat a is not nor mal. On t he ot her hand, t hey ar e less
power ful t han classical met hods for nor mally dist r ibut ed dat a.
The following sect ions descr ibe t he r obust r egr ession and nonpar amet r ic
funct ions in gr eat er det ail:
Robust Regr ession
Kr uskal-Wallis Test
Fr iedmans Test
Bot h of t he nonpar amet r ic funct ions descr ibed her e can r et ur n a stats
st r uct ur e t hat you can use as input t o t he multcompare funct ion t o per for m
mult iple compar isons.
Robust Regression
In Example: Mult iple Linear Regr ession on page 1-85 we found an out lier
when we used or dinar y least squar es r egr ession t o model a r esponse as a
funct ion of five pr edict or s. How did t hat out lier affect t he r esult s?
Let s est imat e t he coefficient s using t he robustfit funct ion.
load moore
x = moore(:,1:5);
y = moore(:,6);
[br,statsr] = robustfit(x,y);
br
1 Tu to r i a l
1-96
br =
-1.7742
0.0000
0.0009
0.0002
0.0062
0.0001
Compar e t hese est imat es t o t hose we obt ained fr om t he regress funct ion.
b
b =
-2.1561
-0.0000
0.0013
0.0001
0.0079
0.0001
To under st and why t he t wo differ , it is helpful t o look at t he weight var iable
fr om t he r obust fit . It measur es how much weight was given t o each point
dur ing t he fit . In t his case, t he fir st point had a ver y low weight so it was
effect ively ignor ed.
statsr.w'
ans =
Columns 1 through 7
0.0577 0.9977 0.9776 0.9455 0.9687 0.8734 0.9177
Columns 8 through 14
0.9990 0.9653 0.9679 0.9768 0.9882 0.9998 0.9979
Columns 15 through 20
0.8185 0.9757 0.9875 0.9991 0.9021 0.6953
For anot her example illust r at ing r obust fit t ing, see The r obust demo Demo on
page 1-172.
Li n e a r M o d e l s
1-97
Kr uska l- Wa llis Test
In One-Way Analysis of Var iance (ANOVA) on page 1-69 we used one-way
analysis of var iance t o det er mine if t he bact er ia count s of milk var ied fr om
shipment t o shipment . Our one-way analysis r est ed on t he assumpt ion t hat t he
measur ement s wer e independent , and t hat each had a nor mal dist r ibut ion
wit h a common var iance and wit h a mean t hat was const ant in each column.
We concluded t hat t he column means wer e not all t he same. Let s r epeat t hat
analysis using a nonpar amet r ic pr ocedur e.
The Kr uskal-Wallis t est is a nonpar amet r ic ver sion of one-way analysis of
var iance. The assumpt ion behind t his t est is t hat t he measur ement s come fr om
a cont inuous dist r ibut ion, but not necessar ily a nor mal dist r ibut ion. The t est
is based on an analysis of var iance using t he r anks of t he dat a values, not t he
dat a values t hemselves. Out put includes a t able similar t o an anova t able, and
a box plot .
We can r un t his t est as follows.
p = kruskalwallis(hogg)
p =
0.0020
The low p-value means t he Kr uskal-Wallis t est r esult s agr ee wit h t he one-way
analysis of var iance r esult s.
Friedma ns Test
In Two-Way Analysis of Var iance (ANOVA) on page 1-73 we used t wo-way
analysis of var iance t o st udy t he effect of car model and fact or y on car mileage.
We t est ed whet her eit her of t hese fact or s had a significant effect on mileage,
and whet her t her e was an int er act ion bet ween t hese fact or s. We concluded
t hat t her e was no int er act ion, but t hat each individual fact or had a significant
effect . Now we will see if a nonpar amet r ic analysis will lead t o t he same
conclusion.
Fr iedmans t est is a nonpar amet r ic t est for dat a having a t wo-way layout (dat a
gr ouped by t wo cat egor ical fact or s). Unlike t wo-way analysis of var iance,
Fr iedmans t est does not t r eat t he t wo fact or s symmet r ically and it does not
t est for an int er act ion bet ween t hem. Inst ead, it is a t est for whet her t he
columns ar e differ ent aft er adjust ing for possible r ow differ ences. The t est is
based on an analysis of var iance using t he r anks of t he dat a acr oss cat egor ies
of t he r ow fact or . Out put includes a t able similar t o an anova t able.
1 Tu to r i a l
1-98
We can r un Fr iedmans t est as follows.
p = friedman(mileage, 3)
ans =
7.4659e-004
Recall t he classical analysis of var iance gave a p-value t o t est column effect s,
r ow effect s, and int er act ion effect s. This p-value is for column effect s. Using
eit her t his p-value or t he p-value fr om ANOVA (p < 0.0001), we conclude t hat
t her e ar e significant column effect s.
In or der t o t est for r ow effect s, we need t o r ear r ange t he dat a t o swap t he r oles
of t he r ows in columns. For a dat a mat r ix x wit h no r eplicat ions, we could
simply t r anspose t he dat a and t ype
p = friedman(x)
Wit h r eplicat ed dat a it is slight ly mor e complicat ed. A simple way is t o
t r ansfor m t he mat r ix int o a t hr ee-dimensional ar r ay wit h t he fir st dimension
r epr esent ing t he r eplicat es, swapping t he ot her t wo dimensions, and r est or ing
t he t wo-dimensional shape.
x = reshape(mileage, [3 2 3]);
x = permute(x, [1 3 2]);
x = reshape(x, [9 2])
x =
33.3000 32.6000
33.4000 32.5000
32.9000 33.0000
34.5000 33.4000
34.8000 33.7000
33.8000 33.9000
37.4000 36.6000
36.8000 37.0000
37.6000 36.7000
friedman(x, 3)
ans =
0.0082
Li n e a r M o d e l s
1-99
Again, t he conclusion is similar t o t he conclusion fr om t he classical analysis of
var iance. Bot h t his p-value and t he one fr om ANOVA (p = 0.0039) lead us t o
conclude t her e ar e significant r ow effect s.
You cannot use Fr iedmans t est t o t est for int er act ions bet ween t he r ow and
column fact or s.
1 Tu to r i a l
1-100
Nonlinear Regression Models
Response Sur face Met hodology (RSM) is an empir ical modeling appr oach using
polynomials as local appr oximat ions t o t he t r ue input /out put r elat ionship. This
empir ical appr oach is oft en adequat e for pr ocess impr ovement in an indust r ial
set t ing.
In scient ific applicat ions t her e is usually r elevant t heor y t hat allows us t o
make a mechanist ic model. Oft en such models ar e nonlinear in t he unknown
par amet er s. Nonlinear models ar e mor e difficult t o fit , r equir ing it er at ive
met hods t hat st ar t wit h an init ial guess of t he unknown par amet er s. Each
it er at ion alt er s t he cur r ent guess unt il t he algor it hm conver ges.
The St at ist ics Toolbox has funct ions for fit t ing nonlinear models of t he for m
wher e:
y is an-n by-1 vect or of obser vat ions.
f is any funct ion of X and .
X is an n-by-p mat r ix of input var iables.
is a p-by-1 vect or of unknown par amet er s t o be est imat ed.
is an n-by-1 vect or of r andom dist ur bances.
This is explor ed fur t her in t he following example.
Ex ample: Nonlinear Modeling
The Hougen-Wat son model (Bat es and Wat t s 1988) for r eact ion kinet ics is one
specific example of t his t ype. The for m of t he model is
wher e
1
,
2
, ...,
5
ar e t he unknown par amet er s, and x
1
, x
2
, and x
3
ar e t he
t hr ee input var iables. The t hr ee input s ar e hydr ogen, n-pent ane, and
isopent ane. It is easy t o see t hat t he par amet er s do not ent er t he model
linear ly.
y f X , ( ) + =
rat e

1
x
2
x
3

5

1
2
x
1

3
x
2

4
x
3
+ + +
------------------------------------------------------------------------ =
N o n l i n e a r Re g r e ssi o n M o d e l s
1-101
The file reaction.mat cont ains simulat ed dat a fr om t his r eact ion.
load reaction
who
Your variables are:
beta rate xn
model reactants yn
The var iables ar e as follows:
rate is a 13-by-1 vect or of obser ved r eact ion r at es.
reactants is a 13-by-3 mat r ix of r eact ant s.
beta is 5-by-1 vect or of init ial par amet er est imat es.
model is a st r ing cont aining t he nonlinear funct ion name.
xn is a st r ing mat r ix of t he names of t he r eact ant s.
yn is a st r ing cont aining t he name of t he r esponse.
The dat a and model ar e explor ed fur t her in t he following sect ions:
Fit t ing t he Hougen-Wat son Model
Confidence Int er vals on t he Par amet er Est imat es
Confidence Int er vals on t he Pr edict ed Responses
An Int er act ive GUI for Nonlinear Fit t ing and Pr edict ion
Fitting the Hougen- Wa tson M odel
The St at ist ics Toolbox pr ovides t he funct ion nlinfit for finding par amet er
est imat es in nonlinear modeling. nlinfit r et ur ns t he least squar es par amet er
est imat es. That is, it finds t he par amet er s t hat minimize t he sum of t he
squar ed differ ences bet ween t he obser ved r esponses and t heir fit t ed values. It
uses t he Gauss-Newt on algor it hm wit h Levenber g-Mar quar dt modificat ions
for global conver gence.
nlinfit r equir es t he input dat a, t he r esponses, and an init ial guess of t he
unknown par amet er s. You must also supply t he name of a funct ion t hat t akes
t he input dat a and t he cur r ent par amet er est imat e and r et ur ns t he pr edict ed
r esponses. In MATLAB t er minology, nlinfit is called a funct ion funct ion.
1 Tu to r i a l
1-102
Her e is t he hougen funct ion.
function yhat = hougen(beta,x)
%HOUGEN Hougen-Watson model for reaction kinetics.
% YHAT = HOUGEN(BETA,X) gives the predicted values of the
% reaction rate, YHAT, as a function of the vector of
% parameters, BETA, and the matrix of data, X.
% BETA must have five elements and X must have three
% columns.
%
% The model form is:
% y = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3)
b1 = beta(1);
b2 = beta(2);
b3 = beta(3);
b4 = beta(4);
b5 = beta(5);
x1 = x(:,1);
x2 = x(:,2);
x3 = x(:,3);
yhat = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3);
To fit t he reaction dat a, call t he funct ion nlinfit.
load reaction
betahat = nlinfit(reactants,rate,'hougen',beta)
betahat =
1.2526
0.0628
0.0400
0.1124
1.1914
nlinfit has t wo opt ional out put s. They ar e t he r esiduals and J acobian mat r ix
at t he solut ion. The r esiduals ar e t he differ ences bet ween t he obser ved and
fit t ed r esponses. The J acobian mat r ix is t he dir ect analog of t he mat r ix X in t he
st andar d linear r egr ession model.
N o n l i n e a r Re g r e ssi o n M o d e l s
1-103
These out put s ar e useful for obt aining confidence int er vals on t he par amet er
est imat es and pr edict ed r esponses.
Confidence Inter va ls on the Pa ra meter Estima tes
Using nlparci, for m 95% confidence int er vals on t he par amet er est imat es,
betahat, fr om t he r eact ion kinet ics example.
[betahat,resid,J] = nlinfit(reactants,rate,'hougen',beta);
betaci = nlparci(betahat,resid,J)
betaci =
-0.7467 3.2519
-0.0377 0.1632
-0.0312 0.1113
-0.0609 0.2857
-0.7381 3.1208
Confidence Inter va ls on the Predicted Responses
Using nlpredci, for m 95% confidence int er vals on t he pr edict ed r esponses
fr om t he r eact ion kinet ics example.
[yhat,delta] = nlpredci('hougen',reactants,betahat,resid,J);
opd = [rate yhat delta]
opd =
8.5500 8.2937 0.9178
3.7900 3.8584 0.7244
4.8200 4.7950 0.8267
0.0200 -0.0725 0.4775
2.7500 2.5687 0.4987
14.3900 14.2227 0.9666
2.5400 2.4393 0.9247
4.3500 3.9360 0.7327
13.0000 12.9440 0.7210
8.5000 8.2670 0.9459
0.0500 -0.1437 0.9537
11.3200 11.3484 0.9228
3.1300 3.3145 0.8418
1 Tu to r i a l
1-104
Mat r ix opd has t he obser ved r at es in column 1 and t he pr edict ions in column 2.
The 95% confidence int er val is column 2tcolumn 3. These ar e simult aneous
confidence int er vals for t he est imat ed funct ion at each input value. They ar e
not int er vals for new r esponse obser vat ions at t hose input s, even t hough most
of t he confidence int er vals do cont ain t he or iginal obser vat ions.
An Intera ctive GUI for N onlinea r Fitting a nd Prediction
The funct ion nlintool for nonlinear models is a dir ect analog of rstool for
polynomial models. nlintool calls nlinfit and r equir es t he same input s.
The pur pose of nlintool is lar ger t han just fit t ing and pr edict ion for nonlinear
models. This GUI pr ovides an envir onment for explor at ion of t he gr aph of a
mult idimensional nonlinear funct ion.
If you have alr eady loaded reaction.mat, you can st ar t nlintool.
nlintool(reactants,rate,'hougen',beta,0.01,xn,yn)
You will see a vect or of t hr ee plot s. The dependent var iable of all t hr ee plot s
is t he r eact ion r at e. The fir st plot has hydr ogen as t he independent var iable.
The second and t hir d plot s have n-pent ane and isopent ane r espect ively.
Each plot shows t he fit t ed r elat ionship of t he r eact ion r at e t o t he independent
var iable at a fixed value of t he ot her t wo independent var iables. The fixed
value of each independent var iable is in an edit able t ext box below each axis.
You can change t he fixed value of any independent var iable by eit her t yping a
new value in t he box or by dr agging any of t he t hr ee ver t ical lines t o a new
posit ion.
When you change t he value of an independent var iable, all t he plot s updat e t o
show t he cur r ent pict ur e at t he new point in t he space of t he independent
var iables.
Not e t hat while t his example only uses t hr ee r eact ant s, nlintool can
accommodat e an ar bit r ar y number of independent var iables. Int er pr et abilit y
may be limit ed by t he size of t he monit or for lar ge number s of input s.
H y p o th e si s Te sts
1-105
Hypothesis Tests
A hypot hesis t est is a pr ocedur e for det er mining if an asser t ion about a
char act er ist ic of a populat ion is r easonable.
For example, suppose t hat someone says t hat t he aver age pr ice of a gallon of
r egular unleaded gas in Massachuset t s is $1.15. How would you decide
whet her t his st at ement is t r ue? You could t r y t o find out what ever y gas st at ion
in t he st at e was char ging and how many gallons t hey wer e selling at t hat pr ice.
That appr oach might be definit ive, but it could end up cost ing mor e t han t he
infor mat ion is wor t h.
A simpler appr oach is t o find out t he pr ice of gas at a small number of r andomly
chosen st at ions ar ound t he st at e and compar e t he aver age pr ice t o $1.15.
Of cour se, t he aver age pr ice you get will pr obably not be exact ly $1.15 due t o
var iabilit y in pr ice fr om one st at ion t o t he next . Suppose your aver age pr ice
was $1.18. Is t his t hr ee cent differ ence a r esult of chance var iabilit y, or is t he
or iginal asser t ion incor r ect ? A hypot hesis t est can pr ovide an answer .
The following sect ions pr ovide an over view of hypot hesis t est ing wit h t he
St at ist ics Toolbox:
Hypot hesis Test Ter minology
Hypot hesis Test Assumpt ions
Example: Hypot hesis Test ing
Available Hypot hesis Test s
Hypothesis Test Terminology
To get st ar t ed, t her e ar e some t er ms t o define and assumpt ions t o make:
The null hypothesis is t he or iginal asser t ion. In t his case t he null hypot hesis
is t hat t he aver age pr ice of a gallon of gas is $1.15. The not at ion is
H
0
: = 1.15.
Ther e ar e t hr ee possibilit ies for t he alternative hypothesis. You might only be
int er est ed in t he r esult if gas pr ices wer e act ually higher . In t his case, t he
alt er nat ive hypot hesis is H
1
: > 1.15. The ot her possibilit ies ar e H
1
: < 1.15
and H
1
: 1.15.
The significance level is r elat ed t o t he degr ee of cer t aint y you r equir e in or der
t o r eject t he null hypot hesis in favor of t he alt er nat ive. By t aking a small
1 Tu to r i a l
1-106
sample you cannot be cer t ain about your conclusion. So you decide in
advance t o r eject t he null hypot hesis if t he pr obabilit y of obser ving your
sampled r esult is less t han t he significance level. For a t ypical significance
level of 5%, t he not at ion is = 0.05. For t his significance level, t he
pr obabilit y of incor r ect ly r eject ing t he null hypot hesis when it is act ually
t r ue is 5%. If you need mor e pr ot ect ion fr om t his er r or , t hen choose a lower
value of .
The p-value is t he pr obabilit y of obser ving t he given sample r esult under t he
assumpt ion t hat t he null hypot hesis is t r ue. If t he p-value is less t han , t hen
you r eject t he null hypot hesis. For example, if = 0.05 and t he p-value is
0.03, t hen you r eject t he null hypot hesis.
The conver se is not t r ue. If t he p-value is gr eat er t han , you have
insufficient evidence t o r eject t he null hypot hesis.
The out put s for many hypot hesis t est funct ions also include confidence
intervals. Loosely speaking, a confidence int er val is a r ange of values t hat
have a chosen pr obabilit y of cont aining t he t r ue hypot hesized quant it y.
Suppose, in our example, 1.15 is inside a 95% confidence int er val for t he
mean, . That is equivalent t o being unable t o r eject t he null hypot hesis at a
significance level of 0.05. Conver sely if t he 100(1-) confidence int er val does
not cont ain 1.15, t hen you r eject t he null hypot hesis at t he level of
significance.
Hypothesis Test Assumptions
The differ ence bet ween hypot hesis t est pr ocedur es oft en ar ises fr om
differ ences in t he assumpt ions t hat t he r esear cher is willing t o make about t he
dat a sample. For example, t he Z-t est assumes t hat t he dat a r epr esent s
independent samples fr om t he same nor mal dist r ibut ion and t hat you know t he
st andar d deviat ion, . The t -t est has t he same assumpt ions except t hat you
est imat e t he st andar d deviat ion using t he dat a inst ead of specifying it as a
known quant it y.
Bot h t est s have an associat ed signal-t o-noise r at io
Z
x

------------ or T
x
s
------------ = =
wher e x
x
i
n
----
i 1 =
n

=
H y p o th e si s Te sts
1-107
The signal is t he differ ence bet ween t he aver age and t he hypot hesized mean.
The noise is t he st andar d deviat ion posit ed or est imat ed.
If t he null hypot hesis is t r ue, t hen Z has a st andar d nor mal dist r ibut ion,
N(0,1). T has a St udent s t dist r ibut ion wit h t he degr ees of fr eedom, , equal t o
one less t han t he number of dat a values.
Given t he obser ved r esult for Z or T, and knowing t he dist r ibut ion of Z and T
assuming t he null hypot hesis is t r ue, it is possible t o comput e t he pr obabilit y
(p-value) of obser ving t his r esult . A ver y small p-value cast s doubt on t he t r ut h
of t he null hypot hesis. For example, suppose t hat t he p-value was 0.001,
meaning t hat t he pr obabilit y of obser ving t he given Z or T was one in a
t housand. That should make you skept ical enough about t he null hypot hesis
t hat you r eject it r at her t han believe t hat your r esult was just a lucky 999 t o 1
shot .
Ther e ar e also nonpar amet r ic t est s t hat do not even r equir e t he assumpt ion
t hat t he dat a come fr om a nor mal dist r ibut ion. In addit ion, t her e ar e funct ions
for t est ing whet her t he nor mal assumpt ion is r easonable.
Ex ample: Hypothesis Testing
This example uses t he gasoline pr ice dat a in gas.mat. Ther e ar e t wo samples
of 20 obser ved gas pr ices for t he mont hs of J anuar y and Febr uar y, 1993.
load gas
prices = [price1 price2];
As a fir st st ep, you may want t o t est whet her t he samples fr om each mont h
follow a nor mal dist r ibut ion. As each sample is r elat ively small, you might
choose t o per for m a Lilliefor s t est (r at her t han a J ar que-Ber a t est ):
lillietest(price1)
ans =
0
lillietest(price2)
ans =
0
1 Tu to r i a l
1-108
The r esult of t he hypot hesis t est is a Boolean value t hat is 0 when you do not
r eject t he null hypot hesis, and 1 when you do r eject t hat hypot hesis. In each
case, t her e is no need t o r eject t he null hypot hesis t hat t he samples have a
nor mal dist r ibut ion.
Suppose it is hist or ically t r ue t hat t he st andar d deviat ion of gas pr ices at gas
st at ions ar ound Massachuset t s is four cent s a gallon. The Z-t est is a pr ocedur e
for t est ing t he null hypot hesis t hat t he aver age pr ice of a gallon of gas in
J anuar y (price1) is $1.15.
[h,pvalue,ci] = ztest(price1/100,1.15,0.04)
h =
0
pvalue =
0.8668
ci =
1.1340 1.1690
The Boolean out put is h = 0, so you do not r eject t he null hypot hesis.
The r esult suggest s t hat $1.15 is r easonable. The 95% confidence int er val
[1.1340 1.1690] neat ly br acket s $1.15.
What about Febr uar y? Tr y a t -t est wit h price2. Now you ar e not assuming
t hat you know t he st andar d deviat ion in pr ice.
[h,pvalue,ci] = ttest(price2/100,1.15)
h =
1
pvalue =
4.9517e-04
ci =
1.1675 1.2025
Wit h t he Boolean r esult h = 1, you can r eject t he null hypot hesis at t he default
significance level, 0.05.
H y p o th e si s Te sts
1-109
It looks like $1.15 is not a r easonable est imat e of t he gasoline pr ice in
Febr uar y. The low end of t he 95% confidence int er val is gr eat er t han 1.15.
The funct ion ttest2 allows you t o compar e t he means of t he t wo dat a samples.
[h,sig,ci] = ttest2(price1,price2)
h =
1
sig =
0.0083
ci =
-5.7845 -0.9155
The confidence int er val (ci above) indicat es t hat gasoline pr ices wer e bet ween
one and six cent s lower in J anuar y t han Febr uar y.
If t he t wo samples wer e not nor mally dist r ibut ed but had similar shape, it
would have been mor e appr opr iat e t o use t he nonpar amet r ic r ank sum t est in
place of t he t -t est . We can st ill use t he r ank sum t est wit h nor mally dist r ibut ed
dat a, but it is less power ful t han t he t -t est .
[p,h,stats] = ranksum(price1, price2)
p =
0.0092
h =
1
stats =
zval: -2.6064
ranksum: 314
As might be expect ed, t he r ank sum t est leads t o t he same conclusion but it is
less sensit ive t o t he differ ence bet ween samples (higher p-value).
1 Tu to r i a l
1-110
The box plot below gives t he same conclusion gr aphically. Not e t hat t he
not ches have lit t le, if any, over lap. Refer t o St at ist ical Plot s on page 1-128 for
mor e infor mat ion about box plot s.
boxplot(prices,1)
set(gca,'XtickLabel',str2mat('January','February'))
xlabel('Month')
ylabel('Prices ($0.01)')
January February
110
115
120
125
P
r
i
c
e
s

(
$
0
.
0
1
)
Month
H y p o th e si s Te sts
1-111
Available Hypothesis Tests
The St at ist ics Toolbox has funct ions for per for ming t he following t est s.
Function What it Tests
jbtest Nor mal dist r ibut ion for one sample
kstest Any specified dist r ibut ion for one sample
kstest2 Equal dist r ibut ions for t wo samples
lillietest Nor mal dist r ibut ion for one sample
ranksum Median of t wo unpair ed samples
signrank Median of t wo pair ed samples
signtest Median of t wo pair ed samples
ttest Mean of one nor mal sample
ttest2 Mean of t wo nor mal samples
ztest Mean of nor mal sample wit h known st andar d deviat ion
1 Tu to r i a l
1-112
Multivariate Statistics
Mult ivar iat e st at ist ics is an omnibus t er m for a number of differ ent st at ist ical
met hods. The defining char act er ist ic of t hese met hods is t hat t hey all aim t o
under st and a dat a set by consider ing a gr oup of var iables t oget her r at her t han
focusing on only one var iable at a t ime.
The St at ist ics Toolbox has funct ions for pr incipal component s analysis
(princomp), mult ivar iat e analysis of var iance (manova1), and linear
discr iminant analysis (classify). The following sect ions illust r at e t he fir st t wo
funct ions:
Pr incipal Component s Analysis
Mult ivar iat e Analysis of Var iance (MANOVA)
Principal Components Analysis
One of t he difficult ies inher ent in mult ivar iat e st at ist ics is t he pr oblem of
visualizing mult idimensionalit y. In MATLAB, t he plot command displays a
gr aph of t he r elat ionship bet ween t wo var iables. The plot3 and surf
commands display differ ent t hr ee-dimensional views. When t her e ar e mor e
t han t hr ee var iables, it st r et ches t he imaginat ion t o visualize t heir
r elat ionships.
For t unat ely, in dat a set s wit h many var iables, gr oups of var iables oft en move
t oget her . One r eason for t his is t hat mor e t han one var iable may be measur ing
t he same dr iving pr inciple gover ning t he behavior of t he syst em. In many
syst ems t her e ar e only a few such dr iving for ces. But an abundance of
inst r ument at ion allows us t o measur e dozens of syst em var iables. When t his
happens, we can t ake advant age of t his r edundancy of infor mat ion. We can
simplify our pr oblem by r eplacing a gr oup of var iables wit h a single new
var iable.
Pr incipal component s analysis is a quant it at ively r igor ous met hod for
achieving t his simplificat ion. The met hod gener at es a new set of var iables,
called principal components. Each pr incipal component is a linear combinat ion
of t he or iginal var iables. All t he pr incipal component s ar e or t hogonal t o each
ot her so t her e is no r edundant infor mat ion. The pr incipal component s as a
whole for m an or t hogonal basis for t he space of t he dat a.
M u l ti v a r i a te Sta ti sti c s
1-113
Ther e ar e an infinit e number of ways t o const r uct an or t hogonal basis for
sever al columns of dat a. What is so special about t he pr incipal component
basis?
The fir st pr incipal component is a single axis in space. When you pr oject each
obser vat ion on t hat axis, t he r esult ing values for m a new var iable. And t he
var iance of t his var iable is t he maximum among all possible choices of t he fir st
axis.
The second pr incipal component is anot her axis in space, per pendicular t o t he
fir st . Pr oject ing t he obser vat ions on t his axis gener at es anot her new var iable.
The var iance of t his var iable is t he maximum among all possible choices of t his
second axis.
The full set of pr incipal component s is as lar ge as t he or iginal set of var iables.
But it is commonplace for t he sum of t he var iances of t he fir st few pr incipal
component s t o exceed 80% of t he t ot al var iance of t he or iginal dat a. By
examining plot s of t hese few new var iables, r esear cher s oft en develop a deeper
under st anding of t he dr iving for ces t hat gener at ed t he or iginal dat a.
The following sect ion pr ovides an example.
Ex a mple: Principa l Components Ana lysis
Let us look at a sample applicat ion t hat uses nine differ ent indices of t he
qualit y of life in 329 U.S. cit ies. These ar e climat e, housing, healt h, cr ime,
t r anspor t at ion, educat ion, ar t s, r ecr eat ion, and economics. For each index,
higher is bet t er ; so, for example, a higher index for cr ime means a lower cr ime
r at e.
We st ar t by loading t he dat a in cities.mat.
load cities
whos
Name Size Bytes Class
categories 9x14 252 char array
names 329x43 28294 char array
ratings 329x9 23688 double array
The whos command gener at es a t able of infor mat ion about all t he var iables in
t he wor kspace.
1 Tu to r i a l
1-114
The cit ies dat a set cont ains t hr ee var iables:
categories, a st r ing mat r ix cont aining t he names of t he indices.
names, a st r ing mat r ix cont aining t he 329 cit y names.
ratings, t he dat a mat r ix wit h 329 r ows and 9 columns.
Let s look at t he value of t he categories var iable.
categories
categories =
climate
housing
health
crime
transportation
education
arts
recreation
economics
Now, let s look at t he fir st sever al r ows of names var iable.
first5 = names(1:5,:)
first5 =
Abilene, TX
Akron, OH
Albany, GA
Albany-Troy, NY
Albuquerque, NM
To get a quick impr ession of t he r at ings dat a, make a box plot .
boxplot(ratings,0,'+',0)
set(gca,'YTicklabel',categories)
These commands gener at e t he plot below. Not e t hat t her e is subst ant ially mor e
var iabilit y in t he r at ings of t he ar t s and housing t han in t he r at ings of cr ime
and climat e.
M u l ti v a r i a te Sta ti sti c s
1-115
Or dinar ily you might also gr aph pair s of t he or iginal var iables, but t her e ar e
36 t wo-var iable plot s. Per haps pr incipal component s analysis can r educe t he
number of var iables we need t o consider .
Somet imes it makes sense t o comput e pr incipal component s for r aw dat a. This
is appr opr iat e when all t he var iables ar e in t he same unit s. St andar dizing t he
dat a is r easonable when t he var iables ar e in differ ent unit s or when t he
var iance of t he differ ent columns is subst ant ial (as in t his case).
You can st andar dize t he dat a by dividing each column by it s st andar d
deviat ion.
stdr = std(ratings);
sr = ratings./repmat(stdr,329,1);
Now we ar e r eady t o find t he pr incipal component s.
[pcs,newdata,variances,t2] = princomp(sr);
The following sect ions explain t he four out put s fr om princomp:
The Pr incipal Component s (Fir st Out put )
The Component Scor es (Second Out put )
The Component Var iances (Thir d Out put )
Hot ellings T
2
(Four t h Out put )
0 1 2 3 4 5
x 10
4
climate
housing
health
crime
transportation
education
arts
recreation
economics
Values
C
o
l
u
m
n

N
u
m
b
e
r
1 Tu to r i a l
1-116
The Principa l Components (First O utput)
The fir st out put of t he princomp funct ion, pcs, cont ains t he nine pr incipal
component s. These ar e t he linear combinat ions of t he or iginal var iables t hat
gener at e t he new var iables.
Let s look at t he fir st t hr ee pr incipal component vect or s.
p3 = pcs(:,1:3)
p3 =
0.2064 0.2178 -0.6900
0.3565 0.2506 -0.2082
0.4602 -0.2995 -0.0073
0.2813 0.3553 0.1851
0.3512 -0.1796 0.1464
0.2753 -0.4834 0.2297
0.4631 -0.1948 -0.0265
0.3279 0.3845 -0.0509
0.1354 0.4713 0.6073
The lar gest weight s in t he fir st column (fir st pr incipal component ) ar e t he t hir d
and sevent h element s, cor r esponding t o t he var iables health and arts. All t he
element s of t he fir st pr incipal component ar e t he same sign, making it a
weight ed aver age of all t he var iables.
To show t he or t hogonalit y of t he pr incipal component s, not e t hat
pr emult iplying t hem by t heir t r anspose yields t he ident it y mat r ix.
I = p3'*p3
I =
1.0000 -0.0000 -0.0000
-0.0000 1.0000 -0.0000
-0.0000 -0.0000 1.0000
M u l ti v a r i a te Sta ti sti c s
1-117
The Component Scores (Second O utput)
The second out put , newdata, is t he dat a in t he new coor dinat e syst em defined
by t he pr incipal component s. This out put is t he same size as t he input dat a
mat r ix.
A plot of t he fir st t wo columns of newdata shows t he r at ings dat a pr oject ed ont o
t he fir st t wo pr incipal component s.
plot(newdata(:,1),newdata(:,2),'+')
xlabel('1st Principal Component');
ylabel('2nd Principal Component');
Not e t he out lying point s in t he lower r ight cor ner .
The funct ion gname is useful for gr aphically ident ifying a few point s in a plot
like t his. You can call gname wit h a st r ing mat r ix cont aining as many case
labels as point s in t he plot . The st r ing mat r ix names wor ks for labeling point s
wit h t he cit y names.
gname(names)
Move your cur sor over t he plot and click once near each point at t he t op r ight .
As you click on each point , MATLAB labels it wit h t he pr oper r ow fr om t he
names st r ing mat r ix. When you ar e finished labeling point s, pr ess t he Return
key.
4 2 0 2 4 6 8 10 12 14
4
3
2
1
0
1
2
3
4
1st Principal Component
2
n
d

P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t
1 Tu to r i a l
1-118
Her e is t he r esult ing plot .
The labeled cit ies ar e t he biggest populat ion cent er s in t he Unit ed St at es.
Per haps we should consider t hem as a complet ely separ at e gr oup. If we call
gname wit hout ar gument s, it labels each point wit h it s r ow number .
4 2 0 2 4 6 8 10 12 14
4
3
2
1
0
1
2
3
4
1st Principal Component
2
n
d

P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t
New York, NY
Los Angeles, Long Beach, CA
San Francisco, CA
Boston, MA
Washington, DCMDVA
Chicago, IL
4 2 0 2 4 6 8 10 12 14
4
3
2
1
0
1
2
3
4
1st Principal Component
2
n
d

P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t
213
179
270
43
314
65
237
234
M u l ti v a r i a te Sta ti sti c s
1-119
We can cr eat e an index var iable cont aining t he r ow number s of all t he
met r opolit an ar eas we chose.
metro = [43 65 179 213 234 270 314];
names(metro,:)
ans =
Boston, MA
Chicago, IL
Los Angeles, Long Beach, CA
New York, NY
Philadelphia, PA-NJ
San Francisco, CA
Washington, DC-MD-VA
To r emove t hese r ows fr om t he r at ings mat r ix, t ype t he following.
rsubset = ratings;
nsubset = names;
nsubset(metro,:) = [];
rsubset(metro,:) = [];
size(rsubset)
ans =
322 9
To pr act ice, r epeat t he analysis using t he var iable rsubset as t he new dat a
mat r ix and nsubset as t he st r ing mat r ix of labels.
1 Tu to r i a l
1-120
The Component Va ria nces (Third O utput)
The t hir d out put , variances, is a vect or cont aining t he var iance explained by
t he cor r esponding column of newdata.
variances
variances =
3.4083
1.2140
1.1415
0.9209
0.7533
0.6306
0.4930
0.3180
0.1204
You can easily calculat e t he per cent of t he t ot al var iabilit y explained by each
pr incipal component .
percent_explained = 100*variances/sum(variances)
percent_explained =
37.8699
13.4886
12.6831
10.2324
8.3698
7.0062
5.4783
3.5338
1.3378
A Scr ee plot is a par et o plot of t he per cent var iabilit y explained by each
pr incipal component .
pareto(percent_explained)
xlabel('Principal Component')
ylabel('Variance Explained (%)')
M u l ti v a r i a te Sta ti sti c s
1-121
We can see t hat t he fir st t hr ee pr incipal component s explain r oughly t wo t hir ds
of t he t ot al var iabilit y in t he st andar dized r at ings.
Hotellings T
2
(Four th O utput)
The last out put of t he princomp funct ion, t2, is Hot ellings T
2
, a st at ist ical
measur e of t he mult ivar iat e dist ance of each obser vat ion fr om t he cent er of t he
dat a set . This is an analyt ical way t o find t he most ext r eme point s in t he dat a.
[st2, index] = sort(t2); % Sort in ascending order.
st2 = flipud(st2); % Values in descending order.
index = flipud(index); % Indices in descending order.
extreme = index(1)
extreme =
213
names(extreme,:)
ans =
New York, NY
It is not sur pr ising t hat t he r at ings for New Yor k ar e t he fur t hest fr om t he
aver age U.S. t own.
1 2 3 4 5 6 7
0
10
20
30
40
50
60
70
80
90
100
Principal Component
V
a
r
i
a
n
c
e

E
x
p
l
a
i
n
e
d

(
%
)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 Tu to r i a l
1-122
Multivariate Analysis of Variance (MANOVA)
We r eviewed t he analysis of var iance t echnique in One-Way Analysis of
Var iance (ANOVA) on page 1-69. Wit h t his t echnique we can t ake a set of
gr ouped dat a and det er mine whet her t he mean of a var iable differ s
significant ly bet ween gr oups. Oft en we have mult iple var iables, and we ar e
int er est ed in det er mining whet her t he ent ir e set of means is differ ent fr om one
gr oup t o t he next . Ther e is a mult ivar iat e ver sion of analysis of var iance t hat
can addr ess t hat pr oblem, as illust r at ed in t he following example.
Ex a mple: M ultiva ria te Ana lysis of Va ria nce
The carsmall dat a set has measur ement s on a var iet y of car models fr om t he
year s 1970, 1976, and 1982. Suppose we ar e int er est ed in whet her t he
char act er ist ics of t he car s have changed over t ime.
Fir st we load t he dat a.
load carsmall
whos
Name Size Bytes Class
Acceleration 100x1 800 double array
Cylinders 100x1 800 double array
Displacement 100x1 800 double array
Horsepower 100x1 800 double array
MPG 100x1 800 double array
Model 100x36 7200 char array
Model_Year 100x1 800 double array
Origin 100x7 1400 char array
Weight 100x1 800 double array
Four of t hese var iables (Acceleration, Displacement, Horsepower, and MPG)
ar e cont inuous measur ement s on individual car models. The var iable
Model_Year indicat es t he year in which t he car was made. We can cr eat e a
gr ouped plot mat r ix of t hese var iables using t he gplotmatrix funct ion.
x = [MPG Horsepower Displacement Weight];
gplotmatrix(x,[],Model_Year,[],'+xo')
M u l ti v a r i a te Sta ti sti c s
1-123
(When t he second ar gument of gplotmatrix is empt y, t he funct ion gr aphs t he
columns of t he x ar gument against each ot her , and places hist ogr ams along t he
diagonals. The empt y four t h ar gument pr oduces a gr aph wit h t he default
color s. The fift h ar gument cont r ols t he symbols used t o dist inguish bet ween
gr oups.)
It appear s t he car s do differ fr om year t o year . The upper r ight plot , for
example, is a gr aph of MPG ver sus Weight. The 1982 car s appear t o have higher
mileage t han t he older car s, and t hey appear t o weigh less on aver age. But as
a gr oup, ar e t he t hr ee year s significant ly differ ent fr om one anot her ? The
manova1 funct ion can answer t hat quest ion.
[d,p,stats] = manova1(x,Model_Year)
d =
2
p =
1.0e-006 *
0
0.1141
2000 3000 4000 200 400 100 200 20 40
2000
3000
4000
100
200
300
400
10
20
30
40
50
100
150
200
70
76
82
1 Tu to r i a l
1-124
stats =
W: [4x4 double]
B: [4x4 double]
T: [4x4 double]
dfW: 90
dfB: 2
dfT: 92
lambda: [2x1 double]
chisq: [2x1 double]
chisqdf: [2x1 double]
eigenval: [4x1 double]
eigenvec: [4x4 double]
canon: [100x4 double]
mdist: [100x1 double]
gmdist: [3x3 double]
The manova1 funct ion pr oduces t hr ee out put s:
The fir st out put , d, is an est imat e of t he dimension of t he gr oup means. If t he
means wer e all t he same, t he dimension would be 0, indicat ing t hat t he
means ar e at t he same point . If t he means differ ed but fell along a line, t he
dimension would be 1. In t he example t he dimension is 2, indicat ing t hat t he
gr oup means fall in a plane but not along a line. This is t he lar gest possible
dimension for t he means of t hr ee gr oups.
The second out put , p, is a vect or of p-values for a sequence of t est s. The fir st
p-value t est s whet her t he dimension is 0, t he next whet her t he dimension
is 1, and so on. In t his case bot h p-values ar e small. That s why t he est imat ed
dimension is 2.
The t hir d out put , stats, is a st r uct ur e cont aining sever al fields, descr ibed in
t he following sect ion.
The Fields of the sta ts Structure. The W, B, and T fields ar e mat r ix analogs t o t he
wit hin, bet ween, and t ot al sums of squar es in or dinar y one-way analysis of
var iance. The next t hr ee fields ar e t he degr ees of fr eedom for t hese mat r ices.
Fields lambda, chisq, and chisqdf ar e t he ingr edient s of t he t est for t he
dimensionalit y of t he gr oup means. (The p-values for t hese t est s ar e t he fir st
out put ar gument of manova1.)
The next t hr ee fields ar e used t o do a canonical analysis. Recall t hat in
pr incipal component s analysis (Pr incipal Component s Analysis on
M u l ti v a r i a te Sta ti sti c s
1-125
page 1-112) we look for t he combinat ion of t he or iginal var iables t hat has t he
lar gest possible var iat ion. In mult ivar iat e analysis of var iance, we inst ead look
for t he linear combinat ion of t he or iginal var iables t hat has t he lar gest
separ at ion bet ween gr oups. It is t he single var iable t hat would give t he most
significant r esult in a univar iat e one-way analysis of var iance. Having found
t hat combinat ion, we next look for t he combinat ion wit h t he second highest
separ at ion, and so on.
The eigenvec field is a mat r ix t hat defines t he coefficient s of t he linear
combinat ions of t he or iginal var iables. The eigenval field is a vect or
measur ing t he r at io of t he bet ween-gr oup var iance t o t he wit hin-gr oup
var iance for t he cor r esponding linear combinat ion. The canon field is a mat r ix
of t he canonical var iable values. Each column is a linear combinat ion of t he
mean-cent er ed or iginal var iables, using coefficient s fr om t he eigenvec mat r ix.
A gr ouped scat t er plot of t he fir st t wo canonical var iables shows mor e
separ at ion bet ween gr oups t hen a gr ouped scat t er plot of any pair of or iginal
var iables. In t his example it shows t hr ee clouds of point s, over lapping but wit h
dist inct cent er s. One point in t he bot t om r ight sit s apar t fr om t he ot her s. By
using t he gname funct ion, we can see t hat t his is t he 20t h point .
c1 = stats.canon(:,1);
c2 = stats.canon(:,2);
gscatter(c2,c1,Model_Year,[],'oxs')
gname
4 3 2 1 0 1 2 3 4 5
6
4
2
0
2
4
6
c2
c
1
20
70
76
82
1 Tu to r i a l
1-126
Roughly speaking, t he fir st canonical var iable, c1, separ at es t he 1982 car s
(which have high values of c1) fr om t he older car s. The second canonical
var iable, c2, r eveals some separ at ion bet ween t he 1970 and 1976 car s.
The final t wo fields of t he stats st r uct ur e ar e Mahalanobis dist ances. The
mdist field measur es t he dist ance fr om each point t o it s gr oup mean. Point s
wit h lar ge values may be out lier s. In t his dat a set , t he lar gest out lier is t he one
we saw in t he scat t er plot , t he Buick Est at e st at ion wagon. (Not e t hat we could
have supplied t he model name t o t he gname funct ion above if we want ed t o label
t he point wit h it s model name r at her t han it s r ow number .)
max(stats.mdist)
ans =
31.5273
find(stats.mdist == ans)
ans =
20
Model(20,:)
ans =
buick_estate_wagon_(sw)
The gmdist field measur es t he dist ances bet ween each pair of gr oup means.
The following commands examine t he gr oup means and t heir dist ances:
grpstats(x, Model_Year)
ans =
1.0e+003 *
0.0177 0.1489 0.2869 3.4413
0.0216 0.1011 0.1978 3.0787
0.0317 0.0815 0.1289 2.4535
stats.gmdist
ans =
0 3.8277 11.1106
3.8277 0 6.1374
11.1106 6.1374 0
M u l ti v a r i a te Sta ti sti c s
1-127
As might be expect ed, t he mult ivar iat e dist ance bet ween t he ext r eme year s
1970 and 1982 (11.1) is lar ger t han t he differ ence bet ween mor e closely spaced
year s (3.8 and 6.1). This is consist ent wit h t he scat t er plot s, wher e t he point s
seem t o follow a pr ogr ession as t he year changes fr om 1970 t hr ough 1976 t o
1982. If we had mor e gr oups, we might have found it inst r uct ive t o use t he
manovacluster funct ion t o dr aw a diagr am t hat pr esent s clust er s of t he
gr oups, for med using t he dist ances bet ween t heir means.
1 Tu to r i a l
1-128
Statistical Plots
The St at ist ics Toolbox adds specialized plot s t o t he ext ensive gr aphics
capabilit ies of MATLAB.
Box plots ar e gr aphs for descr ibing dat a samples. They ar e also useful for
gr aphic compar isons of t he means of many samples (see One-Way Analysis
of Var iance (ANOVA) on page 1-69).
Distribution plots ar e gr aphs for visualizing t he dist r ibut ion of one or mor e
samples. They include nor mal and Weibull pr obabilit y plot s,
quant ile-quant ile plot s, and empir ical cumulat ive dist r ibut ion plot s.
S catter plots ar e gr aphs for visualizing t he r elat ionship bet ween a pair of
var iables or sever al such pair s. Gr ouped ver sions of t hese plot s use differ ent
plot t ing symbols t o indicat e gr oup member ship. The gname funct ion can label
point s on t hese plot s wit h a t ext label or an obser vat ion number .
The plot t ypes ar e descr ibed fur t her in t he following sect ions:
Box Plot s
Dist r ibut ion Plot s
Scat t er Plot s
Box Plots
The gr aph shows an example of a not ched box plot .
1
110
115
120
125
V
a
l
u
e
s
Column Number
Sta ti sti c a l Pl o ts
1-129
This plot has sever al gr aphic element s:
The lower and upper lines of t he box ar e t he 25t h and 75t h per cent iles of
t he sample. The dist ance bet ween t he t op and bot t om of t he box is t he
int er quar t ile r ange.
The line in t he middle of t he box is t he sample median. If t he median is not
cent er ed in t he box, t hat is an indicat ion of skewness.
The whisker s ar e lines ext ending above and below t he box. They show t he
ext ent of t he r est of t he sample (unless t her e ar e out lier s). Assuming no
out lier s, t he maximum of t he sample is t he t op of t he upper whisker . The
minimum of t he sample is t he bot t om of t he lower whisker . By default , an
out lier is a value t hat is mor e t han 1.5 t imes t he int er quar t ile r ange away
fr om t he t op or bot t om of t he box.
The plus sign at t he t op of t he plot is an indicat ion of an out lier in t he dat a.
This point may be t he r esult of a dat a ent r y er r or , a poor measur ement or a
change in t he syst em t hat gener at ed t he dat a.
The not ches in t he box ar e a gr aphic confidence int er val about t he median of
a sample. Box plot s do not have not ches by default .
A side-by-side compar ison of t wo not ched box plot s is t he gr aphical equivalent
of a t -t est . See Hypot hesis Test s on page 1-105.
Distribution Plots
Ther e ar e sever al t ypes of plot s for examining t he dist r ibut ion of one or mor e
samples, as descr ibed in t he following sect ions:
Nor mal Pr obabilit y Plot s
Quant ile-Quant ile Plot s
Weibull Pr obabilit y Plot s
Empir ical Cumulat ive Dist r ibut ion Funct ion (CDF)
N or ma l Proba bility Plots
A nor mal pr obabilit y plot is a useful gr aph for assessing whet her dat a comes
fr om a nor mal dist r ibut ion. Many st at ist ical pr ocedur es make t he assumpt ion
t hat t he under lying dist r ibut ion of t he dat a is nor mal, so t his plot can pr ovide
some assur ance t hat t he assumpt ion of nor malit y is not being violat ed, or
pr ovide an ear ly war ning of a pr oblem wit h your assumpt ions.
1 Tu to r i a l
1-130
This example shows a t ypical nor mal pr obabilit y plot .
x = normrnd(10,1,25,1);
normplot(x)
The plot has t hr ee gr aphical element s. The plus signs show t he empir ical
pr obabilit y ver sus t he dat a value for each point in t he sample. The solid line
connect s t he 25t h and 75t h per cent iles of t he dat a and r epr esent s a r obust
linear fit (i.e., insensit ive t o t he ext r emes of t he sample). The dashed line
ext ends t he solid line t o t he ends of t he sample.
The scale of t he y-axis is not unifor m. The y-axis values ar e pr obabilit ies and,
as such, go fr om zer o t o one. The dist ance bet ween t he t ick mar ks on t he y-axis
mat ches t he dist ance bet ween t he quant iles of a nor mal dist r ibut ion. The
quant iles ar e close t oget her near t he median (pr obabilit y = 0.5) and st r et ch out
symmet r ically moving away fr om t he median. Compar e t he ver t ical dist ance
fr om t he bot t om of t he plot t o t he pr obabilit y 0.25 wit h t he dist ance fr om 0.25
t o 0.50. Similar ly, compar e t he dist ance fr om t he t op of t he plot t o t he
pr obabilit y 0.75 wit h t he dist ance fr om 0.75 t o 0.50.
8.5 9 9.5 10 10.5 11 11.5
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Normal Probability Plot
Sta ti sti c a l Pl o ts
1-131
If all t he dat a point s fall near t he line, t he assumpt ion of nor malit y is
r easonable. But , if t he dat a is nonnor mal, t he plus signs may follow a cur ve, as
in t he example using exponent ial dat a below.
x = exprnd(10,100,1);
normplot(x)
This plot is clear evidence t hat t he under lying dist r ibut ion is not nor mal.
Q ua ntile- Q ua ntile Plots
A quant ile-quant ile plot is useful for det er mining whet her t wo samples come
fr om t he same dist r ibut ion (whet her nor mally dist r ibut ed or not ).
The example shows a quant ile-quant ile plot of t wo samples fr om a Poisson
dist r ibut ion.
x = poissrnd(10,50,1);
y = poissrnd(5,100,1);
qqplot(x,y);
0 5 10 15 20 25 30 35 40 45
0.003
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
0.99
0.997
Data
P
r
o
b
a
b
i
l
i
t
y
Normal Probability Plot
1 Tu to r i a l
1-132
Even t hough t he par amet er s and sample sizes ar e differ ent , t he st r aight line
r elat ionship shows t hat t he t wo samples come fr om t he same dist r ibut ion.
Like t he nor mal pr obabilit y plot , t he quant ile-quant ile plot has t hr ee gr aphical
element s. The pluses ar e t he quant iles of each sample. By default t he number
of pluses is t he number of dat a values in t he smaller sample. The solid line joins
t he 25t h and 75t h per cent iles of t he samples. The dashed line ext ends t he solid
line t o t he ext ent of t he sample.
The example below shows what happens when t he under lying dist r ibut ions ar e
not t he same.
x = normrnd(5,1,100,1);
y = weibrnd(2,0.5,100,1);
qqplot(x,y);
2 4 6 8 10 12 14 16 18
-2
0
2
4
6
8
10
12
X Quantiles
Y

Q
u
a
n
t
i
l
e
s
Sta ti sti c a l Pl o ts
1-133
These samples clear ly ar e not fr om t he same dist r ibut ion.
It is incor r ect t o int er pr et a linear plot as a guarantee t hat t he t wo samples
come fr om t he same dist r ibut ion. But , for assessing t he validit y of a st at ist ical
pr ocedur e t hat depends on t he t wo samples coming fr om t he same dist r ibut ion
(e.g., ANOVA), a linear quant ile-quant ile plot should be sufficient .
Weibull Proba bility Plots
A Weibull pr obabilit y plot is a useful gr aph for assessing whet her dat a comes
fr om a Weibull dist r ibut ion. Many r eliabilit y analyses make t he assumpt ion
t hat t he under lying dist r ibut ion of t he lifet imes is Weibull, so t his plot can
pr ovide some assur ance t hat t his assumpt ion is not being violat ed, or pr ovide
an ear ly war ning of a pr oblem wit h your assumpt ions.
The scale of t he y-axis is not unifor m. The y-axis values ar e pr obabilit ies and,
as such, go fr om zer o t o one. The dist ance bet ween t he t ick mar ks on t he y-axis
mat ches t he dist ance bet ween t he quant iles of a Weibull dist r ibut ion.
If t he dat a point s (pluses) fall near t he line, t he assumpt ion t hat t he dat a comes
fr om a Weibull dist r ibut ion is r easonable.
2 3 4 5 6 7 8
-2
0
2
4
6
8
10
12
14
16
X Quantiles
Y

Q
u
a
n
t
i
l
e
s
1 Tu to r i a l
1-134
This example shows a t ypical Weibull pr obabilit y plot .
y = weibrnd(2,0.5,100,1);
weibplot(y)
Empirica l Cumula tive Distribution Function (CDF)
If you ar e not willing t o assume t hat your dat a follows a specific pr obabilit y
dist r ibut ion, you can use t he cdfplot funct ion t o gr aph an empir ical est imat e
of t he cumulat ive dist r ibut ion funct ion (cdf). This funct ion comput es t he
pr opor t ion of dat a point s less t han each x value, and plot s t he pr opor t ion as a
funct ion of x. The y-axis scale is linear , not a pr obabilit y scale for a specific
dist r ibut ion.
This example shows t he empir ical cumulat ive dist r ibut ion funct ion for a
Weibull sample.
y = weibrnd(2,0.5,100,1);
cdfplot(y)
10
-4
10
-2
10
0
0.003
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.96
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Weibull Probability Plot
Sta ti sti c a l Pl o ts
1-135
The plot shows a pr obabilit y funct ion t hat r ises st eeply near x=0 and levels off
for lar ger values. Over 80% of t he obser vat ions ar e less t han 1, wit h t he
r emaining values spr ead over t he r ange [1 5].
Scatter Plots
A scat t er plot is a simple plot of one var iable against anot her . The MATLAB
plot and scatter funct ions can pr oduce scat t er plot s. The MATLAB
plotmatrix funct ion can pr oduce a mat r ix of such plot s showing t he
r elat ionship bet ween sever al pair s of var iables.
The St at ist ics Toolbox adds funct ions t hat pr oduce gr ouped ver sions of t hese
plot s. These ar e useful for det er mining whet her t he values of t wo var iables or
t he r elat ionship bet ween t hose var iables is t he same in each gr oup.
Suppose we want t o examine t he weight and mileage of car s fr om t hr ee
differ ent model year s.
load carsmall
gscatter(Weight,MPG,Model_Year,'','xos')
0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
1 Tu to r i a l
1-136
This shows t hat not only is t her e a st r ong r elat ionship bet ween t he weight of a
car and it s mileage, but also t hat newer car s t end t o be light er and have bet t er
gas mileage t han older car s.
(The default ar gument s for gscatter pr oduce a scat t er plot wit h t he differ ent
gr oups shown wit h t he same symbol but differ ent color s. The last t wo
ar gument s above r equest t hat all gr oups be shown in default color s and wit h
differ ent symbols.)
The carsmall dat a set cont ains ot her var iables t hat descr ibe differ ent aspect s
of car s. We can examine sever al of t hem in a single display by cr eat ing a
gr ouped plot mat r ix.
xvars = [Weight Displacement Horsepower];
yvars = [MPG Acceleration];
gplotmatrix(xvars,yvars,Model_Year,'','xos')
1500 2000 2500 3000 3500 4000 4500 5000
5
10
15
20
25
30
35
40
45
Weight
M
P
G
70
76
82
Sta ti sti c a l Pl o ts
1-137
The upper r ight subplot displays MPG against Horsepower, and shows t hat over
t he year s t he hor sepower of t he car s has decr eased but t he gas mileage has
impr oved.
The gplotmatrix funct ion can also gr aph all pair s fr om a single list of
var iables, along wit h hist ogr ams for each var iable. See Mult ivar iat e Analysis
of Var iance (MANOVA) on page 1-122.
50 100 150 200 100 200 300 400 2000 3000 4000
10
15
20
25
10
20
30
40
70
76
82
1 Tu to r i a l
1-138
Statistical Process Control (SPC)
SPC is an omnibus t er m for a number of met hods for assessing and monit or ing
t he qualit y of manufact ur ed goods. These met hods ar e simple, which makes
t hem easy t o implement even in a pr oduct ion envir onment . The following
sect ions discuss some of t he SPC feat ur es of t he St at ist ics Toolbox:
Cont r ol Char t s
Capabilit y St udies
Control Charts
These gr aphs wer e popular ized by Walt er Shewhar t in his wor k in t he 1920s
at West er n Elect r ic. A cont r ol char t is a plot of a measur ement s over t ime wit h
st at ist ical limit s applied. Act ually, control char t is a slight misnomer . The
char t it self is act ually a monit or ing t ool. The cont r ol act ivit y may occur if t he
char t indicat es t hat t he pr ocess is changing in an undesir able syst emat ic
dir ect ion.
The St at ist ics Toolbox suppor t s t hr ee common cont r ol char t s, descr ibed in t he
following sect ions:
Xbar Char t s
S Char t s
EWMA Char t s
Xba r Cha r ts
Xbar char t s ar e a plot of t he aver age of a sample of a pr ocess t aken at r egular
int er vals. Suppose we ar e manufact ur ing pist ons t o a t oler ance of
0.5 t housandt hs of an inch. We measur e t he r unout (deviat ion fr om cir cular it y
in t housandt hs of an inch) at four point s on each pist on.
load parts
conf = 0.99;
spec = [-0.5 0.5];
xbarplot(runout,conf,spec)
Sta ti sti c a l Pr o c e ss C o n tr o l (SPC )
1-139
The lines at t he bot t om and t he t op of t he plot show t he pr ocess specificat ions.
The cent r al line is t he aver age r unout over all t he pist ons. The t wo lines
flanking t he cent er line ar e t he 99% st at ist ical cont r ol limit s. By chance only
one measur ement in 100 should fall out side t hese lines. We can see t hat even
in t his small r un of 36 par t s, t her e ar e sever al point s out side t he boundar ies
(labeled by t heir obser vat ion number s). This is an indicat ion t hat t he pr ocess
mean is not in st at ist ical cont r ol. This might not be of much concer n in pr act ice,
since all t he par t s ar e well wit hin specificat ion.
S Cha r ts
The S char t is a plot of t he st andar d deviat ion of a pr ocess t aken at r egular
int er vals. The st andar d deviat ion is a measur e of t he var iabilit y of a pr ocess.
So, t he plot indicat es whet her t her e is any syst emat ic change in t he pr ocess
var iabilit y. Cont inuing wit h t he pist on manufact ur ing example, we can look at
t he st andar d deviat ion of each set of four measur ement s of r unout .
schart(runout)
0 10 20 30 40
-0.4
-0.2
0
0.2
0.4
0.6
1
2
21
25
26
30
Xbar Chart
USL
LSL
Samples
M
e
a
s
u
r
e
m
e
n
t
s
LCL
UCL
1 Tu to r i a l
1-140
The aver age r unout is about 0.1 t housandt hs of an inch. Ther e is no indicat ion
of nonr andom var iabilit y.
EW M A Cha r ts
The exponent ially-weight ed moving aver age (EWMA) char t is anot her char t
for monit or ing t he pr ocess aver age. It oper at es on slight ly differ ent
assumpt ions t han t he Xbar char t . The mat hemat ical model behind t he Xbar
char t posit s t hat t he pr ocess mean is act ually const ant over t ime and any
var iat ion in individual measur ement s is due ent ir ely t o chance.
The EWMA model is a lit t le looser . Her e we assume t hat t he mean may be
var ying in t ime. Her e is an EWMA char t of our r unout example. Compar e t his
wit h t he plot in Xbar Char t s on page 1-138.
ewmaplot(runout,0.5,0.01,spec)
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
S Chart
Sample Number
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
UCL
LCL
Sta ti sti c a l Pr o c e ss C o n tr o l (SPC )
1-141
Capability Studies
Befor e going int o full-scale pr oduct ion, many manufact ur er s r un a pilot st udy
t o det er mine whet her t heir pr ocess can act ually build par t s t o t he
specificat ions demanded by t he engineer ing dr awing.
Using t he dat a fr om t hese capabilit y st udies wit h a st at ist ical model allows us
t o get a pr eliminar y est imat e of t he per cent age of par t s t hat will fall out side
t he specificat ions.
[p,Cp,Cpk] = capable(mean(runout),spec)
p =
1.3940e-09
Cp =
2.3950
Cpk =
1.9812
0 5 10 15 20 25 30 35 40
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
21
25
26
Exponentially Weighted Moving Average (EWMA) Chart
USL
LSL
Sample Number
E
W
M
A
UCL
LCL
1 Tu to r i a l
1-142
The r esult above shows t hat t he pr obabilit y (p = 1.3940e-09) of obser ving an
unaccept able r unout is ext r emely low. Cp and Cpk ar e t wo popular capabilit y
indices.
C
p
is t he r at io of t he r ange of t he specificat ions t o six t imes t he est imat e of t he
pr ocess st andar d deviat ion.
For a pr ocess t hat has it s aver age value on t ar get , a C
p
of 1 t r anslat es t o a lit t le
mor e t han one defect per t housand. Recent ly many indust r ies have set a
qualit y goal of one par t per million. This would cor r espond t o a C
p
= 1.6. The
higher t he value of C
p
, t he mor e capable t he pr ocess.
C
pk
is t he r at io of differ ence bet ween t he pr ocess mean and t he closer
specificat ion limit t o t hr ee t imes t he est imat e of t he pr ocess st andar d
deviat ion.
wher e t he pr ocess mean is . For pr ocesses t hat do not maint ain t heir aver age
on t ar get , C
pk
, is a mor e descr ipt ive index of pr ocess capabilit y.
C
p
US L L S L
6
-------------------------------- =
C
p k
m i n
US L
3
-----------------------
L S L
3
---------------------- ,
,
_
=
D e si g n o f Ex p e r i m e n ts (D O E)
1-143
Design of Experiments (DOE)
Ther e is a wor ld of differ ence bet ween dat a and infor mat ion. To ext r act
infor mat ion fr om dat a you have t o make assumpt ions about t he syst em t hat
gener at ed t he dat a. Using t hese assumpt ions and physical t heor y you may be
able t o develop a mat hemat ical model of t he syst em.
Gener ally, even r igor ously for mulat ed models have some unknown const ant s.
The goal of exper iment at ion is t o acquir e dat a t hat allow us t o est imat e t hese
const ant s.
But why do we need t o exper iment at all? We could inst r ument t he syst em we
want t o st udy and just let it r un. Sooner or lat er we would have all t he dat a we
could use.
In fact , t his is a fair ly common appr oach. Ther e ar e t hr ee char act er ist ics of
hist or ical dat a t hat pose pr oblems for st at ist ical modeling:
Suppose we obser ve a change in t he oper at ing var iables of a syst em followed
by a change in t he out put s of t he syst em. That does not necessar ily mean
t hat t he change in t he syst em caused t he change in t he out put s.
A common assumpt ion in st at ist ical modeling is t hat t he obser vat ions ar e
independent of each ot her . This is not t he way a syst em in nor mal oper at ion
wor ks.
Cont r olling a syst em in oper at ion oft en means changing syst em var iables in
t andem. But if t wo var iables change t oget her , it is impossible t o separ at e
t heir effect s mat hemat ically.
Designed exper iment s dir ect ly addr ess t hese pr oblems. The over whelming
advant age of a designed exper iment is t hat you act ively manipulat e t he syst em
you ar e st udying. Wit h DOE you may gener at e fewer dat a point s t han by using
passive inst r ument at ion, but t he qualit y of t he infor mat ion you get will be
higher .
The St at ist ics Toolbox pr ovides sever al funct ions for gener at ing exper iment al
designs appr opr iat e t o var ious sit uat ions. These ar e discussed in t he following
sect ions:
Full Fact or ial Designs
Fr act ional Fact or ial Designs
D-Opt imal Designs
1 Tu to r i a l
1-144
Full Factorial Designs
Suppose you want t o det er mine whet her t he var iabilit y of a machining pr ocess
is due t o t he differ ence in t he lat hes t hat cut t he par t s or t he oper at or s who r un
t he lat hes.
If t he same oper at or always r uns a given lat he t hen you cannot t ell whet her
t he machine or t he oper at or is t he cause of t he var iat ion in t he out put . By
allowing ever y oper at or t o r un ever y lat he you can separ at e t heir effect s.
This is a fact or ial appr oach. fullfact is t he funct ion t hat gener at es t he design.
Suppose we have four oper at or s and t hr ee machines. What is t he fact or ial
design?
d = fullfact([4 3])
d =
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
1 3
2 3
3 3
4 3
Each r ow of d r epr esent s one oper at or /machine combinat ion. Not e t hat t her e
ar e 4*3 = 12 r ows.
One special subclass of fact or ial designs is when all t he var iables t ake only t wo
values. Suppose you want t o quickly det er mine t he sensit ivit y of a pr ocess t o
high and low values of t hr ee var iables.
d2 = ff2n(3)
D e si g n o f Ex p e r i m e n ts (D O E)
1-145
d2 =
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Ther e ar e 2
3
= 8 combinat ions t o check.
Fractional Factorial Designs
One difficult y wit h fact or ial designs is t hat t he number of combinat ions
incr eases exponent ially wit h t he number of var iables you want t o manipulat e.
For example, t he sensit ivit y st udy discussed above might be impr act ical if
t her e wer e seven var iables t o st udy inst ead of just t hr ee. A full fact or ial design
would r equir e 2
7
= 128 r uns!
If we assume t hat t he var iables do not act syner gist ically in t he syst em, we can
assess t he sensit ivit y wit h far fewer r uns. The t heor et ical minimum number is
eight . A design known as t he Placket t -Bur man design uses a Hadamar d mat r ix
t o define t his minimal number of r uns. To see t he design (X) mat r ix for t he
Placket t -Bur man design, we use t he hadamard funct ion.
X = hadamard(8)
X =
1 1 1 1 1 1 1 1
1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 1
1 1 -1 -1 -1 -1 1 1
1 -1 -1 1 -1 1 1 -1
The last seven columns ar e t he act ual var iable set t ings (-1 for low, 1 for high.)
The fir st column (all ones) allows us t o measur e t he mean effect in t he linear
equat ion, . y X + =
1 Tu to r i a l
1-146
The Placket t -Bur man design enables us t o st udy t he main (linear ) effect s of
each var iable wit h a small number of r uns. It does t his by using a fr act ion, in
t his case 8/128, of t he r uns r equir ed for a full fact or ial design. A dr awback of
t his design is t hat if t he effect of one var iable does var y wit h t he value of
anot her var iable, t hen t he est imat ed effect s will be biased (t hat is, t hey will
t end t o be off by a syst emat ic amount ).
At a cost of a somewhat lar ger design, we can find a fr act ional fact or ial t hat is
much smaller t han a full fact or ial, but t hat does allow est imat ion of main
effect s independent of int er act ions bet ween pair s of var iables. We can do t his
by specifying gener at or s t hat cont r ol t he confounding bet ween var iables.
As an example, suppose we cr eat e a design wit h t he fir st four var iables var ying
independent ly as in a full fact or ial, but wit h t he ot her t hr ee var iables for med
by mult iplying differ ent t r iplet s of t he fir st four . Wit h t his design t he effect s of
t he last t hr ee var iables ar e confounded wit h t hr ee-way int er act ions among t he
fir st four var iables. The est imat ed effect of any single var iable, however , is not
confounded wit h (is independent of) int er act ion effect s bet ween any pair of
var iables. Int er act ion effect s ar e confounded wit h each ot her . Box, Hunt er , and
Hunt er (1978) pr esent t he pr oper t ies of t hese designs and pr ovide t he
gener at or s needed t o pr oduce t hem.
The fracfact funct ion can pr oduce t his fr act ional fact or ial design using t he
gener at or st r ings t hat Box, Hunt er , and Hunt er pr ovide.
X = fracfact('a b c d abc bcd acd')
D e si g n o f Ex p e r i m e n ts (D O E)
1-147
X =
-1 -1 -1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 1
-1 -1 1 -1 1 1 1
-1 -1 1 1 1 -1 -1
-1 1 -1 -1 1 1 -1
-1 1 -1 1 1 -1 1
-1 1 1 -1 -1 -1 1
-1 1 1 1 -1 1 -1
1 -1 -1 -1 1 -1 1
1 -1 -1 1 1 1 -1
1 -1 1 -1 -1 1 -1
1 -1 1 1 -1 -1 1
1 1 -1 -1 -1 1 1
1 1 -1 1 -1 -1 -1
1 1 1 -1 1 -1 -1
1 1 1 1 1 1 1
D-Optimal Designs
All t he designs above wer e in use by ear ly in t he 20t h cent ur y. In t he 1970s
st at ist icians st ar t ed t o use t he comput er in exper iment al design by r ecast ing
t he design of exper iment s (DOE) in t er ms of opt imizat ion. A D-opt imal design
is one t hat maximizes t he det er minant of Fisher s infor mat ion mat r ix, X
T
X.
This mat r ix is pr opor t ional t o t he inver se of t he covar iance mat r ix of t he
par amet er s. So maximizing det(X
T
X) is equivalent t o minimizing t he
det er minant of t he covar iance of t he par amet er s.
A D-opt imal design minimizes t he volume of t he confidence ellipsoid of t he
r egr ession est imat es of t he linear model par amet er s, .
Ther e ar e sever al funct ions in t he St at ist ics Toolbox t hat gener at e D-opt imal
designs. These ar e cordexch, daugment, dcovary, and rowexch. The following
sect ions explor e D-opt imal design in gr eat er det ail:
Gener at ing D-Opt imal Designs
Augment ing D-Opt imal Designs
Designing Exper iment s wit h Uncont r olled Input s
1 Tu to r i a l
1-148
Genera ting D- O ptima l Designs
cordexch and rowexch ar e t wo compet ing opt imizat ion algor it hms for
comput ing a D-opt imal design given a model specificat ion.
Bot h cordexch and rowexch ar e it er at ive algor it hms. They oper at e by
impr oving a st ar t ing design by making incr ement al changes t o it s element s. In
t he coor dinat e exchange algor it hm, t he incr ement s ar e t he individual element s
of t he design mat r ix. In r ow exchange, t he element s ar e t he r ows of t he design
mat r ix. At kinson and Donev (1992) is a r efer ence.
To gener at e a D-opt imal design you must specify t he number of input s, t he
number of r uns, and t he or der of t he model you want t o fit .
Bot h cordexch and rowexch t ake t he following st r ings t o specify t he model:
'linear' or 'l' t he default model wit h const ant and fir st or der t er ms
'interaction' or 'i' includes const ant , linear , and cr oss pr oduct t er ms
'quadratic' or 'q' int er act ions plus squar ed t er ms
'purequadratic' or 'p' includes const ant , linear and squar ed t er ms
Alt er nat ively, you can use a mat r ix of int eger s t o specify t he t er ms. Det ails ar e
in t he help for t he ut ilit y funct ion x2fx.
For a simple example using t he coor dinat e-exchange algor it hm, consider t he
pr oblem of quadr at ic modeling wit h t wo input s. The model for m is
Suppose we want t he D-opt imal design for fit t ing t his model wit h nine r uns.
settings = cordexch(2,9,'q')
settings =
-1 1
1 1
0 1
1 -1
-1 -1
0 -1
1 0
0 0
-1 0
y
0

1
x
1

2
x
2

12
x
1
x
2

11
x
1
2

22
x
2
2
+ + + + + + =
D e si g n o f Ex p e r i m e n ts (D O E)
1-149
We can plot t he columns of set t ings against each ot her t o get a bet t er pict ur e
of t he design.
h = plot(settings(:,1),settings(:,2),'.');
set(gca,'Xtick',[-1 0 1])
set(gca,'Ytick',[-1 0 1])
set(h,'Markersize',20)
For a simple example using t he r ow-exchange algor it hm, consider t he
int er act ion model wit h t wo input s. The model for m is
Suppose we want t he D-opt imal design for fit t ing t his model wit h four r uns.
[settings, X] = rowexch(2,4,'i')
settings =
-1 1
-1 -1
1 -1
1 1
X =
1 -1 1 -1
1 -1 -1 1
1 1 -1 -1
1 1 1 1
The set t ings mat r ix shows how t o var y t he input s fr om r un t o r un. The X mat r ix
is t he design mat r ix for fit t ing t he above r egr ession model. The fir st column of X
-1 0 1
-1
0
1
y
0

1
x
1

2
x
2

12
x
1
x
2
+ + + + =
1 Tu to r i a l
1-150
is for fit t ing t he const ant t er m. The last column is t he element -wise pr oduct of
t he second and t hir d columns.
The associat ed plot is simple but elegant .
h = plot(settings(:,1),settings(:,2),'.');
set(gca,'Xtick',[-1 0 1])
set(gca,'Ytick',[-1 0 1])
set(h,'Markersize',20)
Augmenting D- O ptima l Designs
In pr act ice, exper iment at ion is an it er at ive pr ocess. We oft en want t o add r uns
t o a complet ed exper iment t o lear n mor e about our syst em. The funct ion
daugment allows you choose t hese ext r a r uns opt imally.
Suppose we have execut ed t he eight -r un design below for fit t ing a linear model
t o four input var iables.
settings = cordexch(4,8)
settings =
1 -1 1 1
-1 -1 1 -1
-1 1 1 1
1 1 1 -1
-1 1 -1 1
1 -1 -1 1
-1 -1 -1 -1
1 1 -1 -1
-1 0 1
-1
0
1
D e si g n o f Ex p e r i m e n ts (D O E)
1-151
This design is adequat e t o fit t he linear model for four input s, but cannot fit t he
six cr oss-pr oduct (int er act ion) t er ms. Suppose we ar e willing t o do eight mor e
r uns t o fit t hese ext r a t er ms. Her es how.
[augmented, X] = daugment(settings,8,'i');
augmented
augmented =
1 -1 1 1
-1 -1 1 -1
-1 1 1 1
1 1 1 -1
-1 1 -1 1
1 -1 -1 1
-1 -1 -1 -1
1 1 -1 -1
-1 -1 -1 1
1 1 1 1
-1 -1 1 1
-1 1 1 -1
1 -1 1 -1
1 -1 -1 -1
-1 1 -1 -1
1 1 -1 1
info = X'*X
info =
16 0 0 0 0 0 0 0 0 0 0
0 16 0 0 0 0 0 0 0 0 0
0 0 16 0 0 0 0 0 0 0 0
0 0 0 16 0 0 0 0 0 0 0
0 0 0 0 16 0 0 0 0 0 0
0 0 0 0 0 16 0 0 0 0 0
0 0 0 0 0 0 16 0 0 0 0
0 0 0 0 0 0 0 16 0 0 0
0 0 0 0 0 0 0 0 16 0 0
0 0 0 0 0 0 0 0 0 16 0
0 0 0 0 0 0 0 0 0 0 16
1 Tu to r i a l
1-152
The augment ed design is or t hogonal, since X'*X is a mult iple of t he ident it y
mat r ix. In fact , t his design is t he same as a 2
4
fact or ial design.
Designing Ex periments w ith Uncontrolled Inputs
Somet imes it is impossible t o cont r ol ever y exper iment al input . But you may
know t he values of some input s in advance. An example is t he t ime each r un
t akes place. If a pr ocess is exper iencing linear dr ift , you may want t o include
t he t ime of each t est r un as a var iable in t he model.
The funct ion dcovary allows you t o choose t he set t ings for each r un in or der t o
maximize your infor mat ion despit e a linear dr ift in t he pr ocess.
Suppose we want t o execut e an eight -r un exper iment wit h t hr ee fact or s t hat is
opt imal wit h r espect t o a linear dr ift in t he r esponse over t ime. Fir st we cr eat e
our drift input var iable. Not e, t hat drift is nor malized t o have mean zer o. It s
minimum is -1 and it s maximum is 1.
drift = (linspace(-1,1,8))'
drift =
-1.0000
-0.7143
-0.4286
-0.1429
0.1429
0.4286
0.7143
1.0000
settings = dcovary(3,drift,'linear')
settings =
1.0000 1.0000 -1.0000 -1.0000
-1.0000 -1.0000 -1.0000 -0.7143
-1.0000 1.0000 1.0000 -0.4286
1.0000 -1.0000 1.0000 -0.1429
-1.0000 1.0000 -1.0000 0.1429
1.0000 1.0000 1.0000 0.4286
-1.0000 -1.0000 1.0000 0.7143
1.0000 -1.0000 -1.0000 1.0000
D e m o s
1-153
Demos
The St at ist ics Toolbox has demonst r at ion pr ogr ams t hat cr eat e an int er act ive
envir onment for explor ing t he pr obabilit y dist r ibut ions, r andom number
gener at ion, cur ve fit t ing, and design of exper iment s funct ions. Most of t hem
pr ovide a gr aphical user int er face t hat can be used wit h your r eal dat a, not just
wit h t he sample dat a pr ovided.
The available demos ar e list ed below.
Most of t hese funct ions ar e descr ibed below. The nlintool, rstool, and
stepwise demos ar e discussed in ear lier sect ions:
nlintool: An Int er act ive GUI for Nonlinear Fit t ing and Pr edict ion on
page 1-104
rstool: Explor ing Gr aphs of Mult idimensional Polynomials on page 1-86
stepwise: Example: St epwise Regr ession on page 1-88
Demo Purpose
aoctool Int er act ive gr aphic pr edict ion of anocova fit s
disttool Gr aphic int er act ion wit h pr obabilit y dist r ibut ions
glmdemo Gener alized linear models slide show
nlintool Int er act ive fit t ing of nonlinear models
polytool Int er act ive gr aphic pr edict ion of polynomial fit s
randtool Int er act ive cont r ol of r andom number gener at ion
robustdemo Int er act ive compar ison of r obust and least squar es fit s
rsmdemo Design of exper iment s and r egr ession modeling
rstool Explor ing gr aphs of mult idimensional polynomials
stepwise Int er act ive st epwise r egr ession
1 Tu to r i a l
1-154
The disttool Demo
disttool is a gr aphic envir onment for developing an int uit ive under st anding
of pr obabilit y dist r ibut ions.
The disttool demo has t he following feat ur es:
A gr aph of t he cdf (pdf) for t he given par amet er s of a dist r ibut ion.
A pop-up menu for changing t he dist r ibut ion funct ion.
A pop-up menu for changing t he funct ion t ype (cdf <> pdf).
Slider s t o change t he par amet er set t ings.
Dat a ent r y boxes t o choose specific par amet er values.
Dat a ent r y boxes t o change t he limit s of t he par amet er slider s.
Dr aggable hor izont al and ver t ical r efer ence lines t o do int er act ive evaluat ion
of t he funct ion at var ying values.
A dat a ent r y box t o evaluat e t he funct ion at a specific x-value.
For cdf plot s, a dat a ent r y box on t he pr obabilit y axis (y-axis) t o find cr it ical
values cor r esponding t o a specific pr obabilit y.
A Close but t on t o end t he demonst r at ion.
D e m o s
1-155
Funct i on t y pe
pop- up
cdf f unct i on
Dr aggabl e
v er t i cal
r ef er ence l i ne
Par amet er v al ue
Di st r i but i ons
pop- up
cdf v al ue
x v al ue
Par amet er cont r ol
Dr aggabl e
hor i zont al
r ef er ence l i ne
Upper and
l ower
par amet er
bounds
1 Tu to r i a l
1-156
The polytool Demo
The polytool demo is an int er act ive gr aphic envir onment for polynomial cur ve
fit t ing and pr edict ion.
The polytool demo has t he following feat ur es:
A gr aph of t he dat a, t he fit t ed polynomial, and global confidence bounds on
a new pr edict ed value.
y-axis t ext t o display t he pr edict ed y-value and it s uncer t aint y at t he cur r ent
x-value.
A dat a ent r y box t o change t he degr ee of t he polynomial fit .
A dat a ent r y box t o evaluat e t he polynomial at a specific x-value.
A dr aggable ver t ical r efer ence line t o do int er act ive evaluat ion of t he
polynomial at var ying x-values.
Bounds and Method menus t o cont r ol t he confidence bounds and choose
bet ween least squar es or r obust fit t ing.
A Close but t on t o end t he demonst r at ion.
An Export list box t o st or e fit r esult s int o var iables.
You can use polytool t o do cur ve fit t ing and pr edict ion for any set of x-y dat a,
but , for t he sake of demonst r at ion, t he St at ist ics Toolbox pr ovides a dat a set
(polydata.mat) t o t each some basic concept s.
To st ar t t he demonst r at ion, you must fir st load t he dat a set .
load polydata
who
Your variables are:
x x1 y y1
The var iables x and y ar e obser vat ions made wit h er r or fr om a cubic
polynomial. The var iables x1 and y1 ar e dat a point s fr om t he t r ue funct ion
wit hout er r or .
If you do not specify t he degr ee of t he polynomial, polytool does a linear fit t o
t he dat a.
polytool(x,y)
D e m o s
1-157
The linear fit is not ver y good. The bulk of t he dat a wit h x-values bet ween zer o
and t wo has a st eeper slope t han t he fit t ed line. The t wo point s t o t he r ight ar e
dr agging down t he est imat e of t he slope.
In t he Degree box at t he t op, t ype 3 for a cubic model. Then, dr ag t he ver t ical
r efer ence line t o t he x-value of 2 (or t ype 2 in t he X Values t ext box).
Pr edi ct ed
v al ue
Pol y nomi al
degr ee
9 5 %
conf i dence
i nt er v al
Dr aggabl e
r ef er ence
l i ne
Lower
conf i dence
bound
Fi t t ed l i ne
Upper
conf i dence
bound
x - v al ue
Dat a poi nt
1 Tu to r i a l
1-158
This gr aph shows a much bet t er fit t o t he dat a. The confidence bounds ar e
closer t oget her indicat ing t hat t her e is less uncer t aint y in pr edict ion. The dat a
at bot h ends of t he plot t r acks t he fit t ed cur ve.
The following sect ions explor e addit ional aspect s of t he t ool:
Confidence Bounds
Over fit t ing
Confidence Bounds
By default , t he confidence bounds ar e nonsimult aneous bounds for a new
obser vat ion. What does t his mean? Let p(x) be t he t r ue but unknown funct ion
we want t o est imat e. The gr aph cont ains t he following t hr ee cur ves:
f(x), our fit t ed funct ion
l(x), t he lower confidence bounds
u(x), t he upper confidence bounds
D e m o s
1-159
Suppose we plan t o t ake a new obser vat ion at t he value . Call it
. This new obser vat ion has it s own er r or , so it sat isfies t he
equat ion
What ar e t he likely values for t his new obser vat ion? The confidence bounds
pr ovide t he answer . The int er val [ , ] is a 95% confidence bound for
.
These ar e t he default bounds, but t he Bounds menu on t he polytool figur e
window pr ovides opt ions for changing t he meaning of t hese bounds. This menu
has opt ions t hat let you specify whet her t he bounds ar e t o apply t o t he
est imat ed funct ion or t o a new obser vat ion, and whet her t he bounds should be
simult aneous or not . Using t hese opt ions you can pr oduce any of t he following
t ypes of confidence bounds.
O ver fitting
If t he cubic polynomial is a good fit , it is t empt ing t o t r y a higher or der
polynomial t o see if even mor e pr ecise pr edict ions ar e possible.
Since t he t r ue funct ion is cubic, t his amount s t o over fit t ing t he dat a. Use t he
dat a ent r y box for degr ee and t ype 5 for a quint ic model.
Simultaneous? For Quantity Yields Confidence Bounds for
Nonsimult aneous Obser vat ion
Nonsimult aneous Cur ve
Simult aneous Obser vat ion , globally for any x
Simult aneous Cur ve , simult aneously for all x
x
n 1 +
y
n 1 +
x
n 1 +
( )
n 1 +
y
n 1 +
x
n 1 +
( ) p x
n 1 +
( )
n 1 +
+ =
l
n 1 +
u
n 1 +
y
n 1 +
x
n 1 +
( )
y
n 1 +
x
n 1 +
( )
p x
n 1 +
( )
y
n 1 +
x ( )
p x ( )
1 Tu to r i a l
1-160
As measur ed by t he confidence bounds, t he fit is pr ecise near t he dat a point s.
But , in t he r egion bet ween t he dat a gr oups, t he uncer t aint y of pr edict ion r ises
dr amat ically.
This bulge in t he confidence bounds happens because t he dat a r eally does not
cont ain enough infor mat ion t o est imat e t he higher or der polynomial t er ms
pr ecisely, so even int er polat ion using polynomials can be r isky in some cases.
D e m o s
1-161
The aoctool Demo
The aoctool demo is an int er act ive gr aphical envir onment for fit t ing and
pr edict ion wit h analysis of covar iance (anocova) models. It is similar t o t he
polytool demo.
Analysis of covar iance is a t echnique for analyzing gr ouped dat a having a
r esponse (y, t he var iable t o be pr edict ed) and a pr edict or (x, t he var iable used
t o do t he pr edict ion). Using analysis of covar iance, you can model y as a linear
funct ion of x, wit h t he coefficient s of t he line possibly var ying fr om gr oup t o
gr oup. The aoctool funct ion fit s t he following models for t he it h gr oup:
In t he four t h model, for example, t he int er cept var ies fr om one gr oup t o t he
next , but t he slope is t he same for each gr oup. In t he fir st model, t her e is a
common int er cept and no slope. In or der t o make t he gr oup coefficient s well
det er mined, we impose t he const r aint s
i

i
= 0.
The aoctool demo displays t he r esult s of t he fit in t hr ee figur e windows. One
window displays est imat es of t he coefficient s (,
i
, ,
i
). A second displays an
analysis of var iance t able t hat you can use t o t est whet her a mor e complex
model is significant ly bet t er t han a simpler one. The t hir d, main gr aphics
window has t he following feat ur es:
A gr aph of t he dat a wit h super imposed fit t ed lines and opt ional confidence
bounds.
y-axis t ext t o display t he pr edict ed y-value and it s uncer t aint y at t he cur r ent
x-value for t he cur r ent gr oup, if a gr oup is cur r ent ly select ed.
A dat a ent r y box t o evaluat e t he fit at a specific x-value.
A list box t o evaluat e t he fit for a specific gr oup or t o display fit t ed lines for
all gr oups.
A dr aggable ver t ical r efer ence line t o do int er act ive evaluat ion of t he fit at
var ying x-values.
1 same mean
2 separ at e means
3 same line
4 par allel lines
5 separ at e lines
y + =
y
i
+ ( ) + =
y x + + =
y
i
+ ( ) x + + =
y
i
+ ( )
i
+ ( )x + + =
1 Tu to r i a l
1-162
A Close but t on t o end t he demonst r at ion.
An Export list box t o st or e fit r esult s int o var iables.
The following sect ion pr ovides an illust r at ive example.
Ex a mple: a octool w ith Sa mple Da ta
The St at ist ics Toolbox has a small dat a set named carsmall wit h infor mat ion
about car s. It is a good sample dat a set t o use wit h aoctool. You can also use
aoctool wit h your own dat a.
To st ar t t he demonst r at ion, load t he dat a set .
load carsmall
who
Your variables are:
Acceleration Horsepower Model_Year
Cylinders MPG Origin
Displacement Model Weight
Suppose we want t o st udy t he r elat ionship bet ween t he weight of a car and it s
mileage, and whet her t his r elat ionship has changed over t he year s.
Next , st ar t up t he t ool.
[h,atab,ctab,stats] = aoctool(Weight,MPG,Model_Year);
Note: 6 observations with missing values have been removed.
The gr aphical out put consist s of t he following main window, plus a t able of
coefficient est imat es and an analysis of var iance t able.
D e m o s
1-163
The gr oup of each dat a point is coded by it s color and symbol, and t he fit for
each gr oup has t he same color as t he dat a point s.
The init ial fit models t he y var iable, MPG, as a linear funct ion of t he x var iable,
Weight. Each gr oup has a separ at e line. The coefficient s of t he t hr ee lines
1 Tu to r i a l
1-164
appear in t he figur e t it led ANOCOVA Coeffi ci ents. You can see t hat t he slopes
ar e r oughly -0.0078, wit h a small deviat ion for each gr oup:
Not ice t hat t he t hr ee fit t ed lines have slopes t hat ar e r oughly similar . Could
t hey r eally be t he same? The Model_Year*Weight int er act ion expr esses t he
differ ence in slopes, and t he ANOVA t able shows a t est for t he significance of
t his t er m. Wit h an F st at ist ic of 5.23 and a p-value of 0.0072, t he slopes ar e
significant ly differ ent .
To examine t he fit s when t he slopes ar e const r ained t o be t he same, r et ur n t o
t he ANOCOVA Predi cti on Plot window and use t he Model pop-up t o select a
Parallel Li nes model. The window updat es t o show t he gr aph below.
Model year 70:
Model year 76:
Model year 82:
y 45.9798 8.5805 ( ) 0.0078 0.002 + ( )x + + =
y 45.9798 3.8902 ( ) 0.0078 0.0011 + ( )x + + =
y 45.9798 12.4707 + ( ) 0.0078 0.0031 ( )x + + =
D e m o s
1-165
Though t his fit looks r easonable, we know it is significant ly wor se t han t he
Separate Li nes model. Use t he Model pop-up again t o r et ur n t o t he or iginal
model.
The following sect ions focus on t wo ot her int er est ing aspect s of aoctool:
Confidence Bounds
Mult iple Compar isons
Confidence Bounds. Now we have est imat es of t he r elat ionship bet ween MPG and
Weight for each Model_Year, but how accur at e ar e t hey? We can super impose
confidence bounds on t he fit s by examining t hem one gr oup at a t ime. In t he
Model_Year menu at t he lower r ight of t he figur e, change t he set t ing fr om
All Groups t o 82. The dat a and fit s for t he ot her gr oups ar e dimmed, and
confidence bounds appear ar ound t he 82 fit .
1 Tu to r i a l
1-166
The dashed lines for m an envelope ar ound t he fit t ed line for model year 82.
Under t he assumpt ion t hat t he t r ue r elat ionship is linear , t hese bounds
pr ovide a 95% confidence r egion for t he t r ue line. Not e t hat t he fit s for t he ot her
model year s ar e well out side t hese confidence bounds for Weight values
bet ween 2000 and 3000.
Somet imes it is mor e valuable t o be able t o pr edict t he r esponse value for a new
obser vat ion, not just est imat e t he aver age r esponse value. Like t he polytool
funct ion, t he aoctool funct ion has a Bounds menu t o change t he definit ion of
t he confidence bounds. Use t hat menu t o change fr om Li ne t o Observati on.
The r esult ing wider int er vals r eflect t he uncer t aint y in t he par amet er
est imat es as well as t he r andomness of a new obser vat ion.
D e m o s
1-167
Also like t he polytool funct ion, t he aoctool funct ion has cr osshair s you can
use t o manipulat e t he Weight and wat ch t he est imat e and confidence bounds
along t he y-axis updat e. These values appear only when a single gr oup is
select ed, not when All Groups is select ed.
Multiple Compa risons. We can per for m a mult iple compar ison t est by using t he
stats out put fr om aoctool as input t o t he multcompare funct ion. The
multcompare funct ion can t est eit her slopes, int er cept s, or populat ion mar ginal
means (t he height s of t he four lines evaluat ed at t he mean X value). In t his
example, we have alr eady det er mined t hat t he slopes ar e not all t he same, but
could it be t hat t wo ar e t he same and only t he ot her one is differ ent ? We can
t est t hat hypot hesis.
multcompare(stats,0.05,'on','','s')
ans =
1.0000 2.0000 -0.0012 0.0008 0.0029
1.0000 3.0000 0.0013 0.0051 0.0088
2.0000 3.0000 0.0005 0.0042 0.0079
1 Tu to r i a l
1-168
This mat r ix shows t hat t he est imat ed differ ence bet ween t he int er cept s of
gr oups 1 and 2 (1970 and 1976) is 0.0008, and a confidence int er val for t he
differ ence is [-0.0012, 0.0029]. Ther e is no significant differ ence bet ween t he
t wo. Ther e ar e significant differ ences, however , bet ween t he int er cept for 1982
and each of t he ot her t wo. The gr aph shows t he same infor mat ion.
Not e t hat t he stats st r uct ur e was cr eat ed in t he init ial call t o t he aoctool
funct ion, so it is based on t he init ial model fit (t ypically a separ at e-lines model).
If you change t he model int er act ively and want t o base your mult iple
compar isons on t he new model, you need t o r un aoctool again t o get anot her
stats st r uct ur e, t his t ime specifying your new model as t he init ial model.
D e m o s
1-169
The randtool Demo
randtool is a gr aphic envir onment for gener at ing r andom samples fr om
var ious pr obabilit y dist r ibut ions and displaying t he sample hist ogr am.
The randtool demo has t he following feat ur es:
A hist ogr am of t he sample.
A pop-up menu for changing t he dist r ibut ion funct ion.
Slider s t o change t he par amet er set t ings.
A dat a ent r y box t o choose t he sample size.
Dat a ent r y boxes t o choose specific par amet er values.
Dat a ent r y boxes t o change t he limit s of t he par amet er slider s.
An Output but t on t o out put t he cur r ent sample t o t he var iable ans.
A Resample but t on t o allow r epet it ive sampling wit h const ant sample size
and fixed par amet er s.
A Close but t on t o end t he demonst r at ion.
Par amet er v al ue
Dr aw agai n
f r om t he
same
di st r i but i on
Par amet er cont r ol
Hi st ogr am
Upper and
l ower
par amet er
bounds
Sampl e
si ze
Di st r i but i ons
pop- up
Out put t o
v ar i abl e
ans
1 Tu to r i a l
1-170
The rsmdemo Demo
The rsmdemo ut ilit y is an int er act ive gr aphic envir onment t hat demonst r at es
t he design of exper iment s and sur face fit t ing t hr ough t he simulat ion of a
chemical r eact ion. The goal of t he demo is t o find t he levels of t he r eact ant s
needed t o maximize t he r eact ion r at e.
Ther e ar e t wo par t s t o t he demo:
Par t 1 Compar e dat a gat her ed t hr ough t r ial and er r or wit h dat a fr om a
designed exper iment .
Par t 2 Compar e r esponse sur face (polynomial) modeling wit h nonlinear
modeling.
Pa r t 1
Begin t he demo by using t he slider s in t he Reacti on Si mulator window t o
cont r ol t he par t ial pr essur es of t hr ee r eact ant s: Hydrogen, n-Pentane, and
Isopentane. Each t ime you click t he Run but t on, t he levels for t he r eact ant s
and r esult s of t he r un ar e ent er ed in t he Tri al and Error Data window.
Based on t he r esult s of pr evious r uns, you can change t he levels of t he
r eact ant s t o incr ease t he r eact ion r at e. (The r esult s ar e det er mined using an
under lying model t hat t akes int o account t he noise in t he pr ocess, so even if you
keep all of t he levels t he same, t he r esult s will var y fr om r un t o r un.) You ar e
allot t ed a budget of 13 r uns. When you have complet ed t he r uns, you can use
t he Plot menu on t he Tri al and Error Data window t o plot t he r elat ionships
bet ween t he r eact ant s and t he r eact ion r at e, or click t he Analyze but t on. When
you click Analyze, rsmdemo calls t he rstool funct ion, which you can t hen use
t o t r y t o opt imize t he r esult s.)
Next , per for m anot her set of 13 r uns, t his t ime fr om a designed exper iment . In
t he Experi mental Desi gn Data window, click t he Do Experi ment but t on.
rsmdemo calls t he cordexch funct ion t o gener at e a D-opt imal design, and t hen,
for each r un, comput es t he r eact ion r at e.
Now use t he Plot menu on t he Experi mental Desi gn Data window t o plot t he
r elat ionships bet ween t he levels of t he r eact ant s and t he r eact ion r at e, or click
t he Response Surface but t on t o call rstool t o find t he opt imal levels of t he
r eact ant s.
D e m o s
1-171
Compar e t he analysis r esult s for t he t wo set s of dat a. It is likely (t hough not
cer t ain) t hat youll find some or all of t hese differ ences:
You can fit a full quadr at ic model wit h t he dat a fr om t he designed
exper iment , but t he t r ial and er r or dat a may be insufficient for fit t ing a
quadr at ic model or int er act ions model.
Using t he dat a fr om t he designed exper iment , you ar e mor e likely t o be able
t o find levels for t he r eact ant s t hat r esult in t he maximum r eact ion r at e.
Even if you find t he best set t ings using t he t r ial and er r or dat a, t he
confidence bounds ar e likely t o be wider t han t hose fr om t he designed
exper iment .
Pa r t 2
Now analyze t he exper iment al design dat a wit h a polynomial model and a
nonlinear model, and compar ing t he r esult s. The t r ue model for t he pr ocess,
which is used t o gener at e t he dat a, is act ually a nonlinear model. However ,
wit hin t he r ange of t he dat a, a quadr at ic model appr oximat es t he t r ue model
quit e well.
To see t he polynomial model, click t he Response Surface but t on on t he
Experi mental Desi gn Data window. rsmdemo calls rstool, which fit s a full
quadr at ic model t o t he dat a. Dr ag t he r efer ence lines t o change t he levels of t he
r eact ant s, and find t he opt imal r eact ion r at e. Obser ve t he widt h of t he
confidence int er vals.
Now click t he Nonli near Model but t on on t he Experi mental Desi gn Data
window. rsmdemo calls nlintool, which fit s a Hougen-Wat son model t o t he
dat a. As wit h t he quadr at ic model, you can dr ag t he r efer ence lines t o change
t he r eact ant levels. Obser ve t he r eact ion r at e and t he confidence int er vals.
Compar e t he analysis r esult s for t he t wo models. Even t hough t he t r ue model
is nonlinear , you may find t hat t he polynomial model pr ovides a good fit .
Because polynomial models ar e much easier t o fit and wor k wit h t han
nonlinear models, a polynomial model is oft en pr efer able even when modeling
a nonlinear pr ocess. Keep in mind, however , t hat such models ar e unlikely t o
be r eliable for ext r apolat ing out side t he r ange of t he dat a.
1 Tu to r i a l
1-172
The glmdemo Demo
The glmdemo funct ion pr esent s a simple slide show descr ibing gener alized
linear models. It pr esent s examples of what funct ions and dist r ibut ions ar e
available wit h gener alized linear models. It pr esent s an example wher e
t r adit ional linear least squar es fit t ing is not appr opr iat e, and shows how t o use
t he glmfit funct ion t o fit a logist ic r egr ession model and t he glmval funct ion
t o comput e pr edict ions fr om t hat model.
The robustdemo Demo
The robustdemo funct ion pr esent s a simple compar ison of least squar es and
r obust fit s for a r esponse and a single pr edict or . You can use robustdemo wit h
your own dat a or wit h t he sample dat a pr ovided.
To begin using robustdemo wit h t he built -in sample dat a, simply t ype t he
funct ion name.
robustdemo
The r esult ing figur e pr esent s a scat t er plot wit h t wo fit t ed lines. One line is t he
fit fr om an or dinar y least squar es r egr ession. The ot her is fr om a r obust
r egr ession. Along t he bot t om of t he figur e ar e t he equat ions for t he fit t ed line
and t he est imat ed er r or st andar d deviat ion for each fit .
The effect of any point on t he least squar es fit depends on t he r esidual and
lever age for t hat point . The r esidual is simply t he ver t ical dist ance fr om t he
point t o t he line. The lever age is a measur e of how far t he point is fr om t he
cent er of t he X dat a.
The effect of any point on t he r obust fit also depends on t he weight assigned t o
t he point . Point s far fr om t he line get lower weight .
You can use t he r ight mouse but t on t o click on any point and see it s least
squar es lever age and r obust weight .
D e m o s
1-173
In t his example, t he r ight most point has a lever age value of 0.35. It is also far
fr om t he line, so it exer t s a lar ge influence on t he least squar es fit . It has a
small weight , t hough, so it is effect ively excluded fr om t he r obust fit .
Using t he left mouse but t on, you can exper iment t o see how changes in t he dat a
affect t he t wo fit s. Select any point , and dr ag it t o a new locat ion while holding
t he left but t on down. When you r elease t he point , bot h fit s updat e.
Br inging t he r ight most point closer t o t he line makes t he t wo fit t ed lines near ly
ident ical. Now, t he point has near ly full weight in t he r obust fit .
1 Tu to r i a l
1-174
Se l e c te d B i b l i o g r a p h y
1-175
Selected Bibliography
At kinson, A.C., and A.N. Donev, Optimum Experimental Designs, Oxfor d
Science Publicat ions 1992.
Bat es, D. and D. Wat t s. Nonlinear Regression Analysis and Its Applications,
J ohn Wiley and Sons. 1988. pp. 271272.
Ber noulli, J ., Ars Conjectandi, Basiliea: Thur nisius [11.19], 1713
Box, G.E.P., W.G. Hunt er , and J .S. Hunt er . S tatistics for Experimenters. Wiley,
New Yor k. 1978.
Chat t er jee, S. and A.S. Hadi. Influential Observations, High Leverage Points,
and Outliers in Linear Regression. St at ist ical Science, 1986. pp. 379416.
Dobson, A. J ., An Introduction to Generalized Linear Models, 1990, CRC Pr ess.
Efr on, B., and R.J . Tibshir ani. An Introduction to the Bootstrap, Chapman and
Hall, New Yor k. 1993.
Evans, M., N. Hast ings, and B. Peacock. S tatistical Distributions, S econd
Edition. J ohn Wiley and Sons, 1993.
Hald, A., S tatistical Theory with Engineering Applications, J ohn Wiley and
Sons, 1960. p. 647.
Hogg, R.V., and J . Ledolt er . Engineering S tatistics. MacMillan Publishing
Company, 1987.
J ohnson, N., and S. Kot z. Distributions in S tatistics: Continuous Univariate
Distributions. J ohn Wiley and Sons, 1970.
MuCullagh, P., and J . A. Nelder , Generalized Linear Models, 2nd edit ion, 1990,
Chapman and Hall.
Moor e, J ., Total Biochemical Oxygen Demand of Dairy Manures. Ph.D. t hesis.
Univer sit y of Minnesot a, Depar t ment of Agr icult ur al Engineer ing, 1975.
Poisson, S.D., Recher ches sur la Pr obabilit des J ugement s en Mat ier e
Cr iminelle et en Met ir e Civile, Pr cdes des Regles Gnr ales du Calcul des
Pr obabilit is. Par is: Bachelier , Impr imeur -Libr air e pour les Mat hemat iques,
1837.
St udent , On the Probable Error of the Mean. Biomet r ika, 6:1908. pp. 125.
1 Tu to r i a l
1-176
Weibull, W., A S tatistical Theory of the S trength of Materials. Ingenior s
Vet enskaps Akademiens Handlingar , Royal Swedish Inst it ut e for Engineer ing
Resear ch. St ockholm, Sweden, No. 153. 1939.

2
Refer ence
2 Re f e r e n c e
2-2
This chapt er cont ains det ailed descr ipt ions of all t he St at ist ics Toolbox
funct ions. It is divided int o t wo sect ions:
Funct ion Cat egor y List a list of funct ions, gr ouped by subject ar ea
Funct ion descr ipt ions in alphabet ical or der
Fu n c ti o n C a te g o r y Li st
2-3
Function Category List
The St at ist ics Toolbox pr ovides sever al cat egor ies of funct ions.
The Statistics Toolboxs Main Categories of Functions
Pr obabilit y Dist r ibut ions Par amet er Est imat ion
Cumulat ive Dist r ibut ion Funct ions (cdf)
Pr obabilit y Densit y Funct ions (pdf)
Inver se Cumulat ive Dist r ibut ion Funct ions
Random Number Gener at or s
Moment s of Dist r ibut ion Funct ions
Descr ipt ive St at ist ics Descr ipt ive st at ist ics for dat a samples
St at ist ical Plot t ing St at ist ical plot s
St at ist ical Pr ocess Cont r ol St at ist ical Pr ocess Cont r ol
Clust er Analysis Gr ouping it ems wit h similar char act er ist ics
int o clust er s
Linear Models Fit t ing linear models t o dat a
Nonlinear Regr ession Fit t ing nonlinear r egr ession models
Design of Exper iment s Design of Exper iment s
Pr incipal Component s
Analysis
Pr incipal Component s Analysis
Hypot hesis Test s St at ist ical t est s of hypot heses
File I/O Reading dat a fr om and wr it ing dat a t o
oper at ing-syst em files
Demonst r at ions Demonst r at ions
Dat a Dat a for examples
2 Re f e r e n c e
2-4
The following t ables list t he funct ions in each of t hese specific ar eas. The fir st
seven t ables cont ain pr obabilit y dist r ibut ion funct ions. The r emaining t ables
descr ibe t he ot her cat egor ies of funct ions.
Parameter Estimation
betafit Par amet er est imat ion for t he bet a dist r ibut ion
betalike Bet a log-likelihood funct ion
binofit Par amet er est imat ion for t he binomial dist r ibut ion
expfit Par amet er est imat ion for t he exponent ial dist r ibut ion
gamfit Par amet er est imat ion for t he gamma dist r ibut ion
gamlike Gamma log-likelihood funct ion
mle Maximum likelihood est imat ion
normlike Nor mal log-likelihood funct ion
normfit Par amet er est imat ion for t he nor mal dist r ibut ion
poissfit Par amet er est imat ion for t he Poisson dist r ibut ion
unifit Par amet er est imat ion for t he unifor m dist r ibut ion
Cumulative Distribution Functions (cdf)
betacdf Bet a cdf
binocdf Binomial cdf
cdf Par amet er ized cdf r out ine
chi2cdf Chi-squar e cdf
expcdf Exponent ial cdf
Fu n c ti o n C a te g o r y Li st
2-5
fcdf F cdf
gamcdf Gamma cdf
geocdf Geomet r ic cdf
hygecdf Hyper geomet r ic cdf
logncdf Lognor mal cdf
nbincdf Negat ive binomial cdf
ncfcdf Noncent r al F cdf
nctcdf Noncent r al t cdf
ncx2cdf Noncent r al Chi-squar e cdf
normcdf Nor mal (Gaussian) cdf
poisscdf Poisson cdf
raylcdf Rayleigh cdf
tcdf St udent s t cdf
unidcdf Discr et e unifor m cdf
unifcdf Cont inuous unifor m cdf
weibcdf Weibull cdf
Probability Density Functions (pdf)
betapdf Bet a pdf
binopdf Binomial pdf
chi2pdf Chi-squar e pdf
exppdf Exponent ial pdf
Cumulative Distribution Functions (cdf) (Continued)
2 Re f e r e n c e
2-6
fpdf F pdf
gampdf Gamma pdf
geopdf Geomet r ic pdf
hygepdf Hyper geomet r ic pdf
normpdf Nor mal (Gaussian) pdf
lognpdf Lognor mal pdf
nbinpdf Negat ive binomial pdf
ncfpdf Noncent r al F pdf
nctpdf Noncent r al t pdf
ncx2pdf Noncent r al Chi-squar e pdf
pdf Par amet er ized pdf r out ine
poisspdf Poisson pdf
raylpdf Rayleigh pdf
tpdf St udent s t pdf
unidpdf Discr et e unifor m pdf
unifpdf Cont inuous unifor m pdf
weibpdf Weibull pdf
Inverse Cumulative Distribution Functions
betainv Bet a cr it ical values
binoinv Binomial cr it ical values
chi2inv Chi-squar e cr it ical values
Probability Density Functions (pdf) (Continued)
Fu n c ti o n C a te g o r y Li st
2-7
expinv Exponent ial cr it ical values
finv F cr it ical values
gaminv Gamma cr it ical values
geoinv Geomet r ic cr it ical values
hygeinv Hyper geomet r ic cr it ical values
logninv Lognor mal cr it ical values
nbininv Negat ive binomial cr it ical values
ncfinv Noncent r al F cr it ical values
nctinv Noncent r al t cr it ical values
ncx2inv Noncent r al Chi-squar e cr it ical values
icdf Par amet er ized inver se dist r ibut ion r out ine
norminv Nor mal (Gaussian) cr it ical values
poissinv Poisson cr it ical values
raylinv Rayleigh cr it ical values
tinv St udent s t cr it ical values
unidinv Discr et e unifor m cr it ical values
unifinv Cont inuous unifor m cr it ical values
weibinv Weibull cr it ical values
Random Number Generators
betarnd Bet a r andom number s
binornd Binomial r andom number s
Inverse Cumulative Distribution Functions (Continued)
2 Re f e r e n c e
2-8
chi2rnd Chi-squar e r andom number s
exprnd Exponent ial r andom number s
frnd F r andom number s
gamrnd Gamma r andom number s
geornd Geomet r ic r andom number s
hygernd Hyper geomet r ic r andom number s
lognrnd Lognor mal r andom number s
nbinrnd Negat ive binomial r andom number s
ncfrnd Noncent r al F r andom number s
nctrnd Noncent r al t r andom number s
ncx2rnd Noncent r al Chi-squar e r andom number s
normrnd Nor mal (Gaussian) r andom number s
poissrnd Poisson r andom number s
raylrnd Rayleigh r andom number s
random Par amet er ized r andom number r out ine
trnd St udent s t r andom number s
unidrnd Discr et e unifor m r andom number s
unifrnd Cont inuous unifor m r andom number s
weibrnd Weibull r andom number s
Random Number Generators (Continued)
Fu n c ti o n C a te g o r y Li st
2-9
Moments of Distribution Functions
betastat Bet a mean and var iance
binostat Binomial mean and var iance
chi2stat Chi-squar e mean and var iance
expstat Exponent ial mean and var iance
fstat F mean and var iance
gamstat Gamma mean and var iance
geostat Geomet r ic mean and var iance
hygestat Hyper geomet r ic mean and var iance
lognstat Lognor mal mean and var iance
nbinstat Negat ive binomial mean and var iance
ncfstat Noncent r al F mean and var iance
nctstat Noncent r al t mean and var iance
ncx2stat Noncent r al Chi-squar e mean and var iance
normstat Nor mal (Gaussian) mean and var iance
poisstat Poisson mean and var iance
raylstat Rayleigh mean and var iance
tstat St udent s t mean and var iance
unidstat Discr et e unifor m mean and var iance
unifstat Cont inuous unifor m mean and var iance
weibstat Weibull mean and var iance
2 Re f e r e n c e
2-10
Descriptive Statistics
corrcoef Cor r elat ion coefficient s (in MATLAB)
cov Covar iance mat r ix (in MATLAB)
geomean Geomet r ic mean
harmmean Har monic mean
iqr Int er quar t ile r ange
kurtosis Sample kur t osis
mad Mean absolut e deviat ion
mean Ar it hmet ic aver age (in MATLAB)
median 50t h per cent ile (in MATLAB)
moment Cent r al moment s of all or der s
nanmax Maximum ignor ing missing dat a
nanmean Aver age ignor ing missing dat a
nanmedian Median ignor ing missing dat a
nanmin Minimum ignor ing missing dat a
nanstd St andar d deviat ion ignor ing missing dat a
nansum Sum ignor ing missing dat a
prctile Empir ical per cent iles of a sample
range Sample r ange
skewness Sample skewness
std St andar d deviat ion (in MATLAB)
trimmean Tr immed mean
var Var iance
Fu n c ti o n C a te g o r y Li st
2-11
Statistical Plotting
boxplot Box plot s
errorbar Er r or bar plot
fsurfht Int er act ive cont our plot of a funct ion
gline Int er act ive line dr awing
gname Int er act ive point labeling
lsline Add least -squar es fit line t o plot t ed dat a
normplot Nor mal pr obabilit y plot s
pareto Par et o char t s
qqplot Quant ile-Quant ile plot s
rcoplot Regr ession case or der plot
refcurve Refer ence polynomial
refline Refer ence line
surfht Int er act ive int er polat ing cont our plot
weibplot Weibull plot t ing
Statistical Process Control
capable Qualit y capabilit y indices
capaplot Plot of pr ocess capabilit y
ewmaplot Exponent ially weight ed moving aver age plot
histfit Hist ogr am and nor mal densit y cur ve
normspec Plot nor mal densit y bet ween limit s
2 Re f e r e n c e
2-12
schart Time plot of st andar d deviat ion
xbarplot Time plot of means
Cluster Analysis
cluster Cr eat e clust er s fr om linkage out put
clusterdata Cr eat e clust er s fr om a dat aset
cophenet Calculat e t he cophenet ic cor r elat ion coefficient
dendrogram Plot a hier ar chical t r ee in a dendr ogr am gr aph
inconsistent Calculat e t he inconsist ency values of object s in a clust er
hier ar chy t r ee
linkage Link object s in a dat aset int o a hier ar chical t r ee of
binar y clust er s
pdist Calculat e t he pair wise dist ance bet ween object s in a
dat aset
squareform Refor mat out put of pdist funct ion fr om vect or t o squar e
mat r ix
zscore Nor malize a dat aset befor e calculat ing t he dist ance
Linear Models
anova1 One-way Analysis of Var iance (ANOVA)
anova2 Two-way Analysis of Var iance
lscov Regr ession given a covar iance mat r ix (in MATLAB)
Statistical Process Control (Continued)
Fu n c ti o n C a te g o r y Li st
2-13
polyconf Polynomial pr edict ion wit h confidence int er vals
polyfit Polynomial fit t ing (in MATLAB)
polyval Polynomial pr edict ion (in MATLAB)
regress Mult iple linear r egr ession
ridge Ridge r egr ession
rstool Response sur face t ool
stepwise St epwise r egr ession GUI
Nonlinear Regression
nlinfit Nonlinear least -squar es fit t ing
nlintool Pr edict ion gr aph for nonlinear fit s
nlparci Confidence int er vals on par amet er s
nlpredci Confidence int er vals for pr edict ion
nnls Nonnegat ive least squar es (in MATLAB)
Design of Experiments
cordexch D-opt imal design using coor dinat e exchange
daugment D-opt imal augment at ion of designs
dcovary D-opt imal design wit h fixed covar iat es
ff2n Two-level full fact or ial designs
fullfact Mixed level full fact or ial designs
Linear Models (Continued)
2 Re f e r e n c e
2-14
hadamard Hadamar d designs (in MATLAB)
rowexch D-opt imal design using r ow exchange
Principal Components Analysis
barttest Bar t let t s t est
pcacov PCA fr om covar iance mat r ix
pcares Residuals fr om PCA
princomp PCA fr om r aw dat a mat r ix
Hypothesis Tests
ranksum Wilcoxon r ank sum t est
signrank Wilcoxon signed r ank t est
signtest Sign t est for pair ed samples
ttest One sample t -t est
ttest2 Two sample t -t est
ztest Z-t est
File I/ O
caseread Read casenames fr om a file
casewrite Wr it e casenames fr om a st r ing mat r ix t o a file
Design of Experiments (Continued)
Fu n c ti o n C a te g o r y Li st
2-15
tblread Ret r ieve t abular dat a fr om t he file syst em
tblwrite Wr it e dat a in t abular for m t o t he file syst em
Demonstrations
disttool Int er act ive explor at ion of dist r ibut ion funct ions
randtool Int er act ive r andom number gener at ion
polytool Int er act ive fit t ing of polynomial models
rsmdemo Int er act ive pr ocess exper iment at ion and analysis
Data
census.mat U. S. Populat ion 1790 t o 1980
cities.mat Names of U.S. met r opolit an ar eas
discrim.mat Classificat ion dat a
gas.mat Gasoline pr ices
hald.mat Hald dat a
hogg.mat Bact er ia count s fr om milk shipment s
lawdata.mat GPA ver sus LSAT for 15 law schools
mileage.mat Mileage dat a for t hr ee car models fr om t wo fact or ies
moore.mat Five fact or one r esponse r egr ession dat a
parts.mat Dimensional r unout on 36 cir cular par t s
popcorn.mat Dat a for popcor n example (anova2, friedman)
File I/ O (Continued)
2 Re f e r e n c e
2-16
polydata.mat Dat a for polytool demo
reaction.mat React ion kinet ics dat a
sat.dat ASCII dat a for tblread example
Data (Continued)
anova1
2-17
2anova1
Purpose One-way Analysis of Var iance (ANOVA).
Syntax p = anova1(X)
p = anova1(X,group)
p = anova1(X,group,'displayopt')
[p,table] = anova1(...)
[p,table,stats] = anova1(...)
Description p = anova1(X) per for ms a balanced one-way ANOVA for compar ing t he
means of t wo or mor e columns of dat a in t he m-by-n mat r ix X, wher e each
column r epr esent s an independent sample cont aining m mut ually independent
obser vat ions. The funct ion r et ur ns t he p-value for t he null hypot hesis t hat all
samples in X ar e dr awn fr om t he same populat ion (or fr om differ ent
populat ions wit h t he same mean).
If t he p-value is near zer o, t his cast s doubt on t he null hypot hesis and suggest s
t hat at least one sample mean is significant ly differ ent t han t he ot her sample
means. The choice of a cr it ical p-value t o det er mine whet her t he r esult is
judged st at ist ically significant is left t o t he r esear cher . It is common t o
declar e a r esult significant if t he p-value is less t han 0.05 or 0.01.
The anova1 funct ion displays t wo figur es. The fir st figur e is t he st andar d
ANOVA t able, which divides t he var iabilit y of t he dat a in X int o t wo par t s:
Var iabilit y due t o t he differ ences among t he column means (var iabilit y
between gr oups)
Var iabilit y due t o t he differ ences bet ween t he dat a in each column and t he
column mean (var iabilit y within gr oups)
The ANOVA t able has six columns:
The fir st shows t he sour ce of t he var iabilit y.
The second shows t he Sum of Squar es (SS) due t o each sour ce.
The t hir d shows t he degr ees of fr eedom (df) associat ed wit h each sour ce.
The four t h shows t he Mean Squar es (MS) for each sour ce, which is t he r at io
SS/df.
The fift h shows t he F st at ist ic, which is t he r at io of t he MSs.
The sixt h shows t he p-value, which is der ived fr om t he cdf of F. As F
incr eases, t he p-value decr eases.
anova1
2-18
The second figur e displays box plot s of each column of X. Lar ge differ ences in
t he cent er lines of t he box plot s cor r espond t o lar ge values of F and
cor r espondingly small p-values.
p = anova1(X,group) uses t he values in group (a char act er ar r ay or cell
ar r ay) as labels for t he box plot of t he samples in X, when X is a mat r ix. Each
r ow of group cont ains t he label for t he dat a in t he cor r esponding column of X,
so group must have lengt h equal t o t he number of columns in X.
When X is a vect or , anova1 per for ms a one-way ANOVA on t he samples
cont ained in X, as indexed by input group (a vect or , char act er ar r ay, or cell
ar r ay). Each element in group ident ifies t he gr oup (i.e., sample) t o which t he
cor r esponding element in vect or X belongs, so group must have t he same lengt h
as X. The labels cont ained in group ar e also used t o annot at e t he box plot . The
vect or -input for m of anova1 does not r equir e equal number s of obser vat ions in
each sample, so it is appr opr iat e for unbalanced dat a.
It is not necessar y t o label samples sequent ially (1, 2, 3, ...). For example, if X
cont ains measur ement s t aken at t hr ee differ ent t emper at ur es, -27, 65, and
110, you could use t hese number s as t he sample labels in group. If a r ow of
group cont ains an empt y cell or empt y st r ing, t hat r ow and t he cor r esponding
obser vat ion in X ar e disr egar ded. NaNs in eit her input ar e similar ly ignor ed.
p = anova1(X,group,'displayopt') enables t he ANOVA t able and box plot
displays when 'displayopt' is 'on' (default ) and suppr esses t he displays
when 'displayopt' is 'off'.
[p,table] = anova1(...) r et ur ns t he ANOVA t able (including column and
r ow labels) in cell ar r ay table. (You can copy a t ext ver sion of t he ANOVA t able
t o t he clipboar d by using t he Copy Text it em on t he Edi t menu.)
[p,table,stats] = anova1(...) r et ur ns a stats st r uct ur e t hat you can use
t o per for m a follow-up mult iple compar ison t est . The anova1 t est evaluat es t he
hypot hesis t hat t he samples all have t he same mean against t he alt er nat ive
t hat t he means ar e not all t he same. Somet imes it is pr efer able t o per for m a
t est t o det er mine which pairs of means ar e significant ly differ ent , and which
ar e not . You can use t he multcompare funct ion t o per for m such t est s by
supplying t he stats st r uct ur e as input .
anova1
2-19
Assumptions
The ANOVA t est makes t he following assumpt ions about t he dat a in X:
All sample populat ions ar e nor mally dist r ibut ed.
All sample populat ions have equal var iance.
All obser vat ions ar e mut ually independent .
The ANOVA t est is known t o be r obust t o modest violat ions of t he fir st t wo
assumpt ions.
Examples Ex a mple 1
The five columns of X ar e t he const ant s one t hr ough five plus a r andom nor mal
dist ur bance wit h mean zer o and st andar d deviat ion one.
X = meshgrid(1:5)
X =
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
X = X + normrnd(0,1,5,5)
X =
2.1650 3.6961 1.5538 3.6400 4.9551
1.6268 2.0591 2.2988 3.8644 4.2011
1.0751 3.7971 4.2460 2.6507 4.2348
1.3516 2.2641 2.3610 2.7296 5.8617
0.3035 2.8717 3.5774 4.9846 4.9438
p = anova1(X)
p =
5.9952e-005
anova1
2-20
The ver y small p-value of 6e-5 indicat es t hat differ ences bet ween t he column
means ar e highly significant . The pr obabilit y of t his out come under t he null
hypot hesis (i.e., t he pr obabilit y t hat samples act ually dr awn fr om t he same
populat ion would have means differ ing by t he amount s seen in X) is less t han
6 in 100,000. The t est t her efor e st r ongly suppor t s t he alt er nat e hypot hesis,
t hat one or mor e of t he samples ar e dr awn fr om populat ions wit h differ ent
means.
Ex a mple 2
The following example comes fr om a st udy of t he mat er ial st r engt h of
st r uct ur al beams in Hogg (1987). The vect or strength measur es t he deflect ion
of a beam in t housandt hs of an inch under 3,000 pounds of for ce. St r onger
beams deflect less. The civil engineer per for ming t he st udy want ed t o
det er mine whet her t he st r engt h of st eel beams was equal t o t he st r engt h of t wo
mor e expensive alloys. St eel is coded 'st' in t he vect or alloy. The ot her
mat er ials ar e coded 'al1' and 'al2'.
1 2 3 4 5
1
2
3
4
5
6
V
a
l
u
e
s
Column Number
anova1
2-21
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
alloy = {'st','st','st','st','st','st','st','st',...
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
Though alloy is sor t ed in t his example, you do not need t o sor t t he gr ouping
var iable.
p = anova1(strength,alloy)
p =
1.5264e-004
The p-value indicat es t hat t he t hr ee alloys ar e significant ly differ ent . The box
plot confir ms t his gr aphically and shows t hat t he st eel beams deflect mor e t han
t he mor e expensive alloys.
st al1 al2
74
76
78
80
82
84
86
V
a
l
u
e
s
anova1
2-22
References Hogg, R. V., and J . Ledolt er . Engineering S tatistics. MacMillan Publishing
Company, 1987.
See Also anova2, anovan, boxplot, ttest
anova2
2-23
2anova2
Purpose Two-way Analysis of Var iance (ANOVA).
Syntax p = anova2(X,reps)
p = anova2(X,reps,'displayopt')
[p,table] = anova2(...)
[p,table,stats] = anova2(...)
Description anova2(X,reps) per for ms a balanced t wo-way ANOVA for compar ing t he
means of t wo or mor e columns and t wo or mor e r ows of t he obser vat ions in X.
The dat a in differ ent columns r epr esent changes in fact or A. The dat a in
differ ent r ows r epr esent changes in fact or B. If t her e is mor e t han one
obser vat ion for each combinat ion of fact or s, input reps indicat es t he number of
r eplicat es in each cell, which much be const ant . (For unbalanced designs, use
anovan.)
The mat r ix below shows t he for mat for a set -up wher e column fact or A has t wo
levels, r ow fact or B has t hr ee levels, and t her e ar e t wo r eplicat ions (reps=2).
The subscr ipt s indicat e r ow, column, and r eplicat e, r espect ively.
When reps is 1 (default ), anova2 r et ur ns t wo p-values in vect or p:
1 The p-value for t he null hypot hesis, H
0A
, t hat all samples fr om fact or A
(i.e., all column-samples in X) ar e dr awn fr om t he same populat ion
2 The p-value for t he null hypot hesis, H
0B
, t hat all samples fr om fact or B
(i.e., all r ow-samples in X) ar e dr awn fr om t he same populat ion
x
111
x
121
x
112
x
122
x
211
x
221
x
212
x
222
x
311
x
321
x
312
x
322
B = 3
B = 2
B = 1
A

=

1
A

=

2
anova2
2-24
When reps is gr eat er t han 1, anova2 r et ur ns a t hir d p-value in vect or p:
3 The p-value for t he null hypot hesis, H
0AB
, t hat t he effect s due t o fact or s
A and B ar e additive (i.e., t hat t her e is no int er act ion bet ween fact or s
A and B)
If any p-value is near zer o, t his cast s doubt on t he associat ed null hypot hesis.
A sufficient ly small p-value for H
0A
suggest s t hat at least one column-sample
mean is significant ly differ ent t hat t he ot her column-sample means; i.e., t her e
is a main effect due t o fact or A. A sufficient ly small p-value for H
0B
suggest s
t hat at least one r ow-sample mean is significant ly differ ent t han t he ot her
r ow-sample means; i.e., t her e is a main effect due t o fact or B. A sufficient ly
small p-value for H
0AB
suggest s t hat t her e is an int er act ion bet ween fact or s A
and B. The choice of a limit for t he p-value t o det er mine whet her a r esult is
st at ist ically significant is left t o t he r esear cher . It is common t o declar e a
r esult significant if t he p-value is less t han 0.05 or 0.01.
anova2 also displays a figur e showing t he st andar d ANOVA t able, which
divides t he var iabilit y of t he dat a in X int o t hr ee or four par t s depending on t he
value of reps:
The var iabilit y due t o t he differ ences among t he column means
The var iabilit y due t o t he differ ences among t he r ow means
The var iabilit y due t o t he int er act ion bet ween r ows and columns (if reps is
gr eat er t han it s default value of one)
The r emaining var iabilit y not explained by any syst emat ic sour ce
The ANOVA t able has five columns:
The fir st shows t he sour ce of t he var iabilit y.
The second shows t he Sum of Squar es (SS) due t o each sour ce.
The t hir d shows t he degr ees of fr eedom (df) associat ed wit h each sour ce.
The four t h shows t he Mean Squar es (MS), which is t he r at io SS/df.
The fift h shows t he F st at ist ics, which is t he r at io of t he mean squar es.
p = anova2(X,reps,'displayopt') enables t he ANOVA t able display when
'displayopt' is 'on' (default ) and suppr esses t he display when 'displayopt'
is 'off'.
anova2
2-25
[p,table] = anova2(...) r et ur ns t he ANOVA t able (including column and
r ow labels) in cell ar r ay table. (You can copy a t ext ver sion of t he ANOVA t able
t o t he clipboar d by using t he Copy Text it em on t he Edi t menu.)
[p,table,stats] = anova2(...) r et ur ns a stats st r uct ur e t hat you can use
t o per for m a follow-up mult iple compar ison t est .
The anova2 t est evaluat es t he hypot hesis t hat t he r ow, column, and int er act ion
effect s ar e all t he same, against t he alt er nat ive t hat t hey ar e not all t he same.
Somet imes it is pr efer able t o per for m a t est t o det er mine which pairs of effect s
ar e significant ly differ ent , and which ar e not . You can use t he multcompare
funct ion t o per for m such t est s by supplying t he stats st r uct ur e as input .
Examples The dat a below come fr om a st udy of popcor n br ands and popper t ype (Hogg
1987). The columns of t he mat r ix popcorn ar e br ands (Gour met , Nat ional, and
Gener ic). The r ows ar e popper t ype (Oil and Air .) The st udy popped a bat ch of
each br and t hr ee t imes wit h each popper . The values ar e t he yield in cups of
popped popcor n.
load popcorn
popcorn
popcorn =
5.5000 4.5000 3.5000
5.5000 4.5000 4.0000
6.0000 4.0000 3.0000
6.5000 5.0000 4.0000
7.0000 5.5000 5.0000
7.0000 5.0000 4.5000
p = anova2(popcorn,3)
p =
0.0000 0.0001 0.7462
anova2
2-26
The vect or p shows t he p-values for t he t hr ee br ands of popcor n, 0.0000, t he
t wo popper t ypes, 0.0001, and t he int er act ion bet ween br and and popper
t ype, 0.7462. These values indicat e t hat bot h popcor n br and and popper t ype
affect t he yield of popcor n, but t her e is no evidence of a syner gist ic (int er act ion)
effect of t he t wo.
The conclusion is t hat you can get t he gr eat est yield using t he Gour met br and
and an Air popper (t he t hr ee values popcorn(4:6,1)).
Reference Hogg, R. V. and J . Ledolt er . Engineering S tatistics. MacMillan Publishing
Company, 1987.
See Also anova1, anovan
anovan
2-27
2anovan
Purpose N-way Analysis of Var iance (ANOVA).
Syntax p = anovan(X,group)
p = anovan(X,group,'model')
p = anovan(X,group,'model',sstype)
p = anovan(X,group,'model',sstype,gnames)
p = anovan(X,group,'model',sstype,gnames,'displayopt')
[p,table] = anovan(...)
[p,table,stats] = anovan(...)
[p,table,stats,terms] = anovan(...)
Description p = anovan(X,group) per for ms a balanced or unbalanced mult i-way ANOVA
for compar ing t he means of t he obser vat ions in vect or X wit h r espect t o N
differ ent fact or s. The fact or s and fact or levels of t he obser vat ions in X ar e
assigned by t he cell ar r ay group. Each of t he N cells in group cont ains a list of
fact or levels ident ifying t he obser vat ions in X wit h r espect t o one of t he N
fact or s. The list wit hin each cell can be a vect or , char act er ar r ay, or cell ar r ay
of st r ings, and must have t he same number of element s as X.
As an example, consider t he X and group input s below.
X = [x1 x2 x3 x4 x5 x6 x7 x8];
group = {[1 2 1 2 1 2 1 2];...
['hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'];...
{'may' 'may' 'may' 'may' 'june' 'june' 'june' 'june'}};
In t his case, anovan(X,group) is a t hr ee-way ANOVA wit h t wo levels of each
fact or . Ever y obser vat ion in X is ident ified by a combinat ion of fact or levels in
group. If t he fact or s ar e A, B, and C, t hen obser vat ion x1 is associat ed wit h:
Level 1 of fact or A
Level 'hi' of fact or B
Level 'may' of fact or C
Similar ly, obser vat ion x6 is associat ed wit h:
Level 2 of fact or A
Level 'hi' of fact or B
Level 'june' of fact or C
anovan
2-28
Out put vect or p cont ains p-values for t he null hypot heses on t he N main
effect s. Element p(1) cont ains t he p-value for t he null hypot heses, H
0A
, t hat
samples at all levels of fact or A ar e dr awn fr om t he same populat ion,
element p(2) cont ains t he p-value for t he null hypot heses, H
0B
, t hat samples
at all levels of fact or B ar e dr awn fr om t he same populat ion, and so on.
If any p-value is near zer o, t his cast s doubt on t he associat ed null hypot hesis.
For example, a sufficient ly small p-value for H
0A
suggest s t hat at least one
A-sample mean is significant ly differ ent t hat t he ot her A-sample means;
i.e., t her e is a main effect due t o fact or A. The choice of a limit for t he p-value
t o det er mine whet her a r esult is st at ist ically significant is left t o t he
r esear cher . It is common t o declar e a r esult significant if t he p-value is less
t han 0.05 or 0.01.
anovan also displays a figur e showing t he st andar d ANOVA t able, which by
default divides t he var iabilit y of t he dat a in X int o:
The var iabilit y due t o differ ences bet ween t he levels of each fact or account ed
for in t he model (one r ow for each fact or )
The r emaining var iabilit y not explained by any syst emat ic sour ce
The ANOVA t able has six columns:
The fir st shows t he sour ce of t he var iabilit y.
The second shows t he Sum of Squar es (SS) due t o each sour ce.
The t hir d shows t he degr ees of fr eedom (df) associat ed wit h each sour ce.
The four t h shows t he Mean Squar es (MS), which is t he r at io SS/df.
The fift h shows t he F st at ist ics, which is t he r at io of t he mean squar es.
The sixt h shows t he p-values for t he F st at ist ics.
p = anovan(X,group,'model') per for ms t he ANOVA using t he model
specified by 'model', wher e 'model' can be 'linear', 'interaction', 'full',
or an int eger or vect or . The default 'linear' model comput es only t he p-values
for t he null hypot heses on t he N main effect s. The 'interaction' model
comput es t he p-values for null hypot heses on t he N main effect s and t he
t wo-fact or int er act ions. The 'full' model comput es t he p-values for null
hypot heses on t he N main effect s and int er act ions at all levels.
N
2 ,
_
anovan
2-29
For an int eger value of 'model', k (k N), anovan comput es all int er act ion
levels t hr ough t he kt h level. The values k=1 and k=2 ar e equivalent t o t he
'linear' and 'interaction' specificat ions, r espect ively, while t he value k=N
is equivalent t o t he 'full' specificat ion.
For mor e pr ecise cont r ol over t he main and int er act ion t er ms t hat anovan
comput es, 'model' can specify a vect or cont aining one element for each main
or int er act ion t er m t o include in t he ANOVA model. Each vect or element
encodes t he cor r esponding ANOVA t er m as t he decimal equivalent of an N-bit
number , wher e N is t he number of fact or s. The t able below illust r at es t he
coding for a 3-fact or ANOVA.
For example, if 'model' is t he vect or [2 4 6], t hen out put vect or p cont ains
t he p-values for t he null hypot heses on t he main effect s B and C and t he
int er act ion effect BC, in t hat or der . A simple way t o gener at e t he 'model'
vect or is t o modify t he terms out put , which codes t he t er ms in t he cur r ent
model using t he for mat descr ibed above. If anovan r et ur ned [2 4 6] for terms,
for example, and t her e was no significant r esult for int er act ion BC, you could
r ecomput e t he ANOVA on just t he main effect s B and C by specifying [2 4] for
'model'.
p = anovan(X,group,'model',sstype) comput es t he ANOVA using t he t ype
of sum-of-squar es specified by sstype, which can be 1, 2, or 3 t o designat e
Type 1, Type 2, or Type 3 sum-of-squar es, r espect ively. The default is 3. The
value of sstype only influences comput at ions on unbalanced dat a.
3-bit Code Decimal Value Corresponding ANOVA Terms
[0 0 1] 1 Main t er m A
[0 1 0] 2 Main t er m B
[1 0 0] 4 Main t er m C
[0 1 1] 3 Int er act ion t er m AB
[1 1 0] 6 Int er act ion t er m BC
[1 0 1] 5 Int er act ion t er m AC
[1 1 1] 7 Int er act ion t er m ABC
anovan
2-30
The sum of squar es for any t er m is det er mined by compar ing t wo models. The
Type 1 sum of squar es for a t er m is t he r educt ion in r esidual sum of squar es
obt ained by adding t hat t er m t o a fit t hat alr eady includes t he t er ms list ed
befor e it . The Type 2 sum of squar es is t he r educt ion in r esidual sum of squar es
obt ained by adding t hat t er m t o a model consist ing of all ot her t er ms t hat do
not cont ain t he t er m in quest ion. The Type 3 sum of squar es is t he r educt ion in
r esidual sum of squar es obt ained by adding t hat t er m t o a model cont aining all
ot her t er ms, but wit h t heir effect s const r ained t o obey t he usual sigma
r est r ict ions t hat make models est imable.
Suppose we ar e fit t ing a model wit h t wo fact or s and t heir int er act ion, and t hat
t he t er ms appear in t he or der A, B, AB. Let R() r epr esent t he r esidual sum of
squar es for a model, so for example R(A,B,AB) is t he r esidual sum of squar es
fit t ing t he whole model, R(A) is t he r esidual sum of squar es fit t ing just t he
main effect of A, and R(1) is t he r esidual sum of squar es fit t ing just t he mean.
The t hr ee t ypes of sums of squar es ar e as follows:
The models for Type 3 sum of squar es have sigma r est r ict ions imposed. This
means, for example, t hat in fit t ing R(B,AB), t he ar r ay of AB effect s is
const r ained t o sum t o 0 over A for each value of B, and over B for each value
of A.
p = anovan(X,group,'model',sstype,gnames) uses t he st r ing values in
char act er ar r ay gnames t o label t he N exper iment al fact or s in t he ANOVA
t able. The ar r ay can be a st r ing mat r ix wit h one r ow per obser vat ion, or a cell
ar r ay of st r ings wit h one element per obser vat ion. When gnames is not
specified, t he default labels 'X1', 'X2', 'X3', ..., 'XN' ar e used.
p = anovan(X,group,'model',sstype,gnames,'displayopt') enables t he
ANOVA t able display when 'displayopt' is 'on' (default ) and suppr esses t he
display when 'displayopt' is 'off'.
Term Type 1 SS Type 2 SS Type 3 SS
A R(1)-R(A) R(B)-R(A,B) R(B,AB)-R(A,B,AB)
B R(A)-R(A,B) R(A)-R(A,B) R(A,AB)-R(A,B,AB)
AB R(A,B)-R(A,B,AB) R(A,B)-R(A,B,AB) R(A,B)-R(A,B,AB)
anovan
2-31
[p,table] = anovan(...) r et ur ns t he ANOVA t able (including fact or labels)
in cell ar r ay table. (You can copy a t ext ver sion of t he ANOVA t able t o t he
clipboar d by using t he Copy Text it em on t he Edi t menu.)
[p,table,stats] = anovan(...) r et ur ns a stats st r uct ur e t hat you can use
t o per for m a follow-up mult iple compar ison t est .
The anovan t est evaluat es t he hypot hesis t hat t he differ ent levels of a fact or (or
mor e gener ally, a t er m) have t he same effect , against t he alt er nat ive t hat t hey
do not all have t he same effect . Somet imes it is pr efer able t o per for m a t est t o
det er mine which pairs of levels ar e significant ly differ ent , and which ar e not .
You can use t he multcompare funct ion t o per for m such t est s by supplying t he
stats st r uct ur e as input .
[p,table,stats,terms] = anovan(...) r et ur ns t he main and int er act ion
t er ms used in t he ANOVA comput at ions. The t er ms ar e encoded in out put
vect or terms using t he same for mat descr ibed above for input 'model'. When
'model' it self is specified in t his vect or for mat , t he vect or r et ur ned in terms is
ident ical.
Examples In t he pr evious sect ion we used anova2 t o analyze t he effect s of t wo fact or s on
a r esponse in a balanced design. For a design t hat is not balanced, we can use
anovan inst ead.
The dat aset carbig cont ains a number of measur ement s on 406 car s. Let s
st udy how t he mileage depends on wher e and when t he car s wer e made.
load carbig
anovan(MPG,{org when},2,3,{'Origin';'Mfg date'})
ans =
0
0
0.30587
The p-value for t he int er act ion t er m is not small, indicat ing lit t le evidence t hat
t he effect of t he car s year or manufact ur e (when) depends on wher e t he car was
made (org). The linear effect s of t hose t wo fact or s, t hough, ar e significant .
anovan
2-32
Reference Hogg, R. V. and J . Ledolt er . Engineering S tatistics. MacMillan Publishing
Company, 1987.
See Also anova1, anova2, multcompare
aoctool
2-33
2aoct ool
Purpose Int er act ive plot for fit t ing and pr edict ing analysis of covar iance models.
Syntax aoctool(x,y,g)
aoctool(x,y,g,alpha)
aoctool(x,y,g,alpha,xname,yname,gname)
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt')
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt','model')
h = aoctool(...)
[h,atab,ctab] = aoctool(...)
[h,atab,ctab,stats] = aoctool(...)
Description aoctool(x,y,g) fit s a separ at e line t o t he column vect or s, x and y, for each
gr oup defined by t he values in t he ar r ay g. These t ypes of models ar e known as
one-way analysis of covar iance (ANOCOVA) models. The out put consist s of
t hr ee figur es:
An int er act ive gr aph of t he dat a and pr edict ion cur ves
An ANOVA t able
A t able of par amet er est imat es
You can use t he figur es t o change models and t o t est differ ent par t s of t he
model. Mor e infor mat ion about int er act ive use of t he aoctool funct ion appear s
on The aoct ool Demo on page 1-161.
aoctool(x,y,g,alpha) det er mines t he confidence levels of t he pr edict ion
int er vals. The confidence level is 100*(1-alpha)%. The default value of alpha
is 0.05.
aoctool(x,y,g,alpha,xname,yname,gname) specifies t he name t o use for t he
x, y, and g var iables in t he gr aph and t ables. If you ent er simple var iable names
for t he x, y, and g ar gument s, t he aoct ool funct ion uses t hose names. If you
ent er an expr ession for one of t hese ar gument s, you can specify a name t o use
in place of t hat expr ession by supplying t hese ar gument s. For example, if you
ent er m(:,2) as t he x ar gument , you might choose t o ent er 'Col 2' as t he
xname ar gument .
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt') enables t he
gr aph and t able displays when 'displayopt' is 'on' (default ) and suppr esses
t hose displays when 'displayopt' is 'off'.
aoctool
2-34
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt','model')
specifies t he init ial model t o fit . The value of 'model' can be any of t he
following:
'same mean' fit a single mean, ignor ing gr ouping
'separate means' fit a separ at e mean t o each gr oup
'same line' fit a single line, ignor ing gr ouping
'parallel lines' fit a separ at e line t o each gr oup, but const r ain t he lines
t o be par allel
'separate lines' fit a separ at e line t o each gr oup, wit h no const r aint s
h = aoctool(...) r et ur ns a vect or of handles t o t he line object s in t he plot .
[h,atab,ctab] = aoctool(...) r et ur ns cell ar r ays cont aining t he ent r ies in
ANOVA t able (atab) and t he t able of coefficient est imat es (ctab). (You can copy
a t ext ver sion of eit her t able t o t he clipboar d by using t he Copy Text it em on
t he Edi t menu.)
[h,atab,ctab,stats] = aoctool(...) r et ur ns a stats st r uct ur e t hat you
can use t o per for m a follow-up mult iple compar ison t est . The ANOVA t able
out put includes t est s of t he hypot heses t hat t he slopes or int er cept s ar e all t he
same, against a gener al alt er nat ive t hat t hey ar e not all t he same. Somet imes
it is pr efer able t o per for m a t est t o det er mine which pair s of values ar e
significant ly differ ent , and which ar e not . You can use t he multcompare
funct ion t o per for m such t est s by supplying t he stats st r uct ur e as input . You
can t est eit her t he slopes, t he int er cept s, or populat ion mar ginal means (t he
height s of t he cur ves at t he mean x value).
Example This example illust r at es how t o fit differ ent models non-int er act ively. Fir st , we
load t he smaller car dat aset and fit a separ at e-slopes model, t hen examine t he
coefficient est imat es.
[h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,...
'','','','off','separate lines');
c(:,1:2)
aoctool
2-35
ans =
'Term' 'Estimate'
'Intercept' [45.97983716833132]
' 70' [-8.58050531454973]
' 76' [-3.89017396094922]
' 82' [12.47067927549897]
'Slope' [-0.00780212907455]
' 70' [ 0.00195840368824]
' 76' [ 0.00113831038418]
' 82' [-0.00309671407243]
Roughly speaking, t he lines r elat ing MPG t o Weight have an int er cept close t o
45.98 and a slope close t o -0.0078. Each gr oups coefficient s ar e offset fr om
t hese values somewhat . For inst ance, t he int er cept for t he car s made in 1970
is 45.98-8.58 = 37.40.
Next , we t r y a fit using par allel lines. (If we had examined t he ANOVA t able,
we would have found t hat t he par allel-lines fit is significant ly wor se t han t he
separ at e-lines fit .)
[h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,...
'','','','off','parallel lines');
c(:,1:2)
ans =
'Term' 'Estimate'
'Intercept' [43.38984085130596]
' 70' [-3.27948192983761]
' 76' [-1.35036234809006]
' 82' [ 4.62984427792768]
'Slope' [-0.00664751826198]
Her e we again have separ at e int er cept s for each gr oup, but t his t ime t he slopes
ar e const r ained t o be t he same.
See Also anova1, multcompare, polytool
barttest
2-36
2bar t t est
Purpose Bar t let t s t est for dimensionalit y.
Syntax ndim = barttest(x,alpha)
[ndim,prob,chisquare] = barttest(x,alpha)
Description ndim = barttest(x,alpha) r et ur ns t he number of dimensions necessar y t o
explain t he nonr andom var iat ion in t he dat a mat r ix x, using t he significance
pr obabilit y alpha. The dimension is det er mined by a ser ies of hypot hesis t est s.
The t est for ndim=1 t est s t he hypot hesis t hat t he var iances of t he dat a values
along each pr incipal component ar e equal, t he t est for ndim=2 t est s t he
hypot hesis t hat t he var iances along t he second t hr ough last component s ar e
equal, and so on.
[ndim,prob,chisquare] = barttest(x,alpha) r et ur ns t he number of
dimensions, t he significance values for t he hypot hesis t est s, and t he
2
values
associat ed wit h t he t est s.
Example x = mvnrnd([0 0],[1 0.99; 0.99 1],20);
x(:,3:4) = mvnrnd([0 0],[1 0.99; 0.99 1],20);
x(:,5:6) = mvnrnd([0 0],[1 0.99; 0.99 1],20);
[ndim, prob] = barttest(x,0.05)
ndim =
3
prob =
0
0
0
0.5081
0.6618
See Also princomp, pcacov, pcares
betacdf
2-37
2bet acdf
Purpose Bet a cumulat ive dist r ibut ion funct ion (cdf).
Syntax p = betacdf(X,A,B)
Description p = betacdf(X,A,B) comput es t he bet a cdf at each of t he values in X using t he
cor r esponding par amet er s in A and B. Vect or or mat r ix input s for X, A, and B
must all have t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input s. The par amet er s in A and B must
all be posit ive, and t he values in X must lie on t he int er val [0 1].
The bet a cdf for a given value x and given pair of par amet er s a and b is
wher e B( ) is t he Bet a funct ion. The r esult , p, is t he pr obabilit y t hat a single
obser vat ion fr om a bet a dist r ibut ion wit h par amet er s a and b will fall in t he
int er val [0 x].
Examples x = 0.1:0.2:0.9;
a = 2;
b = 2;
p = betacdf(x,a,b)
p =
0.0280 0.2160 0.5000 0.7840 0.9720
a = [1 2 3];
p = betacdf(0.5,a,a)
p =
0.5000 0.5000 0.5000
See Also betafit, betainv, betalike, betapdf, betarnd, betastat, cdf
p F x a b , ( )
1
B a b , ( )
------------------- t
a 1
0
x

1 t ( )
b 1
d t = =
betafit
2-38
2bet afit
Purpose Par amet er est imat es and confidence int er vals for bet a dist r ibut ed dat a.
Syntax phat = betafit(x)
[phat,pci] = betafit(x,alpha)
Description phat = betafit(x) comput es t he maximum likelihood est imat es of t he bet a
dist r ibut ion par amet er s a and b fr om t he dat a in vect or x, wher e t he bet a cdf
is given by
and B( ) is t he Bet a funct ion. The element s of x must lie in t he int er val (0 1).
[phat,pci] = betafit(x,alpha) r et ur ns confidence int er vals on t he a and b
par amet er s in t he 2-by-2 mat r ix pci. The fir st column of t he mat r ix cont ains
t he lower and upper confidence bounds for par amet er a, and t he second column
cont ains t he confidence bounds for par amet er b. The opt ional input ar gument
alpha is a value in t he r ange [0 1] specifying t he widt h of t he confidence
int er vals. By default , alpha is 0.05, which cor r esponds t o 95% confidence
int er vals.
Example This example gener at es 100 bet a dist r ibut ed obser vat ions. The t r ue a and b
par amet er s ar e 4 and 3, r espect ively. Compar e t hese t o t he values r et ur ned
in p. Not e t hat t he columns of ci bot h br acket t he t r ue par amet er s.
r = betarnd(4,3,100,1);
[p,ci] = betafit(r,0.01)
p =
3.9010 2.6193
ci =
2.5244 1.7488
5.2776 3.4898
F x a b , ( )
1
B a b , ( )
------------------- t
a 1
0
x

1 t ( )
b 1
d t =
betafit
2-39
Reference Hahn, Ger ald J ., & Shapir o, Samuel, S. S tatistical Models in Engineering.
J ohn Wiley & Sons, New Yor k. 1994. p. 95.
See Also betalike, mle
betainv
2-40
2bet ainv
Purpose Inver se of t he bet a cumulat ive dist r ibut ion funct ion.
Syntax X = betainv(P,A,B)
Description X = betainv(P,A,B) comput es t he inver se of t he bet a cdf wit h par amet er s
specified by A and B for t he cor r esponding pr obabilit ies in P. Vect or or mat r ix
input s for P, A, and B must all have t he same size. A scalar input is expanded
t o a const ant mat r ix wit h t he same dimensions as t he ot her input s. The
par amet er s in A and B must all be posit ive, and t he values in P must lie on t he
int er val [0 1].
The inver se bet a cdf for a given pr obabilit y p and a given pair of par amet er s
a and b is
wher e
and B( ) is t he Bet a funct ion. Each element of out put X is t he value whose
cumulat ive pr obabilit y under t he bet a cdf defined by t he cor r esponding
par amet er s in A and B is specified by t he cor r esponding value in P.
Algorithm The betainv funct ion uses Newt ons met hod wit h modificat ions t o const r ain
st eps t o t he allowable r ange for x, i.e., [0 1].
Examples p = [0.01 0.5 0.99];
x = betainv(p,10,5)
x =
0.3726 0.6742 0.8981
Accor ding t o t his r esult , for a bet a cdf wit h a=10 and b=5, a value less t han or
equal t o 0.3726 occur s wit h pr obabilit y 0.01. Similar ly, values less t han or
equal t o 0.6742 and 0.8981 occur wit h r espect ive pr obabilit ies 0.5 and 0.99.
See Also betafit, icdf
x F
1
= p a b , ( ) x:F x a b , ( ) p = { } =
p F x a b , ( )
1
B a b , ( )
------------------- t
a 1
0
x

1 t ( )
b 1
d t = =
betalike
2-41
2bet alike
Purpose Negat ive bet a log-likelihood funct ion.
Syntax logL = betalike(params,data)
[logL,avar] = betalike(params,data)
Description logL = betalike(params,data) r et ur ns t he negat ive of t he bet a
log-likelihood funct ion for t he bet a par amet er s a and b specified in vect or
params and t he obser vat ions specified in column vect or data. The lengt h of
logL is t he lengt h of data.
[logL,avar] = betalike(params,data) also r et ur ns avar, which is t he
asympt ot ic var iance-covar iance mat r ix of t he par amet er est imat es if t he
values in params ar e t he maximum likelihood est imat es. avar is t he inver se of
Fisher s infor mat ion mat r ix. The diagonal element s of avar ar e t he asympt ot ic
var iances of t heir r espect ive par amet er s.
betalike is a ut ilit y funct ion for maximum likelihood est imat ion of t he bet a
dist r ibut ion. The likelihood assumes t hat all t he element s in t he dat a sample
ar e mut ually independent . Since betalike r et ur ns t he negat ive bet a
log-likelihood funct ion, minimizing betalike using fminsearch is t he same as
maximizing t he likelihood.
Example This example cont inues t he betafit example wher e we calculat ed est imat es of
t he bet a par amet er s for some r andomly gener at ed bet a dist r ibut ed dat a.
r = betarnd(4,3,100,1);
[logl,avar] = betalike([3.9010 2.6193],r)
logl =
-33.0514
avar =
0.2856 0.1528
0.1528 0.1142
See Also betafit, fminsearch, gamlike, mle, weiblike
betapdf
2-42
2bet apdf
Purpose Bet a pr obabilit y densit y funct ion (pdf).
Syntax Y = betapdf(X,A,B)
Description Y = betapdf(X,A,B) comput es t he bet a pdf at each of t he values in X using t he
cor r esponding par amet er s in A and B. Vect or or mat r ix input s for X, A, and B
must all have t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions of t he ot her input s. The par amet er s in A and B must
all be posit ive, and t he values in X must lie on t he int er val [0 1].
The bet a pr obabilit y densit y funct ion for a given value x and given pair of
par amet er s a and b is
wher e B( ) is t he Bet a funct ion. The r esult , y, is t he pr obabilit y t hat a single
obser vat ion fr om a bet a dist r ibut ion wit h par amet er s a and b will have value x.
The indicat or funct ion I
(0,1)
(x) ensur es t hat only values of x in t he r ange (0 1)
have nonzer o pr obabilit y. The unifor m dist r ibut ion on (0 1) is a degener at e case
of t he bet a pdf wher e a = 1 and b = 1.
A likelihood function is t he pdf viewed as a funct ion of t he par amet er s.
Maximum likelihood est imat or s (MLEs) ar e t he values of t he par amet er s t hat
maximize t he likelihood funct ion for a fixed value of x.
Examples a = [0.5 1; 2 4]
a =
0.5000 1.0000
2.0000 4.0000
y = betapdf(0.5,a,a)
y =
0.6366 1.0000
1.5000 2.1875
See Also betacdf, betafit, betainv, betalike, betarnd, betastat, pdf
y f x a b , ( )
1
B a b , ( )
-------------------x
a 1
1 x ( )
b 1
I
0 1 , ( )
x ( ) = =
betarnd
2-43
2bet ar nd
Purpose Random number s fr om t he bet a dist r ibut ion.
Syntax R = betarnd(A,B)
R = betarnd(A,B,m)
R = betarnd(A,B,m,n)
Description R = betarnd(A,B) gener at es r andom number s fr om t he bet a dist r ibut ion wit h
par amet er s specified by A and B. Vect or or mat r ix input s for A and B must have
t he same size, which is also t he size of R. A scalar input for A or B is expanded
t o a const ant mat r ix wit h t he same dimensions as t he ot her input .
R = betarnd(A,B,m) gener at es a mat r ix of size m cont aining r andom number s
fr om t he bet a dist r ibut ion wit h par amet er s A and B, wher e m is a 1-by-2 vect or
cont aining t he r ow and column dimensions of R.
R = betarnd(A,B,m,n) gener at es an m-by-n mat r ix cont aining r andom
number s fr om t he bet a dist r ibut ion wit h par amet er s A and B.
Examples a = [1 1;2 2];
b = [1 2;1 2];
r = betarnd(a,b)
r =
0.6987 0.6139
0.9102 0.8067
r = betarnd(10,10,[1 5])
r =
0.5974 0.4777 0.5538 0.5465 0.6327
r = betarnd(4,2,2,3)
r =
0.3943 0.6101 0.5768
0.5990 0.2760 0.5474
See Also betacdf, betafit, betainv, betalike, betapdf, betastat, rand, randtool
betastat
2-44
2bet ast at
Purpose Mean and var iance for t he bet a dist r ibut ion.
Syntax [M,V] = betastat(A,B)
Description [M,V] = betastat(A,B) r et ur ns t he mean and var iance for t he bet a
dist r ibut ion wit h par amet er s specified by A and B. Vect or or mat r ix input s for
A and B must have t he same size, which is also t he size of M and V. A scalar
input for A or B is expanded t o a const ant mat r ix wit h t he same dimensions as
t he ot her input .
The mean of t he bet a dist r ibut ion wit h par amet er s a and b is and
t he var iance is
Examples If par amet er s a and b ar e equal, t he mean is 1/2.
a = 1:6;
[m,v] = betastat(a,a)
m =
0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
v =
0.0833 0.0500 0.0357 0.0278 0.0227 0.0192
See Also betacdf, betafit, betainv, betalike, betapdf, betarnd
a a b + ( )
ab
a b 1 + + ( ) a b + ( )
2
-------------------------------------------------
binocdf
2-45
2binocdf
Purpose Binomial cumulat ive dist r ibut ion funct ion (cdf).
Syntax Y = binocdf(X,N,P)
Description binocdf(X,N,P) comput es a binomial cdf at each of t he values in X using t he
cor r esponding par amet er s in N and P. Vect or or mat r ix input s for X, N, and P
must all have t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions of t he ot her input s. The values in N must all be
posit ive int eger s, and t he values in X and P must lie on t he int er val [0 1].
The binomial cdf for a given value x and given pair of par amet er s n and p is
The r esult , y, is t he pr obabilit y of obser ving up t o x successes in n independent
t r ials, wher e t he pr obabilit y of success in any given t r ial is p. The indicat or
funct ion I
(0,1, ... ,n)
(i) ensur es t hat x only adopt s values of 0, 1, ..., n.
Examples If a baseball t eam plays 162 games in a season and has a 50-50 chance of
winning any game, t hen t he pr obabilit y of t hat t eam winning mor e t han 100
games in a season is:
1 binocdf(100,162,0.5)
The r esult is 0.001 (i.e., 1-0.999). If a t eam wins 100 or mor e games in a season,
t his r esult suggest s t hat it is likely t hat t he t eams t r ue pr obabilit y of winning
any game is gr eat er t han 0.5.
See Also binofit, binoinv, binopdf, binornd, binostat, cdf
y F x n p , ( )
n
i
,
_
i 0 =
x

p
i
q
1 i ( )
I
0 1 n , , , ( )
i ( ) = =
binofit
2-46
2binofit
Purpose Par amet er est imat es and confidence int er vals for binomial dat a.
Syntax phat = binofit(x,n)
[phat,pci] = binofit(x,n)
[phat,pci] = binofit(x,n,alpha)
Description phat = binofit(x,n) r et ur ns a maximum likelihood est imat e of t he
pr obabilit y of success in a given binomial t r ial based on t he number of
successes, x, obser ved in n independent t r ials. A scalar value for x or n is
expanded t o t he same size as t he ot her input .
[phat,pci] = binofit(x,n) r et ur ns t he pr obabilit y est imat e, phat, and t he
95% confidence int er vals, pci.
[phat,pci] = binofit(x,n,alpha) r et ur ns t he 100(1-alpha)% confidence
int er vals. For example, alpha = 0.01 yields 99% confidence int er vals.
Example Fir st we gener at e a binomial sample of 100 element s, wher e t he pr obabilit y of
success in a given t r ial is 0.6. Then, we est imat e t his pr obabilit y fr om t he
out comes in t he sample.
r = binornd(100,0.6);
[phat,pci] = binofit(r,100)
phat =
0.5800
pci =
0.4771 0.6780
The 95% confidence int er val, pci, cont ains t he t r ue value, 0.6.
Reference J ohnson, N. L., S. Kot z, and A.W. Kemp, Univariate Discrete Distributions,
S econd Edition, Wiley 1992. pp. 124130.
See Also binocdf, binoinv, binopdf, binornd, binostat, mle
binoinv
2-47
2binoinv
Purpose Inver se of t he binomial cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = binoinv(Y,N,P)
Description X = binoinv(Y,N,P) r et ur ns t he smallest int eger X such t hat t he binomial cdf
evaluat ed at X is equal t o or exceeds Y. You can t hink of Y as t he pr obabilit y of
obser ving X successes in N independent t r ials wher e P is t he pr obabilit y of
success in each t r ial. Each X is a posit ive int eger less t han or equal t o N.
Vect or or mat r ix input s for Y, N, and P must all have t he same size. A scalar
input is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her
input s. The par amet er s in N must be posit ive int eger s, and t he values in bot h
P and Y must lie on t he int er val [0 1].
Examples If a baseball t eam has a 50-50 chance of winning any game, what is a
r easonable r ange of games t his t eam might win over a season of 162 games? We
assume t hat a sur pr ising r esult is one t hat occur s by chance once in a decade.
binoinv([0.05 0.95],162,0.5)
ans =
71 91
This r esult means t hat in 90% of baseball seasons, a .500 t eam should win
bet ween 71 and 91 games.
See Also binocdf, binofit, binopdf, binornd, binostat, icdf
binopdf
2-48
2binopdf
Purpose Binomial pr obabilit y densit y funct ion (pdf).
Syntax Y = binopdf(X,N,P)
Description Y = binopdf(X,N,P) comput es t he binomial pdf at each of t he values in X
using t he cor r esponding par amet er s in N and P. Vect or or mat r ix input s for X,
N, and P must all have t he same size. A scalar input is expanded t o a const ant
mat r ix wit h t he same dimensions of t he ot her input s.
The par amet er s in N must be posit ive int eger s, and t he values in P must lie on
t he int er val [0 1].
The binomial pr obabilit y densit y funct ion for a given value x and given pair of
par amet er s n and p is
wher e q = 1-p. The r esult , y, is t he pr obabilit y of obser ving x successes in n
independent t r ials, wher e t he pr obabilit y of success in any given t r ial is p. The
indicat or funct ion I
(0,1,...,n)
(x) ensur es t hat x only adopt s values of 0, 1, ..., n.
Examples A Qualit y Assur ance inspect or t est s 200 cir cuit boar ds a day. If 2% of t he
boar ds have defect s, what is t he pr obabilit y t hat t he inspect or will find no
defect ive boar ds on any given day?
binopdf(0,200,0.02)
ans =
0.0176
What is t he most likely number of defect ive boar ds t he inspect or will find?
y = binopdf([0:200],200,0.02);
[x,i] = max(y);
i
i =
5
See Also binocdf, binofit, binoinv, binornd, binostat, pdf
y f x n p , ( )
n
x ,
_
p
x
q
1 x ( )
I
0 1 n , , , ( )
x ( ) = =
binornd
2-49
2binor nd
Purpose Random number s fr om t he binomial dist r ibut ion.
Syntax R = binornd(N,P)
R = binornd(N,P,mm)
R = binornd(N,P,mm,nn)
Description R = binornd(N,P) gener at es r andom number s fr om t he binomial dist r ibut ion
wit h par amet er s specified by N and P. Vect or or mat r ix input s for N and P must
have t he same size, which is also t he size of R. A scalar input for N or P is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input .
R = binornd(N,P,mm) gener at es a mat r ix of size mm cont aining r andom
number s fr om t he binomial dist r ibut ion wit h par amet er s N and P, wher e mm is
a 1-by-2 vect or cont aining t he r ow and column dimensions of R.
R = binornd(N,p,mm,nn) gener at es an mm-by-nn mat r ix cont aining r andom
number s fr om t he binomial dist r ibut ion wit h par amet er s N and P.
Algorithm The binornd funct ion uses t he dir ect met hod using t he definit ion of t he
binomial dist r ibut ion as a sum of Ber noulli r andom var iables.
Examples n = 10:10:60;
r1 = binornd(n,1./n)
r1 =
2 1 0 1 1 2
r2 = binornd(n,1./n,[1 6])
r2 =
0 1 2 1 3 1
r3 = binornd(n,1./n,1,6)
r3 =
0 1 1 1 0 3
See Also binocdf, binofit, binoinv, binopdf, binostat, rand, randtool
binostat
2-50
2binost at
Purpose Mean and var iance for t he binomial dist r ibut ion.
Syntax [M,V] = binostat(N,P)
Description [M,V] = binostat(N,P) r et ur ns t he mean and var iance for t he binomial
dist r ibut ion wit h par amet er s specified by N and P. Vect or or mat r ix input s for
N and P must have t he same size, which is also t he size of M and V. A scalar
input for N or P is expanded t o a const ant mat r ix wit h t he same dimensions as
t he ot her input .
The mean of t he binomial dist r ibut ion wit h par amet er s n and p is np. The
var iance is npq, wher e q = 1-p.
Examples n = logspace(1,5,5)
n =
10 100 1000 10000 100000
[m,v] = binostat(n,1./n)
m =
1 1 1 1 1
v =
0.9000 0.9900 0.9990 0.9999 1.0000
[m,v] = binostat(n,1/2)
m =
5 50 500 5000 50000
v =
1.0e+04 *
0.0003 0.0025 0.0250 0.2500 2.5000
See Also binocdf, binofit, binoinv, binopdf, binornd
bootstrp
2-51
2boot st r p
Purpose Boot st r ap st at ist ics t hr ough r esampling of dat a.
Syntax bootstat = bootstrp(nboot,'bootfun',d1,d2,...)
[bootstat,bootsam] = bootstrp(...)
Description bootstat = bootstrp(nboot,'bootfun',d1,d2,...) dr aws nboot boot st r ap
samples fr om each of t he input dat a set s, d1, d2, et c., and passes t he boot st r ap
samples t o funct ion bootfun for analysis. nboot must be a posit ive int eger , and
each input dat a set must cont ain t he same number of r ows, n. Each boot st r ap
sample cont ains n r ows chosen r andomly (wit h r eplacement ) fr om t he
cor r esponding input dat a set (d1, d2, et c.).
Each r ow of t he out put , bootstat, cont ains t he r esult s of applying bootfun t o
one set of boot st r ap samples. If bootfun r et ur ns mult iple out put s, only t he fir st
is st or ed in bootstat. If t he fir st out put fr om bootfun is a mat r ix, t he mat r ix
is r eshaped t o a r ow vect or for st or age in bootstat.
[bootstat,bootsam] = bootstrap(...) r et ur ns a mat r ix of boot st r ap
indices, bootsam. Each of t he nboot columns in bootsam cont ains indices of t he
values t hat wer e dr awn fr om t he or iginal dat a set s t o const it ut e t he
cor r esponding boot st r ap sample. For example, if d1, d2, et c., each cont ain 16
values, and nboot = 4, t hen bootsam is a 16-by-4 mat r ix. The fir st column
cont ains t he indices of t he 16 values dr awn fr om d1, d2, et c., for t he fir st of t he
four boot st r ap samples, t he second column cont ains t he indices for t he second
of t he four boot st r ap samples, and so on. (The boot st r ap indices ar e t he same
for all input dat a set s.)
Example Cor r elat e t he LSAT scor es and law-school GPA for 15 st udent s. These 15 dat a
point s ar e r esampled t o cr eat e 1000 differ ent dat a set s, and t he cor r elat ion
bet ween t he t wo var iables is comput ed for each dat aset .
load lawdata
[bootstat,bootsam] = bootstrp(1000,'corrcoef',lsat,gpa);
bootstrp
2-52
bootstat(1:5,:)
ans =
1.0000 0.3021 0.3021 1.0000
1.0000 0.6869 0.6869 1.0000
1.0000 0.8346 0.8346 1.0000
1.0000 0.8711 0.8711 1.0000
1.0000 0.8043 0.8043 1.0000
bootsam(:,1:5)
ans =
4 7 5 12 8
1 11 10 8 4
11 9 12 4 2
11 14 15 5 15
15 13 6 6 2
6 8 4 3 8
8 2 15 8 6
13 10 11 14 5
1 7 12 14 14
1 11 10 1 8
8 14 2 14 7
11 12 10 8 15
1 4 14 8 1
6 1 5 5 12
2 12 7 15 12
hist(bootstat(:,2))
0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
bootstrp
2-53
The hist ogr am shows t he var iat ion of t he cor r elat ion coefficient acr oss all t he
boot st r ap samples. The sample minimum is posit ive, indicat ing t hat t he
r elat ionship bet ween LSAT scor e and GPA is not accident al.
boxplot
2-54
2boxplot
Purpose Box plot s of a dat a sample.
Syntax boxplot(X)
boxplot(X,notch)
boxplot(X,notch,'sym')
boxplot(X,notch,'sym',vert)
boxplot(X,notch,'sym',vert,whis)
Description boxplot(X) pr oduces a box and whisker plot for each column of X. The box has
lines at t he lower quar t ile, median, and upper quar t ile values. The whisker s
ar e lines ext ending fr om each end of t he box t o show t he ext ent of t he r est of
t he dat a. Out lier s ar e dat a wit h values beyond t he ends of t he whisker s. If
t her e is no dat a out side t he whisker , a dot is placed at t he bot t om whisker .
boxplot(X,notch) wit h notch = 1 pr oduces a not ched-box plot . Not ches gr aph
a r obust est imat e of t he uncer t aint y about t he means for box-t o-box
compar ison. The default , notch = 0, pr oduces a r ect angular box plot .
boxplot(X,notch,'sym') wher e sym is a plot t ing symbol, affor ds cont r ol of t he
symbol for out lier s. The default is '+'. See MATLABs LineSpec pr oper t y for
infor mat ion about t he available mar ker symbols.
boxplot(X,notch,'sym',vert) wit h vert = 0 cr eat es hor izont al boxes r at her
t han t he default ver t ical boxes (vert = 1).
boxplot(X,notch,'sym',vert,whis) enables you t o specify t he lengt h of t he
whisker s. whis defines t he lengt h of t he whisker s as a funct ion of t he
int er -quar t ile r ange (default = 1.5
*
IQR). If whis = 0, t hen boxplot displays all
dat a values out side t he box using t he plot t ing symbol, 'sym'.
Examples x1 = normrnd(5,1,100,1);
x2 = normrnd(6,1,100,1);
x = [x1 x2];
boxplot(x,1)
boxplot
2-55
The differ ence bet ween t he means of t he t wo columns of x is 1. We can det ect
t his differ ence gr aphically by obser ving t hat t he not ches in t he boxplot do not
over lap.
See Also anova1, kruskalwallis
1 2
3
4
5
6
7
8
V
a
l
u
e
s
Column Number
capable
2-56
2capable
Purpose Pr ocess capabilit y indices.
Syntax p = capable(data,specs)
[p,Cp,Cpk] = capable(data,specs)
Description p = capable(data,specs) comput es t he pr obabilit y t hat a sample, data, fr om
some pr ocess falls out side t he bounds specified in specs, a 2-element vect or of
t he for m [lower upper].
The assumpt ions ar e t hat t he measur ed values in t he vect or data ar e nor mally
dist r ibut ed wit h const ant mean and var iance and t hat t he measur ement s ar e
st at ist ically independent .
[p,Cp,Cpk] = capable(data,specs) addit ionally r et ur ns t he capabilit y
indices Cp and Cpk.
C
p
is t he r at io of t he r ange of t he specificat ions t o six t imes t he est imat e of t he
pr ocess st andar d deviat ion:
For a pr ocess t hat has it s aver age value on t ar get , a C
p
of 1 t r anslat es t o a lit t le
mor e t han one defect per t housand. Recent ly, many indust r ies have set a
qualit y goal of one par t per million. This would cor r espond t o C
p
= 1.6. The
higher t he value of C
p
, t he mor e capable t he pr ocess.
C
pk
is t he r at io of differ ence bet ween t he pr ocess mean and t he closer
specificat ion limit t o t hr ee t imes t he est imat e of t he pr ocess st andar d
deviat ion:
wher e t he pr ocess mean is . For pr ocesses t hat do not maint ain t heir aver age
on t ar get , C
pk
is a mor e descr ipt ive index of pr ocess capabilit y.
Example Imagine a machined par t wit h specificat ions r equir ing a dimension t o be
wit hin t hr ee t housandt hs of an inch of nominal. Suppose t hat t he machining
pr ocess cut s t oo t hick by one t housandt h of an inch on aver age and also has a
C
p
US L L S L
6
-------------------------------- =
C
p k
m i n
US L
3
-----------------------
L S L
3
---------------------- ,
,
_
=
capable
2-57
st andar d deviat ion of one t housandt h of an inch. What ar e t he capabilit y
indices of t his pr ocess?
data = normrnd(1,1,30,1);
[p,Cp,Cpk] = capable(data,[-3 3]);
indices = [p Cp Cpk]
indices =
0.0172 1.1144 0.7053
We expect 17 par t s out of a t housand t o be out -of-specificat ion. Cpk is less t han
Cp because t he pr ocess is not cent er ed.
Reference Mont gomer y, D., Introduction to S tatistical Quality Control,J ohn Wiley &
Sons 1991. pp. 369374.
See Also capaplot, histfit
capaplot
2-58
2capaplot
Purpose Pr ocess capabilit y plot .
Syntax p = capaplot(data,specs)
[p,h] = capaplot(data,specs)
Description p = capaplot(data,specs) est imat es t he mean and var iance of t he
obser vat ions in input vect or data, and plot s t he pdf of t he r esult ing
T dist r ibut ion. The obser vat ions in data ar e assumed t o be nor mally
dist r ibut ed. The out put , p, is t he pr obabilit y t hat a new obser vat ion fr om t he
est imat ed dist r ibut ion will fall wit hin t he r ange specified by t he t wo-element
vect or specs. The por t ion of t he dist r ibut ion bet ween t he lower and upper
bounds specified in specs is shaded in t he plot .
[p,h] = capaplot(data,specs) addit ionally r et ur ns handles t o t he plot
element s in h.
Example Imagine a machined par t wit h specificat ions r equir ing a dimension t o be
wit hin 3 t housandt hs of an inch of nominal. Suppose t hat t he machining
pr ocess cut s t oo t hick by one t housandt h of an inch on aver age and also has a
st andar d deviat ion of one t housandt h of an inch.
data = normrnd(1,1,30,1);
p = capaplot(data,[-3 3])
p =
0.9784
The pr obabilit y of a new obser vat ion being wit hin specs is 97.84%.
-3 -2 -1 0 1 2 3 4
0
0.1
0.2
0.3
0.4
Probability Between Limits is 0.9784
capaplot
2-59
See Also capable, histfit
caseread
2-60
2caser ead
Purpose Read casenames fr om a file.
Syntax names = caseread('filename')
names = caseread
Description names = caseread('filename') r eads t he cont ent s of filename and r et ur ns a
st r ing mat r ix of names. filename is t he name of a file in t he cur r ent dir ect or y,
or t he complet e pat hname of any file elsewher e. caseread t r eat s each line as a
separ at e case.
names = caseread displays t he Select Fi le to Open dialog box for int er act ive
select ion of t he input file.
Example Read t he file months.dat cr eat ed using t he funct ion casewrite on t he next
page.
type months.dat
January
February
March
April
May
names = caseread('months.dat')
names =
January
February
March
April
May
See Also tblread, gname, casewrite, tdfread
casewrite
2-61
2casewr it e
Purpose Wr it e casenames fr om a st r ing mat r ix t o a file.
Syntax casewrite(strmat,'filename')
casewrite(strmat)
Description casewrite(strmat,'filename') wr it es t he cont ent s of st r ing mat r ix strmat
t o filename. Each r ow of strmat r epr esent s one casename. filename is t he
name of a file in t he cur r ent dir ect or y, or t he complet e pat hname of any file
elsewher e. casewrite wr it es each name t o a separ at e line in filename.
casewrite(strmat) displays t he Select Fi le to Wri te dialog box for int er act ive
specificat ion of t he out put file.
Example strmat = str2mat('January','February','March','April','May')
strmat =
January
February
March
April
May
casewrite(strmat,'months.dat')
type months.dat
January
February
March
April
May
See Also gname, caseread, tblwrite, tdfread
cdf
2-62
2cdf
Purpose Comput es a chosen cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = cdf('name',X,A1,A2,A3)
Description P = cdf('name',X,A1,A2,A3) r et ur ns a mat r ix of pr obabilit ies, wher e name is
a st r ing cont aining t he name of t he dist r ibut ion, X is a mat r ix of values, and A,
A2, and A3 ar e mat r ices of dist r ibut ion par amet er s. Depending on t he
dist r ibut ion, some of t hese par amet er s may not be necessar y.
Vect or or mat r ix input s for X, A1, A2, and A3 must have t he same size, which is
also t he size of P. A scalar input for X, A1, A2, or A3 is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s.
cdf is a ut ilit y r out ine allowing you t o access all t he cdfs in t he St at ist ics
Toolbox by using t he name of t he dist r ibut ion as a par amet er . See Over view
of t he Dist r ibut ions on page 1-12 for t he list of available dist r ibut ions.
Examples p = cdf('Normal',-2:2,0,1)
p =
0.0228 0.1587 0.5000 0.8413 0.9772
p = cdf('Poisson',0:5,1:6)
p =
0.3679 0.4060 0.4232 0.4335 0.4405 0.4457
See Also betacdf, binocdf, chi2cdf, expcdf, fcdf, gamcdf, geocdf, hygecdf, icdf,
logncdf, mle, nbincdf, ncfcdf, nctcdf, ncx2cdf, normcdf, pdf, poisscdf,
random, raylcdf, tcdf, unidcdf, unifcdf, weibcdf
cdfplot
2-63
2cdfplot
Purpose Plot of empir ical cumulat ive dist r ibut ion funct ion.
Syntax cdfplot(X)
h = cdfplot(X)
[h,stats] = cdfplot(X)
Description cdfplot(X) displays a plot of t he empir ical cumulat ive dist r ibut ion funct ion
(cdf) for t he dat a in t he vect or X. The empir ical cdf is defined as t he
pr opor t ion of X values less t han or equal t o x.
This plot , like t hose pr oduced by hist and normplot, is useful for examining
t he dist r ibut ion of a sample of dat a. You can over lay a t heor et ical cdf on t he
same plot t o compar e t he empir ical dist r ibut ion of t he sample t o t he t heor et ical
dist r ibut ion.
The kstest, kstest2, and lillietest funct ions comput e t est st at ist ics t hat
ar e der ived fr om t he empir ical cdf. You may find t he empir ical cdf plot
pr oduced by cdfplot useful in helping you t o under st and t he out put fr om t hose
funct ions.
H = cdfplot(X) r et ur ns a handle t o t he cdf cur ve.
[h,stats] = cdfplot(X) also r et ur ns a stats st r uct ur e wit h t he following
fields.
Examples Gener at e a nor mal sample and an empir ical cdf plot of t he dat a.
x = normrnd(0,1,50,1);
cdfplot(x)
Field Contents
stats.min Minimum value
stats.max Maximum value
stats.mean Sample mean
stats.median Sample median (50t h per cent ile)
stats.std Sample st andar d deviat ion
F x ( )
cdfplot
2-64
See Also hist, kstest, kstest2, lillietest, normplot
3 2 1 0 1 2 3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
chi2cdf
2-65
2chi2cdf
Purpose Chi-squar e (
2
) cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = chi2cdf(X,V)
Description P = chi2cdf(X,V) comput es t he
2
cdf at each of t he values in X using t he
cor r esponding par amet er s in V. Vect or or mat r ix input s for X and V must have
t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input . The degr ees of fr eedom par amet er s in V must be
posit ive int eger s, and t he values in X must lie on t he int er val [0 1].
The
2
cdf for a given value x and degr ees-of-fr eedom is

wher e ( ) is t he Gamma funct ion. The r esult , p, is t he pr obabilit y t hat a
single obser vat ion fr om a
2
dist r ibut ion wit h degr ees of fr eedom will fall in
t he int er val [0 x].
The
2
densit y funct ion wit h degr ees-of-fr eedom is t he same as t he gamma
densit y funct ion wit h par amet er s /2 and 2.
Examples probability = chi2cdf(5,1:5)
probability =
0.9747 0.9179 0.8282 0.7127 0.5841
probability = chi2cdf(1:5,1:5)
probability =
0.6827 0.6321 0.6084 0.5940 0.5841
See Also cdf, chi2inv, chi2pdf, chi2rnd, chi2stat
p F x ( )
t
2 ( ) 2
e
t 2
2
2
2 ( )
-----------------------------------
0
x

d t = =
chi2inv
2-66
2chi2inv
Purpose Inver se of t he chi-squar e (
2
) cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = chi2inv(P,V)
Description X = chi2inv(P,V) comput es t he inver se of t he
2
cdf wit h par amet er s
specified by V for t he cor r esponding pr obabilit ies in P. Vect or or mat r ix input s
for P and V must have t he same size. A scalar input is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s.
The degr ees of fr eedom par amet er s in V must be posit ive int eger s, and t he
values in P must lie in t he int er val [0 1].
The inver se
2
cdf for a given pr obabilit y p and degr ees of fr eedom is
wher e
and ( ) is t he Gamma funct ion. Each element of out put X is t he value whose
cumulat ive pr obabilit y under t he
2
cdf defined by t he cor r esponding degr ees
of fr eedom par amet er in V is specified by t he cor r esponding value in P.
Examples Find a value t hat exceeds 95% of t he samples fr om a
2
dist r ibut ion wit h
10 degr ees of fr eedom.
x = chi2inv(0.95,10)
x =
18.3070
You would obser ve values gr eat er t han 18.3 only 5% of t he t ime by chance.
See Also chi2cdf, chi2pdf, chi2rnd, chi2stat, icdf
x F
1
p ( ) x:F x ( ) p = { } = =
p F x ( )
t
2 ( ) 2
e
t 2
2
2
2 ( )
-----------------------------------
0
x

d t = =
chi2pdf
2-67
2chi2pdf
Purpose Chi-squar e (
2
) pr obabilit y densit y funct ion (pdf).
Syntax Y = chi2pdf(X,V)
Description Y = chi2pdf(X,V) comput es t he
2
pdf at each of t he values in X using t he
cor r esponding par amet er s in V. Vect or or mat r ix input s for X and V must have
t he same size, which is also t he size of out put Y. A scalar input is expanded t o
a const ant mat r ix wit h t he same dimensions as t he ot her input .
The degr ees of fr eedom par amet er s in V must be posit ive int eger s, and t he
values in X must lie on t he int er val [0 1].
The
2
pdf for a given value x and degr ees of fr eedom is
wher e ( ) is t he Gamma funct ion. The r esult , y, is t he pr obabilit y t hat a
single obser vat ion fr om a
2
dist r ibut ion wit h degr ees of fr eedom will have
value x.
If x is st andar d nor mal, t hen x
2
is dist r ibut ed
2
wit h one degr ee of fr eedom. If
x
1
, x
2
, ..., x
n
ar e n independent st andar d nor mal obser vat ions, t hen t he sum of
t he squar es of t he xs is dist r ibut ed
2
wit h n degr ees of fr eedom (and is
equivalent t o t he gamma densit y funct ion wit h par amet er s /2 and 2).
Examples nu = 1:6;
x = nu;
y = chi2pdf(x,nu)
y =
0.2420 0.1839 0.1542 0.1353 0.1220 0.1120
The mean of t he
2
dist r ibut ion is t he value of t he degr ees of fr eedom
par amet er , nu. The above example shows t hat t he pr obabilit y densit y of t he
mean falls as nu incr eases.
See Also chi2cdf, chi2inv, chi2rnd, chi2stat, pdf
y f x ( )
x
2 ( ) 2
e
x 2
2
2
2 ( )
------------------------------------- = =
chi2rnd
2-68
2chi2r nd
Purpose Random number s fr om t he chi-squar e (
2
) dist r ibut ion.
Syntax R = chi2rnd(V)
R = chi2rnd(V,m)
R = chi2rnd(V,m,n)
Description R = chi2rnd(V) gener at es r andom number s fr om t he
2
dist r ibut ion wit h
degr ees of fr eedom par amet er s specified by V. R is t he same size as V.
R = chi2rnd(V,m) gener at es a mat r ix of size m cont aining r andom number s
fr om t he
2
dist r ibut ion wit h degr ees of fr eedom par amet er V, wher e m is a
1-by-2 vect or cont aining t he r ow and column dimensions of R.
R = chi2rnd(V,m,n) gener at es an m-by-n mat r ix cont aining r andom number s
fr om t he
2
dist r ibut ion wit h degr ees of fr eedom par amet er V.
Examples Not e t hat t he fir st and t hir d commands ar e t he same, but ar e differ ent fr om t he
second command.
r = chi2rnd(1:6)
r =
0.0037 3.0377 7.8142 0.9021 3.2019 9.0729
r = chi2rnd(6,[1 6])
r =
6.5249 2.6226 12.2497 3.0388 6.3133 5.0388
r = chi2rnd(1:6,1,6)
r =
0.7638 6.0955 0.8273 3.2506 1.5469 10.9197
See Also chi2cdf, chi2inv, chi2pdf, chi2stat
chi2stat
2-69
2chi2st at
Purpose Mean and var iance for t he chi-squar e (
2
) dist r ibut ion.
Syntax [M,V] = chi2stat(NU)
Description [M,V] = chi2stat(NU) r et ur ns t he mean and var iance for t he
2
dist r ibut ion
wit h degr ees of fr eedom par amet er s specified by NU.
The mean of t he
2
dist r ibut ion is , t he degr ees of fr eedom par amet er , and t he
var iance is 2.
Example nu = 1:10;
nu = nu'nu;
[m,v] = chi2stat(nu)
m =
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
v =
2 4 6 8 10 12 14 16 18 20
4 8 12 16 20 24 28 32 36 40
6 12 18 24 30 36 42 48 54 60
8 16 24 32 40 48 56 64 72 80
10 20 30 40 50 60 70 80 90 100
12 24 36 48 60 72 84 96 108 120
14 28 42 56 70 84 98 112 126 140
16 32 48 64 80 96 112 128 144 160
18 36 54 72 90 108 126 144 162 180
20 40 60 80 100 120 140 160 180 200
See Also chi2cdf, chi2inv, chi2pdf, chi2rnd
classify
2-70
2classify
Purpose Linear discr iminant analysis.
Syntax class = classify(sample,training,group)
Description class = classify(sample,training,group) assigns each r ow of t he dat a in
sample t o one of t he gr oups int o which t he t r aining set , training, is alr eady
divided. sample and training must have t he same number of columns.
The vect or group cont ains int eger s, fr om one t o t he number of gr oups, t hat
ident ify t he gr oup t o which each r ow of t he t r aining set belongs. group and
training must have t he same number of r ows.
The funct ion r et ur ns class, a vect or wit h t he same number of r ows as sample.
Each element of class ident ifies t he gr oup t o which t he cor r esponding element
of sample has been assigned. The classify funct ion det er mines t he gr oup int o
which each r ow in sample is classified by comput ing t he Mahalanobis dist ance
bet ween each r ow in sample and each r ow in training.
Example load discrim
sample = ratings(idx,:);
training = ratings(1:200,:);
g = group(1:200);
class = classify(sample,training,g);
first5 = class(1:5)
first5 =
2
2
2
2
2
See Also mahal
cluster
2-71
2clust er
Purpose Const r uct clust er s fr om linkage out put .
Syntax T = cluster(Z,cutoff)
T = cluster(Z,cutoff,depth,flag)
Description T = cluster(Z,cutoff) const r uct s clust er s fr om t he hier ar chical clust er
t r ee, Z, gener at ed by t he linkage funct ion. Z is a mat r ix of size (m-1)-by-3,
wher e m is t he number of obser vat ions in t he or iginal dat a.
cutoff is a t hr eshold value t hat det er mines how t he cluster funct ion cr eat es
clust er s. The value of cutoff det er mines how cluster int er pr et s it .
T = cluster(Z,cutoff,depth,flag) const r uct s clust er s fr om clust er t r ee Z.
The depth ar gument specifies t he number of levels in t he hier ar chical clust er
t r ee t o include in t he inconsist ency coefficient comput at ion. (The inconsist ency
coefficient compar es a link bet ween t wo object s in t he clust er t r ee wit h
neighbor ing links up t o a specified dept h. See t he inconsistent funct ion for
mor e infor mat ion.) When t he depth ar gument is specified, cutoff is always
int er pr et ed as t he inconsist ency coefficient t hr eshold.
The flag ar gument over r ides t he default meaning of t he cutoff ar gument . If
flag is 'inconsistent', t hen cutoff is int er pr et ed as a t hr eshold for t he
inconsist ency coefficient . If flag is 'clusters', t hen cutoff is t he maximum
number of clust er s.
Value Meaning
0 < cutoff < 2 cutoff is int er pr et ed as t he t hr eshold for t he
inconsist ency coefficient . The inconsist ency coefficient
quant ifies t he degr ee of differ ence bet ween object s in
t he hier ar chical clust er t r ee. If t he inconsist ency
coefficient of a link is gr eat er t han t he t hr eshold, t he
cluster funct ion uses t he link as a boundar y for a
clust er gr ouping. For mor e infor mat ion about t he
inconsist ency coefficient , see t he inconsistent
funct ion.
cutoff >= 2 cutoff is int er pr et ed as t he maximum number of
clust er s t o r et ain in t he hier ar chical t r ee.
cluster
2-72
The out put , T, is a vect or of size m t hat ident ifies, by number , t he clust er in
which each object was gr ouped. To find out which object s fr om t he or iginal
dat aset ar e cont ained in clust er i, use find(T==i).
Example The example uses t he pdist funct ion t o calculat e t he dist ance bet ween it ems
in a mat r ix of r andom number s and t hen uses t he linkage funct ion t o comput e
t he hier ar chical clust er t r ee based on t he mat r ix. The out put of t he linkage
funct ion is passed t o t he cluster funct ion. The cutoff value 3 indicat es t hat
you want t o gr oup t he it ems int o t hr ee clust er s. The example uses t he find
funct ion t o list all t he it ems gr ouped int o clust er 2.
rand('seed', 0);
X = [rand(10,3); rand(10,3)+1; rand(10,3)+2];
Y = pdist(X);
Z = linkage(Y);
T = cluster(Z,3);
find(T==3)
ans =
11
12
13
14
15
16
17
18
19
20
See Also clusterdata, cophenet, dendrogram, inconsistent, linkage, pdist,
squareform
clusterdata
2-73
2clust er dat a
Purpose Const r uct clust er s fr om dat a.
Syntax T = clusterdata(X,cutoff)
Description T = clusterdata(X,cutoff) const r uct s clust er s fr om t he dat a mat r ix X. X is a
mat r ix of size m by n, int er pr et ed as m obser vat ions of n var iables.
cutoff is a t hr eshold value t hat det er mines how t he cluster funct ion cr eat es
clust er s. The value of cutoff det er mines how clusterdata int er pr et s it .
The out put , T, is a vect or of size m t hat ident ifies, by number , t he clust er in
which each object was gr ouped.
T = clusterdata(X,cutoff) is t he same as
Y = pdist(X,'euclid');
Z = linkage(Y,'single');
T = cluster(Z,cutoff);
Follow t his sequence t o use nondefault par amet er s for pdist and linkage.
Example The example fir st cr eat es a sample dat aset of r andom number s. The example
t hen uses t he clusterdata funct ion t o comput e t he dist ances bet ween it ems in
t he dat aset and cr eat e a hier ar chical clust er t r ee fr om t he dat aset . Finally, t he
clusterdata funct ion gr oups t he it ems in t he dat aset int o t hr ee clust er s. The
example uses t he find funct ion t o list all t he it ems in clust er 2.
Value Meaning
0 < cutoff < 1 cutoff is int er pr et ed as t he t hr eshold for t he
inconsist ency coefficient . The inconsist ency coefficient
quant ifies t he degr ee of differ ence bet ween object s in
t he hier ar chical clust er t r ee. If t he inconsist ency
coefficient of a link is gr eat er t han t he t hr eshold, t he
cluster funct ion uses t he link as a boundar y for a
clust er gr ouping. For mor e infor mat ion about t he
inconsist ency coefficient , see t he inconsistent
funct ion.
cutoff >= 1 cutoff is int er pr et ed as t he maximum number of
clust er s t o r et ain in t he hier ar chical t r ee.
clusterdata
2-74
rand('seed',12);
X = [rand(10,3); rand(10,3)+1.2; rand(10,3)+2.5;
T = clusterdata(X,3);
find(T==2)
ans =
21
22
23
24
25
26
27
28
29
30
See Also cluster, cophenet, dendrogram, inconsistent, linkage, pdist, squareform
combnk
2-75
2combnk
Purpose Enumer at ion of all combinat ions of n object s k at a t ime.
Syntax C = combnk(v,k)
Description C = combnk(v,k) r et ur ns all combinat ions of t he n element s in v t aken k at a
t ime.
C = combnk(v,k) pr oduces a mat r ix C wit h k columns and n! / k!(n-k)! r ows,
wher e each r ow cont ains k of t he element s in t he vect or v.
It is not pr act ical t o use t his funct ion if v has mor e t han about 15 element s.
Example Combinat ions of char act er s fr om a st r ing.
C = combnk('tendril',4);
last5 = C(31:35,:)
last5 =
tedr
tenl
teni
tenr
tend
Combinat ions of element s fr om a numer ic vect or .
c = combnk(1:4,2)
c =
3 4
2 4
2 3
1 4
1 3
1 2
cophenet
2-76
2cophenet
Purpose Cophenet ic cor r elat ion coefficient .
Syntax c = cophenet(Z,Y)
Description c = cophenet(Z,Y) comput es t he cophenet ic cor r elat ion coefficient which
compar es t he dist ance infor mat ion in Z, gener at ed by linkage, and t he
dist ance infor mat ion in Y, gener at ed by pdist. Z is a mat r ix of size (m-1)-by-3,
wit h dist ance infor mat ion in t he t hir d column. Y is a vect or of size
.
For example, given a gr oup of object s {1, 2, ..., m} wit h dist ances Y, t he funct ion
linkage pr oduces a hier ar chical clust er t r ee. The cophenet funct ion measur es
t he dist or t ion of t his classificat ion, indicat ing how r eadily t he dat a fit s int o t he
st r uct ur e suggest ed by t he classificat ion.
The out put value, c, is t he cophenet ic cor r elat ion coefficient . The magnit ude of
t his value should be ver y close t o 1 for a high-qualit y solut ion. This measur e
can be used t o compar e alt er nat ive clust er solut ions obt ained using differ ent
algor it hms.
The cophenet ic cor r elat ion bet ween Z(:,3) and Y is defined as
wher e:
Y
ij
is t he dist ance bet ween object s i and j in Y.
Z
ij
is t he dist ance bet ween object s i and j in Z(:,3).
y and z ar e t he aver age of Y and Z(:,3), r espect ively.
Example rand('seed',12);
X = [rand(10,3);rand(10,3)+1;rand(10,3)+2];
Y = pdist(X);
Z = linkage(Y,'centroid');
c = cophenet(Z,Y)
c =
0.6985
m m 1 ( ) 2
c

i j <
Y
i j
y ( ) Z
i j
z ( )

i j <
Y
i j
y ( )
2

i j <
Z
i j
z ( )
2
----------------------------------------------------------------------------- - =
cophenet
2-77
See Also cluster, dendrogram, inconsistent, linkage, pdist, squareform
cordexch
2-78
2cor dexch
Purpose D-opt imal design of exper iment s coor dinat e exchange algor it hm.
Syntax settings = cordexch(nfactors,nruns)
[settings,X] = cordexch(nfactors,nruns)
[settings,X] = cordexch(nfactors,nruns,'model')
Description settings = cordexch(nfactors,nruns) gener at es t he fact or set t ings mat r ix,
settings, for a D-opt imal design using a linear addit ive model wit h a const ant
t er m. settings has nruns r ows and nfactors columns.
[settings,X] = cordexch(nfactors,nruns) also gener at es t he associat ed
design mat r ix X.
[settings,X] = cordexch(nfactors,nruns,'model') pr oduces a design for
fit t ing a specified r egr ession model. The input , 'model', can be one of t hese
st r ings:
'interaction' includes const ant , linear , and cr oss-pr oduct t er ms.
'quadratic' includes int er act ions and squar ed t er ms.
'purequadratic' includes const ant , linear and squar ed t er ms.
Example The D-opt imal design for t wo fact or s in nine r un using a quadr at ic model is t he
3
2
fact or ial as shown below:
settings = cordexch(2,9,'quadratic')
settings =
-1 1
1 1
0 1
1 -1
-1 -1
0 -1
1 0
0 0
-1 0
See Also rowexch, daugment, dcovary, hadamard, fullfact, ff2n
corrcoef
2-79
2cor r coef
Purpose Cor r elat ion coefficient s.
Syntax R = corrcoef(X)
Description R = corrcoef(X) r et ur ns a mat r ix of cor r elat ion coefficient s calculat ed fr om
an input mat r ix whose r ows ar e obser vat ions and whose columns ar e var iables.
Element i,j of t he mat r ix R is r elat ed t o t he cor r esponding element of t he
covar iance mat r ix C = cov(X) by
The corrcoef funct ion is par t of t he st andar d MATLAB language.
See Also cov, mean, std, var
R i j , ( )
C i j ) , ( )
C i i , ( )C j j , ( )
------------------------------------- =
cov
2-80
2cov
Purpose Covar iance mat r ix.
Syntax C = cov(X)
C = cov(x,y)
Description C = cov(X) comput es t he covar iance mat r ix. For a single vect or , cov(x)
r et ur ns a scalar cont aining t he var iance. For mat r ices, wher e each r ow is an
obser vat ion, and each column a var iable, cov(X) is t he covar iance mat r ix.
The var iance funct ion, var(X) is t he same as diag(cov(X)).
The st andar d deviat ion funct ion, std(X) is equivalent t o sqrt(diag(cov(X))).
cov(x,y), wher e x and y ar e column vect or s of equal lengt h, gives t he same
r esult as cov([x y]).
The cov funct ion is par t of t he st andar d MATLAB language.
Algorithm The algor it hm for cov is
[n,p] = size(X);
X = X - ones(n,1)
*
mean(X);
Y = X'

X/(n-1);
See Also corrcoef, mean, std, var
xcov, xcorr (Signal Pr ocessing Toolbox)
crosstab
2-81
2cr osst ab
Purpose Cr oss-t abulat ion of sever al vect or s.
Syntax table = crosstab(col1,col2)
table = crosstab(col1,col2,col3,...)
[table,chi2,p] = crosstab(col1,col2)
[table,chi2,p,label] = crosstab(col1,col2)
Description table = crosstab(col1,col2) t akes t wo vect or s of posit ive int eger s and
r et ur ns a mat r ix, table, of cr oss-t abulat ions. The ijt h element of table
cont ains t he count of all inst ances wher e col1 = i and col2 = j.
Alt er nat ively, col1 and col2 can be vect or s cont aining nonint eger values,
char act er ar r ays, or cell ar r ays of st r ings. crosstab implicit ly assigns a
posit ive int eger gr oup number t o each dist inct value in col1 and col2, and
cr eat es a cr oss-t abulat ion using t hose number s.
table = crosstab(col1,col2,col3,...) r et ur ns table as an n-dimensional
ar r ay, wher e n is t he number of ar gument s you supply. The value of
table(i,j,k,...) is t he count of all inst ances wher e col1 = i, col2 = j,
col3 = k, and so on.
[table,chi2,p] = crosstab(col1,col2) also r et ur ns t he chi-squar e st at ist ic,
chi2, for t est ing t he independence of t he r ows and columns of table. The
scalar p is t he significance level of t he t est . Values of p near zer o cast doubt on
t he assumpt ion of independence of t he r ows and columns of table.
[table,chi2,p,label] = crosstab(col1,col2) also r et ur ns a cell ar r ay
label t hat has one column for each input ar gument . The value in label(i,j)
is t he value of colj t hat defines gr oup i in t he jt h dimension.
Example Ex a mple 1
We gener at e 2 columns of 50 discr et e unifor m r andom number s. The fir st
column has number s fr om 1 t o 3. The second has only t he number s 1 and 2. The
t wo columns ar e independent so we would be sur pr ised if p wer e near zer o.
r1 = unidrnd(3,50,1);
r2 = unidrnd(2,50,1);
[table,chi2,p] = crosstab(r1,r2)
crosstab
2-82
table =
10 5
8 8
6 13
chi2 =
4.1723
p =
0.1242
The r esult , 0.1242, is not a sur pr ise. A ver y small value of p would make us
suspect t he r andomness of t he r andom number gener at or .
Ex a mple 2
We have dat a collect ed on sever al car s over a per iod of t ime. How many
four -cylinder car s wer e made in t he USA dur ing t he lat e par t of t his per iod?
[t,c,p,l] = crosstab(cyl4,when,org);
l
l =
'Other' 'Early' 'USA'
'Four' 'Mid' 'Europe'
[] 'Late' 'Japan'
t(2,3,1)
ans =
38
See Also tabulate
daugment
2-83
2daugment
Purpose D-opt imal augment at ion of an exper iment al design.
Syntax settings = daugment(startdes,nruns)
[settings,X] = daugment(startdes,nruns,'model')
Description settings = daugment(startdes,nruns) augment s an init ial exper iment al
design, startdes, wit h nruns new t est s.
[settings,X] = daugment(startdes,nruns,'model') also supplies t he
design mat r ix, X. The input , 'model', cont r ols t he or der of t he r egr ession
model. By default , daugment assumes a linear addit ive model. Alt er nat ively,
'model' can be any of t hese:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms.
'quadratic' includes int er act ions plus squar ed t er ms.
'purequadratic' includes const ant , linear , and squar ed t er ms.
daugment uses t he coor dinat e exchange algor it hm.
Example We add 5 r uns t o a 2
2
fact or ial design t o allow us t o fit a quadr at ic model.
startdes = [-1 -1; 1 -1; -1 1; 1 1];
settings = daugment(startdes,5,'quadratic')
settings =
-1 -1
1 -1
-1 1
1 1
1 0
-1 0
0 1
0 0
0 -1
The r esult is a 3
2
fact or ial design.
See Also cordexch, dcovary, rowexch
dcovary
2-84
2dcovar y
Purpose D-opt imal design wit h specified fixed covar iat es.
Syntax settings = dcovary(factors,covariates)
[settings,X] = dcovary(factors,covariates,'model')
Description settings = dcovary(factors,covariates,'model') cr eat es a D-opt imal
design subject t o t he const r aint of fixed covariates for each r un. factors is
t he number of exper iment al var iables you want t o cont r ol.
[settings,X] = dcovary(factors,covariates,'model') also cr eat es t he
associat ed design mat r ix, X. The input , 'model', cont r ols t he or der of t he
r egr ession model. By default , dcovary assumes a linear addit ive model.
Alt er nat ively, 'model' can be any of t hese:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms.
'quadratic' includes int er act ions plus squar ed t er ms.
'purequadratic' includes const ant , linear , and squar ed t er ms.
Example Suppose we want t o block an eight r un exper iment int o 4 blocks of size 2 t o fit
a linear model on t wo fact or s.
covariates = dummyvar([1 1 2 2 3 3 4 4]);
settings = dcovary(2,covariates(:,1:3),'linear')
settings =
1 1 1 0 0
-1 -1 1 0 0
-1 1 0 1 0
1 -1 0 1 0
1 1 0 0 1
-1 -1 0 0 1
-1 1 0 0 0
1 -1 0 0 0
The fir st t wo columns of t he out put mat r ix cont ain t he set t ings for t he t wo
fact or s. The last t hr ee columns ar e dummy variable codings for t he four blocks.
See Also daugment, cordexch
dendrogram
2-85
2dendr ogr am
Purpose Plot dendr ogr am gr aphs.
Syntax H = dendrogram(Z)
H = dendrogram(Z,p)
[H,T] = dendrogram(...)
Description H = dendrogram(Z) gener at es a dendr ogr am plot of t he hier ar chical, binar y
clust er t r ee, Z. Z is an (m-1)-by-3 mat r ix, gener at ed by t he linkage funct ion,
wher e m is t he number of object s in t he or iginal dat aset .
A dendr ogr am consist s of many upside-down, U-shaped lines connect ing
object s in a hier ar chical t r ee. Except for t he War d linkage (see linkage), t he
height of each U r epr esent s t he dist ance bet ween t he t wo object s being
connect ed. The out put , H, is a vect or of line handles.
H = dendrogram(Z,p) gener at es a dendr ogr am wit h only t he t op p nodes. By
default , dendrogram uses 30 as t he value of p. When t her e ar e mor e t han 30
init ial nodes, a dendr ogr am may look cr owded. To display ever y node, set p = 0.
[H,T] = dendrogram(...) gener at es a dendr ogr am and r et ur ns T, a vect or of
size m t hat cont ains t he clust er number for each object in t he or iginal dat aset .
T pr ovides access t o t he nodes of a clust er hier ar chy t hat ar e not displayed in
t he dendr ogr am because t hey fall below t he cut off value p. For example, t o find
out which object s ar e cont ained in leaf node k of t he dendr ogr am, use
find(T==k). Leaf nodes ar e t he nodes at t he bot t om of t he dendr ogr am t hat
have no ot her nodes below t hem.
When t her e ar e fewer t han p object s in t he or iginal dat a, all object s ar e
displayed in t he dendr ogr am. In t his case, T is t he ident ical map, i.e.,
T = (1:m)', wher e each node cont ains only it self.
Example rand('seed',12);
X= rand(100,2);
Y= pdist(X,'citiblock');
Z= linkage(Y,'average');
[H, T] = dendrogram(Z);
dendrogram
2-86
find(T==20)
ans =
20
49
62
65
73
96
This out put indicat es t hat leaf node 20 in t he dendr ogr am cont ains t he or iginal
dat a point s 20, 49, 62, 65, 73, and 96.
See Also cluster, clusterdata, cophenet, inconsistent, linkage, pdist, squareform
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
14 17 13 22 12 8 23 20 19 1 21 15 5 2 3 16 27 4 18 24 28 6 10 7 30 26 9 25 11 29
disttool
2-87
2dist t ool
Purpose Int er act ive gr aph of cdf (or pdf) for many pr obabilit y dist r ibut ions.
Syntax disttool
Description The disttool command displays a gr aphic user int er face for explor ing t he
effect s of changing par amet er s on t he plot of a cdf or pdf. Clicking and dr agging
a ver t ical line on t he plot allows you t o int er act ively evaluat e t he funct ion over
it s ent ir e domain.
Evaluat e t he plot t ed funct ion by t yping a value in t he x-axis edit box or
dr agging t he ver t ical r efer ence line on t he plot . For cdfs, you can evaluat e t he
inver se funct ion by t yping a value in t he y-axis edit box or dr agging t he
hor izont al r efer ence line on t he plot . The shape of t he point er changes fr om an
ar r ow t o a cr osshair when it is over t he ver t ical or hor izont al line t o indicat e
t hat t he r efer ence line is dr aggable.
To change t he dist r ibut ion funct ion, choose an opt ion fr om t he menu of
funct ions at t he t op left of t he figur e. To change fr om cdfs t o pdfs, choose an
opt ion fr om t he menu at t he t op r ight of t he figur e.
To change t he par amet er set t ings, move t he slider s or t ype a value in t he edit
box under t he name of t he par amet er . To change t he limit s of a par amet er , t ype
a value in t he edit box at t he t op or bot t om of t he par amet er slider .
To close t he t ool, pr ess t he Close but t on.
See Also randtool
dummyvar
2-88
2dummyvar
Purpose Mat r ix of 0-1 dummy var iables.
Syntax D = dummyvar(group)
Description D = dummyvar(group) gener at es a mat r ix, D, of 0-1 columns. D has one column
for each unique value in each column of t he mat r ix group. Each column of
group cont ains posit ive int eger s t hat indicat e t he gr oup member ship of an
individual r ow.
Example Suppose we ar e st udying t he effect s of t wo machines and t hr ee oper at or s on a
pr ocess. The fir st column of group would have t he values 1 or 2 depending on
which machine was used. The second column of group would have t he values
1, 2, or 3 depending on which oper at or r an t he machine.
group = [1 1;1 2;1 3;2 1;2 2;2 3];
D = dummyvar(group)
D =
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
0 1 1 0 0
0 1 0 1 0
0 1 0 0 1
See Also pinv, regress
errorbar
2-89
2er r or bar
Purpose Plot er r or bar s along a cur ve.
Syntax errorbar(X,Y,L,U,symbol)
errorbar(X,Y,L)
errorbar(Y,L)
Description errorbar(X,Y,L,U,symbol) plot s X ver sus Y wit h er r or bar s specified by L
and U. X, Y, L, and U must be t he same lengt h. If X, Y, L, and U ar e mat r ices, t hen
each column pr oduces a separ at e line. The er r or bar s ar e each dr awn a dist ance
of U(i) above and L(i) below t he point s in (X,Y). symbol is a st r ing t hat
cont r ols t he line t ype, plot t ing symbol, and color of t he er r or bar s.
errorbar(X,Y,L) plot s X ver sus Y wit h symmet r ic er r or bar s about Y.
errorbar(Y,L) plot s Y wit h er r or bar s [YL Y+L].
The errorbar funct ion is a par t of t he st andar d MATLAB language.
Example lambda = (0.1:0.2:0.5);
r = poissrnd(lambda(ones(50,1),:));
[p,pci] = poissfit(r,0.001);
L = p - pci(1,:)
U = pci(2,:) - p
errorbar(1:3,p,L,U,'+')
L =
0.1200 0.1600 0.2600
U =
0.2000 0.2200 0.3400
0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
ewmaplot
2-90
2ewmaplot
Purpose Exponent ially Weight ed Moving Aver age (EWMA) char t for St at ist ical Pr ocess
Cont r ol (SPC).
Syntax ewmaplot(data)
ewmaplot(data,lambda)
ewmaplot(data,lambda,alpha)
ewmaplot(data,lambda,alpha,specs)
h = ewmaplot(...)
Description ewmaplot(data) pr oduces an EWMA char t of t he gr ouped r esponses in data.
The r ows of data cont ain r eplicat e obser vat ions t aken at a given t ime. The r ows
should be in t ime or der .
ewmaplot(data,lambda) pr oduces an EWMA char t of t he gr ouped r esponses in
data, and specifies how much t he cur r ent pr edict ion is influenced by past
obser vat ions. Higher values of lambda give mor e weight t o past obser vat ions.
By default , lambda = 0.4; lambda must be bet ween 0 and 1.
ewmaplot(data,lambda,alpha) pr oduces an EWMA char t of t he gr ouped
r esponses in data, and specifies t he significance level of t he upper and lower
plot t ed confidence limit s. alpha is 0.0027 by default . This value pr oduces
t hr ee-sigma limit s:
norminv(1-0.0027/2)
ans =
3
To get k-sigma limit s, use t he expr ession 2*(1-normcdf(k)). For example, t he
cor r ect alpha value for 2-sigma limit s is 0.0455, as shown below.
k = 2;
2*(1-normcdf(k))
ans =
0.0455
ewmaplot(data,lambda,alpha,specs) pr oduces an EWMA char t of t he
gr ouped r esponses in data, and specifies a t wo-element vect or , specs, for t he
lower and upper specificat ion limit s of t he r esponse.
ewmaplot
2-91
h = ewmaplot(...) r et ur ns a vect or of handles t o t he plot t ed lines.
Example Consider a pr ocess wit h a slowly dr ift ing mean. An EWMA char t is pr efer able
t o an x-bar char t for monit or ing t his kind of pr ocess. The simulat ion below
demonst r at es an EWMA char t for a slow linear dr ift .
t = (1:28)';
r = normrnd(10+0.02*t(:,ones(4,1)),0.5);
ewmaplot(r,0.4,0.01,[9.75 10.75])
The EWMA value for gr oup 28 is higher t han would be expect ed pur ely by
chance. If we had been monit or ing t his pr ocess cont inuously, we would have
det ect ed t he dr ift when gr oup 28 was collect ed, and we would have had an
oppor t unit y t o invest igat e it s cause.
Reference Mont gomer y, D., Introduction to S tatistical Quality Control, J ohn Wiley &
Sons 1991. p. 299.
See Also xbarplot, schart
0 5 10 15 20 25 30
9.6
9.8
10
10.2
10.4
10.6
10.8
28
UCL
LCL
CL
Exponentially Weighted Moving Average (EWMA) Chart
USL
LSL
Sample Number
E
W
M
A
expcdf
2-92
2expcdf
Purpose Exponent ial cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = expcdf(X,MU)
Description P = expcdf(X,MU) comput es t he exponent ial cdf at each of t he values in X
using t he cor r esponding par amet er s in MU. Vect or or mat r ix input s for X and MU
must have t he same size. A scalar input is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input . The par amet er s in MU must be posit ive.
The exponent ial cdf is
The r esult , p, is t he pr obabilit y t hat a single obser vat ion fr om an exponent ial
dist r ibut ion will fall in t he int er val [0 x].
Examples The median of t he exponent ial dist r ibut ion is log(2). Demonst r at e t his fact .
mu = 10:10:60;
p = expcdf(log(2)*mu,mu)
p =
0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
What is t he pr obabilit y t hat an exponent ial r andom var iable will be less t han
or equal t o t he mean, ?
mu = 1:6;
x = mu;
p = expcdf(x,mu)
p =
0.6321 0.6321 0.6321 0.6321 0.6321 0.6321
See Also cdf, expfit, expinv, exppdf, exprnd, expstat
p F x ( )
1

---e
t

---
0
x

d t 1 e
x

---
= = =
expfit
2-93
2expfit
Purpose Par amet er est imat es and confidence int er vals for exponent ial dat a.
Syntax muhat = expfit(x)
[muhat,muci] = expfit(x)
[muhat,muci] = expfit(x,alpha)
Description muhat = expfit(x) r et ur ns t he est imat e of t he par amet er , , of t he
exponent ial dist r ibut ion given dat a x.
[muhat,muci] = expfit(x) also r et ur ns t he 95% confidence int er val in muci.
[muhat,muci] = expfit(x,alpha) gives 100(1-alpha)% confidence
int er vals. For example, alpha = 0.01 yields 99% confidence int er vals.
Example We gener at e 100 independent samples of exponent ial dat a wit h = 3. muhat is
an est imat e of true_mu and muci is a 99% confidence int er val ar ound muhat.
Not ice t hat muci cont ains true_mu.
true_mu = 3;
[muhat,muci] = expfit(r,0.01)
muhat =
2.8835
muci =
2.1949
3.6803
See Also expcdf, expinv, exppdf, exprnd, expstat, betafit, binofit, gamfit, normfit,
poissfit, unifit, weibfit
expinv
2-94
2expinv
Purpose Inver se of t he exponent ial cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = expinv(P,MU)
Description X = expinv(P,MU) comput es t he inver se of t he exponent ial cdf wit h
par amet er s specified by MU for t he cor r esponding pr obabilit ies in P. Vect or or
mat r ix input s for P and MU must have t he same size. A scalar input is expanded
t o a const ant mat r ix wit h t he same dimensions as t he ot her input . The
par amet er s in MU must be posit ive and t he values in P must lie on t he int er val
[0 1].
The inver se of t he exponent ial cdf is
The r esult , x, is t he value such t hat an obser vat ion fr om an exponent ial
dist r ibut ion wit h par amet er will fall in t he r ange [0 x] wit h pr obabilit y p.
Examples Let t he lifet ime of light bulbs be exponent ially dist r ibut ed wit h = 700 hour s.
What is t he median lifet ime of a bulb?
expinv(0.50,700)
ans =
485.2030
So, suppose you buy a box of 700 hour light bulbs. If 700 hour s is t he mean
life of t he bulbs, t hen half t hem will bur n out in less t han 500 hour s.
See Also expcdf, expfit, exppdf, exprnd, expstat, icdf
x F p ( ) ln 1 p ( ) = =
exppdf
2-95
2exppdf
Purpose Exponent ial pr obabilit y densit y funct ion (pdf).
Syntax Y = exppdf(X,MU)
Description exppdf(X,MU) comput es t he exponent ial pdf at each of t he values in X using t he
cor r esponding par amet er s in MU. Vect or or mat r ix input s for X and MU must be
t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input . The par amet er s in MU must be posit ive.
The exponent ial pdf is
The exponent ial pdf is t he gamma pdf wit h it s fir st par amet er equal t o 1.
The exponent ial dist r ibut ion is appr opr iat e for modeling wait ing t imes when
t he pr obabilit y of wait ing an addit ional per iod of t ime is independent of how
long youve alr eady wait ed. For example, t he pr obabilit y t hat a light bulb will
bur n out in it s next minut e of use is r elat ively independent of how many
minut es it has alr eady bur ned.
Examples y = exppdf(5,1:5)
y =
0.0067 0.0410 0.0630 0.0716 0.0736
y = exppdf(1:5,1:5)
y =
0.3679 0.1839 0.1226 0.0920 0.0736
See Also expcdf, expfit, expinv, exprnd, expstat, pdf
y f x ( )
1

---e
x

---
= =
exprnd
2-96
2expr nd
Purpose Random number s fr om t he exponent ial dist r ibut ion.
Syntax R = exprnd(MU)
R = exprnd(MU,m)
R = exprnd(MU,m,n)
Description R = exprnd(MU) gener at es exponent ial r andom number s wit h mean MU. The
size of R is t he size of MU.
R = exprnd(MU,m) gener at es exponent ial r andom number s wit h mean MU,
wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column dimensions of R.
R = exprnd(MU,m,n) gener at es exponent ial r andom number s wit h mean MU,
wher e scalar s m and n ar e t he r ow and column dimensions of R.
Examples n1 = exprnd(5:10)
n1 =
7.5943 18.3400 2.7113 3.0936 0.6078 9.5841
n2 = exprnd(5:10,[1 6])
n2 =
3.2752 1.1110 23.5530 23.4303 5.7190 3.9876
n3 = exprnd(5,2,3)
n3 =
24.3339 13.5271 1.8788
4.7932 4.3675 2.6468
See Also expcdf, expfit, expinv, exppdf, expstat
expstat
2-97
2expst at
Purpose Mean and var iance for t he exponent ial dist r ibut ion.
Syntax [M,V] = expstat(MU)
Description [M,V] = expstat(MU) r et ur ns t he mean and var iance for t he exponent ial
dist r ibut ion wit h par amet er s MU. The mean of t he exponent ial dist r ibut ion is ,
and t he var iance is
2
.
Examples [m,v] = expstat([1 10 100 1000])
m =
1 10 100 1000
v =
1 100 10000 1000000
See Also expcdf, expfit, expinv, exppdf, exprnd
fcdf
2-98
2fcdf
Purpose F cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = fcdf(X,V1,V2)
Description P = fcdf(X,V1,V2) comput es t he F cdf at each of t he values in X using t he
cor r esponding par amet er s in V1 and V2. Vect or or mat r ix input s for X, V1, and
V2 must all be t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input s. The par amet er s in V1 and V2
must be posit ive int eger s.
The F cdf is
The r esult , p, is t he pr obabilit y t hat a single obser vat ion fr om an F dist r ibut ion
wit h par amet er s
1
and
2
will fall in t he int er val [0 x].
Examples This example illust r at es an impor t ant and useful mat hemat ical ident it y for t he
F dist r ibut ion.
nu1 = 1:5;
nu2 = 6:10;
x = 2:6;
F1 = fcdf(x,nu1,nu2)
F1 =
0.7930 0.8854 0.9481 0.9788 0.9919
F2 = 1 - fcdf(1./x,nu2,nu1)
F2 =
0.7930 0.8854 0.9481 0.9788 0.9919
See Also cdf, finv, fpdf, frnd, fstat
p F x
1

2
, ( )


1

2
+ ( )
2
-----------------------


1
2
------
,
_


2
2
------
,
_
--------------------------------
0
x

2
------
,
_

1
2
-----
t

1
2
2
--------------
1

1

2
------
,
_
t +

1

2
+
2
-----------------
-------------------------------------------d t = =
ff2n
2-99
2ff2n
Purpose Two-level full-fact or ial designs.
Syntax X = ff2n(n)
Description X = ff2n(n) cr eat es a t wo-level full-fact or ial design, X, wher e n is t he desir ed
number of columns of X. The number of r ows in X is 2
n
.
Example X = ff2n(3)
X =
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
X is t he binar y r epr esent at ion of t he number s fr om 0 t o 2
n
-1.
See Also fracfact, fullfact
finv
2-100
2finv
Purpose Inver se of t he F cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = finv(P,V1,V2)
Description X = finv(P,V1,V2) comput es t he inver se of t he F cdf wit h numer at or degr ees
of fr eedom V1 and denominat or degr ees of fr eedom V2 for t he cor r esponding
pr obabilit ies in P. Vect or or mat r ix input s for P, V1, and V2 must all be t he same
size. A scalar input is expanded t o a const ant mat r ix wit h t he same dimensions
as t he ot her input s.
The par amet er s in V1 and V2 must all be posit ive int eger s, and t he values in P
must lie on t he int er val [0 1].
The F inver se funct ion is defined in t er ms of t he F cdf as
wher e
Examples Find a value t hat should exceed 95% of t he samples fr om an F dist r ibut ion wit h
5 degr ees of fr eedom in t he numer at or and 10 degr ees of fr eedom in t he
denominat or .
x = finv(0.95,5,10)
x =
3.3258
You would obser ve values gr eat er t han 3.3258 only 5% of t he t ime by chance.
See Also fcdf, fpdf, frnd, fstat, icdf
x F
1
p
1

2
, ( ) x:F x
1

2
, ( ) p = { } = =
p F x
1

2
, ( )


1

2
+ ( )
2
-----------------------


1
2
------
,
_


2
2
------
,
_
--------------------------------
0
x

2
------
,
_

1
2
-----
t

1
2
2
--------------
1

1

2
------
,
_
t +

1

2
+
2
-----------------
-------------------------------------------d t = =
fpdf
2-101
2fpdf
Purpose F pr obabilit y densit y funct ion (pdf).
Syntax Y = fpdf(X,V1,V2)
Description Y = fpdf(X,V1,V2) comput es t he F pdf at each of t he values in X using t he
cor r esponding par amet er s in V1 and V2. Vect or or mat r ix input s for X, V1,
and V2 must all be t he same size. A scalar input is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s. The par amet er s in V1
and V2 must all be posit ive int eger s, and t he values in X must lie on t he int er val
[0 ).
The pr obabilit y densit y funct ion for t he F dist r ibut ion is
Examples y = fpdf(1:6,2,2)
y =
0.2500 0.1111 0.0625 0.0400 0.0278 0.0204
z = fpdf(3,5:10,5:10)
z =
0.0689 0.0659 0.0620 0.0577 0.0532 0.0487
See Also fcdf, finv, frnd, fstat, pdf
y f x
1

2
, ( )


1

2
+ ( )
2
-----------------------


1
2
------
,
_


2
2
------
,
_
--------------------------------

1

2
------
,
_

1
2
-----
x

1
2
2
--------------
1

1

2
------
,
_
x +

1

2
+
2
-----------------
-------------------------------------------- = =
fracfact
2-102
2fr acfact
Purpose Gener at e fr act ional fact or ial design fr om gener at or s.
Syntax x = fracfact('gen')
[x,conf] = fracfact('gen')
Description x = fracfact('gen') gener at es a fr act ional fact or ial design as specified by
t he gener at or st r ing gen, and r et ur ns a mat r ix x of design point s. The input
st r ing gen is a gener at or st r ing consist ing of wor ds separ at ed by spaces. Each
wor d descr ibes how a column of t he out put design should be for med fr om
columns of a full fact or ial. Typically gen will include single-let t er wor ds for t he
fir st few fact or s, plus addit ional mult iple-let t er wor ds descr ibing how t he
r emaining fact or s ar e confounded wit h t he fir st few.
The out put mat r ix x is a fr act ion of a t wo-level full-fact or ial design. Suppose
t her e ar e m wor ds in gen, and t hat each wor d is for med fr om a subset of t he
fir st n let t er s of t he alphabet . The out put mat r ix x has 2
n
r ows and m columns.
Let F r epr esent t he t wo-level full-fact or ial design as pr oduced by ff2n(n). The
values in column j of x ar e comput ed by mult iplying t oget her t he columns of F
cor r esponding t o let t er s t hat appear in t he jt h wor d of t he gener at or st r ing.
[x,conf] = fracfact('gen') also r et ur ns a cell ar r ay, conf, t hat descr ibes
t he confounding pat t er n among t he main effect s and all t wo-fact or
int er act ions.
Examples Ex a mple 1
We want t o r un an exper iment t o st udy t he effect s of four fact or s on a r esponse,
but we can only affor d eight r uns. (A r un is a single r epet it ion of t he exper iment
at a specified combinat ion of fact or values.) Our goal is t o det er mine which
fact or s affect t he r esponse. Ther e may be int er act ions bet ween some pair s of
fact or s.
A t ot al of sixt een r uns would be r equir ed t o t est all fact or combinat ions.
However , if we ar e willing t o assume t her e ar e no t hr ee-fact or int er act ions, we
can est imat e t he main fact or effect s in just eight r uns.
[x,conf] = fracfact('a b c abc')
fracfact
2-103
x =
-1 -1 -1 -1
-1 -1 1 1
-1 1 -1 1
-1 1 1 -1
1 -1 -1 1
1 -1 1 -1
1 1 -1 -1
1 1 1 1
conf =
'Term' 'Generator' 'Confounding'
'X1' 'a' 'X1'
'X2' 'b' 'X2'
'X3' 'c' 'X3'
'X4' 'abc' 'X4'
'X1*X2' 'ab' 'X1*X2 + X3*X4'
'X1*X3' 'ac' 'X1*X3 + X2*X4'
'X1*X4' 'bc' 'X1*X4 + X2*X3'
'X2*X3' 'bc' 'X1*X4 + X2*X3'
'X2*X4' 'ac' 'X1*X3 + X2*X4'
'X3*X4' 'ab' 'X1*X2 + X3*X4'
The fir st t hr ee columns of t he x mat r ix for m a full-fact or ial design. The final
column is for med by mult iplying t he ot her t hr ee. The confounding pat t er n
shows t hat t he main effect s for all four fact or s ar e est imable, but t he t wo-fact or
int er act ions ar e not . For example, t he X1*X2 and X3*X4 int er act ions ar e
confounded, so it is not possible t o est imat e t heir effect s separ at ely.
Aft er conduct ing t he exper iment , we may find out t hat t he 'ab' effect is
significant . In or der t o det er mine whet her t his effect comes fr om X1*X2 or
X3*X4 we would have t o r un t he r emaining eight r uns. We can obt ain t hose
r uns by r ever sing t he sign of t he final gener at or .
fracfact('a b c -abc')
fracfact
2-104
ans =
-1 -1 -1 1
-1 -1 1 -1
-1 1 -1 -1
-1 1 1 1
1 -1 -1 -1
1 -1 1 1
1 1 -1 1
1 1 1 -1
Ex a mple 2
Suppose now we need t o st udy t he effect s of eight fact or s. A full fact or ial would
r equir e 256 r uns. By clever choice of gener at or s, we can find a sixt een-r un
design t hat can est imat e t hose eight effect s wit h no confounding fr om
t wo-fact or int er act ions.
[x,c] = fracfact('a b c d abc acd abd bcd');
c(1:10,:)
ans =
'Term' 'Generator' 'Confounding'
'X1' 'a' 'X1'
'X2' 'b' 'X2'
'X3' 'c' 'X3'
'X4' 'd' 'X4'
'X5' 'abc' 'X5'
'X6' 'acd' 'X6'
'X7' 'abd' 'X7'
'X8' 'bcd' 'X8'
'X1*X2' 'ab' 'X1*X2 + X3*X5 + X4*X7 + X6*X8'
This confounding pat t er n shows t hat t he main effect s ar e not confounded wit h
t wo-fact or int er act ions. The final r ow shown r eveals t hat a gr oup of four
t wo-fact or int er act ions is confounded. Ot her choices of gener at or s would not
have t he same desir able pr oper t y.
[x,c] = fracfact('a b c d ab cd ad bc');
c(1:10,:)
fracfact
2-105
ans =
'Term' 'Generator' 'Confounding'
'X1' 'a' 'X1 + X2*X5 + X4*X7'
'X2' 'b' 'X2 + X1*X5 + X3*X8'
'X3' 'c' 'X3 + X2*X8 + X4*X6'
'X4' 'd' 'X4 + X1*X7 + X3*X6'
'X5' 'ab' 'X5 + X1*X2'
'X6' 'cd' 'X6 + X3*X4'
'X7' 'ad' 'X7 + X1*X4'
'X8' 'bc' 'X8 + X2*X3'
'X1*X2' 'ab' 'X5 + X1*X2'
Her e all t he main effect s ar e confounded wit h one or mor e t wo-fact or
int er act ions.
References Box, G. A. F., W. G. Hunt er , and J . S. Hunt er (1978), S tatistics for
Experimenters, Wiley, New Yor k.
See Also ff2n, fullfact, hadamard
friedman
2-106
2fr iedman
Purpose Fr iedmans nonpar amet r ic t wo-way Analysis of Var iance (ANOVA).
Syntax p = friedman(X,reps)
p = friedman(X,reps,'displayopt)
[p,table] = friedman(...)
[p,table,stats] = friedman(...)
Description p = friedman(X,reps) per for ms t he nonpar amet r ic Fr iedmans t est t o
compar e t he means of t he columns of X. Fr iedmans t est is similar t o classical
t wo-way ANOVA, but it t est s only for column effect s aft er adjust ing for possible
r ow effect s. It does not t est for r ow effect s or int er act ion effect s. Fr iedmans t est
is appr opr iat e when columns r epr esent t r eat ment s t hat ar e under st udy, and
r ows r epr esent nuisance effect s (blocks) t hat need t o be t aken int o account but
ar e not of any int er est .
The differ ent columns r epr esent changes in fact or A. The differ ent r ows
r epr esent changes in t he blocking fact or B. If t her e is mor e t han one
obser vat ion for each combinat ion of fact or s, input reps indicat es t he number of
r eplicat es in each cell, which must be const ant .
The mat r ix below illust r at es t he for mat for a set -up wher e column fact or A has
t hr ee levels, r ow fact or B has t wo levels, and t her e ar e t wo r eplicat es (reps=2).
The subscr ipt s indicat e r ow, column, and r eplicat e, r espect ively.
Fr iedmans t est assumes a model of t he for m
wher e is an over all locat ion par amet er , r epr esent s t he column effect ,
r epr esent s t he r ow effect , and r epr esent s t he er r or . This t est r anks t he
dat a wit hin each level of B, and t est s for a differ ence acr oss levels of A. The p
t hat friedman r et ur ns is t he p-value for t he null hypot hesis t hat . If t he
p-value is near zer o, t his cast s doubt on t he null hypot hesis. A sufficient ly
x
111
x
121
x
131
x
112
x
122
x
132
x
211
x
221
x
231
x
212
x
222
x
232
x
i j k

i

j

i j k
+ + + =

i

j

i j k

i
0 =
friedman
2-107
small p-value suggest s t hat at least one column-sample mean is significant ly
differ ent t han t he ot her column-sample means; i.e., t her e is a main effect due
t o fact or A. The choice of a limit for t he p-value t o det er mine whet her a r esult
is st at ist ically significant is left t o t he r esear cher . It is common t o declar e a
r esult significant if t he p-value is less t han 0.05 or 0.01.
friedman also displays a figur e showing an ANOVA t able, which divides t he
var iabilit y of t he r anks int o t wo or t hr ee par t s:
The var iabilit y due t o t he differ ences among t he column means
The var iabilit y due t o t he int er act ion bet ween r ows and columns (if reps is
gr eat er t han it s default value of 1)
The r emaining var iabilit y not explained by any syst emat ic sour ce
The ANOVA t able has six columns:
The fir st shows t he sour ce of t he var iabilit y.
The second shows t he Sum of Squar es (SS) due t o each sour ce.
The t hir d shows t he degr ees of fr eedom (df) associat ed wit h each sour ce.
The four t h shows t he Mean Squar es (MS), which is t he r at io SS/df.
The fift h shows Fr iedmans chi-squar e st at ist ic.
The sixt h shows t he p-value for t he chi-squar e st at ist ic.
p = friedman(X,reps,'displayopt') enables t he ANOVA t able display
when 'displayopt' is 'on' (default ) and suppr esses t he display when
'displayopt' is 'off'.
[p,table] = friedman(...) r et ur ns t he ANOVA t able (including column and
r ow labels) in cell ar r ay table. (You can copy a t ext ver sion of t he ANOVA t able
t o t he clipboar d by select ing Copy Text fr om t he Edi t menu.
[p,table,stats] = friedman(...) r et ur ns a stats st r uct ur e t hat you can
use t o per for m a follow-up mult iple compar ison t est . The friedman t est
evaluat es t he hypot hesis t hat t he column effect s ar e all t he same against t he
alt er nat ive t hat t hey ar e not all t he same. Somet imes it is pr efer able t o
per for m a t est t o det er mine which pair s of column effect s ar e significant ly
differ ent , and which ar e not . You can use t he multcompare funct ion t o per for m
such t est s by supplying t he stats st r uct ur e as input .
friedman
2-108
Examples Let s r epeat t he example fr om t he anova2 funct ion, t his t ime applying
Fr iedmans t est . Recall t hat t he dat a below come fr om a st udy of popcor n
br ands and popper t ype (Hogg 1987). The columns of t he mat r ix popcorn ar e
br ands (Gour met , Nat ional, and Gener ic). The r ows ar e popper t ype (Oil and
Air ). The st udy popped a bat ch of each br and t hr ee t imes wit h each popper . The
values ar e t he yield in cups of popped popcor n.
load popcorn
popcorn
popcorn =
5.5000 4.5000 3.5000
5.5000 4.5000 4.0000
6.0000 4.0000 3.0000
6.5000 5.0000 4.0000
7.0000 5.5000 5.0000
7.0000 5.0000 4.5000
p = friedman(popcorn,3)
p =
0.0010
The small p-value of 0.001 indicat es t he popcor n br and affect s t he yield of
popcor n. This is consist ent wit h t he r esult s fr om anova2.
We could also t est popper t ype by per mut ing t he popcorn ar r ay as descr ibed on
Fr iedmans Test on page 1-97 and r epeat ing t he t est .
friedman
2-109
References Hogg, R. V. and J . Ledolt er . Engineering S tatistics. MacMillan Publishing
Company, 1987.
Hollander , M. and D. A. Wolfe. Nonparametric S tatistical Methods. Wiley,
1973.
See Also anova2, multcompare
frnd
2-110
2fr nd
Purpose Random number s fr om t he F dist r ibut ion.
Syntax R = frnd(V1,V2)
R = frnd(V1,V2,m)
R = frnd(V1,V2,m,n)
Description R = frnd(V1,V2) gener at es r andom number s fr om t he F dist r ibut ion wit h
numer at or degr ees of fr eedom V1 and denominat or degr ees of fr eedom V2.
Vect or or mat r ix input s for V1 and V2 must have t he same size, which is also
t he size of R. A scalar input for V1 or V2 is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input .
R = frnd(V1,V2,m) gener at es r andom number s fr om t he F dist r ibut ion wit h
par amet er s V1 and V2, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = frnd(V1,V2,m,n) gener at es r andom number s fr om t he F dist r ibut ion
wit h par amet er s V1 and V2, wher e scalar s m and n ar e t he r ow and column
dimensions of R.
Examples n1 = frnd(1:6,1:6)
n1 =
0.0022 0.3121 3.0528 0.3189 0.2715 0.9539
n2 = frnd(2,2,[2 3])
n2 =
0.3186 0.9727 3.0268
0.2052 148.5816 0.2191
n3 = frnd([1 2 3;4 5 6],1,2,3)
n3 =
0.6233 0.2322 31.5458
2.5848 0.2121 4.4955
See Also fcdf, finv, fpdf, fstat
fstat
2-111
2fst at
Purpose Mean and var iance for t he F dist r ibut ion.
Syntax [M,V] = fstat(V1,V2)
Description [M,V] = fstat(V1,V2) r et ur ns t he mean and var iance for t he F dist r ibut ion
wit h par amet er s specified by V1 and V2. Vect or or mat r ix input s for V1 and V2
must have t he same size, which is also t he size of M and V. A scalar input for V1
or V2 is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her
input .
The mean of t he F dist r ibut ion for values of
2
gr eat er t han 2 is
The var iance of t he F dist r ibut ion for values of
2
gr eat er t han 4 is
The mean of t he F dist r ibut ion is undefined if
2
is less t han 3. The var iance is
undefined for
2
less t han 5.
Examples fstat r et ur ns NaN when t he mean and var iance ar e undefined.
[m,v] = fstat(1:5,1:5)
m =
NaN NaN 3.0000 2.0000 1.6667
v =
NaN NaN NaN NaN 8.8889
See Also fcdf, finv, frnd, frnd

2
2
------------
2
2
2

1

2
2 + ( )

1

2
2 ( )
2

2
4 ( )
--------------------------------------------------
fsurfht
2-112
2fsur fht
Purpose Int er act ive cont our plot of a funct ion.
Syntax fsurfht('fun',xlims,ylims)
fsurfht('fun',xlims,ylims,p1,p2,p3,p4,p5)
Description fsurfht('fun',xlims,ylims) is an int er act ive cont our plot of t he funct ion
specified by t he t ext var iable fun. The x-axis limit s ar e specified by xlims in
t he for m [xmin xmax], and t he y-axis limit s ar e specified by ylims in t he for m
[ymin ymax].
fsurfht('fun',xlims,ylims,p1,p2,p3,p4,p5) allows for five opt ional
par amet er s t hat you can supply t o t he funct ion fun.
The int er sect ion of t he ver t ical and hor izont al r efer ence lines on t he plot
defines t he cur r ent x-value and y-value. You can dr ag t hese r efer ence lines and
wat ch t he calculat ed z-values (at t he t op of t he plot ) updat e simult aneously.
Alt er nat ively, you can t ype t he x-value and y-value int o edit able t ext fields on
t he x-axis and y-axis.
Example Plot t he Gaussian likelihood funct ion for t he gas.mat dat a.
load gas
Cr eat e a funct ion cont aining t he following commands, and name it
gauslike.m.
function z = gauslike(mu,sigma,p1)
n = length(p1);
z = ones(size(mu));
for i = 1:n
z = z .* (normpdf(p1(i),mu,sigma));
end
The gauslike funct ion calls normpdf, t r eat ing t he dat a sample as fixed and t he
par amet er s and as var iables. Assume t hat t he gas pr ices ar e nor mally
dist r ibut ed, and plot t he likelihood sur face of t he sample.
fsurfht('gauslike',[112 118],[3 5],price1)
fsurfht
2-113
The sample mean is t he x-value at t he maximum, but t he sample st andar d
deviat ion is not t he y-value at t he maximum.
mumax = mean(price1)
mumax =
115.1500
sigmamax = std(price1)*sqrt(19/20)
sigmamax =
3.7719
fullfact
2-114
2fullfact
Purpose Full-fact or ial exper iment al design.
Syntax design = fullfact(levels)
Description design = fullfact(levels) give t he fact or set t ings for a full fact or ial design.
Each element in t he vect or levels specifies t he number of unique values in t he
cor r esponding column of design.
For example, if t he fir st element of levels is 3, t hen t he fir st column of design
cont ains only int eger s fr om 1 t o 3.
Example If levels = [2 4], fullfact gener at es an eight -r un design wit h t wo levels in
t he fir st column and four in t he second column.
d = fullfact([2 4])
d =
1 1
2 1
1 2
2 2
1 3
2 3
1 4
2 4
See Also ff2n, dcovary, daugment, cordexch
gamcdf
2-115
2gamcdf
Purpose Gamma cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = gamcdf(X,A,B)
Description gamcdf(X,A,B) comput es t he gamma cdf at each of t he values in X using t he
cor r esponding par amet er s in A and B. Vect or or mat r ix input s for X, A, and B
must all be t he same size. A scalar input is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input s. The par amet er s in A and B must be
posit ive.
The gamma cdf is
The r esult , p, is t he pr obabilit y t hat a single obser vat ion fr om a gamma
dist r ibut ion wit h par amet er s a and b will fall in t he int er val [0 x].
gammainc is t he gamma dist r ibut ion wit h b fixed at 1.
Examples a = 1:6;
b = 5:10;
prob = gamcdf(a.b,a,b)
prob =
0.6321 0.5940 0.5768 0.5665 0.5595 0.5543
The mean of t he gamma dist r ibut ion is t he pr oduct of t he par amet er s, ab. In
t his example, t he mean appr oaches t he median as it incr eases (i.e., t he
dist r ibut ion becomes mor e symmet r ic).
See Also cdf, gamfit, gaminv, gamlike, gampdf, gamrnd, gamstat
p F x a b , ( )
1
b
a
a ( )
------------------ t
a 1
e
t
b
---
t d
0
x

= =
gamfit
2-116
2gamfit
Purpose Par amet er est imat es and confidence int er vals for gamma dist r ibut ed dat a.
Syntax phat = gamfit(x)
[phat,pci] = gamfit(x)
[phat,pci] = gamfit(x,alpha)
Description phat = gamfit(x) r et ur ns t he maximum likelihood est imat es (MLEs) for t he
par amet er s of t he gamma dist r ibut ion given t he dat a in vect or x.
[phat,pci] = gamfit(x) r et ur ns MLEs and 95% per cent confidence
int er vals. The fir st r ow of pci is t he lower bound of t he confidence int er vals;
t he last r ow is t he upper bound.
[phat,pci] = gamfit(x,alpha) r et ur ns 100(1-alpha)% confidence
int er vals. For example, alpha = 0.01 yields 99% confidence int er vals.
Example Not e t hat t he 95% confidence int er vals in t he example below br acket t he t r ue
par amet er values of 2 and 4.
a = 2; b = 4;
r = gamrnd(a,b,100,1);
[p,ci] = gamfit(r)
p =
2.1990 3.7426
ci =
1.6840 2.8298
2.7141 4.6554
Reference Hahn, G. J . and S.S. Shapir o. S tatistical Models in Engineering. J ohn Wiley &
Sons, New Yor k. 1994. p. 88.
See Also gamcdf, gaminv, gamlike, gampdf, gamrnd, gamstat, betafit, binofit, expfit,
normfit, poissfit, unifit, weibfit
gaminv
2-117
2gaminv
Purpose Inver se of t he gamma cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = gaminv(P,A,B)
Description X = gaminv(P,A,B) comput es t he inver se of t he gamma cdf wit h par amet er s A
and B for t he cor r esponding pr obabilit ies in P. Vect or or mat r ix input s for P, A,
and B must all be t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input s. The par amet er s in A and B must
all be posit ive, and t he values in P must lie on t he int er val [0 1].
The gamma inver se funct ion in t er ms of t he gamma cdf is
wher e
Algorithm Ther e is no known analyt ical solut ion t o t he int egr al equat ion above. gaminv
uses an it er at ive appr oach (Newt ons met hod) t o conver ge on t he solut ion.
Examples This example shows t he r elat ionship bet ween t he gamma cdf and it s inver se
funct ion.
a = 1:5;
b = 6:10;
x = gaminv(gamcdf(1:5,a,b),a,b)
x =
1.0000 2.0000 3.0000 4.0000 5.0000
See Also gamcdf, gamfit, gamlike, gampdf, gamrnd, gamstat, icdf
x F
1
p a b , ( ) x:F x a b , ( ) p = { } = =
p F x a b , ( )
1
b
a
a ( )
------------------ t
a 1
e
t
b
---
t d
0
x

= =
gamlike
2-118
2gamlike
Purpose Negat ive gamma log-likelihood funct ion.
Syntax logL = gamlike(params,data)
[logL,avar] = gamlike(params,data)
Description logL = gamlike(params,data) r et ur ns t he negat ive of t he gamma
log-likelihood funct ion for t he par amet er s, params, given data. The lengt h of
out put vect or logL is t he lengt h of vect or data.
[logL,avar] = gamlike(params,data) also r et ur ns avar, which is t he
asympt ot ic var iance-covar iance mat r ix of t he par amet er est imat es when t he
values in params ar e t he maximum likelihood est imat es. avar is t he inver se of
Fisher s infor mat ion mat r ix. The diagonal element s of avar ar e t he asympt ot ic
var iances of t heir r espect ive par amet er s.
gamlike is a ut ilit y funct ion for maximum likelihood est imat ion of t he gamma
dist r ibut ion. Since gamlike r et ur ns t he negat ive gamma log-likelihood
funct ion, minimizing gamlike using fminsearch is t he same as maximizing t he
likelihood.
Example This example cont inues t he example for gamfit.
a = 2; b = 3;
r = gamrnd(a,b,100,1);
[logL,info] = gamlike([2.1990 2.8069],r)
logL =
267.5585
info =
0.0690 -0.0790
-0.0790 0.1220
See Also betalike, gamcdf, gamfit, gaminv, gampdf, gamrnd, gamstat, mle, weiblike
gampdf
2-119
2gampdf
Purpose Gamma pr obabilit y densit y funct ion (pdf).
Syntax Y = gampdf(X,A,B)
Description gampdf(X,A,B) comput es t he gamma pdf at each of t he values in X using t he
cor r esponding par amet er s in A and B. Vect or or mat r ix input s for X, A, and B
must all be t he same size. A scalar input is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input s. The par amet er s in A and B must all
be posit ive, and t he values in X must lie on t he int er val [0 ).
The gamma pdf is
The gamma pr obabilit y densit y funct ion is useful in r eliabilit y models of
lifet imes. The gamma dist r ibut ion is mor e flexible t han t he exponent ial
dist r ibut ion in t hat t he pr obabilit y of a pr oduct sur viving an addit ional per iod
may depend on it s cur r ent age. The exponent ial and
2
funct ions ar e special
cases of t he gamma funct ion.
Examples The exponent ial dist r ibut ion is a special case of t he gamma dist r ibut ion.
mu = 1:5;
y = gampdf(1,1,mu)
y =
0.3679 0.3033 0.2388 0.1947 0.1637
y1 = exppdf(1,mu)
y1 =
0.3679 0.3033 0.2388 0.1947 0.1637
See Also gamcdf, gamfit, gaminv, gamlike, gamrnd, gamstat, pdf
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
gamrnd
2-120
2gamr nd
Purpose Random number s fr om t he gamma dist r ibut ion.
Syntax R = gamrnd(A,B)
R = gamrnd(A,B,m)
R = gamrnd(A,B,m,n)
Description R = gamrnd(A,B) gener at es gamma r andom number s wit h par amet er s A
and B. Vect or or mat r ix input s for A and B must have t he same size, which is
also t he size of R. A scalar input for A or B is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input .
R = gamrnd(A,B,m) gener at es gamma r andom number s wit h par amet er s A
and B, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column dimensions
of R.
R = gamrnd(A,B,m,n) gener at es gamma r andom number s wit h par amet er s A
and B, wher e scalar s m and n ar e t he r ow and column dimensions of R.
Examples n1 = gamrnd(1:5,6:10)
n1 =
9.1132 12.8431 24.8025 38.5960 106.4164
n2 = gamrnd(5,10,[1 5])
n2 =
30.9486 33.5667 33.6837 55.2014 46.8265
n3 = gamrnd(2:6,3,1,5)
n3 =
12.8715 11.3068 3.0982 15.6012 21.6739
See Also gamcdf, gamfit, gaminv, gamlike, gampdf, gamstat
gamstat
2-121
2gamst at
Purpose Mean and var iance for t he gamma dist r ibut ion.
Syntax [M,V] = gamstat(A,B)
Description [M,V] = gamstat(A,B) r et ur ns t he mean and var iance for t he gamma
dist r ibut ion wit h par amet er s specified by A and B. Vect or or mat r ix input s for
A and B must have t he same size, which is also t he size of M and V. A scalar input
for A or B is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input .
The mean of t he gamma dist r ibut ion wit h par amet er s a and b is ab. The
var iance is ab
2
.
Examples [m,v] = gamstat(1:5,1:5)
m =
1 4 9 16 25
v =
1 8 27 64 125
[m,v] = gamstat(1:5,1./(1:5))
m =
1 1 1 1 1
v =
1.0000 0.5000 0.3333 0.2500 0.2000
See Also gamcdf, gamfit, gaminv, gamlike, gampdf, gamrnd
geocdf
2-122
2geocdf
Purpose Geomet r ic cumulat ive dist r ibut ion funct ion (cdf).
Syntax Y = geocdf(X,P)
Description geocdf(X,P) comput es t he geomet r ic cdf at each of t he values in X using t he
cor r esponding pr obabilit ies in P. Vect or or mat r ix input s for X and P must be
t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input . The par amet er s in P must lie on t he int er val
[0 1].
The geomet r ic cdf is
wher e .
The r esult , y, is t he pr obabilit y of obser ving up t o x t r ials befor e a success, when
t he pr obabilit y of success in any given t r ial is p.
Examples Suppose you t oss a fair coin r epeat edly. If t he coin lands face up (heads), t hat
is a success. What is t he pr obabilit y of obser ving t hr ee or fewer t ails befor e
get t ing a heads?
p = geocdf(3,0.5)
p =
0.9375
See Also cdf, geoinv, geopdf, geornd, geostat
y F x p ( ) pq
i
i 0 =
f l oor x ( )

= =
q 1 p =
geoinv
2-123
2geoinv
Purpose Inver se of t he geomet r ic cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = geoinv(Y,P)
Description X = geoinv(Y,P) r et ur ns t he smallest posit ive int eger X such t hat t he
geomet r ic cdf evaluat ed at X is equal t o or exceeds Y. You can t hink of Y as t he
pr obabilit y of obser ving X successes in a r ow in independent t r ials wher e P is
t he pr obabilit y of success in each t r ial.
Vect or or mat r ix input s for P and Y must have t he same size, which is also t he
size of X. A scalar input for P and Y is expanded t o a const ant mat r ix wit h t he
same dimensions as t he ot her input . The values in P and Y must lie on t he
int er val [0 1].
Examples The pr obabilit y of cor r ect ly guessing t he r esult of 10 coin t osses in a r ow is less
t han 0.001 (unless t he coin is not fair ).
psychic = geoinv(0.999,0.5)
psychic =
9
The example below shows t he inver se met hod for gener at ing r andom number s
fr om t he geomet r ic dist r ibut ion.
rndgeo = geoinv(rand(2,5),0.5)
rndgeo =
0 1 3 1 0
0 1 0 2 0
See Also geocdf, geopdf, geornd, geostat, icdf
geomean
2-124
2geomean
Purpose Geomet r ic mean of a sample.
Syntax m = geomean(X)
Description geomean calculat es t he geomet r ic mean of a sample. For vect or s, geomean(x) is
t he geomet r ic mean of t he element s in x. For mat r ices, geomean(X) is a r ow
vect or cont aining t he geomet r ic means of each column.
The geomet r ic mean is
Examples The sample aver age is gr eat er t han or equal t o t he geomet r ic mean.
x = exprnd(1,10,6);
geometric = geomean(x)
geometric =
0.7466 0.6061 0.6038 0.2569 0.7539 0.3478
average = mean(x)
average =
1.3509 1.1583 0.9741 0.5319 1.0088 0.8122
See Also mean, median, harmmean, trimmean
m x
i
i 1 =
n

1
n
---
=
geopdf
2-125
2geopdf
Purpose Geomet r ic pr obabilit y densit y funct ion (pdf).
Syntax Y = geopdf(X,P)
Description geocdf(X,P) comput es t he geomet r ic pdf at each of t he values in X using t he
cor r esponding pr obabilit ies in P. Vect or or mat r ix input s for X and P must be
t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input . The par amet er s in P must lie on t he int er val
[0 1].
The geomet r ic pdf is
wher e .
Examples Suppose you t oss a fair coin r epeat edly. If t he coin lands face up (heads), t hat
is a success. What is t he pr obabilit y of obser ving exact ly t hr ee t ails befor e
get t ing a heads?
p = geopdf(3,0.5)
p =
0.0625
See Also geocdf, geoinv, geornd, geostat, pdf
y f x p ( ) pq
x
I
0 1 K , , ( )
x ( ) = =
q 1 p =
geornd
2-126
2geor nd
Purpose Random number s fr om t he geomet r ic dist r ibut ion.
Syntax R = geornd(P)
R = geornd(P,m)
R = geornd(P,m,n)
Description The geomet r ic dist r ibut ion is useful when you want t o model t he number of
successive failur es pr eceding a success, wher e t he pr obabilit y of success in any
given t r ial is t he const ant P.
R = geornd(P) gener at es geomet r ic r andom number s wit h pr obabilit y
par amet er P. The size of R is t he size of P.
R = geornd(P,m) gener at es geomet r ic r andom number s wit h pr obabilit y
par amet er P, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column
dimensions of R.
R = geornd(P,m,n) gener at es geomet r ic r andom number s wit h pr obabilit y
par amet er P, wher e scalar s m and n ar e t he r ow and column dimensions of R.
The par amet er s in P must lie on t he int er val [0 1].
Examples r1 = geornd(1 ./ 2.^(1:6))
r1 =
2 10 2 5 2 60
r2 = geornd(0.01,[1 5])
r2 =
65 18 334 291 63
r3 = geornd(0.5,1,6)
r3 =
0 7 1 3 1 0
See Also geocdf, geoinv, geopdf, geostat
geostat
2-127
2geost at
Purpose Mean and var iance for t he geomet r ic dist r ibut ion.
Syntax [M,V] = geostat(P)
Description [M,V] = geostat(P) r et ur ns t he mean and var iance for t he geomet r ic
dist r ibut ion wit h par amet er s specified by P.
The mean of t he geomet r ic dist r ibut ion wit h par amet er p is q/p, wher e q = 1-p.
The var iance is q/p
2
.
Examples [m,v] = geostat(1./(1:6))
m =
0 1.0000 2.0000 3.0000 4.0000 5.0000
v =
0 2.0000 6.0000 12.0000 20.0000 30.0000
See Also geocdf, geoinv, geopdf, geornd
gline
2-128
2gline
Purpose Int er act ively dr aw a line in a figur e.
Syntax gline(fig)
h = gline(fig)
gline
Description gline(fig) allows you t o dr aw a line segment in t he figur e fig by clicking t he
point er at t he t wo end-point s. A r ubber band line t r acks t he point er movement .
h = gline(fig) r et ur ns t he handle t o t he line in h.
gline wit h no input ar gument s dr aws in t he cur r ent figur e.
See Also refline, gname
glmdemo
2-129
2glmdemo
Purpose Demo of gener alized linear models.
Syntax glmdemo
Description glmdemo begins a slide show demonst r at ion of gener alized linear models. The
slides indicat e when gener alized linear models ar e useful, how t o fit
gener alized linear models using t he glmfit funct ion, and how t o make
pr edict ions using t he glmval funct ion.
See Also glmfit, glmval
glmfit
2-130
2glmfit
Purpose Gener alized linear model fit t ing.
Syntax b = glmfit(X,Y,'distr')
b = glmfit(X,Y,'distr','link','estdisp',offset,pwts,'const')
[b,dev,stats] = glmfit(...)
Description b = glmfit(x,y,'distr') fit s t he gener alized linear model for r esponse Y,
pr edict or var iable mat r ix X, and dist r ibut ion 'distr'. The following
dist r ibut ions ar e available: 'binomial', 'gamma', 'inverse gaussian',
'lognormal', 'normal' (t he default ), and 'poisson'. In most cases Y is a
vect or of r esponse measur ement s, but for t he binomial dist r ibut ion Y is a
t wo-column ar r ay having t he measur ed number of count s in t he fir st column
and t he number of t r ials (t he binomial N par amet er ) in t he second column. X is
a mat r ix having t he same number of r ows as Y and cont aining t he values of t he
pr edict or var iables for each obser vat ion. The out put b is a vect or of coefficient
est imat es. This synt ax uses t he canonical link (see below) t o r elat e t he
dist r ibut ion par amet er t o t he pr edict or s.
b = glmfit(x,y,'distr','link','estdisp',offset,pwts,'const')
pr ovides addit ional cont r ol over t he fit . The 'link' ar gument specifies t he
r elat ionship bet ween t he dist r ibut ion par amet er () and t he fit t ed linear
combinat ion of pr edict or var iables (xb). In most cases 'link' is one of t he
following:
link Meaning Default (Canonical) Link
'identity' = xb 'normal'
'log' log() = xb 'poisson'
'logit' log( / (1-)) = xb 'binomial'
'probit' nor minv() = xb
'comploglog' log(-log(1-)) = xb
'logloglink' log(-log()) = xb
'reciprocal' 1/ = xb 'gamma'
p (a number )
p
= xb 'inverse gaussian' (wit h p=-2)
glmfit
2-131
Alt er nat ively, you can wr it e funct ions t o define your own cust om link. You
specify t he link ar gument as a t hr ee-element cell ar r ay cont aining funct ions
t hat define t he link funct ion, it s der ivat ive, and it s inver se. For example,
suppose you want t o define a r ecipr ocal squar e r oot link using inline funct ions.
You could define t he var iable mylinks t o use as your 'link' ar gument by
wr it ing:
FL = inline('x.^-.5')
FD = inline('-.5*x.^-1.5')
FI = inline('x.^-2')
mylinks = {FL FI FD}
Alt er nat ively, you could define funct ions named FL, FD, and FI in t heir own
M-files, and t hen specify mylinks in t he for m
mylinks = {@FL @FD @FI}
The 'estdisp' ar gument can be 'on' t o est imat e a disper sion par amet er for
t he binomial or Poisson dist r ibut ion, or 'off' (t he default ) t o use t he
t heor et ical value of 1.0 for t hose dist r ibut ions. The glmfit funct ion always
est imat es disper sion par amet er s for ot her dist r ibut ions.
The offset and pwts par amet er s can be vect or s of t he same lengt h as Y, or can
be omit t ed (or specified as an empt y vect or ). The offset vect or is a special
pr edict or var iable whose coefficient is known t o be 1.0. As an example, suppose
t hat you ar e modeling t he number of defect s on var ious sur faces, and you want
t o const r uct a model in which t he expect ed number of defect s is pr opor t ional t o
t he sur face ar ea. You might use t he number of defect s as your r esponse, along
wit h t he Poisson dist r ibut ion, t he log link funct ion, and t he log sur face ar ea as
an offset .
The pwts ar gument is a vect or of pr ior weight s. As an example, if t he r esponse
value Y(i) is t he aver age of f(i) measur ement s, you could use f as a vect or of
pr ior weight s.
The 'const' ar gument can be 'on' (t he default ) t o est imat e a const ant t er m,
or 'off' t o omit t he const ant t er m. If you want t he const ant t er m, use t his
ar gument r at her t han specifying a column of ones in t he X mat r ix.
[b,dev,stats] = glmfit(...) r et ur ns t he addit ional out put s dev and stats.
dev is t he deviance at t he solut ion vect or . The deviance is a gener alizat ion of
t he r esidual sum of squar es. It is possible t o per for m an analysis of deviance t o
glmfit
2-132
compar e sever al models, each a subset of t he ot her , and t o t est whet her t he
model wit h mor e t er ms is significant ly bet t er t han t he model wit h fewer t er ms.
stats is a st r uct ur e wit h t he following fields:
stats.dfe = degr ees of fr eedom for er r or
stats.s = t heor et ical or est imat ed disper sion par amet er
stats.sfit = est imat ed disper sion par amet er
stats.estdisp = 1 if disper sion is est imat ed, 0 if fixed
stats.beta = vect or of coefficient est imat es (same as b)
stats.se = vect or of st andar d er r or s of t he coefficient est imat es b
stats.coeffcorr = cor r elat ion mat r ix for b
stats.t = t st at ist ics for b
stats.p = p-values for b
stats.resid = vect or of r esiduals
stats.residp = vect or of Pear son r esiduals
stats.residd = vect or of deviance r esiduals
stats.resida = vect or of Anscombe r esiduals
If you est imat e a disper sion par amet er for t he binomial or Poisson dist r ibut ion,
t hen stats.s is set equal t o stats.sfit. Also, t he element s of stats.se differ
by t he fact or stats.s fr om t heir t heor et ical values.
Example We have dat a on car s weighing bet ween 2100 and 4300 pounds. For each car
weight we have t he t ot al number of car s of t hat weight , and t he number t hat
can be consider ed t o get poor mileage accor ding t o some t est . For example, 8
out of 21 car s weighing 3100 pounds get poor mileage accor ding t o a
measur ement of t he miles t hey can t r avel on a gallon of gasoline.
w = (2100:200:4300)';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
We can compar e sever al fit s t o t hese dat a. Fir st , let s t r y fit t ing logit and pr obit
models:
[bl,dl,sl] = glmfit(w,[poor total],'binomial');
[bp,dp,sp] = glmfit(w,[poor total],'binomial','probit');
glmfit
2-133
dl
dl =
6.4842
dp
dp =
7.5693
The deviance for t he logit model is smaller t han for t he pr obit model. Alt hough
t his is not a for mal t est , it leads us t o pr efer t he logit model.
We can do a for mal t est compar ing t wo logit models. We alr eady fit one model
using w as a linear pr edict or . Let s fit anot her logit model using bot h linear and
squar ed t er ms in w. If t her e is no t r ue effect for t he squar ed t er m, t he differ ence
in t heir deviances should be small compar ed wit h a chi-squar e dist r ibut ion
having one degr ee of fr eedom.
[b2,d2,s2] = glmfit([w w.^2],[poor total],'binomial')
dl-d2
ans =
0.7027
chi2cdf(dl-d2,1)
ans =
0.5981
A differ ence of 0.7072 is not at all unusual for a chi-squar e dist r ibut ion wit h
one degr ee of fr eedom, so t he quadr at ic model does not give a significant ly
bet t er fit t han t he simpler linear model.
The following ar e t he coefficient est imat es, t heir st andar d er r or s, t -st at ist ics,
and p-values for t he linear model:
[b sl.se sl.t sl.p]
ans =
-13.3801 1.3940 -9.5986 0.0000
0.0042 0.0004 9.4474 0.0000
glmfit
2-134
This shows t hat we cannot simplify t he model any fur t her . Bot h t he int er cept
and slope coefficient s ar e significant ly differ ent fr om 0, as indicat ed by
p-values t hat ar e 0.0000 t o four decimal places.
See Also glmval, glmdemo, nlinfit, regress, regstats
References Dobson, A. J . An Introduction to Generalized Linear Models. 1990, CRC Pr ess.
MuCullagh, P. and J . A. Nelder . Generalized Linear Models. 2nd edit ion, 1990,
Chapman and Hall.
glmval
2-135
2glmval
Purpose Comput e pr edict ions for gener alized linear model.
Syntax yfit = glmval(b,X,'link')
[yfit,dlo,dhi] = glmval(b,X,'link',stats,clev)
[yfit,dlo,dhi] = glmval(b,X,'link',stats,clev,N,offset,'const')
Description yfit = glmval(b,X,'link') comput es t he pr edict ed dist r ibut ion par amet er s
for obser vat ions wit h pr edict or values X using t he coefficient vect or b and link
funct ion 'link'. Typically, b is a vect or of coefficient est imat es comput ed by
t he glmfit funct ion. The value of 'link' must be t he same as t hat used in
glmfit. The r esult yfit is t he value of t he inver se of t he link funct ion at t he
linear combinat ion X*b.
[yfit,dlo,dhi] = glmval(b,X,'link',stats,clev) r et ur ns confidence
bounds for t he pr edict ed values when you supply t he stats st r uct ur e r et ur ned
fr om glmfit, and opt ionally specify a confidence level as t he clev ar gument .
(The default confidence level is 0.95 for 95% confidence.) The int er val
[yfit-dlo, yfit+dhi] is a confidence bound for t he t r ue par amet er value at
t he specified X values.
[yhat,dlo,dhi] = glmval(beta,X,'link',stats,clev,N,offset,'const')
specifies t hr ee addit ional ar gument s t hat may be needed if you used cer t ain
ar gument s t o glmfit. If you fit a binomial dist r ibut ion using glmfit, specify N
as t he value of t he binomial N par amet er for t he pr edict ions. If you included an
offset var iable, specify offset as t he new value of t his var iable. Use t he same
'const' value ('on' or 'off') t hat you used wit h glmfit.
Example Let s model t he number of car s wit h poor gasoline mileage using t he binomial
dist r ibut ion. Fir st we use t he binomial dist r ibut ion wit h t he default logit link
t o model t he pr obabilit y of having poor mileage as a funct ion of t he weight and
squar ed weight of t he car s. Then we comput e a vect or wnew of new car weight s
at which we want t o make pr edict ions. Next we comput e t he expect ed number
of car s, out of a t ot al of 30 car s of each weight , t hat would have poor mileage.
Finally we gr aph t he pr edict ed values and 95% confidence bounds as a funct ion
of weight .
w = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
glmval
2-136
[b2,d2,s2] = glmfit([w w.^2],[poor total],'binomial')
wnew = (3000:100:4000)';
[yfit,dlo,dhi] = glmval(b2,[wnew wnew.^2],'logit',s2,0.95,30)
errorbar(wnew,yfit,dlo,dhi);
See Also glmfit, glmdemo
2800 3000 3200 3400 3600 3800 4000 4200
5
10
15
20
25
30
gname
2-137
2gname
Purpose Label plot t ed point s wit h t heir case names or case number .
Syntax gname('cases')
gname
h = gname('cases',line_handle)
Description gname('cases') displays a figur e window, displays cr oss-hair s, and wait s for
a mouse but t on or keyboar d key t o be pr essed. Posit ion t he cr oss-hair wit h t he
mouse and click once near each point t o label t hat point . Input 'cases' is a
st r ing mat r ix wit h each r ow t he case name of a dat a point . You can also click
and dr ag a select ion r ect angle t o label all point s wit hin t he r ect angle. When
you ar e done, pr ess t he Enter or Escape key.
gname wit h no ar gument s labels each case wit h it s case number .
h = gname('cases',line_handle) r et ur ns a vect or of handles t o t he t ext
object s on t he plot . Use t he scalar line_handle t o ident ify t he cor r ect line if
t her e is mor e t han one line object on t he plot .
You can use gname t o label plot s cr eat ed by t he plot, scatter, gscatter,
plotmatrix, and gplotmatrix funct ions.
Example Let s use t he cit y r at ings dat a set s t o find out which cit ies ar e t he best and
wor st for educat ion and t he ar t s. We cr eat e a gr aph, call t he gname funct ion,
and click on t he point s at t he ext r eme left and at t he t op.
load cities
education = ratings(:,6);
arts = ratings(:,7);
plot(education,arts,'+')
gname(names)
gname
2-138
See Also gplotmatrix, gscatter, gtext, plot, plotmatrix, scatter
1500 2000 2500 3000 3500 4000
0
1
2
3
4
5
6
x 10
4
Pascagoula, MS
New York, NY
gplotmatrix
2-139
2gplot mat r ix
Purpose Plot mat r ix of scat t er plot s by gr oup.
Syntax gplotmatrix(x,y,g)
gplotmatrix(x,y,g,'clr','sym',siz)
gplotmatrix(x,y,g,'clr','sym',siz,'doleg')
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt')
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt','xnam','ynam')
[h,ax,bigax] = gplotmatrix(...)
Description gplotmatrix(x,y,g) cr eat es a mat r ix of scat t er plot s. Each individual set of
axes in t he r esult ing figur e cont ains a scat t er plot of a column of x against a
column of y. All plot s ar e gr ouped by t he gr ouping var iable g.
x and y ar e mat r ices wit h t he same number of r ows. If x has p columns and y
has q columns, t he figur e cont ains a p-by-q mat r ix of scat t er plot s. If you omit
y or specify it as t he empt y mat r ix, [], gplotmatrix cr eat es a squar e mat r ix of
scat t er plot s of columns of x against each ot her .
g is a gr ouping var iable t hat can be a vect or , st r ing ar r ay, or cell ar r ay of
st r ings. g must have t he same number of r ows as x and y. Point s wit h t he same
value of g ar e placed in t he same gr oup, and appear on t he gr aph wit h t he same
mar ker and color . Alt er nat ively, g can be a cell ar r ay cont aining sever al
gr ouping var iables (such as {G1 G2 G3}); in t hat case, obser vat ions ar e in t he
same gr oup if t hey have common values of all gr ouping var iables.
gplotmatrix(x,y,g,'clr','sym',siz) specifies t he color , mar ker t ype, and
size for each gr oup. clr is a st r ing ar r ay of color s r ecognized by t he plot
funct ion. The default is 'clr' = 'bgrcmyk'. 'sym' is a st r ing ar r ay of symbols
r ecognized by t he plot command, wit h t he default value '.'. siz is a vect or of
sizes, wit h t he default det er mined by t he 'defaultlinemarkersize' pr oper t y.
If you do not specify enough values for all gr oups, gplotmatrix cycles t hr ough
t he specified values as needed.
gplotmatrix(x,y,g,'clr','sym',siz,'doleg') cont r ols whet her a legend is
displayed on t he gr aph ('doleg' = 'on', t he default ) or not ('doleg' = 'off').
gplotmatrix
2-140
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt') cont r ols what
appear s along t he diagonal of a plot mat r ix of x ver sus x. Allowable values ar e
'none' t o leave t he diagonals blank, 'hist' (t he default ) t o plot hist ogr ams, or
'variable' t o wr it e t he var iable names.
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt','xnam','ynam')
specifies t he names of t he columns in t he x and y ar r ays. These names ar e used
t o label t he x- and y-axes. 'xnam' and 'ynam' must be char act er ar r ays wit h
one r ow for each column of x and y, r espect ively.
[h,ax,bigax] = gplotmatrix(...) r et ur ns t hr ee ar r ays of handles. h is an
ar r ay of handles t o t he lines on t he gr aphs. ax is a mat r ix of handles t o t he axes
of t he individual plot s. bigax is a handle t o big (invisible) axes fr aming t he
ent ir e plot mat r ix. These ar e left as t he cur r ent axes, so a subsequent title,
xlabel, or ylabel command will pr oduce labels t hat ar e cent er ed wit h r espect
t o t he ent ir e plot mat r ix.
Example Load t he cities dat a. The ratings ar r ay has r at ings of t he cit ies in nine
cat egor ies (cat egor y names ar e in t he ar r ay categories). group is a code whose
value is 2 for t he lar gest cit ies. We can make scat t er plot s of t he fir st t hr ee
cat egor ies against t he ot her four , gr ouped by t he cit y size code.
load discrim
gplotmatrix(ratings(:,1:3),ratings(:,4:7),group)
The out put figur e (not shown) has an ar r ay of gr aphs wit h each cit y gr oup
r epr esent ed by a differ ent color . The gr aphs ar e a lit t le easier t o r ead if we
specify color s and plot t ing symbols, label t he axes wit h t he r at ing cat egor ies,
and move t he legend off t he gr aphs.
gplotmatrix(ratings(:,1:3),ratings(:,4:7),group,...
'br','.o',[],'on','',categories(1:3,:),...
categories(4:7,:))
gplotmatrix
2-141
See Also grpstats, gscatter, plotmatrix
0 2000 4000 6000 8000
health
0.5 1 1.5 2
x 10
4 housing
200 400 600 800
0
2
4
x 10
4
climate
a
r
t
s
2000
2500
3000
3500
e
d
u
c
a
t
i
o
n
2000
4000
6000
8000
t
r
a
n
s
p
o
r
t
a
t
i
o
n
500
1000
1500
2000
2500
c
r
i
m
e
1
2
grpstats
2-142
2gr pst at s
Purpose Summar y st at ist ics by gr oup.
Syntax means = grpstats(X,group)
[means,sem,counts,name] = grpstats(X,group)
grpstats(x,group,alpha)
Description means = grpstats(X,group) r et ur ns t he means of each column of X by group,
wher e X is a mat r ix of obser vat ions. group is an ar r ay t hat defines t he gr ouping
such t hat t wo element s of X ar e in t he same gr oup if t heir cor r esponding group
values ar e t he same. The gr ouping var iable group can be a vect or , st r ing ar r ay,
or cell ar r ay of st r ings. It can also be a cell ar r ay cont aining sever al gr ouping
var iables (such as {G1 G2 G3}); in t hat case obser vat ions ar e in t he same gr oup
if t hey have common values of all gr ouping var iables.
[means,sem,counts,name] = grpstats(x,group,alpha) supplies t he
st andar d er r or of t he mean in sem, t he number of element s in each gr oup in
counts, and t he name of each gr oup in name. name is useful t o ident ify and label
t he gr oups when t he input group values ar e not simple gr oup number s.
grpstats(x,group,alpha) plot s 100(1-alpha)% confidence int er vals ar ound
each mean.
Example We assign 100 obser vat ions t o one of four gr oups. For each obser vat ion we
measur e five quant it ies wit h true means fr om 1 t o 5. grpstats allows us t o
comput e t he means for each gr oup.
group = unidrnd(4,100,1);
true_mean = 1:5;
true_mean = true_mean(ones(100,1),:);
x = normrnd(true_mean,1);
means = grpstats(x,group)
means =
0.7947 2.0908 2.8969 3.6749 4.6555
0.9377 1.7600 3.0285 3.9484 4.8169
1.0549 2.0255 2.8793 4.0799 5.3740
0.7107 1.9264 2.8232 3.8815 4.9689
See Also tabulate, crosstab
gscatter
2-143
2gscat t er
Purpose Scat t er plot by gr oup.
Syntax gscatter(x,y,g)
gscatter(x,y,g,'clr','sym',siz)
gscatter(x,y,g,'clr','sym',siz,'doleg')
gscatter(x,y,g,'clr','sym',siz,'doleg','xnam','ynam')
h = gscatter(...)
Description gscatter(x,y,g) cr eat es a scat t er plot of x and y, gr ouped by g, wher e x and y
ar e vect or s wit h t he same size and g can be a vect or , st r ing ar r ay, or cell ar r ay
of st r ings. Point s wit h t he same value of g ar e placed in t he same gr oup, and
appear on t he gr aph wit h t he same mar ker and color . Alt er nat ively, g can be a
cell ar r ay cont aining sever al gr ouping var iables (such as {G1 G2 G3}); in t hat
case, obser vat ions ar e in t he same gr oup if t hey have common values of all
gr ouping var iables.
gscatter(x,y,g,'clr','sym',siz) specifies t he color , mar ker t ype, and size
for each gr oup. 'clr' is a st r ing ar r ay of color s r ecognized by t he plot funct ion.
The default is 'clr' = 'bgrcmyk'. 'sym' is a st r ing ar r ay of symbols r ecognized
by t he plot command, wit h t he default value '.'. siz is a vect or of sizes, wit h
t he default det er mined by t he 'defaultlinemarkersize' pr oper t y. If you do
not specify enough values for all gr oups, gscatter cycles t hr ough t he specified
values as needed.
gscatter(x,y,g,'clr','sym',siz,'doleg') cont r ols whet her a legend is
displayed on t he gr aph ('doleg' = 'on', t he default ) or not ('doleg' = 'off').
gscatter(x,y,g,'clr','sym',siz,'doleg','xnam','ynam') specifies t he
name t o use for t he x-axis and y-axis labels. If t he x and y input s ar e simple
var iable names and xnam and ynam ar e omit t ed, gscatter labels t he axes wit h
t he var iable names.
h = gscatter(...) r et ur ns an ar r ay of handles t o t he lines on t he gr aph.
Example Load t he cities dat a and look at t he r elat ionship bet ween t he r at ings for
climat e (fir st column) and housing (second column) gr ouped by cit y size. Well
also specify t he color s and plot t ing symbols.
gscatter
2-144
load discrim
gscatter(ratings(:,1),ratings(:,2),group,'br','xo')
See Also gplotmatrix, grpstats, scatter
100 200 300 400 500 600 700 800 900 1000
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
x 10
4
1
2
harmmean
2-145
2har mmean
Purpose Har monic mean of a sample of dat a.
Syntax m = harmmean(X)
Description m = harmmean(X) calculat es t he har monic mean of a sample. For vect or s,
harmmean(x) is t he har monic mean of t he element s in x. For mat r ices,
harmmean(X) is a r ow vect or cont aining t he har monic means of each column.
The har monic mean is
Examples The sample aver age is gr eat er t han or equal t o t he har monic mean.
x = exprnd(1,10,6);
harmonic = harmmean(x)
harmonic =
0.3382 0.3200 0.3710 0.0540 0.4936 0.0907
average = mean(x)
average =
1.3509 1.1583 0.9741 0.5319 1.0088 0.8122
See Also mean, median, geomean, trimmean
m
n
1
x
i
----
i 1 =
n

--------------- =
hist
2-146
2hist
Purpose Plot hist ogr ams.
Syntax hist(y)
hist(y,nb)
hist(y,x)
[n,x] = hist(y,...)
Description hist(y) dr aws a 10-bin hist ogr am for t he dat a in vect or y. The bins ar e equally
spaced bet ween t he minimum and maximum values in y.
hist(y,nb) dr aws a hist ogr am wit h nb bins.
hist(y,x) dr aws a hist ogr am using t he bins in t he vect or x.
[n,x] = hist(y,...) do not dr aw gr aphs, but r et ur n vect or s n and x
cont aining t he fr equency count s and t he bin locat ions such t hat bar(x,n) plot s
t he hist ogr am. This is useful in sit uat ions wher e mor e cont r ol is needed over
t he appear ance of a gr aph, for example, t o combine a hist ogr am int o a mor e
elabor at e plot st at ement .
The hist funct ion is a par t of t he st andar d MATLAB language.
Examples Gener at e bell-cur ve hist ogr ams fr om Gaussian dat a.
x = -2.9:0.1:2.9;
y = normrnd(0,1,1000,1);
hist(y,x)
-3 -2 -1 0 1 2 3
0
10
20
30
40
50
histfit
2-147
2hist fit
Purpose Hist ogr am wit h super imposed nor mal densit y.
Syntax histfit(data)
histfit(data,nbins)
h = histfit(data,nbins)
Description histfit(data,nbins) plot s a hist ogr am of t he values in t he vect or data using
nbins bar s in t he hist ogr am. Wit h nbins is omit t ed, it s value is set t o t he
squar e r oot of t he number of element s in data.
h = histfit(data,nbins) r et ur ns a vect or of handles t o t he plot t ed lines,
wher e h(1) is t he handle t o t he hist ogr am, h(2) is t he handle t o t he densit y
cur ve.
Example r = normrnd(10,1,100,1);
histfit(r)
See Also hist, normfit
7 8 9 10 11 12 13
0
5
10
15
20
25
hougen
2-148
2hougen
Purpose Hougen-Wat son model for r eact ion kinet ics.
Syntax yhat = hougen(beta,x)
Description yhat = hougen(beta,x) r et ur ns t he pr edict ed values of t he r eact ion r at e,
yhat, as a funct ion of t he vect or of par amet er s, beta, and t he mat r ix of dat a, X.
beta must have 5 element s and X must have t hr ee columns.
hougen is a ut ilit y funct ion for rsmdemo.
The model for m is:
Reference Bat es, D., and D. Wat t s. Nonlinear Regression Analysis and Its Applications.
Wiley 1988. p. 271272.
See Also rsmdemo
y

1
x
2
x
3

5

1
2
x
1

3
x
2

4
x
3
+ + +
----------------------------------------------------------- =
hygecdf
2-149
2hygecdf
Purpose Hyper geomet r ic cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = hygecdf(X,M,K,N)
Description hygecdf(X,M,K,N) comput es t he hyper geomet r ic cdf at each of t he values in X
using t he cor r esponding par amet er s in M, K, and N. Vect or or mat r ix input s for
X, M, K, and N must all have t he same size. A scalar input is expanded t o a
const ant mat r ix wit h t he same dimensions as t he ot her input s.
The hyper geomet r ic cdf is
The r esult , p, is t he pr obabilit y of dr awing up t o x of a possible K it ems in N
dr awings wit hout r eplacement fr om a gr oup of M object s.
Examples Suppose you have a lot of 100 floppy disks and you know t hat 20 of t hem ar e
defect ive. What is t he pr obabilit y of dr awing zer o t o t wo defect ive floppies if
you select 10 at r andom?
p = hygecdf(2,100,20,10)
p =
0.6812
See Also cdf, hygeinv, hygepdf, hygernd, hygestat
p F x M K N , , ( )
K
i
,
_
M K
N i
,
_
M
N ,
_
-------------------------------
i 0 =
x

= =
hygeinv
2-150
2hygeinv
Purpose Inver se of t he hyper geomet r ic cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = hygeinv(P,M,K,N)
Description hygeinv(P,M,K,N) r et ur ns t he smallest int eger X such t hat t he
hyper geomet r ic cdf evaluat ed at X equals or exceeds P. You can t hink of P as t he
pr obabilit y of obser ving X defect ive it ems in N dr awings wit hout r eplacement
fr om a gr oup of M it ems wher e K ar e defect ive.
Examples Suppose you ar e t he Qualit y Assur ance manager for a floppy disk
manufact ur er . The pr oduct ion line t ur ns out floppy disks in bat ches of 1,000.
You want t o sample 50 disks fr om each bat ch t o see if t hey have defect s. You
want t o accept 99% of t he bat ches if t her e ar e no mor e t han 10 defect ive disks
in t he bat ch. What is t he maximum number of defect ive disks should you allow
in your sample of 50?
x = hygeinv(0.99,1000,10,50)
x =
3
What is t he median number of defect ive floppy disks in samples of 50 disks
fr om bat ches wit h 10 defect ive disks?
x = hygeinv(0.50,1000,10,50)
x =
0
See Also hygecdf, hygepdf, hygernd, hygestat, icdf
hygepdf
2-151
2hygepdf
Purpose Hyper geomet r ic pr obabilit y densit y funct ion (pdf).
Syntax Y = hygepdf(X,M,K,N)
Description Y = hygecdf(X,M,K,N) comput es t he hyper geomet r ic pdf at each of t he values
in X using t he cor r esponding par amet er s in M, K, and N. Vect or or mat r ix input s
for X, M, K, and N must all have t he same size. A scalar input is expanded t o a
const ant mat r ix wit h t he same dimensions as t he ot her input s.
The par amet er s in M, K, and N must all be posit ive int eger s, wit h N M. The
values in X must be less t han or equal t o all t he par amet er values.
The hyper geomet r ic pdf is
The r esult , y, is t he pr obabilit y of dr awing exact ly x of a possible K it ems in n
dr awings wit hout r eplacement fr om a gr oup of M object s.
Examples Suppose you have a lot of 100 floppy disks and you know t hat 20 of t hem ar e
defect ive. What is t he pr obabilit y of dr awing 0 t hr ough 5 defect ive floppy disks
if you select 10 at r andom?
p = hygepdf(0:5,100,20,10)
p =
0.0951 0.2679 0.3182 0.2092 0.0841 0.0215
See Also hygecdf, hygeinv, hygernd, hygestat, pdf
y f x M K N , , ( )
K
x
,
_
M K
N x
,
_
M
N ,
_
------------------------------- = =
hygernd
2-152
2hyger nd
Purpose Random number s fr om t he hyper geomet r ic dist r ibut ion.
Syntax R = hygernd(M,K,N)
R = hygernd(M,K,N,mm)
R = hygernd(M,K,N,mm,nn)
Description R = hygernd(M,K,N) gener at es hyper geomet r ic r andom number s wit h
par amet er s M, K, and N. Vect or or mat r ix input s for M, K, and N must have t he
same size, which is also t he size of R. A scalar input for M, K, or N is expanded t o
a const ant mat r ix wit h t he same dimensions as t he ot her input s.
R = hygernd(M,K,N,mm) gener at es hyper geomet r ic r andom number s wit h
par amet er s M, K, and N, wher e mm is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = hygernd(M,K,N,mm,nn) gener at es hyper geomet r ic r andom number s wit h
par amet er s M, K, and N, wher e scalar s mm and nn ar e t he r ow and column
dimensions of R.
Examples numbers = hygernd(1000,40,50)
numbers =
1
See Also hygecdf, hygeinv, hygepdf, hygestat
hygestat
2-153
2hygest at
Purpose Mean and var iance for t he hyper geomet r ic dist r ibut ion.
Syntax [MN,V] = hygestat(M,K,N)
Description [MN,V] = hygestat(M,K,N) r et ur ns t he mean and var iance for t he
hyper geomet r ic dist r ibut ion wit h par amet er s specified by M, K, and N. Vect or or
mat r ix input s for M, K, and N must have t he same size, which is also t he size of
MN and V. A scalar input for M, K, or N is expanded t o a const ant mat r ix wit h t he
same dimensions as t he ot her input s.
The mean of t he hyper geomet r ic dist r ibut ion wit h par amet er s M, K, and N is
NK/M, and t he var iance is
Examples The hyper geomet r ic dist r ibut ion appr oaches t he binomial dist r ibut ion, wher e
p = K / M as M goes t o infinit y.
[m,v] = hygestat(10.^(1:4),10.^(0:3),9)
m =
0.9000 0.9000 0.9000 0.9000
v =
0.0900 0.7445 0.8035 0.8094
[m,v] = binostat(9,0.1)
m =
0.9000
v =
0.8100
See Also hygecdf, hygeinv, hygepdf, hygernd
N
K
M
-----
M K
M
----------------
M N
M 1
----------------
icdf
2-154
2icdf
Purpose Inver se of a specified cumulat ive dist r ibut ion funct ion (icdf).
Syntax X = icdf('name',P,A1,A2,A3)
Description X = icdf('name',P,A1,A2,A3) r et ur ns a mat r ix of cr it ical values, X, wher e
'name' is a st r ing cont aining t he name of t he dist r ibut ion. P is a mat r ix of
pr obabilit ies, and A, B, and C ar e mat r ices of dist r ibut ion par amet er s.
Depending on t he dist r ibut ion some of t he par amet er s may not be necessar y.
Vect or or mat r ix input s for P, A1, A2, and A3 must all have t he same size. A
scalar input is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input s.
icdf is a ut ilit y r out ine allowing you t o access all t he inver se cdfs in t he
St at ist ics Toolbox using t he name of t he dist r ibut ion as a par amet er . See
Over view of t he Dist r ibut ions on page 1-12 for t he list of available
dist r ibut ions.
Examples x = icdf('Normal',0.1:0.2:0.9,0,1)
x =
-1.2816 -0.5244 0 0.5244 1.2816
x = icdf('Poisson',0.1:0.2:0.9,1:5)
x =
1 1 3 5 8
See Also betainv, binoinv, cdf, chi2inv, expinv, finv, gaminv, geoinv, hygeinv,
logninv, nbininv, ncfinv, nctinv, ncx2inv, norminv, pdf, poissinv, random,
raylinv, tinv, unidinv, unifinv, weibinv
inconsistent
2-155
2inconsist ent
Purpose Calculat e t he inconsist ency coefficient of a clust er t r ee.
Syntax Y = inconsistent(Z)
Y = inconsistent(Z,d)
Description Y = inconsistent(Z) comput es t he inconsist ency coefficient for each link of
t he hier ar chical clust er t r ee Z, wher e Z is an (m-1)-by-3 mat r ix gener at ed by t he
linkage funct ion. The inconsist ency coefficient char act er izes each link in a
clust er t r ee by compar ing it s lengt h wit h t he aver age lengt h of ot her links at
t he same level of t he hier ar chy. The higher t he value of t his coefficient , t he less
similar t he object s connect ed by t he link.
Y = inconsistent(Z,d) comput es t he inconsist ency coefficient for each link
in t he hier ar chical clust er t r ee Z t o dept h d, wher e d is an int eger denot ing t he
number of levels of t he clust er t r ee t hat ar e included in t he calculat ion. By
default , d=2.
The out put , Y, is an (m-1)-by-4 mat r ix for mat t ed as follows.
For each link, k, t he inconsist ency coefficient is calculat ed as:
For leaf nodes, nodes t hat have no fur t her nodes under t hem, t he inconsist ency
coefficient is set t o 0.
Column Description
1 Mean of t he lengt hs of all t he links included in t he calculat ion.
2 St andar d deviat ion of all t he links included in t he calculat ion.
3 Number of links included in t he calculat ion.
4 Inconsist ency coefficient .
Y k 4 , ( ) z k 3 , ( ) Y k 1 , ( ) ( ) Y k 2 , ( ) =
inconsistent
2-156
Example rand('seed',12);
X = rand(10,2);
Y = pdist(X);
Z = linkage(Y,'centroid');
W = inconsistent(Z,3)
W =
0.0423 0 1.0000 0
0.1406 0 1.0000 0
0.1163 0.1047 2.0000 0.7071
0.2101 0 1.0000 0
0.2054 0.0886 3.0000 0.6792
0.1742 0.1762 3.0000 0.6568
0.2336 0.1317 4.0000 0.6408
0.3081 0.2109 5.0000 0.7989
0.4610 0.3728 4.0000 0.8004
See Also cluster, cophenet, clusterdata, dendrogram, linkage, pdist, squareform
iqr
2-157
2iqr
Purpose Int er quar t ile r ange (IQR) of a sample.
Syntax y = iqr(X)
Description y = iqr(X) comput es t he differ ence bet ween t he 75t h and t he 25t h per cent iles
of t he sample in X. The IQR is a r obust est imat e of t he spr ead of t he dat a, since
changes in t he upper and lower 25% of t he dat a do not affect it .
If t her e ar e out lier s in t he dat a, t hen t he IQR is mor e r epr esent at ive t han t he
st andar d deviat ion as an est imat e of t he spr ead of t he body of t he dat a. The
IQR is less efficient t han t he st andar d deviat ion as an est imat e of t he spr ead
when t he dat a is all fr om t he nor mal dist r ibut ion.
Mult iply t he IQR by 0.7413 t o est imat e (t he second par amet er of t he nor mal
dist r ibut ion.)
Examples This Mont e Car lo simulat ion shows t he r elat ive efficiency of t he IQR t o t he
sample st andar d deviat ion for nor mal dat a.
x = normrnd(0,1,100,100);
s = std(x);
s_IQR = 0.7413 iqr(x);
efficiency = (norm(s - 1)./norm(s_IQR - 1)).^2
efficiency =
0.3297
See Also std, mad, range
jbtest
2-158
2jbt est
Purpose J ar que-Ber a t est for goodness-of-fit t o a nor mal dist r ibut ion.
Syntax H = jbtest(X)
H = jbtest(X,alpha)
[H,P,JBSTAT,CV] = jbtest(X,alpha)
Description H = jbtest(X) per for ms t he J ar que-Ber a t est on t he input dat a vect or X and
r et ur ns H, t he r esult of t he hypot hesis t est . The r esult is H=1 if we can r eject t he
hypot hesis t hat X has a nor mal dist r ibut ion, or H=0 if we cannot r eject t hat
hypot hesis. We r eject t he hypot hesis if t he t est is significant at t he 5% level.
The J ar que-Ber a t est evaluat es t he hypot hesis t hat X has a nor mal dist r ibut ion
wit h unspecified mean and var iance, against t he alt er nat ive t hat X does not
have a nor mal dist r ibut ion. The t est is based on t he sample skewness and
kur t osis of X. For a t r ue nor mal dist r ibut ion, t he sample skewness should be
near 0 and t he sample kur t osis should be near 3. The J ar que-Ber a t est
det er mines whet her t he sample skewness and kur t osis ar e unusually differ ent
t han t heir expect ed values, as measur ed by a chi-squar e st at ist ic.
The J ar que-Ber a t est is an asympt ot ic t est , and should not be used wit h small
samples. You may want t o use lillietest in place of jbtest for small samples.
H = jbtest(X,alpha) per for ms t he J ar que-Ber a t est at t he 100*alpha% level
r at her t han t he 5% level, wher e alpha must be bet ween 0 and 1.
[H,P,JBSTAT,CV] = jbtest(X,alpha) r et ur ns t hr ee addit ional out put s. P is
t he p-value of t he t est , JBSTAT is t he value of t he t est st at ist ic, and CV is t he
cr it ical value for det er mining whet her t o r eject t he null hypot hesis.
Example We can use jbtest t o det er mine if car weight s follow a nor mal dist r ibut ion.
load carsmall
[h,p,j] = jbtest(Weight)
jbtest
2-159
h =
1
p =
0.026718
j =
7.2448
Wit h a p-value of 2.67%, we r eject t he hypot hesis t hat t he dist r ibut ion is
nor mal. Wit h a log t r ansfor mat ion, t he dist r ibut ion becomes closer t o nor mal
but is st ill significant ly differ ent at t he 5% level.
[h,p,j] = jbtest(log(Weight))
h =
1
p =
0.043474
j =
6.2712
See lillietest for a differ ent t est of t he same hypot hesis.
Reference J udge, G. G., R. C. Hill, W. E. Gr iffit hs, H. Lut kepohl, and T.-C. Lee.
Introduction to the Theory and Practice of Econometrics. New Yor k, Wiley.
See Also hist, kstest2, lillietest
kruskalwallis
2-160
2kr uskalwallis
Purpose Kr uskal-Wallis nonpar amet r ic one-way Analysis of Var iance (ANOVA).
Syntax p = kruskalwallis(X)
p = kruskalwallis(X,group)
p = kruskalwallis(X,group,'displayopt')
[p,table] = kruskalwallis(...)
[p,table,stats] = kruskalwallis(...)
Description p = kruskalwallis(X) per for ms a Kr uskal-Wallis t est for compar ing t he
means of columns of t he m-by-n mat r ix X, wher e each column r epr esent s an
independent sample cont aining m mut ually independent obser vat ions. The
Kr uskal-Wallis t est is a nonpar amet r ic ver sion of t he classical one-way
ANOVA. The funct ion r et ur ns t he p-value for t he null hypot hesis t hat all
samples in X ar e dr awn fr om t he same populat ion (or fr om differ ent
populat ions wit h t he same mean).
If t he p-value is near zer o, t his cast s doubt on t he null hypot hesis and suggest s
t hat at least one sample mean is significant ly differ ent t han t he ot her sample
means. The choice of a cr it ical p-value t o det er mine whet her t he r esult is
judged st at ist ically significant is left t o t he r esear cher . It is common t o
declar e a r esult significant if t he p-value is less t han 0.05 or 0.01.
The kruskalwallis funct ion displays t wo figur es. The fir st figur e is a st andar d
ANOVA t able, calculat ed using t he r anks of t he dat a r at her t han t heir numer ic
values. Ranks ar e found by or der ing t he dat a fr om smallest t o lar gest acr oss all
gr oups, and t aking t he numer ic index of t his or der ing. The r ank for a t ied
obser vat ion is equal t o t he aver age r ank of all obser vat ions t ied wit h it . For
example, t he following t able shows t he r anks for a small sample.
The ent r ies in t he ANOVA t able ar e t he usual sums of squar es, degr ees of
fr eedom, and ot her quant it ies calculat ed on t he r anks. The usual F st at ist ic is
r eplaced by a chi-squar e st at ist ic. The p-value measur es t he significance of t he
chi-squar e st at ist ic.
The second figur e displays box plot s of each column of X (not t he r anks of X).
X value 1.4 2.7 1.6 1.6 3.3 0.9 1.1
Rank 3 6 4.5 4.5 7 1 2
kruskalwallis
2-161
p = kruskalwallis(X,group) uses t he values in group (a char act er ar r ay or
cell ar r ay) as labels for t he box plot of t he samples in X, when X is a mat r ix.
Each r ow of group cont ains t he label for t he dat a in t he cor r esponding column
of X, so group must have lengt h equal t o t he number of columns in X.
When X is a vect or , kruskalwallis per for ms a Kr uskal-Wallis t est on t he
samples cont ained in X, as indexed by input group (a vect or , char act er ar r ay,
or cell ar r ay). Each element in group ident ifies t he gr oup (i.e., sample) t o which
t he cor r esponding element in vect or X belongs, so group must have t he same
lengt h as X. The labels cont ained in group ar e also used t o annot at e t he box
plot .
It is not necessar y t o label samples sequent ially (1, 2, 3, ...). For example, if X
cont ains measur ement s t aken at t hr ee differ ent t emper at ur es, -27, 65, and
110, you could use t hese number s as t he sample labels in group. If a r ow of
group cont ains an empt y cell or empt y st r ing, t hat r ow and t he cor r esponding
obser vat ion in X ar e disr egar ded. NaNs in eit her input ar e similar ly ignor ed.
p = kruskalwallis(X,group,'displayopt') enables t he t able and box plot
displays when 'displayopt' is 'on' (default ) and suppr esses t he displays
when 'displayopt' is 'off'.
[p,table] = kruskalwallis(...) r et ur ns t he ANOVA t able (including
column and r ow labels) in cell ar r ay table. (You can copy a t ext ver sion of t he
ANOVA t able t o t he clipboar d by using t he Copy Text it em on t he Edi t menu.)
[p,table,stats] = kruskalwallis(...) r et ur ns a stats st r uct ur e t hat you
can use t o per for m a follow-up mult iple compar ison t est . The kruskalwallis
t est evaluat es t he hypot hesis t hat all samples have t he same mean, against t he
alt er nat ive t hat t he means ar e not all t he same. Somet imes it is pr efer able t o
per for m a t est t o det er mine which pair s of means ar e significant ly differ ent ,
and which ar e not . You can use t he multcompare funct ion t o per for m such t est s
by supplying t he stats st r uct ur e as input .
Assumptions
The Kr uskal-Wallis t est makes t he following assumpt ions about t he dat a in X:
All sample populat ions have t he same cont inuous dist r ibut ion, apar t fr om a
possibly differ ent locat ion.
All obser vat ions ar e mut ually independent .
kruskalwallis
2-162
The classical one-way ANOVA t est r eplaces t he fir st assumpt ion wit h t he
st r onger assumpt ion t hat t he populat ions have nor mal dist r ibut ions.
Example Let s r evisit t he same mat er ial st r engt h st udy t hat we used wit h t he anova1
funct ion, t o see if t he nonpar amet r ic Kr uskal-Wallis pr ocedur e leads t o t he
same conclusion. Recall we ar e st udying t he st r engt h of beams made fr om
t hr ee alloys:
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
alloy = {'st','st','st','st','st','st','st','st',...
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
This t ime we t r y bot h classical and Kr uskal-Wallis anova, omit t ing displays:
anova1(strength,alloy,'off')
ans =
1.5264e-004
kruskalwallis(strength,alloy,'off')
ans =
0.0018
Bot h t est s find t hat t he t hr ee alloys ar e significant ly differ ent , t hough t he
r esult is less significant accor ding t o t he Kr uskal-Wallis t est . It is t ypical t hat
when a dat aset has a r easonable fit t o t he nor mal dist r ibut ion, t he classical
ANOVA t est will be mor e sensit ive t o differ ences bet ween gr oups.
To under st and when a nonpar amet r ic t est may be mor e appr opr iat e, let s see
how t he t est s behave when t he dist r ibut ion is not nor mal. We can simulat e t his
by r eplacing one of t he values by an ext r eme value (an out lier ).
strength(20)=120;
anova1(strength,alloy,'off')
ans =
0.2501
kruskalwallis
2-163
kruskalwallis(strength,alloy,'off')
ans =
0.0060
Now t he classical ANOVA t est does not find a significant differ ence, but t he
nonpar amet r ic pr ocedur e does. This illust r at es one of t he pr oper t ies of
nonpar amet r ic pr ocedur es t hey ar e oft en not sever ely affect ed by changes in
a small por t ion of t he dat a.
Reference Hollander , M., and D. A. Wolfe, Nonparametric S tatistical Methods, Wiley,
1973.
See Also anova1, boxplot, multcompare
kstest
2-164
2kst est
Purpose Kolmogor ov-Smir nov t est of t he dist r ibut ion of one sample.
Syntax H = kstest(X)
H = kstest(X,cdf)
H = kstest(X,cdf,alpha,tail)
[H,P,KSSTAT,CV] = kstest(X,cdf,alpha,tail)
Description H = kstest(X) per for ms a Kolmogor ov-Smir nov t est t o compar e t he values in
t he dat a vect or X wit h a st andar d nor mal dist r ibut ion (t hat is, a nor mal
dist r ibut ion having mean 0 and var iance 1). The null hypot hesis for t he
Kolmogor ov-Smir nov t est is t hat X has a st andar d nor mal dist r ibut ion. The
alt er nat ive hypot hesis t hat X does not have t hat dist r ibut ion. The r esult H is 1
if we can r eject t he hypot hesis t hat X has a st andar d nor mal dist r ibut ion, or 0
if we cannot r eject t hat hypot hesis. We r eject t he hypot hesis if t he t est is
significant at t he 5% level.
For each pot ent ial value x, t he Kolmogor ov-Smir nov t est compar es t he
pr opor t ion of values less t han x wit h t he expect ed number pr edict ed by t he
st andar d nor mal dist r ibut ion. The kstest funct ion uses t he maximum
differ ence over all x values is it s t est st at ist ic. Mat hemat ically, t his can be
wr it t en as
wher e is t he pr opor t ion of X values less t han or equal t o x and is t he
st andar d nor mal cumulat ive dist r ibut ion funct ion evaluat ed at x.
H = kstest(X,cdf) compar es t he dist r ibut ion of X t o t he hypot hesized
dist r ibut ion defined by t he t wo-column mat r ix cdf. Column one cont ains a set
of possible x values, and column t wo cont ains t he cor r esponding hypot hesized
cumulat ive dist r ibut ion funct ion values . If possible, you should define
cdf so t hat column one cont ains t he values in X. If t her e ar e values in X not
found in column one of cdf, kstest will appr oximat e by int er polat ion. All
values in X must lie in t he int er val bet ween t he smallest and lar gest values in
t he fir st column of cdf. If t he second ar gument is empt y (cdf = []), kstest uses
t he st andar d nor mal dist r ibut ion as if t her e wer e no second ar gument .
The Kolmogor ov-Smir nov t est r equir es t hat cdf be pr edet er mined. It is not
accur at e if cdf is est imat ed fr om t he dat a. To t est X against a nor mal
dist r ibut ion wit hout specifying t he par amet er s, use lillietest inst ead.
max F x ( ) G x ( ) ( )
F x ( ) G x ( )
G x ( )
G X ( )
kstest
2-165
H = kstest(X,cdf,alpha,tail) specifies t he significance level alpha and a
code tail for t he t ype of alt er nat ive hypot hesis. If tail = 0 (t he default ),
kstest per for ms a t wo-sided t est wit h t he gener al alt er nat ive . If
tail = -1, t he alt er nat ive is t hat . If tail = 1, t he alt er nat ive is .
The for m of t he t est st at ist ic depends on t he value of tail as follows.
tail = 0:
tail = -1:
tail = 1:
[H,P,KSSTAT,CV] = kstest(X,cdf,alpha,tail) also r et ur ns t he obser ved
p-value P, t he obser ved Kolmogor ov-Smir nov st at ist ic KSSTAT, and t he cut off
value CV for det er mining if KSSTAT is significant . If t he r et ur n value of CV is NaN,
t hen kstest det er mined t he significance calculat ing a p-value accor ding t o an
asympt ot ic for mula r at her t han by compar ing KSSTAT t o a cr it ical value.
Examples Ex a mple 1
Let s gener at e some evenly spaced number s and per for m a
Kolmogor ov-Smir nov t est t o see how well t hey fit t o a nor mal dist r ibut ion:
x = -2:1:4
x =
-2 -1 0 1 2 3 4
[h,p,k,c] = kstest(x,[],0.05,0)
h =
0
p =
0.13632
k =
0.41277
c =
0.48342
We cannot r eject t he null hypot hesis t hat t he values come fr om a st andar d
nor mal dist r ibut ion. Alt hough int uit ively it seems t hat t hese evenly-spaced
int eger s could not follow a nor mal dist r ibut ion, t his example illust r at es t he
difficult y in t est ing nor malit y in small samples.
F G
F G < F G >
max F x ( ) G x ( ) ( )
max G x ( ) F x ( ) ( )
max F x ( ) G x ( ) ( )
kstest
2-166
To under st and t he t est , it is helpful t o gener at e an empir ical cumulat ive
dist r ibut ion plot and over lay t he t heor et ical nor mal dist r ibut ion.
xx = -3:.1:5;
cdfplot(x)
hold on
plot(xx,normcdf(xx),'r--')
The Kolmogor ov-Smir nov t est st at ist ic is t he maximum differ ence bet ween
t hese cur ves. It appear s t hat t his maximum of 0.41277 occur s as we appr oach
x = 1.0 fr om below. We can see t hat t he empir ical cur ve has t he value 3/7 her e,
and we can easily ver ify t hat t he differ ence bet ween t he cur ves is 0.41277.
normcdf(1) - 3/7
ans =
0.41277
We can also per for m a one-sided t est . By set t ing tail = -1 we indicat e t hat our
alt er nat ive is , so t he t est st at ist ic count s only point s wher e t his
inequalit y is t r ue.
[h,p,k] = kstest(x, [], .05, -1)
3 2 1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
F G <
kstest
2-167
h =
0
p =
0.068181
k =
0.41277
The t est st at ist ic is t he same as befor e because in fact at x = 1.0.
However , t he p-value is smaller for t he one-sided t est . If we car r y out t he ot her
one-sided t est , we see t hat t he t est st at ist ic changes, and is t he differ ence
bet ween t he t wo cur ves near x = -1.0.
[h,p,k] = kstest(x,[],0.05,1)
h =
0
p =
0.77533
k =
0.12706
2/7 - normcdf(-1)
ans =
0.12706
Ex a mple 2
Now let s gener at e r andom number s fr om a Weibull dist r ibut ion, and t est
against t hat Weibull dist r ibut ion and an exponent ial dist r ibut ion.
x = weibrnd(1, 2, 100, 1);
kstest(x, [x weibcdf(x, 1, 2)])
ans =
0
kstest(x, [x expcdf(x, 1)])
ans =
1
F G <
kstest
2-168
See Also kstest2, lillietest
kstest2
2-169
2kst est 2
Purpose Kolmogor ov-Smir nov t est t o compar e t he dist r ibut ion of t wo samples.
Syntax H = kstest2(X1,X2)
H = kstest2(X1,X2,alpha,tail)
[H,P,KSSTAT] = kstest(X,cdf,alpha,tail)
Description H = kstest2(X1,X2) per for ms a t wo-sample Kolmogor ov-Smir nov t est t o
compar e t he dist r ibut ions of values in t he t wo dat a vect or s X1 and X2. The null
hypot hesis for t his t est is t hat X1 and X2 have t he same cont inuous
dist r ibut ion. The alt er nat ive hypot hesis is t hat t hey have differ ent cont inuous
dist r ibut ions. The r esult H is 1 if we can r eject t he hypot hesis t hat t he
dist r ibut ions ar e t he same, or 0 if we cannot r eject t hat hypot hesis. We r eject
t he hypot hesis if t he t est is significant at t he 5% level.
For each pot ent ial value x, t he Kolmogor ov-Smir nov t est compar es t he
pr opor t ion of X1 values less t han x wit h pr opor t ion of X2 values less t han x. The
kstest2 funct ion uses t he maximum differ ence over all x values is it s t est
st at ist ic. Mat hemat ically, t his can be wr it t en as
wher e is t he pr opor t ion of X1 values less t han or equal t o x and is
t he pr opor t ion of X2 values less t han or equal t o x.
H = kstest2(X1,X2,alpha,tail) specifies t he significance level alpha and a
code tail for t he t ype of alt er nat ive hypot hesis. If tail = 0 (t he default ),
kstest per for ms a t wo-sided t est wit h t he gener al alt er nat ive . If
tail = -1, t he alt er nat ive is t hat . If tail = 1, t he alt er nat ive is
. The for m of t he t est st at ist ic depends on t he value of tail as follows:
tail = 0:
tail = -1:
tail = 1:
[H,P,KSSTAT,CV] = kstest(X,cdf,alpha,tail) also r et ur ns t he obser ved
p-value P, t he obser ved Kolmogor ov-Smir nov st at ist ic KSSTAT, and t he cut off
value CV for det er mining if KSSTAT is significant . If t he r et ur n value of CV is NaN,
t hen kstest det er mined t he significance calculat ing a p-value accor ding t o an
asympt ot ic for mula r at her t han by compar ing KSSTAT t o a cr it ical value.
max F1 x ( ) F2 x ( ) ( )
F1 x ( ) F2 x ( )
F1 F2
F1 F2 <
F1 F2 >
max F1 x ( ) F2 x ( ) ( )
max F2 x ( ) F1 x ( ) ( )
max F1 x ( ) F2 x ( ) ( )
kstest2
2-170
Examples Let s compar e t he dist r ibut ions of a small evenly-spaced sample and a lar ger
nor mal sample:
x = -1:1:5
y = randn(20,1);
[h,p,k] = kstest2(x,y)
h =
1
p =
0.0403
k =
0.5714
The differ ence bet ween t heir dist r ibut ions is significant at t he 5% level
(p = 4%). To visualize t he differ ence, we can over lay plot s of t he t wo empir ical
cumulat ive dist r ibut ion funct ions. The Kolmogor ov-Smir nov st at ist ic is t he
maximum differ ence bet ween t hese funct ions. Aft er changing t he color and line
st yle of one of t he t wo cur ves, we can see t hat t he maximum differ ence appear s
t o be near x = 1.9. We can also ver ify t hat t he differ ence equals t he k value t hat
kstest2 r epor t s:
cdfplot(x)
hold on
cdfplot(y)
h = findobj(gca,'type','line');
set(h(1),'linestyle',':','color','r')
1 - 3/7
ans =
0.5714
kstest2
2-171
See Also kstest, lillietest
2 1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
kurtosis
2-172
2kur t osis
Purpose Sample kur t osis.
Syntax k = kurtosis(X)
k = kurtosis(X,flag)
Description k = kurtosis(X) r et ur ns t he sample kur t osis of X. For vect or s, kurtosis(x) is
t he kur t osis of t he element s in t he vect or x. For mat r ices kurtosis(X) r et ur ns
t he sample kur t osis for each column of X.
Kur t osis is a measur e of how out lier -pr one a dist r ibut ion is. The kur t osis of t he
nor mal dist r ibut ion is 3. Dist r ibut ions t hat ar e mor e out lier -pr one t han t he
nor mal dist r ibut ion have kur t osis gr eat er t han 3; dist r ibut ions t hat ar e less
out lier -pr one have kur t osis less t han 3.
The kur t osis of a dist r ibut ion is defined as
wher e is t he mean of x, is t he st andar d deviat ion of x, and E(t) r epr esent s
t he expect ed value of t he quant it y t.
Note Some definit ions of kur t osis subt r act 3 fr om t he comput ed value, so
t hat t he nor mal dist r ibut ion has kur t osis of 0. The kurtosis funct ion does not
use t his convent ion.
k = kurtosis(X,flag) specifies whet her t o cor r ect for bias (flag = 0) or not
(flag = 1, t he default ). When X r epr esent s a sample fr om a populat ion, t he
kur t osis of X is biased, t hat is, it will t end t o differ fr om t he populat ion kur t osis
by a syst emat ic amount t hat depends on t he size of t he sample. You can set
flag = 0 t o cor r ect for t his syst emat ic bias.
k
E x ( )
4

4
------------------------ =

kurtosis
2-173
Example X = randn([5 4])
X =
1.1650 1.6961 -1.4462 -0.3600
0.6268 0.0591 -0.7012 -0.1356
0.0751 1.7971 1.2460 -1.3493
0.3516 0.2641 -0.6390 -1.2704
-0.6965 0.8717 0.5774 0.9846
k = kurtosis(X)
k =
2.1658 1.2967 1.6378 1.9589
See Also mean, moment, skewness, std, var
leverage
2-174
2lever age
Purpose Lever age values for a r egr ession.
Syntax h = leverage(data)
h = leverage(data,'model')
Description h = leverage(data) finds t he lever age of each r ow (point ) in t he mat r ix data
for a linear addit ive r egr ession model.
h = leverage(data,'model') finds t he lever age on a r egr ession, using a
specified model t ype, wher e 'model' can be one of t hese st r ings:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms
'quadratic' includes int er act ions and squar ed t er ms
'purequadratic' includes const ant , linear , and squar ed t er ms
Lever age is a measur e of t he influence of a given obser vat ion on a r egr ession
due t o it s locat ion in t he space of t he input s.
Example One r ule of t humb is t o compar e t he lever age t o 2p/ n wher e n is t he number of
obser vat ions and p is t he number of par amet er s in t he model. For t he Hald
dat aset t his value is 0.7692.
load hald
h = max(leverage(ingredients,'linear'))
h =
0.7004
Since 0.7004 < 0.7692, t her e ar e no high lever age point s using t his r ule.
Algorithm [Q,R] = qr(x2fx(data,'model'));
leverage = (sum(Q'.*Q'))'
Reference Goodall, C. R. (1993). Computation using the QR decomposition. Handbook in
St at ist ics, Volume 9. St at ist ical Comput ing (C. R. Rao, ed.). Amst er dam, NL
Elsevier /Nor t h-Holland.
See Also regstats
lillietest
2-175
2lilliet est
Purpose Lilliefor s t est for goodness of fit t o a nor mal dist r ibut ion.
Syntax H = lillietest(X)
H = lillietest(X,alpha)
[H,P,LSTAT,CV] = lillietest(X,alpha)
Description H = lillietest(X) per for ms t he Lilliefor s t est on t he input dat a vect or X and
r et ur ns H, t he r esult of t he hypot hesis t est . The r esult H is 1 if we can r eject t he
hypot hesis t hat X has a nor mal dist r ibut ion, or 0 if we cannot r eject t hat
hypot hesis. We r eject t he hypot hesis if t he t est is significant at t he 5% level.
The Lilliefor s t est evaluat es t he hypot hesis t hat X has a nor mal dist r ibut ion
wit h unspecified mean and var iance, against t he alt er nat ive t hat X does not
have a nor mal dist r ibut ion. This t est compar es t he empir ical dist r ibut ion of X
wit h a nor mal dist r ibut ion having t he same mean and var iance as X. It is
similar t o t he Kolmogor ov-Smir nov t est , but it adjust s for t he fact t hat t he
par amet er s of t he nor mal dist r ibut ion ar e est imat ed fr om X r at her t han
specified in advance.
H = lillietest(X,alpha) per for ms t he Lilliefor s t est at t he 100*alpha%
level r at her t han t he 5% level. alpha must be bet ween 0.01 and 0.2.
[H,P,LSTAT,CV] = lillietest(X,alpha) r et ur ns t hr ee addit ional out put s. P
is t he p-value of t he t est , obt ained by linear int er polat ion in a set of t able
cr eat ed by Lilliefor s. LSTAT is t he value of t he t est st at ist ic. CV is t he cr it ical
value for det er mining whet her t o r eject t he null hypot hesis. If t he value of
LSTAT is out side t he r ange of t he Lilliefor s t able, P is r et ur ned as NaN but H
indicat es whet her t o r eject t he hypot hesis.
Example Do car weight s follow a nor mal dist r ibut ion? Not exact ly, because weight s ar e
always posit ive, and a nor mal dist r ibut ion allows bot h posit ive and negat ive
values. However , per haps t he nor mal dist r ibut ion is a r easonable
appr oximat ion.
load carsmall
[h p l c] = lillietest(Weight);
[h p l c]
lillietest
2-176
ans =
1.0000 0.0232 0.1032 0.0886
The Lilliefor s t est st at ist ic of 0.10317 is lar ger t han t he cut off value of 0.0886
for a 5% level t est , so we r eject t he hypot hesis of nor malit y. In fact , t he p-value
of t his t est is appr oximat ely 0.02.
To visualize t he dist r ibut ion, we can make a hist ogr am. This gr aph shows t hat
t he dist r ibut ion is skewed t o t he r ight fr om t he peak near 2250, t he
fr equencies dr op off abr upt ly t o t he left but mor e gr adually t o t he r ight .
hist(Weight)
Somet imes it is possible t o t r ansfor m a var iable t o make it s dist r ibut ion mor e
near ly nor mal. A log t r ansfor mat ion, in par t icular , t ends t o compensat e for
skewness t o t he r ight .
[h p l c] = lillietest(log(Weight))
ans =
0 0.13481 0.077924 0.0886
Now t he p-value is appr oximat ely 0.13, so we do not r eject t he hypot hesis.
1500 2000 2500 3000 3500 4000 4500 5000
0
2
4
6
8
10
12
14
16
18
lillietest
2-177
Reference Conover , W. J . (1980). Practical Nonparametric S tatistics. New Yor k, Wiley.
See Also hist, jbtest, kstest2
linkage
2-178
2linkage
Purpose Cr eat e hier ar chical clust er t r ee.
Syntax Z = linkage(Y)
Z = linkage(Y,'method')
Description Z = linkage(Y) cr eat es a hier ar chical clust er t r ee, using t he Single Linkage
algor it hm. The input mat r ix, Y, is t he dist ance vect or out put by t he pdist
funct ion, a vect or of lengt h -by-1, wher e m is t he number of
object s in t he or iginal dat aset .
Z = linkage(Y,'method') comput es a hier ar chical clust er t r ee using t he
algor it hm specified by 'method', wher e 'method' can be any of t he following
char act er st r ings t hat ident ify ways t o cr eat e t he clust er hier ar chy. Their
definit ions ar e explained in Mat hemat ical Definit ions on page 2-179.
The out put , Z, is an (m-1)-by-3 mat r ix cont aining clust er t r ee infor mat ion. The
leaf nodes in t he clust er hier ar chy ar e t he object s in t he or iginal dat aset ,
number ed fr om 1 t o m. They ar e t he singlet on clust er s fr om which all higher
clust er s ar e built . Each newly for med clust er , cor r esponding t o r ow i in Z, is
assigned t he index m+i, wher e m is t he t ot al number of init ial leaves.
Columns 1 and 2, Z(i,1:2), cont ain t he indices of t he object s t hat wer e linked
in pair s t o for m a new clust er . This new clust er is assigned t he index value m+i.
Ther e ar e m-1 higher clust er s t hat cor r espond t o t he int er ior nodes of t he
hier ar chical clust er t r ee.
Column 3, Z(i,3), cont ains t he cor r esponding linkage dist ances bet ween t he
object s pair ed in t he clust er s at each r ow i.
String Meaning
'single' Shor t est dist ance (default )
'complete' Lar gest dist ance
'average' Aver age dist ance
'centroid' Cent r oid dist ance
'ward' Incr ement al sum of squar es
m 1 ( ) m 2 ( )
linkage
2-179
For example, consider a case wit h 30 init ial nodes. If t he t ent h clust er for med
by t he linkage funct ion combines object 5 and object 7 and t heir dist ance is
1.5, t hen r ow 10 of Z will cont ain t he values (5, 7, 1.5). This newly for med
clust er will have t he index 10+30=40. If clust er 40 shows up in a lat er r ow, t hat
means t his newly for med clust er is being combined again int o some bigger
clust er .
M a thema tica l Definitions
The 'method' ar gument is a char act er st r ing t hat specifies t he algor it hm used
t o gener at e t he hier ar chical clust er t r ee infor mat ion. These linkage algor it hms
ar e based on var ious measur ement s of pr oximit y bet ween t wo gr oups of object s.
If n
r
is t he number of object s in clust er r and n
s
is t he number of object s in
clust er s, and x
ri
is t he it h object in clust er r, t he definit ions of t hese var ious
measur ement s ar e as follows:
S ingle linkage, also called nearest neighbor, uses t he smallest dist ance
bet ween object s in t he t wo gr oups.

Complete linkage, also called furthest neighbor, uses t he lar gest dist ance
bet ween object s in t he t wo gr oups.
Average linkage uses t he aver age dist ance bet ween all pair s of object s in
clust er r and clust er s.
Centroid linkage uses t he dist ance bet ween t he cent r oids of t he t wo gr oups.

wher e
and is defined similar ly.
d r s , ( ) m i n d i st x
r i
x
s j
, ( ) ( ) i i n
r
, , ( ) j 1 n
s
, , ( ) , , =
d r s , ( ) m ax d i st x
r i
x
s j
, ( ) ( ) i 1 n
r
, , ( ) j 1 n
s
, , ( ) , , =
d r s , ( )
1
n
r
n
s
------------ d i st x
r i
x
s j
, ( )
j 1 =
n
s

i 1 =
n
r

=
d r s , ( ) d x
r
x
s
, ( ) =
x
r
1
n
r
------ x
r i
i 1 =
n
r

=
x
s
linkage
2-180
Ward linkage uses t he incr ement al sum of squar es; t hat is, t he incr ease in
t he t ot al wit hin-gr oup sum of squar es as a r esult of joining gr oups r and s. It
is given by
wher e is t he dist ance bet ween clust er r and clust er s defined in t he
Cent r oid linkage. The wit hin-gr oup sum of squar es of a clust er is defined as
t he sum of t he squar es of t he dist ance bet ween all object s in t he clust er and
t he cent r oid of t he clust er .
Example X = [3 1.7; 1 1; 2 3; 2 2.5; 1.2 1; 1.1 1.5; 3 1];
Y = pdist(x);
Z = linkage(y)
Z =
2.0000 5.0000 0.2000
3.0000 4.0000 0.5000
8.0000 6.0000 0.5099
1.0000 7.0000 0.7000
11.0000 9.0000 1.2806
12.0000 10.0000 1.3454
See Also cluster, clusterdata, cophenet, dendrogram, inconsistent, pdist,
squareform
d r s , ( ) n
r
n
s
d
r s
2
n
r
n
s
+ ( ) =
d
r s
2
logncdf
2-181
2logncdf
Purpose Lognor mal cumulat ive dist r ibut ion funct ion.
Syntax P = logncdf(X,MU,SIGMA)
Description P = logncdf(X,MU,SIGMA) comput es t he lognor mal cdf at each of t he values in
X using t he cor r esponding means in MU and st andar d deviat ions in SIGMA.
Vect or or mat r ix input s for X, MU, and SIGMA must have t he same size, which is
also t he size of P. A scalar input for X, MU, or SIGMA is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s.
The lognor mal cdf is
Example x = (0:0.2:10);
y = logncdf(x,0,1);
plot(x,y); grid;
xlabel('x'); ylabel('p');
Reference Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 102105.
See Also cdf, logninv, lognpdf, lognrnd, lognstat
p F x , ( )
1
2
---------------
e
ln t ( ) ( )
2
2
2
--------------------------------
t
----------------------------- t d
0
x

= =
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
x
p
logninv
2-182
2logninv
Purpose Inver se of t he lognor mal cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = logninv(P,MU,SIGMA)
Description X = logninv(P,MU,SIGMA) comput es t he inver se lognor mal cdf wit h mean MU
and st andar d deviat ion SIGMA, at t he cor r esponding pr obabilit ies in P. Vect or
or mat r ix input s for P, MU, and SIGMA must have t he same size, which is also t he
size of X. A scalar input for P, MU, or SIGMA is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input s.
We define t he lognor mal inver se funct ion in t er ms of t he lognor mal cdf as
wher e
Example p = (0.005:0.01:0.995);
crit = logninv(p,1,0.5);
plot(p,crit)
xlabel('Probability');ylabel('Critical Value'); grid
x F
1
p , ( ) x:F x , ( ) p = { } = =
p F x , ( )
1
2
---------------
e
ln t ( ) ( )
2
2
2
--------------------------------
t
----------------------------- t d
0
x

= =
0 0.2 0.4 0.6 0.8 1
0
2
4
6
8
10
Probability
C
r
i
t
i
c
a
l

V
a
l
u
e
logninv
2-183
Reference Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 102105.
See Also icdf, logncdf, lognpdf, lognrnd, lognstat
lognpdf
2-184
2lognpdf
Purpose Lognor mal pr obabilit y densit y funct ion (pdf).
Syntax Y = lognpdf(X,MU,SIGMA)
Description Y = logncdf(X,MU,SIGMA) comput es t he lognor mal cdf at each of t he values
in X using t he cor r esponding means in MU and st andar d deviat ions in SIGMA.
Vect or or mat r ix input s for X, MU, and SIGMA must have t he same size, which is
also t he size of Y. A scalar input for X, MU, or SIGMA is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s
The lognor mal pdf is
Example x = (0:0.02:10);
y = lognpdf(x,0,1);
plot(x,y); grid;
xlabel('x'); ylabel('p')
Reference Mood, A. M., F.A. Gr aybill, and D.C. Boes, Introduction to the Theory of
S tatistics, Third Edition, McGr aw-Hill 1974 p. 540541.
See Also logncdf, logninv, lognrnd, lognstat, pdf
y f x , ( )
1
x 2
------------------
e
ln x ( ) ( )
2
2
2
---------------------------------
= =
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
x
p
lognrnd
2-185
2lognr nd
Purpose Random mat r ices fr om t he lognor mal dist r ibut ion.
Syntax R = lognrnd(MU,SIGMA)
R = lognrnd(MU,SIGMA,m)
R = lognrnd(MU,SIGMA,m,n)
Description R = lognrnd(MU,SIGMA) gener at es lognor mal r andom number s wit h
par amet er s MU and SIGMA. Vect or or mat r ix input s for MU and SIGMA must have
t he same size, which is also t he size of R. A scalar input for MU or SIGMA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input .
R = lognrnd(MU,SIGMA,m) gener at es lognor mal r andom number s wit h
par amet er s MU and SIGMA, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = lognrnd(MU,SIGMA,m,n) gener at es lognor mal r andom number s wit h
par amet er s MU and SIGMA, wher e scalar s m and n ar e t he r ow and column
dimensions of R.
Example r = lognrnd(0,1,4,3)
r =
3.2058 0.4983 1.3022
1.8717 5.4529 2.3909
1.0780 1.0608 0.2355
1.4213 6.0320 0.4960
Reference Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 102105.
See Also random, logncdf, logninv, lognpdf, lognstat
lognstat
2-186
2lognst at
Purpose Mean and var iance for t he lognor mal dist r ibut ion.
Syntax [M,V] = lognstat(MU,SIGMA)
Description [M,V] = lognstat(MU,SIGMA) r et ur ns t he mean and var iance of t he
lognor mal dist r ibut ion wit h par amet er s MU and SIGMA. Vect or or mat r ix input s
for MU and SIGMA must have t he same size, which is also t he size of M and V. A
scalar input for MU or SIGMA is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input .
The mean of t he lognor mal dist r ibut ion wit h par amet er s and is
and t he var iance is
Example [m,v]= lognstat(0,1)
m =
1.6487
v =
7.0212
Reference Mood, A. M., F.A. Gr aybill, and D.C. Boes, Introduction to the Theory of
S tatistics, Third Edition, McGr aw-Hill 1974 p. 540541.
See Also logncdf, logninv, lognrnd, lognrnd
e


2
2
----- +
,
_
e
2 2
2
+ ( )
e
2
2
+ ( )

lsline
2-187
2lsline
Purpose Least squar es fit line(s).
Syntax lsline
h = lsline
Description lsline super imposes t he least squar es line on each line object in t he cur r ent
axes (except LineStyles '-','--','.-').
h = lsline r et ur ns t he handles t o t he line object s.
Example y = [2 3.4 5.6 8 11 12.3 13.8 16 18.8 19.9]';
plot(y,'+');
lsline;
See Also polyfit, polyval
0 2 4 6 8 10
0
5
10
15
20
mad
2-188
2mad
Purpose Mean absolut e deviat ion (MAD) of a sample of dat a.
Syntax y = mad(X)
Description y = mad(X) comput es t he aver age of t he absolut e differ ences bet ween a set of
dat a and t he sample mean of t hat dat a. For vect or s, mad(x) r et ur ns t he mean
absolut e deviat ion of t he element s of x. For mat r ices, mad(X) r et ur ns t he MAD
of each column of X.
The MAD is less efficient t han t he st andar d deviat ion as an est imat e of t he
spr ead when t he dat a is all fr om t he nor mal dist r ibut ion.
Mult iply t he MAD by 1.3 t o est imat e (t he second par amet er of t he nor mal
dist r ibut ion).
Examples This example shows a Mont e Car lo simulat ion of t he r elat ive efficiency of t he
MAD t o t he sample st andar d deviat ion for nor mal dat a.
x = normrnd(0,1,100,100);
s = std(x);
s_MAD = 1.3 mad(x);
efficiency = (norm(s - 1)./norm(s_MAD - 1)).^2
efficiency =
0.5972
See Also std, range
mahal
2-189
2mahal
Purpose Mahalanobis dist ance.
Syntax d = mahal(Y,X)
Description mahal(Y,X) comput es t he Mahalanobis dist ance of each point (r ow) of t he
mat r ix Y fr om t he sample in t he mat r ix X.
The number of columns of Y must equal t he number of columns in X, but t he
number of r ows may differ . The number of r ows in X must exceed t he number
of columns.
The Mahalanobis dist ance is a mult ivar iat e measur e of t he separ at ion of a dat a
set fr om a point in space. It is t he cr it er ion minimized in linear discr iminant
analysis.
Example The Mahalanobis dist ance of a mat r ix r when applied t o it self is a way t o find
out lier s.
r = mvnrnd([0 0],[1 0.9;0.9 1],100);
r = [r;10 10];
d = mahal(r,r);
last6 = d(96:101)
last6 =
1.1036
2.2353
2.0219
0.3876
1.5571
52.7381
The last element is clear ly an out lier .
See Also classify
manova1
2-190
2manova1
Purpose One-way Mult ivar iat e Analysis of Var iance (MANOVA).
Syntax d = manova1(X,group)
d = manova1(X,group,alpha)
[d,p] = manova1(...)
[d,p,stats] = anova1(...)
Description d = manova1(X,group) per for ms a one-way Mult ivar iat e Analysis of Var iance
(MANOVA) for compar ing t he mult ivar iat e means of t he columns of X, gr ouped
by group. X is an m-by-n mat r ix of dat a values, and each r ow is a vect or of
measur ement s on n var iables for a single obser vat ion. group is a gr ouping
var iable defined as a vect or , st r ing ar r ay, or cell ar r ay of st r ings. Two
obser vat ions ar e in t he same gr oup if t hey have t he same value in t he group
ar r ay. The obser vat ions in each gr oup r epr esent a sample fr om a populat ion.
The funct ion r et ur ns d, an est imat e of t he dimension of t he space cont aining
t he gr oup means. manova1 t est s t he null hypot hesis t hat t he means of each
gr oup ar e t he same n-dimensional mult ivar iat e vect or , and t hat any differ ence
obser ved in t he sample X is due t o r andom chance. If d = 0, t her e is no evidence
t o r eject t hat hypot hesis. If d = 1, t hen you can r eject t he null hypot hesis at t he
5% level, but you cannot r eject t he hypot hesis t hat t he mult ivar iat e means lie
on t he same line. Similar ly, if d = 2 t he mult ivar iat e means may lie on t he same
plane in n-dimensional space, but not on t he same line.
d = manova1(X,group,alpha) gives cont r ol of t he significance level, alpha.
The r et ur n value d will be t he smallest dimension having p > alpha, wher e p is
a p-value for t est ing whet her t he means lie in a space of t hat dimension.
[d,p] = manova1(...) also r et ur ns a p, a vect or of p-values for t est ing
whet her t he means lie in a space of dimension 0, 1, and so on. The lar gest
possible dimension is eit her t he dimension of t he space, or one less t han t he
number of gr oups. Ther e is one element of p for each dimension up t o, but not
including, t he lar gest .
If t he it h p-value is near zer o, t his cast s doubt on t he hypot hesis t hat t he gr oup
means lie on a space of i-1 dimensions. The choice of a cr it ical p-value t o
det er mine whet her t he r esult is judged st at ist ically significant is left t o t he
r esear cher and is specified by t he value of t he input ar gument alpha. It is
common t o declar e a r esult significant if t he p-value is less t han 0.05 or 0.01.
manova1
2-191
[d,p,stats] = anova1(...) also r et ur ns stats, a st r uct ur e cont aining
addit ional MANOVA r esult s. The st r uct ur e cont ains t he following fields.
The canonical var iables C ar e linear combinat ions of t he or iginal var iables,
chosen t o maximize t he separ at ion bet ween gr oups. Specifically, C(:,1) is t he
linear combinat ion of t he X columns t hat has t he maximum separ at ion bet ween
gr oups. This means t hat among all possible linear combinat ions, it is t he one
wit h t he most significant F st at ist ic in a one-way analysis of var iance.
Field Contents
W Wit hin-gr oups sum of squar es and cr oss-pr oduct s mat r ix
B Bet ween-gr oups sum of squar es and cr oss-pr oduct s mat r ix
T Tot al sum of squar es and cr oss-pr oduct s mat r ix
dfW Degr ees of fr eedom for W
dfB Degr ees of fr eedom for B
dfT Degr ees of fr eedom for T
lambda Vect or of values of Wilks lambda t est st at ist ic for t est ing
whet her t he means have dimension 0, 1, et c.
chisq Tr ansfor mat ion of lambda t o an appr oximat e chi-squar e
dist r ibut ion
chisqdf Degr ees of fr eedom for chisq
eigenval Eigenvalues of
eigenvec Eigenvect or s of ; t hese ar e t he coefficient s for t he
canonical var iables C, and t hey ar e scaled so t he wit hin-gr oup
var iance of t he canonical var iables is 1
canon Canonical var iables C, equal t o XC*eigenvec, wher e XC is X wit h
columns cent er ed by subt r act ing t heir means
mdist A vect or of Mahalanobis dist ances fr om each point t o t he mean
of it s gr oup
gmdist A mat r ix of Mahalanobis dist ances bet ween each pair of gr oup
means
W
1
B
W
1
B
manova1
2-192
C(:,2) has t he maximum separ at ion subject t o it being or t hogonal t o C(:,1),
and so on.
You may find it useful t o use t he out put s fr om manova1 along wit h ot her
funct ions t o supplement your analysis. For example, you may want t o st ar t
wit h a gr ouped scat t er plot mat r ix of t he or iginal var iables using gplotmatrix.
You can use gscatter t o visualize t he gr oup separ at ion using t he fir st t wo
canonical var iables. You can use manovacluster t o gr aph a dendr ogr am
showing t he clust er s among t he gr oup means.
Assumptions
The MANOVA t est makes t he following assumpt ions about t he dat a in X:
The populat ions for each gr oup ar e nor mally dist r ibut ed.
The var iance-covar iance mat r ix is t he same for each populat ion.
All obser vat ions ar e mut ually independent .
Example We can use manova1 t o det er mine whet her t her e ar e differ ences in t he aver ages
of four car char act er ist ics, among gr oups defined by t he count r y wher e t he car s
wer e made.
load carbig
[d,p] = manova1([MPG Acceleration Weight Displacement],Origin)
d =
3
p =
0
0.0000
0.0075
0.1934
Ther e ar e four dimensions in t he input mat r ix, so t he gr oup means must lie in
a four -dimensional space. manova1 shows t hat we cannot r eject t he hypot hesis
t hat t he means lie in a t hr ee-dimensional subspace.
manova1
2-193
References Kr zanowski, W. J . Principles of Multivariate Analysis. Oxfor d Univer sit y
Pr ess, 1988.
See Also anova1, gscatter, gplotmatrix, manovacluster
manovacluster
2-194
2manovaclust er
Purpose Plot dendr ogr am showing gr oup mean clust er s aft er MANOVA.
Syntax manovacluster(stats)
manovacluster(stats,'method')
H = manovacluster(stats)
Description manovacluster(stats) gener at es a dendr ogr am plot of t he gr oup means aft er
a mult ivar iat e analysis of var iance (MANOVA). stats is t he out put stats
st r uct ur e fr om manova1. The clust er s ar e comput ed by applying t he single
linkage met hod t o t he mat r ix of Mahalanobis dist ances bet ween gr oup means.
See dendrogram for mor e infor mat ion on t he gr aphical out put fr om t his
funct ion. The dendr ogr am is most useful when t he number of gr oups is lar ge.
manovacluster(stats,'method') uses t he specified met hod in place of single
linkage. 'method' can be any of t he following char act er st r ings t hat ident ify
ways t o cr eat e t he clust er hier ar chy. See linkage for fur t her explanat ion.
H = manovacluster(stats,'method') r et ur ns a vect or of handles t o t he lines
in t he figur e.
Example Let s analyze t he lar ger car dat aset t o det er mine which count r ies pr oduce car s
wit h t he most similar char act er ist ics.
load carbig
X = [MPG Acceleration Weight Displacement];
[d,p,stats] = manova1(X,Origin);
manovacluster(stats)
String Meaning
'single' Shor t est dist ance (default )
'complete' Lar gest dist ance
'average' Aver age dist ance
'centroid' Cent r oid dist ance
'ward' Incr ement al sum of squar es
manovacluster
2-195
See Also cluster, dendrogram, linkage, manova1
Japan Germany Italy France Sweden England USA
0
0.5
1
1.5
2
2.5
3
mean
2-196
2mean
Purpose Aver age or mean value of vect or s and mat r ices.
Syntax m = mean(X)
Description m = mean(X) calculat es t he sample aver age
For vect or s, mean(x) is t he mean value of t he element s in vect or x. For
mat r ices, mean(X) is a r ow vect or cont aining t he mean value of each column.
The mean funct ion is par t of t he st andar d MATLAB language.
Example These commands gener at e five samples of 100 nor mal r andom number s wit h
mean, zer o, and st andar d deviat ion, one. The sample aver ages in xbar ar e
much less var iable (0.00 t 0.10).
x = normrnd(0,1,100,5);
xbar = mean(x)
xbar =
0.0727 0.0264 0.0351 0.0424 0.0752
See Also median, std, cov, corrcoef, var
x
j
1
n
--- x
i j
i 1 =
n

=
median
2-197
2median
Purpose Median value of vect or s and mat r ices.
Syntax m = median(X)
Description m = median(X) calculat es t he median value, which is t he 50t h per cent ile of a
sample. The median is a r obust est imat e of t he cent er of a sample of dat a, since
out lier s have lit t le effect on it .
For vect or s, median(x) is t he median value of t he element s in vect or x. For
mat r ices, median(X) is a r ow vect or cont aining t he median value of each
column. Since median is implement ed using sort, it can be cost ly for lar ge
mat r ices.
The median funct ion is par t of t he st andar d MATLAB language.
Examples xodd = 1:5;
modd = median(xodd)
modd =
3
meven = median(xeven)
meven =
2.5000
This example shows r obust ness of t he median t o out lier s.
xoutlier = [x 10000];
moutlier = median(xoutlier)
moutlier =
3
See Also mean, std, cov, corrcoef
mle
2-198
2mle
Purpose Maximum likelihood est imat ion.
Syntax phat = mle('dist',data)
[phat,pci] = mle('dist',data)
[phat,pci] = mle('dist',data,alpha)
[phat,pci] = mle('dist',data,alpha,p1)
Description phat = mle('dist',data) r et ur ns t he maximum likelihood est imat es (MLEs)
for t he dist r ibut ion specified in 'dist' using t he sample in t he vect or , data.
See Over view of t he Dist r ibut ions on page 1-12 for t he list of available
dist r ibut ions.
[phat,pci] = mle('dist',data) r et ur ns t he MLEs and 95% per cent
confidence int er vals.
[phat,pci] = mle('dist',data,alpha) r et ur ns t he MLEs and
100(1-alpha)% confidence int er vals given t he dat a and t he specified alpha.
[phat,pci] = mle('dist',data,alpha,p1) is used for t he binomial
dist r ibut ion only, wher e p1 is t he number of t r ials.
Example rv = binornd(20,0.75)
rv =
16
[p,pci] = mle('binomial',rv,0.05,20)
p =
0.8000
pci =
0.5634
0.9427
See Also betafit, binofit, expfit, gamfit, normfit, poissfit, weibfit
moment
2-199
2moment
Purpose Cent r al moment of all or der s.
Syntax m = moment(X,order)
Description m = moment(X,order) r et ur ns t he cent r al moment of X specified by t he
posit ive int eger order. For vect or s, moment(x,order) r et ur ns t he cent r al
moment of t he specified or der for t he element s of x. For mat r ices,
moment(X,order) r et ur ns cent r al moment of t he specified or der for each
column.
Not e t hat t he cent r al fir st moment is zer o, and t he second cent r al moment is
t he var iance comput ed using a divisor of n r at her t han n-1, wher e n is t he
lengt h of t he vect or x or t he number of r ows in t he mat r ix X.
The cent r al moment of or der k of a dist r ibut ion is defined as
wher e E(x) is t he expect ed value of x.
Example X = randn([6 5])
X =
1.1650 0.0591 1.2460 -1.2704 -0.0562
0.6268 1.7971 -0.6390 0.9846 0.5135
0.0751 0.2641 0.5774 -0.0449 0.3967
0.3516 0.8717 -0.3600 -0.7989 0.7562
-0.6965 -1.4462 -0.1356 -0.7652 0.4005
1.6961 -0.7012 -1.3493 0.8617 -1.3414
m = moment(X,3)
m =
-0.0282 0.0571 0.1253 0.1460 -0.4486
See Also kurtosis, mean, skewness, std, var
m
n
E x ( )
k
=
multcompare
2-200
2mult compar e
Purpose Mult iple compar ison t est of means or ot her est imat es.
Syntax c = multcompare(stats)
c = multcompare(stats,alpha)
c = multcompare(stats,alpha,'displayopt')
c = multcompare(stats,alpha,'displayopt','ctype')
c = multcompare(stats,alpha,'displayopt','ctype','estimate')
c = multcompare(stats,alpha,'displayopt','ctype','estimate',dim)
[c,m] = multcompare(...)
[c,m,h] = multcompare(...)
Description c = multcompare(stats) per for ms a mult iple compar ison t est using t he
infor mat ion in t he stats st r uct ur e, and r et ur ns a mat r ix c of pair wise
compar ison r esult s. It also displays an int er act ive figur e pr esent ing a
gr aphical r epr esent at ion of t he t est .
In a one-way analysis of var iance, you compar e t he means of sever al gr oups t o
t est t he hypot hesis t hat t hey ar e all t he same, against t he gener al alt er nat ive
t hat t hey ar e not all t he same. Somet imes t his alt er nat ive may be t oo gener al.
You may need infor mat ion about which pair s of means ar e significant ly
differ ent , and which ar e not . A t est t hat can pr ovide such infor mat ion is called
a mult iple compar ison pr ocedur e.
When you per for m a simple t -t est of one gr oup mean against anot her , you
specify a significance level t hat det er mines t he cut off value of t he t st at ist ic.
For example, you can specify t he value alpha = 0.05 t o insur e t hat when t her e
is no r eal differ ence, you will incor r ect ly find a significant differ ence no mor e
t han 5% of t he t ime. When t her e ar e many gr oup means, t her e ar e also many
pair s t o compar e. If you applied an or dinar y t -t est in t his sit uat ion, t he alpha
value would apply t o each compar ison, so t he chance of incor r ect ly finding a
significant differ ence would incr ease wit h t he number of compar isons. Mult iple
compar ison pr ocedur es ar e designed t o pr ovide an upper bound on t he
pr obabilit y t hat any compar ison will be incor r ect ly found significant .
The out put c cont ains t he r esult s of t he t est in t he for m of a five-column mat r ix.
Each r ow of t he mat r ix r epr esent s one t est , and t her e is one r ow for each pair
of gr oups. The ent r ies in t he r ow indicat e t he means being compar ed, t he
est imat ed differ ence in means, and a confidence int er val for t he differ ence.
multcompare
2-201
For example, suppose one r ow cont ains t he following ent r ies.
2.0000 5.0000 1.9442 8.2206 14.4971
These number s indicat e t hat t he mean of gr oup 2 minus t he mean of gr oup 5 is
est imat ed t o be 8.2206, and a 95% confidence int er val for t he t r ue mean is
[1.9442, 14.4971].
In t his example t he confidence int er val does not cont ain 0.0, so t he differ ence
is significant at t he 0.05 level. If t he confidence int er val did cont ain 0.0, t he
differ ence would not be significant at t he 0.05 level.
The multcompare funct ion also displays a gr aph wit h each gr oup mean
r epr esent ed by a symbol and an int er val ar ound t he symbol. Two means ar e
significant ly differ ent if t heir int er vals ar e disjoint , and ar e not significant ly
differ ent if t heir int er vals over lap. You can use t he mouse t o select any gr oup,
and t he gr aph will highlight any ot her gr oups t hat ar e significant ly differ ent
fr om it .
c = multcompare(stats,alpha) det er mines t he confidence levels of t he
int er vals in t he c mat r ix and in t he figur e. The confidence level is
100*(1-alpha)%. The default value of alpha is 0.05.
c = multcompare(stats,alpha,'displayopt') enables t he gr aph display
when 'displayopt' is 'on' (default ) and suppr esses t he display when
'displayopt' is 'off'.
multcompare
2-202
c = multcompare(stats,alpha,'displayopt','ctype') specifies t he cr it ical
value t o use for t he mult iple compar ison, which can be any of t he following.
ctype Meaning
'hsd' Use Tukeys honest ly significant differ ence cr it er ion.
This is t he default , and it is based on t he St udent ized
r ange dist r ibut ion. It is opt imal for balanced one-way
ANOVA and similar pr ocedur es wit h equal sample sizes.
It has been pr oven t o be conser vat ive for one-way
ANOVA wit h differ ent sample sizes. Accor ding t o t he
unpr oven Tukey-Kr amer conject ur e, it is also accur at e
for pr oblems wher e t he quant it ies being compar ed ar e
cor r elat ed, as in analysis of covar iance wit h unbalanced
covar iat e values.
'lsd' Use Tukeys least significant differ ence pr ocedur e. This
pr ocedur e is a simple t -t est . It is r easonable if t he
pr eliminar y t est (say, t he one-way ANOVA F st at ist ic)
shows a significant differ ence. If it is used
uncondit ionally, it pr ovides no pr ot ect ion against
mult iple compar isons.
'bonferroni' Use cr it ical values fr om t he t dist r ibut ion, aft er a
Bonfer r oni adjust ment t o compensat e for mult iple
compar isons. This pr ocedur e is conser vat ive, but usually
less so t han t he Scheff pr ocedur e.
'dunn-sidak' Use cr it ical values fr om t he t dist r ibut ion, aft er an
adjust ment for mult iple compar isons t hat was pr oposed
by Dunn and pr oved accur at e by idk. This pr ocedur e is
similar t o, but less conser vat ive t han, t he Bonfer r oni
pr ocedur e.
'scheffe' Use cr it ical values fr om Scheffs S pr ocedur e, der ived
fr om t he F dist r ibut ion. This pr ocedur e pr ovides a
simult aneous confidence level for compar isons of all
linear combinat ions of t he means, and it is conser vat ive
for compar isons of simple differ ences of pair s.
multcompare
2-203
c = multcompare(stats,alpha,'displayopt','ctype','estimate')
specifies t he est imat e t o be compar ed. The allowable values of est imat e depend
on t he funct ion t hat was t he sour ce of t he stats st r uct ur e, accor ding t o t he
following t able.
c = multcompare(stats,alpha,'displayopt','ctype','estimate',dim)
specifies t he populat ion mar ginal means t o be compar ed. This ar gument is
used only if t he input stats st r uct ur e was cr eat ed by t he anovan funct ion. For
n-way ANOVA wit h n fact or s, you can specify dim as a scalar or a vect or of
int eger s bet ween 1 and n. The default value is 1.
For example, if dim = 1, t he est imat es t hat ar e compar ed ar e t he means for
each value of t he fir st gr ouping var iable, adjust ed by r emoving effect s of t he
ot her gr ouping var iables as if t he design wer e balanced. If dim = [1 3],
populat ion mar ginal means ar e comput ed for each combinat ion of t he fir st and
t hir d gr ouping var iables, r emoving effect s of t he second gr ouping var iable. If
you fit a singular model, some cell means may not be est imable and any
populat ion mar ginal means t hat depend on t hose cell means will have t he
value NaN.
Source Allowable Values of Estimate
'anova1' Ignor ed. Always compar e t he gr oup means.
'anova2' Eit her 'column' (t he default ) or 'row' t o compar e
column or r ow means.
'anovan' Ignor ed. Always compar e t he populat ion mar ginal
means as specified by t he dim ar gument .
'aoctool' Eit her 'slope', 'intercept', or 'pmm' t o compar e
slopes, int er cept s, or populat ion mar ginal means. If
t he analysis of covar iance model did not include
separ at e slopes, t hen 'slope' is not allowed. If it did
not include separ at e int er cept s, t hen no compar isons
ar e possible.
'friedman' Ignor ed. Always compar e aver age column r anks.
'kruskalwallis' Ignor ed. Always compar e aver age gr oup r anks.
multcompare
2-204
Populat ion mar ginal means ar e descr ibed by Milliken and J ohnson (1992) and
by Sear le, Speed, and Milliken (1980). The idea behind populat ion mar ginal
means is t o r emove any effect of an unbalanced design by fixing t he values of
t he fact or s specified by dim, and aver aging out t he effect s of ot her fact or s as if
each fact or combinat ion occur r ed t he same number of t imes. The definit ion of
populat ion mar ginal means does not depend on t he number of obser vat ions at
each fact or combinat ion. For designed exper iment s wher e t he number of
obser vat ions at each fact or combinat ion has no meaning, populat ion mar ginal
means can be easier t o int er pr et t han simple means ignor ing ot her fact or s. For
sur veys and ot her st udies wher e t he number of obser vat ions at each
combinat ion does have meaning, populat ion mar ginal means may be har der t o
int er pr et .
[c,m] = multcompare(...) r et ur ns an addit ional mat r ix m. The fir st column
of m cont ains t he est imat ed values of t he means (or what ever st at ist ics ar e
being compar ed) for each gr oup, and t he second column cont ains t heir st andar d
er r or s.
[c,m,h] = multcompare(...) r et ur ns a handle h t o t he compar ison gr aph.
Not e t hat t he t it le of t his gr aph cont ains inst r uct ions for int er act ing wit h t he
gr aph, and t he x-axis label cont ains infor mat ion about which means ar e
significant ly differ ent fr om t he select ed mean. If you plan t o use t his gr aph for
pr esent at ion, you may want t o omit t he t it le and t he x-axis label. You can
r emove t hem using int er act ive feat ur es of t he gr aph window, or you can use t he
following commands.
title('')
xlabel('')
Example Let s r evisit t he anova1 example t est ing t he mat er ial st r engt h in st r uct ur al
beams. Fr om t he anova1 out put we found significant evidence t hat t he t hr ee
t ypes of beams ar e not equivalent in st r engt h. Now we can det er mine wher e
t hose differ ences lie. Fir st we cr eat e t he dat a ar r ays and we per for m one-way
ANOVA.
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
alloy = {'st','st','st','st','st','st','st','st',...
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
multcompare
2-205
[p,a,s] = anova1(strength,alloy);
Among t he out put s is a st r uct ur e t hat we can use as input t o multcompare.
multcompare(s)
ans =
1.0000 2.0000 3.6064 7.0000 10.3936
1.0000 3.0000 1.6064 5.0000 8.3936
2.0000 3.0000 -5.6280 -2.0000 1.6280
The t hir d r ow of t he out put mat r ix shows t hat t he differ ences in st r engt h
bet ween t he t wo alloys is not significant . A 95% confidence int er val for t he
differ ence is [-5.6, 1.6], so we cannot r eject t he hypot hesis t hat t he t r ue
differ ence is zer o.
The fir st t wo r ows show t hat bot h compar isons involving t he fir st gr oup (st eel)
have confidence int er vals t hat do not include zer o. In ot her wor ds, t hose
differ ences ar e significant . The gr aph shows t he same infor mat ion.
See Also anova1, anova2, anovan, aoctool, friedman, kruskalwallis
74 76 78 80 82 84 86
al2
al1
st
Click on the group you want to test
2 groups have slopes significantly different from st
multcompare
2-206
References Hochber g, Y., and A. C. Tamhane, Multiple Comparison Procedures, 1987,
Wiley.
Milliken, G. A., and D. E. J ohnson, Analysis of Messy Data, Volume 1: Designed
Experiments, 1992, Chapman & Hall.
Sear le, S. R., F. M. Speed, and G. A. Milliken, Populat ion mar ginal means in
t he linear model: an alt er nat ive t o least squar es means, American
S tatistician, 1980, pp. 216-221.
mvnrnd
2-207
2mvnr nd
Purpose Random mat r ices fr om t he mult ivar iat e nor mal dist r ibut ion.
Syntax r = mvnrnd(mu,SIGMA,cases)
Description r = mvnrnd(mu,SIGMA,cases) r et ur ns a mat r ix of r andom number s chosen
fr om t he mult ivar iat e nor mal dist r ibut ion wit h mean vect or mu and covar iance
mat r ix SIGMA. cases specifies t he number of r ows in r.
SIGMA is a symmet r ic posit ive definit e mat r ix wit h size equal t o t he lengt h
of mu.
Example mu = [2 3];
sigma = [1 1.5; 1.5 3];
r = mvnrnd(mu,sigma,100);
plot(r(:,1),r(:,2),'+')
See Also normrnd
-1 0 1 2 3 4 5
-2
0
2
4
6
8
mvtrnd
2-208
2mvt r nd
Purpose Random mat r ices fr om t he mult ivar iat e t dist r ibut ion.
Syntax r = mvtrnd(C,df,cases)
Description r = mvtrnd(C,df,cases) r et ur ns a mat r ix of r andom number s chosen fr om
t he mult ivar iat e t dist r ibut ion, wher e C is a cor r elat ion mat r ix. df is t he
degr ees of fr eedom and is eit her a scalar or is a vect or wit h cases element s. If
p is t he number of columns in C, t hen t he out put r has cases r ows and p
columns.
Let t r epr esent a r ow of r. Then t he dist r ibut ion of t is t hat of a vect or having
a mult ivar iat e nor mal dist r ibut ion wit h mean 0, var iance 1, and covar iance
mat r ix C, divided by an independent chi-squar e r andom value having df
degr ees of fr eedom. The r ows of r ar e independent .
C must be a squar e, symmet r ic and posit ive definit e mat r ix. If it s diagonal
element s ar e not all 1 (t hat is, if C is a covar iance mat r ix r at her t han a
cor r elat ion mat r ix), mvtrnd comput es t he equivalent cor r elat ion mat r ix befor e
gener at ing t he r andom number s.
Example sigma = [1 0.8;0.8 1];
r = mvtrnd(sigma,3,100);
plot(r(:,1),r(:,2),'+')
See Also mvnrnd, trnd
4 2 0 2 4 6 8 10 12
4
2
0
2
4
6
8
10
nanmax
2-209
2nanmax
Purpose Maximum ignor ing NaNs.
Syntax m = nanmax(a)
[m,ndx] = nanmax(a)
m = nanmax(a,b)
Description m = nanmax(a) r et ur ns t he maximum wit h NaNs t r eat ed as missing. For
vect or s, nanmax(a) is t he lar gest non-NaN element in a. For mat r ices,
nanmax(A) is a r ow vect or cont aining t he maximum non-NaN element fr om each
column.
[m,ndx] = nanmax(a) also r et ur ns t he indices of t he maximum values in
vect or ndx.
m = nanmax(a,b) r et ur ns t he lar ger of a or b, which must mat ch in size.
Example m = magic(3);
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
[nmax,maxidx] = nanmax(m)
nmax =
4 5 6
maxidx =
3 2 1
See Also nanmin, nanmean, nanmedian, nanstd, nansum
nanmean
2-210
2nanmean
Purpose Mean ignor ing NaNs
Syntax y = nanmean(X)
Description y = nanmean(X) is t he aver age comput ed by t r eat ing NaNs as missing values.
For vect or s, nanmean(x) is t he mean of t he non-NaN element s of x. For mat r ices,
nanmean(X) is a r ow vect or cont aining t he mean of t he non-NaN element s in
each column.
Example m = magic(3);
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
nmean = nanmean(m)
nmean =
3.5000 3.0000 4.0000
See Also nanmin, nanmax, nanmedian, nanstd, nansum
nanmedian
2-211
2nanmedian
Purpose Median ignor ing NaNs
Syntax y = nanmedian(X)
Description y = nanmedian(X) is t he median comput ed by t r eat ing NaNs as missing values.
For vect or s, nanmedian(x) is t he median of t he non-NaN element s of x. For
mat r ices, nanmedian(X) is a r ow vect or cont aining t he median of t he non-NaN
element s in each column of X.
Example m = magic(4);
m([1 6 9 11]) = [NaN NaN NaN NaN]
m =
NaN 2 NaN 13
5 NaN 10 8
9 7 NaN 12
4 14 15 1
nmedian = nanmedian(m)
nmedian =
5.0000 7.0000 12.5000 10.0000
See Also nanmin, nanmax, nanmean, nanstd, nansum
nanmin
2-212
2nanmin
Purpose Minimum ignor ing NaNs
Syntax m = nanmin(a)
[m,ndx] = nanmin(a)
m = nanmin(a,b)
Description m = nanmin(a) is t he minimum comput ed by t r eat ing NaNs as missing values.
For vect or s, nanmin(a) is t he smallest non-NaN element in a. For mat r ices,
nanmin(A) is a r ow vect or cont aining t he minimum non-NaN element fr om each
column.
[m,ndx] = nanmin(a) also r et ur ns t he indices of t he minimum values in
vect or ndx.
m = nanmin(a,b) r et ur ns t he smaller of a or b, which must mat ch in size.
Example m = magic(3);
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
[nmin,minidx] = nanmin(m)
nmin =
3 1 2
minidx =
2 1 3
See Also nanmax, nanmean, nanmedian, nanstd, nansum
nanstd
2-213
2nanst d
Purpose St andar d deviat ion ignor ing NaNs.
Syntax y = nanstd(X)
Description y = nanstd(X) is t he st andar d deviat ion comput ed by t r eat ing NaNs as
missing values.
For vect or s, nanstd(x) is t he st andar d deviat ion of t he non-NaN element s of x.
For mat r ices, nanstd(X) is a r ow vect or cont aining t he st andar d deviat ions of
t he non-NaN element s in each column of X.
Example m = magic(3);
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
nstd = nanstd(m)
nstd =
0.7071 2.8284 2.8284
See Also nanmax, nanmin, nanmean, nanmedian, nansum
nansum
2-214
2nansum
Purpose Sum ignor ing NaNs.
Syntax y = nansum(X)
Description y = nansum(X) is t he sum comput ed by t r eat ing NaNs as missing values.
For vect or s, nansum(x) is t he sum of t he non-NaN element s of x. For mat r ices,
nansum(X) is a r ow vect or cont aining t he sum of t he non-NaN element s in each
column of X.
Example m = magic(3);
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
nsum = nansum(m)
nsum =
7 6 8
See Also nanmax, nanmin, nanmean, nanmedian, nanstd
nbincdf
2-215
2nbincdf
Purpose Negat ive binomial cumulat ive dist r ibut ion funct ion.
Syntax Y = nbincdf(X,R,P)
Description Y = nbincdf(X,R,P) comput es t he negat ive binomial cdf at each of t he values
in X using t he cor r esponding par amet er s in R and P. Vect or or mat r ix input s for
X, R, and P must have t he same size, which is also t he size of Y. A scalar input
for X, R, or P is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input s.
The negat ive binomial cdf is
The mot ivat ion for t he negat ive binomial is t he case of successive t r ials, each
having a const ant pr obabilit y P of success. What you want t o find out is how
many extra t r ials you must do t o obser ve a given number R of successes.
Example x = (0:15);
p = nbincdf(x,3,0.5);
stairs(x,p)
See Also cdf, nbininv, nbinpdf, nbinrnd, nbinstat
y F x r p , ( )
r i 1 +
i ,
_
i 0 =
x

p
r
q
i
I
0 1 , , ( )
i ( ) = =
0 5 10 15
0
0.2
0.4
0.6
0.8
1
nbininv
2-216
2nbininv
Purpose Inver se of t he negat ive binomial cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = nbininv(Y,R,P)
Description X = nbininv(Y,R,P) r et ur ns t he inver se of t he negat ive binomial cdf wit h
par amet er s R and P at t he cor r esponding pr obabilit ies in P. Since t he binomial
dist r ibut ion is discr et e, nbininv r et ur ns t he least int eger X such t hat t he
negat ive binomial cdf evaluat ed at X equals or exceeds Y. Vect or or mat r ix
input s for Y, R, and P must have t he same size, which is also t he size of X. A
scalar input for Y, R, or P is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input s.
The negat ive binomial cdf models consecut ive t r ials, each having a const ant
pr obabilit y P of success. The par amet er R is t he number of successes r equir ed
befor e st opping.
Example How many t imes would you need t o flip a fair coin t o have a 99% pr obabilit y of
having obser ved 10 heads?
flips = nbininv(0.99,10,0.5) + 10
flips =
33
Not e t hat you have t o flip at least 10 t imes t o get 10 heads. That is why t he
second t er m on t he r ight side of t he equals sign is a 10.
See Also icdf, nbincdf, nbinpdf, nbinrnd, nbinstat
nbinpdf
2-217
2nbinpdf
Purpose Negat ive binomial pr obabilit y densit y funct ion.
Syntax Y = nbinpdf(X,R,P)
Description Y = nbinpdf(X,R,P) r et ur ns t he negat ive binomial pdf at each of t he values
in X using t he cor r esponding par amet er s in R and P. Vect or or mat r ix input s for
X, R, and P must have t he same size, which is also t he size of Y. A scalar input
for X, R, or P is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input s. Not e t hat t he densit y funct ion is zer o unless t he values in X ar e
int eger s.
The negat ive binomial pdf is
The negat ive binomial pdf models consecut ive t r ials, each having a const ant
pr obabilit y P of success. The par amet er R is t he number of successes r equir ed
befor e st opping.
Example x = (0:10);
y = nbinpdf(x,3,0.5);
plot(x,y,'+')
set(gca,'Xlim',[-0.5,10.5])
See Also nbincdf, nbininv, nbinrnd, nbinstat, pdf
y f x r p , ( )
r x 1 +
x ,
_
p
r
q
x
I
0 1 , , ( )
x ( ) = =
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
nbinrnd
2-218
2nbinr nd
Purpose Random mat r ices fr om a negat ive binomial dist r ibut ion.
Syntax RND = nbinrnd(R,P)
RND = nbinrnd(R,P,m)
RND = nbinrnd(R,P,m,n)
Description RND = nbinrnd(R,P) is a mat r ix of r andom number s chosen fr om a negat ive
binomial dist r ibut ion wit h par amet er s R and P. Vect or or mat r ix input s for R
and P must have t he same size, which is also t he size of RND. A scalar input for
R or P is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her
input .
RND = nbinrnd(R,P,m) gener at es r andom number s wit h par amet er s R and P,
wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column dimensions of RND.
RND = nbinrnd(R,P,m,n) gener at es r andom number s wit h par amet er s R
and P, wher e scalar s m and n ar e t he r ow and column dimensions of RND.
The negat ive binomial dist r ibut ion models consecut ive t r ials, each having a
const ant pr obabilit y P of success. The par amet er R is t he number of successes
r equir ed befor e st opping.
Example Suppose you want t o simulat e a pr ocess t hat has a defect pr obabilit y of 0.01.
How many unit s might Qualit y Assur ance inspect befor e finding t hr ee
defect ive it ems?
r = nbinrnd(3,0.01,1,6) + 3
r =
496 142 420 396 851 178
See Also nbincdf, nbininv, nbinpdf, nbinstat
nbinstat
2-219
2nbinst at
Purpose Mean and var iance of t he negat ive binomial dist r ibut ion.
Syntax [M,V] = nbinstat(R,P)
Description [M,V] = nbinstat(R,P) r et ur ns t he mean and var iance of t he negat ive
binomial dist r ibut ion wit h par amet er s R and P. Vect or or mat r ix input s for R
and P must have t he same size, which is also t he size of M and V. A scalar input
for R or P is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input .
The mean of t he negat ive binomial dist r ibut ion wit h par amet er s r and p is rq/p,
wher e q = 1-p. The var iance is rq/p
2
.
Example p = 0.1:0.2:0.9;
r = 1:5;
[R,P] = meshgrid(r,p);
[M,V] = nbinstat(R,P)
M =
9.0000 18.0000 27.0000 36.0000 45.0000
2.3333 4.6667 7.0000 9.3333 11.6667
1.0000 2.0000 3.0000 4.0000 5.0000
0.4286 0.8571 1.2857 1.7143 2.1429
0.1111 0.2222 0.3333 0.4444 0.5556
V =
90.0000 180.0000 270.0000 360.0000 450.0000
7.7778 15.5556 23.3333 31.1111 38.8889
2.0000 4.0000 6.0000 8.0000 10.0000
0.6122 1.2245 1.8367 2.4490 3.0612
0.1235 0.2469 0.3704 0.4938 0.6173
See Also nbincdf, nbininv, nbinpdf, nbinrnd
ncfcdf
2-220
2ncfcdf
Purpose Noncent r al F cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = ncfcdf(X,NU1,NU2,DELTA)
Description P = ncfcdf(X,NU1,NU2,DELTA) comput es t he noncent r al F cdf at each of t he
values in X using t he cor r esponding numer at or degr ees of fr eedom in NU1,
denominat or degr ees of fr eedom in NU2, and posit ive noncent r alit y par amet er s
in DELTA. Vect or or mat r ix input s for X, NU1, NU2, and DELTA must have t he same
size, which is also t he size of P. A scalar input for X, NU1, NU2, or DELTA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
The noncent r al F cdf is
wher e I(x| a,b) is t he incomplet e bet a funct ion wit h par amet er s a and b.
Example Compar e t he noncent r al F cdf wit h = 10 t o t he F cdf wit h t he same number of
numer at or and denominat or degr ees of fr eedom (5 and 20 r espect ively).
x = (0.01:0.1:10.01)';
p1 = ncfcdf(x,5,20,10);
p = fcdf(x,5,20);
plot(x,p,'--',x,p1,'-')
F x
1

2
, , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,



_
I

1
x

2
+
1
x
-------------------------

1
2
------ j +

2
2
------ ,
,

_
j 0 =

=
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
1
ncfcdf
2-221
References J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 189200.
See Also cdf, ncfpdf, ncfinv, ncfrnd, ncfstat
ncfinv
2-222
2ncfinv
Purpose Inver se of t he noncent r al F cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = ncfinv(P,NU1,NU2,DELTA)
Description X = ncfinv(P,NU1,NU2,DELTA) r et ur ns t he inver se of t he noncent r al F cdf
wit h numer at or degr ees of fr eedom NU1, denominat or degr ees of fr eedom NU2,
and posit ive noncent r alit y par amet er DELTA for t he cor r esponding pr obabilit ies
in P. Vect or or mat r ix input s for P, NU1, NU2, and DELTA must have t he same
size, which is also t he size of X. A scalar input for P, NU1, NU2, or DELTA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
Example One hypot hesis t est for compar ing t wo sample var iances is t o t ake t heir r at io
and compar e it t o an F dist r ibut ion. If t he numer at or and denominat or degr ees
of fr eedom ar e 5 and 20 r espect ively, t hen you r eject t he hypot hesis t hat t he
fir st var iance is equal t o t he second var iance if t heir r at io is less t han t hat
comput ed below.
critical = finv(0.95,5,20)
critical =
2.7109
Suppose t he t r ut h is t hat t he fir st var iance is t wice as big as t he second
var iance. How likely is it t hat you would det ect t his differ ence?
prob = 1 - ncfcdf(critical,5,20,2)
prob =
0.1297
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 102105.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 189200.
See Also icdf, ncfcdf, ncfpdf, ncfrnd, ncfstat
ncfpdf
2-223
2ncfpdf
Purpose Noncent r al F pr obabilit y densit y funct ion.
Syntax Y = ncfpdf(X,NU1,NU2,DELTA)
Description Y = ncfpdf(X,NU1,NU2,DELTA) comput es t he noncent r al F pdf at each of t he
values in X using t he cor r esponding numer at or degr ees of fr eedom in NU1,
denominat or degr ees of fr eedom in NU2, and posit ive noncent r alit y par amet er s
in DELTA. Vect or or mat r ix input s for X, NU1, NU2, and DELTA must have t he same
size, which is also t he size of Y. A scalar input for P, NU1, NU2, or DELTA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
The F dist r ibut ion is a special case of t he noncent r al F wher e = 0. As
incr eases, t he dist r ibut ion flat t ens like t he plot in t he example.
Example Compar e t he noncent r al F pdf wit h = 10 t o t he F pdf wit h t he same number
of numer at or and denominat or degr ees of fr eedom (5 and 20 r espect ively).
x = (0.01:0.1:10.01)';
p1 = ncfpdf(x,5,20,10);
p = fpdf(x,5,20);
plot(x,p,'--',x,p1,'-')
References J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 189200.
See Also ncfcdf, ncfinv, ncfrnd, ncfstat, pdf
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
ncfrnd
2-224
2ncfr nd
Purpose Random mat r ices fr om t he noncent r al F dist r ibut ion.
Syntax R = ncfrnd(NU1,NU2,DELTA)
R = ncfrnd(NU1,NU2,DELTA,m)
R = ncfrnd(NU1,NU2,DELTA,m,n)
Description R = ncfrnd(NU1,NU2,DELTA) r et ur ns a mat r ix of r andom number s chosen fr om
t he noncent r al F dist r ibut ion wit h par amet er s NU1, NU2 and DELTA. Vect or or
mat r ix input s for NU1, NU2, and DELTA must have t he same size, which is also
t he size of R. A scalar input for NU1, NU2, or DELTA is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s.
R = ncfrnd(NU1,NU2,DELTA,m) r et ur ns a mat r ix of r andom number s wit h
par amet er s NU1, NU2, and DELTA, wher e m is a 1-by-2 vect or t hat cont ains t he
r ow and column dimensions of R.
R = ncfrnd(NU1,NU2,DELTA,m,n) gener at es r andom number s wit h
par amet er s NU1, NU2, and DELTA, wher e scalar s m and n ar e t he r ow and column
dimensions of R.
Example Comput e six r andom number s fr om a noncent r al F dist r ibut ion wit h 10
numer at or degr ees of fr eedom, 100 denominat or degr ees of fr eedom and a
noncent r alit y par amet er , , of 4.0. Compar e t his t o t he F dist r ibut ion wit h t he
same degr ees of fr eedom.
r = ncfrnd(10,100,4,1,6)
r =
2.5995 0.8824 0.8220 1.4485 1.4415 1.4864
r1 = frnd(10,100,1,6)
r1 =
0.9826 0.5911 1.0967 0.9681 2.0096 0.6598
References J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 189200.
See Also ncfcdf, ncfinv, ncfpdf, ncfstat
ncfstat
2-225
2ncfst at
Purpose Mean and var iance of t he noncent r al F dist r ibut ion.
Syntax [M,V] = ncfstat(NU1,NU2,DELTA)
Description [M,V] = ncfstat(NU1,NU2,DELTA) r et ur ns t he mean and var iance of t he
noncent r al F pdf wit h NU1 and NU2 degr ees of fr eedom and noncent r alit y
par amet er DELTA. Vect or or mat r ix input s for NU1, NU2, and DELTA must have
t he same size, which is also t he size of M and V. A scalar input for NU1, NU2, or
DELTA is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her
input .
The mean of t he noncent r al F dist r ibut ion wit h par amet er s
1
,
2
, and is

wher e
2
> 2.
The var iance is
wher e
2
> 4.
Example [m,v]= ncfstat(10,100,4)
m =
1.4286
v =
3.9200
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 7374.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 189200.
See Also ncfcdf, ncfinv, ncfpdf, ncfrnd

2
+
1
( )

1

2
2 ( )
--------------------------
2

2

1
------
,
_
2
+
1
( )
2
2 +
1
( )
2
2 ( ) +

2
2 ( )
2

2
4 ( )
------------------------------------------------------------------------- -
nctcdf
2-226
2nct cdf
Purpose Noncent r al T cumulat ive dist r ibut ion funct ion.
Syntax P = nctcdf(X,NU,DELTA)
Description P = nctcdf(X,NU,DELTA) comput es t he noncent r al T cdf at each of t he values
in X using t he cor r esponding degr ees of fr eedom in NU and noncent r alit y
par amet er s in DELTA. Vect or or mat r ix input s for X, NU, and DELTA must have
t he same size, which is also t he size of P. A scalar input for X, NU, or DELTA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
Example Compar e t he noncent r al T cdf wit h DELTA = 1 t o t he T cdf wit h t he same
number of degr ees of fr eedom (10).
x = (-5:0.1:5)';
p1 = nctcdf(x,10,1);
p = tcdf(x,10);
plot(x,p,'--',x,p1,'-')
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 147148.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 201219.
See Also cdf, nctcdf, nctinv, nctpdf, nctrnd, nctstat
-5 0 5
0
0.2
0.4
0.6
0.8
1
nctinv
2-227
2nct inv
Purpose Inver se of t he noncent r al T cumulat ive dist r ibut ion.
Syntax X = nctinv(P,NU,DELTA)
Description X = nctinv(P,NU,DELTA) r et ur ns t he inver se of t he noncent r al T cdf wit h NU
degr ees of fr eedom and noncent r alit y par amet er DELTA for t he cor r esponding
pr obabilit ies in P. Vect or or mat r ix input s for P, NU, and DELTA must have t he
same size, which is also t he size of X. A scalar input for P, NU, or DELTA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
Example x = nctinv([0.1 0.2],10,1)
x =
-0.2914 0.1618
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 147148.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 201219.
See Also icdf, nctcdf, nctpdf, nctrnd, nctstat
nctpdf
2-228
2nct pdf
Purpose Noncent r al T pr obabilit y densit y funct ion (pdf).
Syntax Y = nctpdf(X,V,DELTA)
Description Y = nctpdf(X,V,DELTA) comput es t he noncent r al T pdf at each of t he values
in X using t he cor r esponding degr ees of fr eedom in V and noncent r alit y
par amet er s in DELTA. Vect or or mat r ix input s for X, V, and DELTA must have t he
same size, which is also t he size of Y. A scalar input for X, V, or DELTA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
Example Compar e t he noncent r al T pdf wit h DELTA = 1 t o t he T pdf wit h t he same
number of degr ees of fr eedom (10).
x = (-5:0.1:5)';
p1 = nctpdf(x,10,1);
p = tpdf(x,10);
plot(x,p,'--',x,p1,'-')
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 147148.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 201219.
See Also nctcdf, nctinv, nctrnd, nctstat, pdf
-5 0 5
0
0.1
0.2
0.3
0.4
nctrnd
2-229
2nct r nd
Purpose Random mat r ices fr om noncent r al T dist r ibut ion.
Syntax R = nctrnd(V,DELTA)
R = nctrnd(V,DELTA,m)
R = nctrnd(V,DELTA,m,n)
Description R = nctrnd(V,DELTA) r et ur ns a mat r ix of r andom number s chosen fr om t he
noncent r al T dist r ibut ion wit h par amet er s V and DELTA. Vect or or mat r ix
input s for V and DELTA must have t he same size, which is also t he size of R. A
scalar input for V or DELTA is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input .
R = nctrnd(V,DELTA,m) r et ur ns a mat r ix of r andom number s wit h
par amet er s V and DELTA, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = nctrnd(V,DELTA,m,n) gener at es r andom number s wit h par amet er s V and
DELTA, wher e scalar s m and n ar e t he r ow and column dimensions of R.
Example nctrnd(10,1,5,1)
ans =
1.6576
1.0617
1.4491
0.2930
3.6297
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 147148.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 201219.
See Also nctcdf, nctinv, nctpdf, nctstat
nctstat
2-230
2nct st at
Purpose Mean and var iance for t he noncent r al t dist r ibut ion.
Syntax [M,V] = nctstat(NU,DELTA)
Description [M,V] = nctstat(NU,DELTA) r et ur ns t he mean and var iance of t he
noncent r al t pdf wit h NU degr ees of fr eedom and noncent r alit y par amet er
DELTA. Vect or or mat r ix input s for NU and DELTA must have t he same size, which
is also t he size of M and V. A scalar input for NU or DELTA is expanded t o a
const ant mat r ix wit h t he same dimensions as t he ot her input .
The mean of t he noncent r al t dist r ibut ion wit h par amet er s and is

wher e > 1.
The var iance is
wher e > 2.
Example [m,v] = nctstat(10,1)
m =
1.0837
v =
1.3255
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 147148.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 201219.
See Also nctcdf, nctinv, nctpdf, nctrnd
2 ( )
1 2
1 ( ) 2 ( )
2 ( )
-------------------------------------------------------------

2 ( )
----------------- 1
2
+ ( )

2
---
2 1 ( ) 2 ( )
2 ( )
---------------------------------
2
ncx2cdf
2-231
2ncx2cdf
Purpose Noncent r al chi-squar e cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = ncx2cdf(X,V,DELTA)
Description P = ncx2cdf(X,V,DELTA) comput es t he noncent r al chi-squar e cdf at each of
t he values in X using t he cor r esponding degr ees of fr eedom in V and posit ive
noncent r alit y par amet er s in DELTA. Vect or or mat r ix input s for X, V, and DELTA
must have t he same size, which is also t he size of P. A scalar input for X, V, or
DELTA is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her
input s.
Some t ext s r efer t o t his dist r ibut ion as t he gener alized Rayleigh,
Rayleigh-Rice, or Rice dist r ibut ion.
The noncent r al chi-squar e cdf is
Example x = (0:0.1:10)';
p1 = ncx2cdf(x,4,2);
p = chi2cdf(x,4);
plot(x,p,'--',x,p1,'-')
F x , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,



_
Pr
2j +
2
x [ ]
j 0 =

=
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
ncx2cdf
2-232
References J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 130148.
See Also cdf, ncx2inv, ncx2pdf, ncx2rnd, ncx2stat
ncx2inv
2-233
2ncx2inv
Purpose Inver se of t he noncent r al chi-squar e cdf.
Syntax X = ncx2inv(P,V,DELTA)
Description X = ncx2inv(P,V,DELTA) r et ur ns t he inver se of t he noncent r al chi-squar e cdf
wit h par amet er s V and DELTA at t he cor r esponding pr obabilit ies in P. Vect or or
mat r ix input s for P, V, and DELTA must have t he same size, which is also t he size
of X. A scalar input for P, V, or DELTA is expanded t o a const ant mat r ix wit h t he
same dimensions as t he ot her input s.
Algorithm ncx2inv uses Newt ons met hod t o conver ge t o t he solut ion.
Example ncx2inv([0.01 0.05 0.1],4,2)
ans =
0.4858 1.1498 1.7066
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 5052.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 130148.
See Also icdf, ncx2cdf, ncx2pdf, ncx2rnd, ncx2stat
ncx2pdf
2-234
2ncx2pdf
Purpose Noncent r al chi-squar e pr obabilit y densit y funct ion (pdf).
Syntax Y = ncx2pdf(X,V,DELTA)
Description Y = ncx2pdf(X,V,DELTA) comput es t he noncent r al chi-squar e pdf at each of
t he values in X using t he cor r esponding degr ees of fr eedom in V and posit ive
noncent r alit y par amet er s in DELTA. Vect or or mat r ix input s for X, V, and DELTA
must have t he same size, which is also t he size of Y. A scalar input for X, V, or
DELTA is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her
input s.
Some t ext s r efer t o t his dist r ibut ion as t he gener alized Rayleigh,
Rayleigh-Rice, or Rice dist r ibut ion.
Example As t he noncent r alit y par amet er incr eases, t he dist r ibut ion flat t ens as shown
in t he plot .
x = (0:0.1:10)';
p1 = ncx2pdf(x,4,2);
p = chi2pdf(x,4);
plot(x,p,'--',x,p1,'-')
References J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 130148.
See Also ncx2cdf, ncx2inv, ncx2rnd, ncx2stat, pdf
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
ncx2rnd
2-235
2ncx2r nd
Purpose Random mat r ices fr om t he noncent r al chi-squar e dist r ibut ion.
Syntax R = ncx2rnd(V,DELTA)
R = ncx2rnd(V,DELTA,m)
R = ncx2rnd(V,DELTA,m,n)
Description R = ncx2rnd(V,DELTA) r et ur ns a mat r ix of r andom number s chosen fr om t he
non-cent r al chi-squar e dist r ibut ion wit h par amet er s V and DELTA. Vect or or
mat r ix input s for V and DELTA must have t he same size, which is also t he size
of R. A scalar input for V or DELTA is expanded t o a const ant mat r ix wit h t he
same dimensions as t he ot her input .
R = ncx2rnd(V,DELTA,m) r et ur ns a mat r ix of r andom number s wit h
par amet er s V and DELTA, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = ncx2rnd(V,DELTA,m,n) gener at es r andom number s wit h par amet er s V and
DELTA, wher e scalar s m and n ar e t he r ow and column dimensions of R.
Example ncx2rnd(4,2,6,3)
ans =
6.8552 5.9650 11.2961
5.2631 4.2640 5.9495
9.1939 6.7162 3.8315
10.3100 4.4828 7.1653
2.1142 1.9826 4.6400
3.8852 5.3999 0.9282
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 5052.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 130148.
See Also ncx2cdf, ncx2inv, ncx2pdf, ncx2stat
ncx2stat
2-236
2ncx2st at
Purpose Mean and var iance for t he noncent r al chi-squar e dist r ibut ion.
Syntax [M,V] = ncx2stat(NU,DELTA)
Description [M,V] = ncx2stat(NU,DELTA) r et ur ns t he mean and var iance of t he noncent r al
chi-squar e pdf wit h NU degr ees of fr eedom and noncent r alit y par amet er DELTA.
Vect or or mat r ix input s for NU and DELTA must have t he same size, which is also
t he size of M and V. A scalar input for NU or DELTA is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input .
The mean of t he noncent r al chi-squar e dist r ibut ion wit h par amet er s and is
, and t he var iance is .
Example [m,v] = ncx2stat(4,2)
m =
6
v =
16
References Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, J ohn Wiley and Sons, 1993. p. 5052.
J ohnson, N., and S. Kot z, Distributions in S tatistics: Continuous Univariate
Distributions-2, J ohn Wiley and Sons, 1970. pp. 130148.
See Also ncx2cdf, ncx2inv, ncx2pdf, ncx2rnd
+ 2 2 + ( )
nlinfit
2-237
2nlinfit
Purpose Nonlinear least -squar es dat a fit t ing by t he Gauss-Newt on met hod.
Syntax [beta,r,J] = nlinfit(X,y,FUN,beta0)
Description beta = nlinfit(X,y,FUN,beta0) r et ur ns t he coefficient s of t he nonlinear
funct ion descr ibed in FUN. FUN can be a funct ion handle specified using @, an
inline funct ion, or a quot ed t ext st r ing cont aining t he name of a funct ion.
The funct ion FUN has t he for m . It r et ur ns t he pr edict ed values of y
given init ial par amet er est imat es and t he independent var iable X.
The mat r ix X has one column per independent var iable. The r esponse, y, is a
column vect or wit h t he same number of r ows as X.
[beta,r,J] = nlinfit(X,y,FUN,beta0) r et ur ns t he fit t ed coefficient s, beta,
t he r esiduals, r, and t he J acobian, J, for use wit h nlintool t o pr oduce er r or
est imat es on pr edict ions.
Example load reaction
betafit = nlinfit(reactants,rate,@hougen,beta)
betafit =
1.2526
0.0628
0.0400
0.1124
1.1914
See Also nlintool
y f X , ( ) =
nlintool
2-238
2nlint ool
Purpose Fit s a nonlinear equat ion t o dat a and displays an int er act ive gr aph.
Syntax nlintool(x,y,FUN,beta0)
nlintool(x,y,FUN,beta0,alpha)
nlintool(x,y,FUN,beta0,alpha,'xname','yname')
Description nlintool(x,y,FUN,beta0) is a pr edict ion plot t hat pr ovides a nonlinear cur ve
fit t o (x,y) dat a. It plot s a 95% global confidence int er val for pr edict ions as t wo
r ed cur ves. beta0 is a vect or cont aining init ial guesses for t he par amet er s.
nlintool(x,y,FUN,beta0,alpha) plot s a 100(1-alpha)% confidence
int er val for pr edict ions.
nlintool displays a vect or of plot s, one for each column of t he mat r ix of
input s, x. The r esponse var iable, y, is a column vect or t hat mat ches t he number
of r ows in x.
The default value for alpha is 0.05, which pr oduces 95% confidence int er vals.
nlintool(x,y,FUN,beta0,alpha,'xname','yname') labels t he plot using t he
st r ing mat r ix, 'xname for t he x var iables and t he st r ing 'yname for t he y
var iable.
You can dr ag t he dot t ed whit e r efer ence line and wat ch t he pr edict ed values
updat e simult aneously. Alt er nat ively, you can get a specific pr edict ion by
t yping t he value for x int o an edit able t ext field. Use t he pop-up menu labeled
Export t o move specified var iables t o t he base wor kspace. You can change t he
t ype of confidence bounds using t he Bounds menu.
Example See Nonlinear Regr ession Models on page 1-100.
See Also nlinfit, rstool
nlparci
2-239
2nlpar ci
Purpose Confidence int er vals on est imat es of par amet er s in nonlinear models.
Syntax ci = nlparci(beta,r,J)
Description nlparci(beta,r,J) r et ur ns t he 95% confidence int er val ci on t he nonlinear
least squar es par amet er est imat es beta, given t he r esiduals r and t he
J acobian mat r ix J at t he solut ion. The confidence int er val calculat ion is valid
for syst ems wher e t he number of r ows of J exceeds t he lengt h of beta.
nlparci uses t he out put s of nlinfit for it s input s.
Example Cont inuing t he example fr om nlinfit:
load reaction
[beta,resids,J] = nlinfit(reactants,rate,'hougen',beta);
ci = nlparci(beta,resids,J)
ci =
-1.0798 3.3445
-0.0524 0.1689
-0.0437 0.1145
-0.0891 0.2941
-1.1719 3.7321
See Also nlinfit, nlintool, nlpredci
nlpredci
2-240
2nlpr edci
Purpose Confidence int er vals on pr edict ions of nonlinear models.
Syntax ypred = nlpredci(FUN,inputs,beta,r,J)
[ypred,delta] = nlpredci(FUN,inputs,beta,r,J)
ypred = nlpredci(FUN,inputs,beta,r,J,alpha,'simopt','predopt')
Description ypred = nlpredci(FUN,inputs,beta,r,J) r et ur ns t he pr edict ed r esponses,
ypred, given t he fit t ed par amet er s beta, r esiduals r, and t he J acobian
mat r ix J. inputs is a mat r ix of values of t he independent var iables in t he
nonlinear funct ion.
[ypred,delta] = nlpredci(FUN,inputs,beta,r,J) also r et ur ns t he
half-widt h, delta, of confidence int er vals for t he nonlinear least squar es
pr edict ions. The confidence int er val calculat ion is valid for syst ems wher e t he
lengt h of r exceeds t he lengt h of beta and J is of full column r ank. The int er val
[ypred-delta,ypred+delta] is a 95% non-simult aneous confidence int er val
for t he t r ue value of t he funct ion at t he specified input values.
ypred = nlpredci(FUN,inputs,beta,r,J,alpha,'simopt','predopt')
cont r ols t he t ype of confidence int er vals. The confidence level is
100(1-alpha)%. 'simopt' can be 'on' for simult aneous int er vals or 'off' (t he
default ) for non-simult aneous int er vals. 'predopt' can be 'curve' (t he
default ) for confidence int er vals for t he funct ion value at t he input s, or
'observation' for confidence int er vals for a new r esponse value.
nlpredci uses t he out put s of nlinfit for it s input s.
Example Cont inuing t he example fr om nlinfit, we can det er mine t he pr edict ed
funct ion value at [100 300 80] and t he half-widt h of a confidence int er val for
it .
load reaction
[beta,resids,J] = nlinfit(reactants,rate,@hougen,beta);
[ypred,delta] = nlpredci(@hougen,[100 300 80],beta,resids,J)
ypred =
13
delta =
1.4277
nlpredci
2-241
See Also nlinfit, nlintool, nlparci
normcdf
2-242
2nor mcdf
Purpose Nor mal cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = normcdf(X,MU,SIGMA)
Description normcdf(X,MU,SIGMA) comput es t he nor mal cdf at each of t he values in X using
t he cor r esponding par amet er s in MU and SIGMA. Vect or or mat r ix input s for X,
MU, and SIGMA must all have t he same size. A scalar input is expanded t o a
const ant mat r ix wit h t he same dimensions as t he ot her input s. The par amet er s
in SIGMA must be posit ive.
The nor mal cdf is
The r esult , p, is t he pr obabilit y t hat a single obser vat ion fr om a nor mal
dist r ibut ion wit h par amet er s and will fall in t he int er val (- x].
The standard normal dist r ibut ion has = 0 and = 1.
Examples What is t he pr obabilit y t hat an obser vat ion fr om a st andar d nor mal
dist r ibut ion will fall on t he int er val [-1 1]?
p = normcdf([-1 1]);
p(2) - p(1)
ans =
0.6827
Mor e gener ally, about 68% of t he obser vat ions fr om a nor mal dist r ibut ion fall
wit hin one st andar d deviat ion, , of t he mean, .
See Also cdf, normfit, norminv, normpdf, normplot, normrnd, normspec, normstat
p F x , ( )
1
2
---------------
e
t ( )
2
2
2
---------------------
t d

x

= =
normfit
2-243
2nor mfit
Purpose Par amet er est imat es and confidence int er vals for nor mal dat a.
Syntax [muhat,sigmahat,muci,sigmaci] = normfit(X)
[muhat,sigmahat,muci,sigmaci] = normfit(X,alpha)
Description [muhat,sigmahat,muci,sigmaci] = normfit(X) r et ur ns est imat es muhat
and sigmahat of t he nor mal dist r ibut ion par amet er s and , given t he mat r ix
of dat a X. muci and sigmaci ar e 95% confidence int er vals and have t wo r ows
and as many columns as mat r ix X. The t op r ow is t he lower bound of t he
confidence int er val and t he bot t om r ow is t he upper bound.
[muhat,sigmahat,muci,sigmaci] = normfit(X,alpha) gives est imat es and
100(1-alpha)% confidence int er vals. For example, alpha = 0.01 gives 99%
confidence int er vals.
Example In t his example t he dat a is a t wo-column r andom nor mal mat r ix. Bot h columns
have = 10 and = 2. Not e t hat t he confidence int er vals below cont ain t he
t r ue values.
r = normrnd(10,2,100,2);
[mu,sigma,muci,sigmaci] = normfit(r)
mu =
10.1455 10.0527
sigma =
1.9072 2.1256
muci =
9.7652 9.6288
10.5258 10.4766
sigmaci =
1.6745 1.8663
2.2155 2.4693
See Also normcdf, norminv, normpdf, normplot, normrnd, normspec, normstat, betafit,
binofit, expfit, gamfit, poissfit, unifit, weibfit
norminv
2-244
2nor minv
Purpose Inver se of t he nor mal cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = norminv(P,MU,SIGMA)
Description X = norminv(P,MU,SIGMA) comput es t he inver se of t he nor mal cdf wit h
par amet er s MU and SIGMA at t he cor r esponding pr obabilit ies in P. Vect or or
mat r ix input s for P, MU, and SIGMA must all have t he same size. A scalar input
is expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
The par amet er s in SIGMA must be posit ive, and t he values in P must lie on t he
int er val [0 1].
We define t he nor mal inver se funct ion in t er ms of t he nor mal cdf as
wher e
The r esult , x, is t he solut ion of t he int egr al equat ion above wher e you supply
t he desir ed pr obabilit y, p.
Examples Find an int er val t hat cont ains 95% of t he values fr om a st andar d nor mal
dist r ibut ion.
x = norminv([0.025 0.975],0,1)
x =
-1.9600 1.9600
Not e t he int er val x is not t he only such int er val, but it is t he shor t est .
xl = norminv([0.01 0.96],0,1)
xl =
-2.3263 1.7507
The int er val xl also cont ains 95% of t he pr obabilit y, but it is longer t han x.
See Also icdf, normfit, normfit, normpdf, normplot, normrnd, normspec, normstat
x F
1
p , ( ) x:F x , ( ) p = { } = =
p F x , ( )
1
2
---------------
e
t ( )
2
2
2
---------------------
t d

x

= =
normpdf
2-245
2nor mpdf
Purpose Nor mal pr obabilit y densit y funct ion (pdf).
Syntax Y = normpdf(X,MU,SIGMA)
Description normpdf(X,MU,SIGMA) comput es t he nor mal pdf at each of t he values in X using
t he cor r esponding par amet er s in MU and SIGMA. Vect or or mat r ix input s for X,
MU, and SIGMA must all have t he same size. A scalar input is expanded t o a
const ant mat r ix wit h t he same dimensions as t he ot her input s. The par amet er s
in SIGMA must be posit ive.
The nor mal pdf is
The likelihood function is t he pdf viewed as a funct ion of t he par amet er s.
Maximum likelihood est imat or s (MLEs) ar e t he values of t he par amet er s t hat
maximize t he likelihood funct ion for a fixed value of x.
The standard normal dist r ibut ion has = 0 and = 1.
If x is st andar d nor mal, t hen x + is also nor mal wit h mean and st andar d
deviat ion . Conver sely, if y is nor mal wit h mean and st andar d deviat ion ,
t hen x = (y-) / is st andar d nor mal.
Examples mu = [0:0.1:2];
[y i] = max(normpdf(1.5,mu,1));
MLE = mu(i)
MLE =
1.5000
See Also normfit, normfit, norminv, normplot, normrnd, normspec, normstat, pdf
y f x , ( )
1
2
---------------
e
x ( )
2
2
2
----------------------
= =
normplot
2-246
2nor mplot
Purpose Nor mal pr obabilit y plot for gr aphical nor malit y t est ing.
Syntax normplot(X)
h = normplot(X)
Description normplot(X) displays a nor mal pr obabilit y plot of t he dat a in X. For mat r ix X,
normplot displays a line for each column of X.
The plot has t he sample dat a displayed wit h t he plot symbol '+'.
Super imposed on t he plot is a line joining t he fir st and t hir d quar t iles of each
column of X (a r obust linear fit of t he sample or der st at ist ics.) This line is
ext r apolat ed out t o t he ends of t he sample t o help evaluat e t he linear it y of t he
dat a.
If t he dat a does come fr om a nor mal dist r ibut ion, t he plot will appear linear .
Ot her pr obabilit y densit y funct ions will int r oduce cur vat ur e in t he plot .
h = normplot(X) r et ur ns a handle t o t he plot t ed lines.
Examples Gener at e a nor mal sample and a nor mal pr obabilit y plot of t he dat a.
x = normrnd(0,1,50,1);
h = normplot(x);
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Normal Probability Plot
normplot
2-247
The plot is linear , indicat ing t hat you can model t he sample by a nor mal
dist r ibut ion.
See Also cdfplot, hist, normfit, normfit, norminv, normpdf, normrnd, normspec,
normstat
normrnd
2-248
2nor mr nd
Purpose Random number s fr om t he nor mal dist r ibut ion.
Syntax R = normrnd(MU,SIGMA)
R = normrnd(MU,SIGMA,m)
R = normrnd(MU,SIGMA,m,n)
Description R = normrnd(MU,SIGMA) gener at es nor mal r andom number s wit h mean MU
and st andar d deviat ion SIGMA. Vect or or mat r ix input s for MU and SIGMA must
have t he same size, which is also t he size of R. A scalar input for MU or SIGMA is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input .
R = normrnd(MU,SIGMA,m) gener at es nor mal r andom number s wit h
par amet er s MU and SIGMA, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = normrnd(MU,SIGMA,m,n) gener at es nor mal r andom number s wit h
par amet er s MU and SIGMA, wher e scalar s m and n ar e t he r ow and column
dimensions of R.
Examples n1 = normrnd(1:6,1./(1:6))
n1 =
2.1650 2.3134 3.0250 4.0879 4.8607 6.2827
n2 = normrnd(0,1,[1 5])
n2 =
0.0591 1.7971 0.2641 0.8717 -1.4462
n3 = normrnd([1 2 3;4 5 6],0.1,2,3)
n3 =
0.9299 1.9361 2.9640
4.1246 5.0577 5.9864
See Also normfit, normfit, norminv, normpdf, normplot, normspec, normstat
normspec
2-249
2nor mspec
Purpose Plot nor mal densit y bet ween specificat ion limit s.
Syntax p = normspec(specs,mu,sigma)
[p,h] = normspec(specs,mu,sigma)
Description p = normspec(specs,mu,sigma) plot s t he nor mal densit y bet ween a lower
and upper limit defined by t he t wo element s of t he vect or specs, wher e mu and
sigma ar e t he par amet er s of t he plot t ed nor mal dist r ibut ion.
[p,h] = normspec(specs,mu,sigma) r et ur ns t he pr obabilit y p of a sample
falling bet ween t he lower and upper limit s. h is a handle t o t he line object s.
If specs(1) is -Inf, t her e is no lower limit , and similar ly if specs(2) = Inf,
t her e is no upper limit .
Example Suppose a cer eal manufact ur er pr oduces 10 ounce boxes of cor n flakes.
Var iabilit y in t he pr ocess of filling each box wit h flakes causes a 1.25 ounce
st andar d deviat ion in t he t r ue weight of t he cer eal in each box. The aver age box
of cer eal has 11.5 ounces of flakes. What per cent age of boxes will have less t han
10 ounces?
normspec([10 Inf],11.5,1.25)
See Also capaplot, disttool, histfit, normfit, normfit, norminv, normpdf, normplot,
normrnd, normstat
6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
Critical Value
D
e
n
s
i
t
y
Probability Between Limits is 0.8849
normstat
2-250
2nor mst at
Purpose Mean and var iance for t he nor mal dist r ibut ion.
Syntax [M,V] = normstat(MU,SIGMA)
Description [M,V] = normstat(MU,SIGMA) r et ur ns t he mean and var iance for t he nor mal
dist r ibut ion wit h par amet er s MU and SIGMA. Vect or or mat r ix input s for MU and
SIGMA must have t he same size, which is also t he size of M and V. A scalar input
for MU or SIGMA is expanded t o a const ant mat r ix wit h t he same dimensions as
t he ot her input .
The mean of t he nor mal dist r ibut ion wit h par amet er s and is , and t he
var iance is
2
.
Examples n = 1:5;
[m,v] = normstat(n'n,n'*n)
[m,v] = normstat(n'*n,n'*n)
m =
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
v =
1 4 9 16 25
4 16 36 64 100
9 36 81 144 225
16 64 144 256 400
25 100 225 400 625
See Also normfit, normfit, norminv, normpdf, normplot, normrnd, normspec
pareto
2-251
2par et o
Purpose Par et o char t s for St at ist ical Pr ocess Cont r ol.
Syntax pareto(y)
pareto(y,names)
h = pareto(...)
Description pareto(y,names) displays a Par et o char t wher e t he values in t he vect or y ar e
dr awn as bar s in descending or der . Each bar is labeled wit h t he associat ed
value in t he st r ing mat r ix names. pareto(y) labels each bar wit h t he index of
t he cor r esponding element in y.
The line above t he bar s shows t he cumulat ive per cent age.
pareto(y,names) labels each bar wit h t he r ow of t he st r ing mat r ix names t hat
cor r esponds t o t he plot t ed element of y.
h = pareto(...) r et ur ns a combinat ion of pat ch and line handles.
Example Cr eat e a Par et o char t fr om dat a measur ing t he number of manufact ur ed par t s
r eject ed for var ious t ypes of defect s.
defects = ['pits';'cracks';'holes';'dents'];
quantity = [5 3 19 25];
pareto(quantity,defects)
See Also bar, capaplot, ewmaplot, hist, histfit, schart, xbarplot
dents holes pits cracks
0
20
40
60
pcacov
2-252
2pcacov
Purpose Pr incipal Component s Analysis (PCA) using t he covar iance mat r ix.
Syntax pc = pcacov(X)
[pc,latent,explained] = pcacov(X)
Description [pc,latent,explained] = pcacov(X) t akes t he covar iance mat r ix X and
r et ur ns t he pr incipal component s in pc, t he eigenvalues of t he covar iance
mat r ix of X in latent, and t he per cent age of t he t ot al var iance in t he
obser vat ions explained by each eigenvect or in explained.
Example load hald
covx = cov(ingredients);
[pc,variances,explained] = pcacov(covx)
pc =
0.0678 -0.6460 0.5673 -0.5062
0.6785 -0.0200 -0.5440 -0.4933
-0.0290 0.7553 0.4036 -0.5156
-0.7309 -0.1085 -0.4684 -0.4844
variances =
517.7969
67.4964
12.4054
0.2372
explained =
86.5974
11.2882
2.0747
0.0397
References J ackson, J . E., A Users Guide to Principal Components, J ohn Wiley and Sons,
Inc. 1991. pp. 125.
See Also barttest, pcares, princomp
pcares
2-253
2pcar es
Purpose Residuals fr om a Pr incipal Component s Analysis.
Syntax residuals = pcares(X,ndim)
Description pcares(X,ndim) r et ur ns t he residuals obt ained by r et aining ndim pr incipal
component s of X. Not e t hat ndim is a scalar and must be less t han t he number
of columns in X. Use t he dat a mat r ix, not t he covar iance mat r ix, wit h t his
funct ion.
Example This example shows t he dr op in t he r esiduals fr om t he fir st r ow of t he Hald
dat a as t he number of component dimensions incr ease fr om one t o t hr ee.
load hald
r1 = pcares(ingredients,1);
r2 = pcares(ingredients,2);
r3 = pcares(ingredients,3);
r11 = r1(1,:)
r11 =
2.0350 2.8304 -6.8378 3.0879
r21 = r2(1,:)
r21 =
-2.4037 2.6930 -1.6482 2.3425
r31 = r3(1,:)
r31 =
0.2008 0.1957 0.2045 0.1921
Reference J ackson, J . E., A Users Guide to Principal Components, J ohn Wiley and Sons,
Inc. 1991. pp. 125.
See Also barttest, pcacov, princomp
pdf
2-254
2pdf
Purpose Pr obabilit y densit y funct ion (pdf) for a specified dist r ibut ion.
Syntax Y = pdf('name',X,A1,A2,A3)
Description pdf('name',X,A1,A2,A3) r et ur ns a mat r ix of densit ies, wher e name' is a
st r ing cont aining t he name of t he dist r ibut ion. X is a mat r ix of values, and A1,
A2, and A3 ar e mat r ices of dist r ibut ion par amet er s. Depending on t he
dist r ibut ion, some of t he par amet er s may not be necessar y.
Vect or or mat r ix input s for X, A1, A2, and A3 must all have t he same size. A
scalar input is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input s.
pdf is a ut ilit y r out ine allowing access t o all t he pdfs in t he St at ist ics Toolbox
using t he name of t he dist r ibut ion as a par amet er . See Over view of t he
Dist r ibut ions on page 1-12 for t he list of available dist r ibut ions.
Examples p = pdf('Normal',-2:2,0,1)
p =
0.0540 0.2420 0.3989 0.2420 0.0540
p = pdf('Poisson',0:4,1:5)
p =
0.3679 0.2707 0.2240 0.1954 0.1755
See Also betapdf, binopdf, cdf, chi2pdf, exppdf, fpdf, gampdf, geopdf, hygepdf,
lognpdf, nbinpdf, ncfpdf, nctpdf, ncx2pdf, normpdf, poisspdf, raylpdf,
tpdf, unidpdf, unifpdf, weibpdf
pdist
2-255
2pdist
Purpose Pair wise dist ance bet ween obser vat ions.
Syntax Y = pdist(X)
Y = pdist(X,'metric')
Y = pdist(X,'minkowski',p)
Description Y = pdist(X) comput es t he Euclidean dist ance bet ween pair s of object s in
m-by-n mat r ix X, which is t r eat ed as m vect or s of size n. For a dat aset made up
of m object s, t her e ar e pair s.
The out put , Y, is a vect or of lengt h , cont aining t he dist ance
infor mat ion. The dist ances ar e ar r anged in t he or der (1,2), (1,3), ..., (1,m),
(2,3), ..., (2,m), ..., ..., (m-1,m). Y is also commonly known as a similar it y mat r ix
or dissimilar it y mat r ix.
To save space and comput at ion t ime, Y is for mat t ed as a vect or . However , you
can conver t t his vect or int o a squar e mat r ix using t he squareform funct ion so
t hat element i,j in t he mat r ix cor r esponds t o t he dist ance bet ween object s i and
j in t he or iginal dat aset .
Y = pdist(X,'metric') comput es t he dist ance bet ween object s in t he dat a
mat r ix, X, using t he met hod specified by 'metric', wher e 'metric' can be any
of t he following char act er st r ings t hat ident ify ways t o comput e t he dist ance.
Y = pdist(X,'minkowski',p) comput es t he dist ance bet ween object s in t he
dat a mat r ix, X, using t he Minkowski met r ic. p is t he exponent used in t he
Minkowski comput at ion which, by default , is 2.
String Meaning
'Euclid' Euclidean dist ance (default )
'SEuclid' St andar dized Euclidean dist ance
'Mahal' Mahalanobis dist ance
'CityBlock' Cit y Block met r ic
'Minkowski' Minkowski met r ic
m 1 ( ) m 2
m 1 ( ) m 2
pdist
2-256
M a thema tica l Definitions of M ethods
Given an m-by-n dat a mat r ix X, which is t r eat ed as m (1-by-n) r ow vect or s x
1
,
x
2
, ..., x
m
, t he var ious dist ances bet ween t he vect or x
r
and x
s
ar e defined as
follows:
Euclidean dist ance
St andar dized Euclidean dist ance
wher e D is t he diagonal mat r ix wit h diagonal element s given by , which
denot es t he var iance of t he var iable X
j
over t he m object s.
Mahalanobis dist ance
wher e V is t he sample covar iance mat r ix.
Cit y Block met r ic
Minkowski met r ic
Not ice t hat for t he special case of p = 1, t he Minkowski met r ic gives t he Cit y
Block met r ic, and for t he special case of p = 2, t he Minkowski met r ic gives
t he Euclidean dist ance.
d
r s
2
x
r
x
s
( ) x
r
x
s
( )' =
d
r s
2
x
r
x
s
( )D
1
x
r
x
s
( )' =
v
j
2
d
r s
2
x
r
x
s
( )' V
1
x
r
x
s
( ) =
d
r s
x
r j
x
s j

j 1 =
n

=
d
r s
x
r j
x
s j

p
j 1 =
n



' ;


1 p
=
pdist
2-257
Examples X = [1 2; 1 3; 2 2; 3 1]
X =
1 2
1 3
2 2
3 1
Y = pdist(X,'mahal')
Y =
2.3452 2.0000 2.3452 1.2247 2.4495 1.2247
Y = pdist(X)
Y =
1.0000 1.0000 2.2361 1.4142 2.8284 1.4142
squareform(Y)
ans =
0 1.0000 1.0000 2.2361
1.0000 0 1.4142 2.8284
1.0000 1.4142 0 1.4142
2.2361 2.8284 1.4142 0
See Also cluster, clusterdata, cophenet, dendrogram, inconsistent, linkage,
squareform
perms
2-258
2per ms
Purpose All per mut at ions.
Syntax P = perms(v)
Description P = perms(v) wher e v is a r ow vect or of lengt h n, cr eat es a mat r ix whose r ows
consist of all possible per mut at ions of t he n element s of v. The mat r ix P
cont ains n! r ows and n columns.
perms is only pr act ical when n is less t han 8 or 9.
Example perms([2 4 6])
ans =
6 4 2
4 6 2
6 2 4
2 6 4
4 2 6
2 4 6
poisscdf
2-259
2poisscdf
Purpose Poisson cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = poisscdf(X,LAMBDA)
Description poisscdf(X,LAMBDA) comput es t he Poisson cdf at each of t he values in X using
t he cor r esponding par amet er s in LAMBDA. Vect or or mat r ix input s for X and
LAMBDA must be t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input . The par amet er s in LAMBDA must
be posit ive.
The Poisson cdf is
Examples For example, consider a Qualit y Assur ance depar t ment t hat per for ms r andom
t est s of individual har d disks. Their policy is t o shut down t he manufact ur ing
pr ocess if an inspect or finds mor e t han four bad sect or s on a disk. What is t he
pr obabilit y of shut t ing down t he pr ocess if t he mean number of bad sect or s ()
is t wo?
probability = 1 - poisscdf(4,2)
probability =
0.0527
About 5% of t he t ime, a nor mally funct ioning manufact ur ing pr ocess will
pr oduce mor e t han four flaws on a har d disk.
Suppose t he aver age number of flaws () incr eases t o four . What is t he
pr obabilit y of finding fewer t han five flaws on a har d dr ive?
probability = poisscdf(4,4)
probability =
0.6288
This means t hat t his fault y manufact ur ing pr ocess cont inues t o oper at e aft er
t his fir st inspect ion almost 63% of t he t ime.
p F x ( ) e

i
i!
-----
i 0 =
f l oor x ( )

= =
poisscdf
2-260
See Also cdf, poissfit, poissinv, poisspdf, poissrnd, poisstat
poissfit
2-261
2poissfit
Purpose Par amet er est imat es and confidence int er vals for Poisson dat a.
Syntax lambdahat = poissfit(X)
[lambdahat,lambdaci] = poissfit(X)
[lambdahat,lambdaci] = poissfit(X,alpha)
Description poissfit(X) r et ur ns t he maximum likelihood est imat e (MLE) of t he
par amet er of t he Poisson dist r ibut ion, , given t he dat a X.
[lambdahat,lambdaci] = poissfit(X) also gives 95% confidence int er vals in
lamdaci.
[lambdahat,lambdaci] = poissfit(X,alpha) gives 100(1-alpha)%
confidence int er vals. For example alpha = 0.001 yields 99.9% confidence
int er vals.
The sample aver age is t he MLE of .
Example r = poissrnd(5,10,2);
[l,lci] = poissfit(r)
l =
7.4000 6.3000
lci =
5.8000 4.8000
9.1000 7.9000
See Also betafit, binofit, expfit, gamfit, poisscdf, poissfit, poissinv, poisspdf,
poissrnd, poisstat, unifit, weibfit

1
n
--- x
i
i 1 =
n

=
poissinv
2-262
2poissinv
Purpose Inver se of t he Poisson cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = poissinv(P,LAMBDA)
Description poissinv(P,LAMBDA) r et ur ns t he smallest value X such t hat t he Poisson cdf
evaluat ed at X equals or exceeds P.
Examples If t he aver age number of defect s () is t wo, what is t he 95t h per cent ile of t he
number of defect s?
poissinv(0.95,2)
ans =
5
What is t he median number of defect s?
median_defects = poissinv(0.50,2)
median_defects =
2
See Also icdf, poisscdf, poissfit, poisspdf, poissrnd, poisstat
poisspdf
2-263
2poisspdf
Purpose Poisson pr obabilit y densit y funct ion (pdf).
Syntax Y = poisspdf(X,LAMBDA)
Description poisspdf(X,LAMBDA) comput es t he Poisson pdf at each of t he values in X using
t he cor r esponding par amet er s in LAMBDA. Vect or or mat r ix input s for X and
LAMBDA must be t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input . The par amet er s in LAMBDA must
all be posit ive.
The Poisson pdf is
wher e x can be any nonnegat ive int eger . The densit y funct ion is zer o unless x
is an int eger .
Examples A comput er har d disk manufact ur er has obser ved t hat flaws occur r andomly in
t he manufact ur ing pr ocess at t he aver age r at e of t wo flaws in a 4 Gb har d disk
and has found t his r at e t o be accept able. What is t he pr obabilit y t hat a disk will
be manufact ur ed wit h no defect s?
In t his pr oblem, = 2 and x = 0.
p = poisspdf(0,2)
p =
0.1353
See Also pdf, poisscdf, poissfit, poissinv, poissrnd, poisstat
y f x ( )

x
x!
-----e

I
0 1 , , ( )
x ( ) = =
poissrnd
2-264
2poissr nd
Purpose Random number s fr om t he Poisson dist r ibut ion.
Syntax R = poissrnd(LAMBDA)
R = poissrnd(LAMBDA,m)
R = poissrnd(LAMBDA,m,n)
Description R = poissrnd(LAMBDA) gener at es Poisson r andom number s wit h mean
LAMBDA. The size of R is t he size of LAMBDA.
R = poissrnd(LAMBDA,m) gener at es Poisson r andom number s wit h mean
LAMBDA, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column
dimensions of R.
R = poissrnd(LAMBDA,m,n) gener at es Poisson r andom number s wit h mean
LAMBDA, wher e scalar s m and n ar e t he r ow and column dimensions of R.
Examples Gener at e a r andom sample of 10 pseudo-obser vat ions fr om a Poisson
dist r ibut ion wit h = 2.
lambda = 2;
random_sample1 = poissrnd(lambda,1,10)
random_sample1 =
1 0 1 2 1 3 4 2 0 0
random_sample2 = poissrnd(lambda,[1 10])
random_sample2 =
1 1 1 5 0 3 2 2 3 4
random_sample3 = poissrnd(lambda(ones(1,10)))
random_sample3 =
3 2 1 1 0 0 4 0 2 0
See Also poisscdf, poissfit, poissinv, poisspdf, poisstat
poisstat
2-265
2poisst at
Purpose Mean and var iance for t he Poisson dist r ibut ion.
Syntax M = poisstat(LAMBDA)
[M,V] = poisstat(LAMBDA)
Description M = poisstat(LAMBDA) r et ur ns t he mean of t he Poisson dist r ibut ion wit h
par amet er LAMBDA. The size of M is t he size of LAMBDA.
[M,V] = poisstat(LAMBDA) also r et ur ns t he var iance V of t he Poisson
dist r ibut ion.
For t he Poisson dist r ibut ion wit h par amet er , bot h t he mean and var iance ar e
equal t o .
Examples Find t he mean and var iance for t he Poisson dist r ibut ion wit h = 2.
[m,v] = poisstat([1 2; 3 4])
m =
1 2
3 4
v =
1 2
3 4
See Also poisscdf, poissfit, poissinv, poisspdf, poissrnd
polyconf
2-266
2polyconf
Purpose Polynomial evaluat ion and confidence int er val est imat ion.
Syntax [Y,DELTA] = polyconf(p,X,S)
[Y,DELTA] = polyconf(p,X,S,alpha)
Description [Y,DELTA] = polyconf(p,X,S) uses t he opt ional out put S gener at ed by
polyfit t o give 95% confidence int er vals Y DELTA. This assumes t he er r or s in
t he dat a input t o polyfit ar e independent nor mal wit h const ant var iance.
[Y,DELTA] = polyconf(p,X,S,alpha) gives 100(1-alpha)% confidence
int er vals. For example, alpha = 0.1 yields 90% int er vals.
If p is a vect or whose element s ar e t he coefficient s of a polynomial in
descending power s, such as t hose out put fr om polyfit, t hen polyconf(p,X) is
t he value of t he polynomial evaluat ed at X. If X is a mat r ix or vect or , t he
polynomial is evaluat ed at each of t he element s.
Examples This example gives pr edict ions and 90% confidence int er vals for comput ing
t ime for LU fact or izat ions of squar e mat r ices wit h 100 t o 200 columns.
n = [100 100:20:200];
for i = n
A = rand(i,i);
tic
B = lu(A);
t(ceil((i-80)/20)) = toc;
end
[p,S] = polyfit(n(2:7),t,3);
[time,delta_t] = polyconf(p,n(2:7),S,0.1)
time =
0.0829 0.1476 0.2277 0.3375 0.4912 0.7032
delta_t =
0.0064 0.0057 0.0055 0.0055 0.0057 0.0064
polyfit
2-267
2polyfit
Purpose Polynomial cur ve fit t ing.
Syntax [p,S] = polyfit(x,y,n)
Description p = polyfit(x,y,n) finds t he coefficient s of a polynomial p(x) of degr ee n
t hat fit s t he dat a, p(x(i)) t o y(i), in a least -squar es sense. The r esult p is a
r ow vect or of lengt h n+1 cont aining t he polynomial coefficient s in descending
power s.
[p,S] = polyfit(x,y,n) r et ur ns polynomial coefficient s p and mat r ix S for
use wit h polyval t o pr oduce er r or est imat es on pr edict ions. If t he er r or s in t he
dat a, y, ar e independent nor mal wit h const ant var iance, polyval will pr oduce
er r or bounds which cont ain at least 50% of t he pr edict ions.
You may omit S if you ar e not going t o pass it t o polyval or polyconf for
calculat ing er r or est imat es.
The polyfit funct ion is par t of t he st andar d MATLAB language.
Example [p,S] = polyfit(1:10,[1:10] + normrnd(0,1,1,10),1)
p =
1.0300 0.4561
S =
-19.6214 -2.8031
0 -1.4639
8.0000 0
2.3180 0
See Also polyval, polytool, polyconf
p x ( ) p
1
x
n
p
2
x
n 1
p
n
x p
n 1 +
+ + + + =
polytool
2-268
2polyt ool
Purpose Int er act ive plot for pr edict ion of fit t ed polynomials.
Syntax polytool(x,y)
polytool(x,y,n)
polytool(x,y,n,alpha)
Description polytool(x,y) fit s a line t o t he column vect or s x and y and displays an
int er act ive plot of t he r esult . This plot is gr aphic user int er face for explor ing
t he effect s of changing t he polynomial degr ee of t he fit . The plot shows t he
fit t ed cur ve and 95% global confidence int er vals on a new pr edict ed value for
t he cur ve. Text wit h cur r ent pr edict ed value of y and it s uncer t aint y appear s t o
t he left of t he y-axis.
polytool(x,y,n) init ially fit s a polynomial of or der n.
polytool(x,y,n,alpha) plot s 100(1-alpha)% confidence int er vals on t he
pr edict ed values.
polytool fit s by least -squar es using t he r egr ession model
Evaluat e t he funct ion by t yping a value in t he x-axis edit box or by dr agging
t he ver t ical r efer ence line on t he plot . The shape of t he point er changes fr om
an ar r ow t o a cr oss hair when you ar e over t he ver t ical line t o indicat e t hat t he
line can be dr agged. The pr edict ed value of y will updat e as you dr ag t he
r efer ence line.
The ar gument n cont r ols t he degr ee of t he polynomial fit . To change t he degr ee
of t he polynomial, choose fr om t he pop-up menu at t he t op of t he figur e. To
change t he t ype of confidence int er vals, use t he Bounds menu. To change fr om
least squar es t o a r obust fit t ing met hod, use t he Method menu.
When you ar e done, pr ess t he Close but t on.
y
i

0

1
x
i

2
x
i
2

n
x
i
n

i
+ + + + + =

i
N 0
2
, ( ) i
Cov
i

j
, ( ) 0 = i j ,
polyval
2-269
2polyval
Purpose Polynomial evaluat ion.
Syntax Y = polyval(p,X)
[Y,DELTA] = polyval(p,X,S)
Description Y = polyval(p,X) r et ur ns t he pr edict ed value of a polynomial given it s
coefficient s, p, at t he values in X.
[Y,DELTA] = polyval(p,X,S) uses t he opt ional out put S gener at ed by
polyfit t o gener at e er r or est imat es, Y DELTA. If t he er r or s in t he dat a input
t o polyfit ar e independent nor mal wit h const ant var iance, Y DELTA cont ains
at least 50% of t he pr edict ions.
If p is a vect or whose element s ar e t he coefficient s of a polynomial in
descending power s, t hen polyval(p,X) is t he value of t he polynomial
evaluat ed at X. If X is a mat r ix or vect or , t he polynomial is evaluat ed at each of
t he element s.
The polyval funct ion is par t of t he st andar d MATLAB language.
Examples Simulat e t he funct ion y = x, adding nor mal r andom er r or s wit h a st andar d
deviat ion of 0.1. Then use polyfit t o est imat e t he polynomial coefficient s. Not e
t hat pr edict ed Y values ar e wit hin DELTA of t he int eger X in ever y case.
[p,S] = polyfit(1:10,(1:10) + normrnd(0,0.1,1,10),1);
X = magic(3);
[Y,D] = polyval(p,X,S)
Y =
8.0696 1.0486 6.0636
3.0546 5.0606 7.0666
4.0576 9.0726 2.0516
D =
0.0889 0.0951 0.0861
0.0889 0.0861 0.0870
0.0870 0.0916 0.0916
See Also polyfit, polytool, polyconf
prctile
2-270
2pr ct ile
Purpose Per cent iles of a sample.
Syntax Y = prctile(X,p)
Description Y = prctile(X,p) calculat es a value t hat is gr eat er t han p per cent of t he
values in X. The values of p must lie in t he int er val [0 100].
For vect or s, prctile(X,p) is t he pt h per cent ile of t he element s in X. For
inst ance, if p = 50 t hen Y is t he median of X.
For mat r ix X and scalar p, prctile(X,p) is a r ow vect or cont aining t he pt h
per cent ile of each column. If p is a vect or , t he it h r ow of Y is p(i) of X.
Examples x = (1:5)'*(1:5)
x =
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
y = prctile(x,[25 50 75])
y =
1.7500 3.5000 5.2500 7.0000 8.7500
3.0000 6.0000 9.0000 12.0000 15.0000
4.2500 8.5000 12.7500 17.0000 21.2500
princomp
2-271
2pr incomp
Purpose Pr incipal Component s Analysis (PCA).
Syntax PC = princomp(X)
[PC,SCORE,latent,tsquare] = princomp(X)
Description [PC,SCORE,latent,tsquare] = princomp(X) t akes a dat a mat r ix X and
r et ur ns t he pr incipal component s in PC, t he so-called Z-scor es in SCORE, t he
eigenvalues of t he covar iance mat r ix of X in latent, and Hot ellings T
2
st at ist ic
for each dat a point in tsquare.
The Z-scor es ar e t he dat a for med by t r ansfor ming t he or iginal dat a int o t he
space of t he pr incipal component s. The values of t he vect or , latent, ar e t he
var iance of t he columns of SCORE. Hot ellings T
2
is a measur e of t he
mult ivar iat e dist ance of each obser vat ion fr om t he cent er of t he dat a set .
Example Comput e pr incipal component s for t he ingredients dat a in t he Hald dat aset ,
and t he var iance account ed for by each component .
load hald;
[pc,score,latent,tsquare] = princomp(ingredients);
pc,latent
pc =
0.0678 -0.6460 0.5673 -0.5062
0.6785 -0.0200 -0.5440 -0.4933
-0.0290 0.7553 0.4036 -0.5156
-0.7309 -0.1085 -0.4684 -0.4844
latent =
517.7969
67.4964
12.4054
0.2372
Reference J ackson, J . E., A Users Guide to Principal Components, J ohn Wiley and Sons,
Inc. 1991. pp. 125.
See Also barttest, pcacov, pcares
qqplot
2-272
2qqplot
Purpose Quant ile-quant ile plot of t wo samples.
Syntax qqplot(X)
qqplot(X,Y)
qqplot(X,Y,pvec)
h = qqplot(...)
Description qqplot(X) displays a quant ile-quant ile plot of t he sample quant iles of X ver sus
t heor et ical quant iles fr om a nor mal dist r ibut ion. If t he dist r ibut ion of X is
nor mal, t he plot will be close t o linear .
qqplot(X,Y) displays a quant ile-quant ile plot of t wo samples. If t he samples
do come fr om t he same dist r ibut ion, t he plot will be linear .
For mat r ix X and Y, qqplot displays a separ at e line for each pair of columns.
The plot t ed quant iles ar e t he quant iles of t he smaller dat aset .
The plot has t he sample dat a displayed wit h t he plot symbol '+'.
Super imposed on t he plot is a line joining t he fir st and t hir d quar t iles of each
dist r ibut ion (t his is a r obust linear fit of t he or der st at ist ics of t he t wo samples).
This line is ext r apolat ed out t o t he ends of t he sample t o help evaluat e t he
linear it y of t he dat a.
Use qqplot(X,Y,pvec) t o specify t he quant iles in t he vect or pvec.
h = qqplot(X,Y,pvec) r et ur ns handles t o t he lines in h.
Examples Gener at e t wo nor mal samples wit h differ ent means and st andar d deviat ions.
Then make a quant ile-quant ile plot of t he t wo samples.
x = normrnd(0,1,100,1);
y = normrnd(0.5,2,50,1);
qqplot(x,y);
qqplot
2-273
See Also normplot
-3 -2 -1 0 1 2 3
-10
-5
0
5
10
X Quantiles
Y

Q
u
a
n
t
i
l
e
s
random
2-274
2r andom
Purpose Random number s fr om a specified dist r ibut ion.
Syntax y = random('name',A1,A2,A3,m,n)
Description y = random('name',A1,A2,A3,m,n) r et ur ns a mat r ix of r andom number s,
wher e 'name' is a st r ing cont aining t he name of t he dist r ibut ion, and A1, A2,
and A3 ar e mat r ices of dist r ibut ion par amet er s. Depending on t he dist r ibut ion
some of t he par amet er s may not be necessar y.
Vect or or mat r ix input s must all have t he same size. A scalar input is expanded
t o a const ant mat r ix wit h t he same dimensions as t he ot her input s.
The last t wo par amet er s, d and e, ar e t he size of t he mat r ix y. If t he
dist r ibut ion par amet er s ar e mat r ices, t hen t hese par amet er s ar e opt ional, but
t hey must mat ch t he size of t he ot her mat r ix ar gument s (see second example).
random is a ut ilit y r out ine allowing you t o access all t he r andom number
gener at or s in t he St at ist ics Toolbox using t he name of t he dist r ibut ion as a
par amet er . See Over view of t he Dist r ibut ions on page 1-12 for t he list of
available dist r ibut ions.
Examples rn = random('Normal',0,1,2,4)
rn =
1.1650 0.0751 -0.6965 0.0591
0.6268 0.3516 1.6961 1.7971
rp = random('Poisson',1:6,1,6)
rp =
0 0 1 2 5 7
See Also betarnd, binornd, cdf, chi2rnd, exprnd, frnd, gamrnd, geornd, hygernd, icdf,
lognrnd, nbinrnd, ncfrnd, nctrnd, ncx2rnd, normrnd, pdf, poissrnd, raylrnd,
trnd, unidrnd, unifrnd, weibrnd
randtool
2-275
2r andt ool
Purpose Int er act ive r andom number gener at ion using hist ogr ams for display.
Syntax randtool
r = randtool('output')
Description The randtool command set s up a gr aphic user int er face for explor ing t he
effect s of changing par amet er s and sample size on t he hist ogr am of r andom
samples fr om t he suppor t ed pr obabilit y dist r ibut ions.
The M-file calls it self r ecur sively using t he action and flag par amet er s. For
gener al use call randtool wit hout par amet er s.
To out put t he cur r ent set of r andom number s, pr ess t he Output but t on. The
r esult s ar e st or ed in t he var iable ans. Alt er nat ively, use t he following
command.
r = randtool('output') places t he sample of r andom number s in t he
vect or r.
To sample r epet it ively fr om t he same dist r ibut ion, pr ess t he Resample but t on.
To change t he dist r ibut ion funct ion, choose fr om t he pop-up menu of funct ions
at t he t op of t he figur e.
To change t he par amet er set t ings, move t he slider s or t ype a value in t he edit
box under t he name of t he par amet er . To change t he limit s of a par amet er , t ype
a value in t he edit box at t he t op or bot t om of t he par amet er slider .
To change t he sample size, t ype a number in t he Sample Si ze edit box.
When you ar e done, pr ess t he Close but t on.
For an ext ensive discussion, see The r andt ool Demo on page 1-169.
See Also disttool
range
2-276
2r ange
Purpose Sample r ange.
Syntax y = range(X)
Description range(X) r et ur ns t he differ ence bet ween t he maximum and t he minimum of a
sample. For vect or s, range(x) is t he r ange of t he element s. For mat r ices,
range(X) is a r ow vect or cont aining t he r ange of each column of X.
The r ange is an easily calculat ed est imat e of t he spr ead of a sample. Out lier s
have an undue influence on t his st at ist ic, which makes it an unr eliable
est imat or .
Example The r ange of a lar ge sample of st andar d nor mal r andom number s is
appr oximat ely six. This is t he mot ivat ion for t he pr ocess capabilit y indices C
p

and C
pk
in st at ist ical qualit y cont r ol applicat ions.
rv = normrnd(0,1,1000,5);
near6 = range(rv)
near6 =
6.1451 6.4986 6.2909 5.8894 7.0002
See Also std, iqr, mad
ranksum
2-277
2r anksum
Purpose Wilcoxon r ank sum t est t hat t wo populat ions ar e ident ical.
Syntax p = ranksum(x,y,alpha)
[p,h] = ranksum(x,y,alpha)
[p,h,stats] = ranksum(x,y,alpha)
Description p = ranksum(x,y,alpha) r et ur ns t he significance pr obabilit y t hat t he
populat ions gener at ing t wo independent samples, x and y, ar e ident ical. x and
y ar e bot h vect or s, but can have differ ent lengt hs. alpha is t he desir ed level of
significance and must be a scalar bet ween zer o and one.
[p,h] = ranksum(x,y,alpha) also r et ur ns t he r esult of t he hypot hesis t est , h.
h is zer o if t he populat ions of x and y ar e not significant ly differ ent . h is one if
t he t wo populat ions ar e significant ly differ ent .
p is t he pr obabilit y of obser ving a r esult equally or mor e ext r eme t han t he one
using t he dat a (x and y) if t he null hypot hesis is t r ue. If p is near zer o, t his cast s
doubt on t his hypot hesis.
[p,h,stats] = ranksum(x,y,alpha) also r et ur ns a st r uct ur e cont aining t he
field stats.ranksum whose value is equal t o t he r ank sum st at ist ic. For lar ge
samples, it also cont ains stats.zval t hat is t he value of t he nor mal (Z) st at ist ic
used t o comput e p.
Example This example t est s t he hypot hesis of equalit y of means for t wo samples
gener at ed wit h poissrnd.
x = poissrnd(5,10,1);
y = poissrnd(2,20,1);
[p,h] = ranksum(x,y,0.05)
p =
0.0027
h =
1
See Also signrank, signtest, ttest2
raylcdf
2-278
2r aylcdf
Purpose Rayleigh cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = raylcdf(X,B)
Description P = raylcdf(X,B) comput es t he Rayleigh cdf at each of t he values in X using
t he cor r esponding par amet er s in B. Vect or or mat r ix input s for X and B must
have t he same size, which is also t he size of P. A scalar input for X or B is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input .
The Rayleigh cdf is
Example x = 0:0.1:3;
p = raylcdf(x,1);
plot(x,p)
Reference Evans, M., N. Hast ings, and B. Peacock, S tatistical Distributions, S econd
Edition, Wiley 1993. pp. 134136.
See Also cdf, raylinv, raylpdf, raylrnd, raylstat
y F x b ( )
t
b
2
------
0
x

e
t
2

2b
2
---------
,
_
= = d t
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
raylinv
2-279
2r aylinv
Purpose Inver se of t he Rayleigh cumulat ive dist r ibut ion funct ion.
Syntax X = raylinv(P,B)
Description X = raylinv(P,B) r et ur ns t he inver se of t he Rayleigh cumulat ive dist r ibut ion
funct ion wit h par amet er B at t he cor r esponding pr obabilit ies in P. Vect or or
mat r ix input s for P and B must have t he same size, which is also t he size of X.
A scalar input for P or B is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input .
Example x = raylinv(0.9,1)
x =
2.1460
See Also icdf, raylcdf, raylpdf, raylrnd, raylstat
raylpdf
2-280
2r aylpdf
Purpose Rayleigh pr obabilit y densit y funct ion.
Syntax Y = raylpdf(X,B)
Description Y = raylpdf(X,B) comput es t he Rayleigh pdf at each of t he values in X using
t he cor r esponding par amet er s in B. Vect or or mat r ix input s for X and B must
have t he same size, which is also t he size of Y. A scalar input for X or B is
expanded t o a const ant mat r ix wit h t he same dimensions as t he ot her input .
The Rayleigh pdf is
Example x = 0:0.1:3;
p = raylpdf(x,1);
plot(x,p)
See Also pdf, raylcdf, raylinv, raylrnd, raylstat
y f x b ( )
x
b
2
------e
x
2

2b
2
---------
,
_
= =
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
raylrnd
2-281
2r aylr nd
Purpose Random mat r ices fr om t he Rayleigh dist r ibut ion.
Syntax R = raylrnd(B)
R = raylrnd(B,m)
R = raylrnd(B,m,n)
Description R = raylrnd(B) r et ur ns a mat r ix of r andom number s chosen fr om t he
Rayleigh dist r ibut ion wit h par amet er B. The size of R is t he size of B.
R = raylrnd(B,m) r et ur ns a mat r ix of r andom number s chosen fr om t he
Rayleigh dist r ibut ion wit h par amet er B, wher e m is a 1-by-2 vect or t hat
cont ains t he r ow and column dimensions of R.
R = raylrnd(B,m,n) r et ur ns a mat r ix of r andom number s chosen fr om t he
Rayleigh dist r ibut ion wit h par amet er B, wher e scalar s m and n ar e t he r ow and
column dimensions of R.
Example r = raylrnd(1:5)
r =
1.7986 0.8795 3.3473 8.9159 3.5182
See Also random, raylcdf, raylinv, raylpdf, raylstat
raylstat
2-282
2r aylst at
Purpose Mean and var iance for t he Rayleigh dist r ibut ion.
Syntax M = raylstat(B)
[M,V] = raylstat(B)
Description [M,V] = raylstat(B) r et ur ns t he mean and var iance of t he Rayleigh
dist r ibut ion wit h par amet er B.
The mean of t he Rayleigh dist r ibut ion wit h par amet er b is and t he
var iance is
Example [mn,v] = raylstat(1)
mn =
1.2533
v =
0.4292
See Also raylcdf, raylinv, raylpdf, raylrnd
b 2
4
2
------------b
2
rcoplot
2-283
2r coplot
Purpose Residual case or der plot .
Syntax rcoplot(r,rint)
Description rcoplot(r,rint) displays an er r or bar plot of t he confidence int er vals on t he
r esiduals fr om a r egr ession. The r esiduals appear in t he plot in case or der .
Input s r and rint ar e out put s fr om t he regress funct ion.
Example X = [ones(10,1) (1:10)'];
y = X [10;1] + normrnd(0,0.1,10,1);
[b,bint,r,rint] = regress(y,X,0.05);
rcoplot(r,rint);
The figur e shows a plot of t he r esiduals wit h er r or bar s showing 95% confidence
int er vals on t he r esiduals. All t he er r or bar s pass t hr ough t he zer o line,
indicat ing t hat t her e ar e no out lier s in t he dat a.
See Also regress
0 2 4 6 8 10
-0.2
-0.1
0
0.1
0.2
R
e
s
i
d
u
a
l
s
Case Number
refcurve
2-284
2r efcur ve
Purpose Add a polynomial cur ve t o t he cur r ent plot .
Syntax h = refcurve(p)
Description refcurve adds a gr aph of t he polynomial p t o t he cur r ent axes. The funct ion for
a polynomial of degr ee n is:
y = p
1
x
n
+ p
2
x
(n-1)
+ ... + p
n
x + p
n+1
Not e t hat p
1
goes wit h t he highest or der t er m.
h = refcurve(p) r et ur ns t he handle t o t he cur ve.
Example Plot dat a for t he height of a r ocket against t ime, and add a r efer ence cur ve
showing t he t heor et ical height (assuming no air fr ict ion). The init ial velocit y of
t he r ocket is 100 m/sec.
h = [85 162 230 289 339 381 413 437 452 458 456 440 400 356];
plot(h,'+')
refcurve([-4.9 100 0])
See Also polyfit, polyval, refline
0 2 4 6 8 10 12 14
0
100
200
300
400
500
refline
2-285
2r efline
Purpose Add a r efer ence line t o t he cur r ent axes.
Syntax refline(slope,intercept)
refline(slope)
h = refline(slope,intercept)
refline
Description refline(slope,intercept) adds a r efer ence line wit h t he given slope and
intercept t o t he cur r ent axes.
refline(slope), wher e slope is a t wo-element vect or , adds t he line
y = slope(2) + slope(1)*x
t o t he figur e.
h = refline(slope,intercept) r et ur ns t he handle t o t he line.
refline wit h no input ar gument s super imposes t he least squar es line on each
line object in t he cur r ent figur e (except LineStyles '-','--','.-'). This
behavior is equivalent t o lsline.
Example y = [3.2 2.6 3.1 3.4 2.4 2.9 3.0 3.3 3.2 2.1 2.6]';
plot(y,'+')
refline(0,3)
See Also lsline, polyfit, polyval, refcurve
0 2 4 6 8 10 12
2
2.5
3
3.5
regress
2-286
2r egr ess
Purpose Mult iple linear r egr ession.
Syntax b = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X,alpha)
Description b = regress(y,X) r et ur ns t he least squar es fit of y on X by solving t he linear
model
for , wher e:
y is an n-by-1 vect or of obser vat ions
X is an n-by-p mat r ix of r egr essor s
is a p-by-1 vect or of par amet er s
is an n-by-1 vect or of r andom dist ur bances
[b,bint,r,rint,stats] = regress(y,X) r et ur ns an est imat e of in b, a 95%
confidence int er val for in t he p-by-2 vect or bint. The r esiduals ar e r et ur ned
in r and a 95% confidence int er val for each r esidual is r et ur ned in t he n-by-2
vect or rint. The vect or stats cont ains t he R
2
st at ist ic along wit h t he F and p
values for t he r egr ession.
[b,bint,r,rint,stats] = regress(y,X,alpha) gives 100(1-alpha)%
confidence int er vals for bint and rint. For example, alpha = 0.2 gives 80%
confidence int er vals.
Examples Suppose t he t r ue model is
wher e I is t he ident it y mat r ix.
X = [ones(10,1) (1:10)']
y X + =
N 0
2
I , ( )
y 10 x + + =
N 0 0. 01I , ( )
regress
2-287
X =
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
y = X [10;1] + normrnd(0,0.1,10,1)
y =
11.1165
12.0627
13.0075
14.0352
14.9303
16.1696
17.0059
18.1797
19.0264
20.0872
[b,bint] = regress(y,X,0.05)
b =
10.0456
1.0030
bint =
9.9165 10.1747
0.9822 1.0238
Compar e b t o [10 1]'. Not e t hat bint includes t he t r ue model values.
Reference Chat t er jee, S. and A. S. Hadi. Influential Observations, High Leverage Points,
and Outliers in Linear Regression. St at ist ical Science, 1986. pp. 379416.
regstats
2-288
2r egst at s
Purpose Regr ession diagnost ics gr aphical user int er face.
Syntax regstats(responses,DATA)
regstats(responses,DATA,'model')
Description regstats(responses,DATA) gener at es r egr ession diagnost ics for a linear
addit ive model wit h a const ant t er m. The dependent var iable is t he vect or
responses. Values of t he independent var iables ar e in t he mat r ix DATA.
The funct ion cr eat es a figur e wit h a gr oup of check boxes t hat save diagnost ic
st at ist ics t o t he base wor kspace using var iable names you can specify.
regstats(responses,data,'model') cont r ols t he or der of t he r egr ession
model, wher e 'model' can be one of t hese st r ings:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms
'quadratic' includes int er act ions and squar ed t er ms
'purequadratic' includes const ant , linear , and squar ed t er ms
The lit er at ur e suggest s many diagnost ic st at ist ics for evaluat ing mult iple
linear r egr ession. regstats pr ovides t hese diagnost ics:
Q fr om QR decomposit ion
R fr om QR decomposit ion
Regr ession coefficient s
Covar iance of r egr ession coefficient s
Fit t ed values of t he r esponse dat a
Residuals
Mean squar ed er r or
Lever age
Hat mat r ix
Delet e-1 var iance
Delet e-1 coefficient s
St andar dized r esiduals
St udent ized r esiduals
Change in r egr ession coefficient s
regstats
2-289
Change in fit t ed values
Scaled change in fit t ed values
Change in covar iance
Cooks dist ance
For mor e det ail pr ess t he Help but t on in t he regstats window. This pr ovides
for mulae and int er pr et at ions for each of t hese r egr ession diagnost ics.
Algorithm The usual r egr ession model is y = X + , wher e:
y is an n-by-1 vect or of r esponses
X is an n-by-p mat r ix of pr edict or s
is an p-by-1 vect or of par amet er s
is an n-by-1 vect or of r andom dist ur bances
Let X = Q*R wher e Q and R come fr om a QR Decomposit ion of X. Q is or t hogonal
and R is t r iangular . Bot h of t hese mat r ices ar e useful for calculat ing many
r egr ession diagnost ics (Goodall 1993).
The st andar d t ext book equat ion for t he least squar es est imat or of is
However , t his definit ion has poor numer ic pr oper t ies. Par t icular ly dubious is
t he comput at ion of , which is bot h expensive and impr ecise.
Numer ically st able MATLAB code for is
b = R\(Q'*y);
Reference Goodall, C. R. (1993). Computation using the QR decomposition. Handbook in
St at ist ics, Volume 9. St at ist ical Comput ing (C. R. Rao, ed.). Amst er dam, NL
Elsevier /Nor t h-Holland.
See Also leverage, stepwise, regress

b X' X ( )
1
Xy = =
XX ( )
1
ridge
2-290
2r idge
Purpose Par amet er est imat es for r idge r egr ession.
Syntax b = ridge(y,X,k)
Description b = ridge(y,X,k) r et ur ns t he r idge r egr ession coefficient s b for t he linear
model y = X + , wher e:
X is an n-by-p mat r ix
y is t he n-by-1 vect or of obser vat ions
k is a scalar const ant (t he r idge par amet er )
The r idge est imat or of is .
When k = 0, b is t he least squar es est imat or . For incr easing k, t he bias of b
incr eases, but t he var iance of b falls. For poor ly condit ioned X, t he dr op in t he
var iance mor e t han compensat es for t he bias.
Example This example shows how t he coefficient s change as t he value of k incr eases,
using dat a fr om t he hald dat aset .
load hald;
b = zeros(4,100);
kvec = 0.01:0.01:1;
count = 0;
for k = 0.01:0.01:1
count = count + 1;
b(:,count) = ridge(heat,ingredients,k);
end
plot(kvec',b'),xlabel('k'),ylabel('b','FontName','Symbol')
b XX k I + ( )
1
Xy =
ridge
2-291
See Also regress, stepwise
0 0.2 0.4 0.6 0.8 1
-10
-5
0
5
10
k

robustdemo
2-292
2r obust demo
Purpose Demo of r obust r egr ession.
Syntax robustdemo
robustdemo(X,Y)
Description rsmdemo demonst r at es r obust r egr ession and or dinar y least squar es r egr ession
on a sample dat aset . The funct ion cr eat es a figur e window cont aining a scat t er
plot of sample dat a vect or s X and Y, along wit h t wo fit t ed lines calculat ed using
least squar es and t he r obust bisquar e met hod. The bot t om of t he figur e shows
t he equat ions of t he lines and t he est imat ed er r or st andar d deviat ions for each
fit . If you use t he left mouse but t on t o select an point and move it t o a new
locat ion, bot h fit s will updat e. If you hold down t he r ight mouse but t on over any
point , t he point will be labeled wit h t he lever age of t hat point on t he least
squar es fit , and t he weight of t hat point in t he r obust fit .
rsmdemo(X,Y) per for ms t he same demonst r at ion using t he X and Y values t hat
you specify.
Example See The r obust demo Demo on page 1-172.
See Also robustfit, leverage
robustfit
2-293
2r obust fit
Purpose Robust r egr ession.
Syntax b = robustfit(X,Y)
[b,stats] = robustfit(X,Y)
[b,stats] = robustfit(X,Y,'wfun',tune,'const')
Description b = robustfit(X,Y) uses r obust r egr ession t o fit Y as a funct ion of t he
columns of X, and r et ur ns t he vect or b of coefficient est imat es. The robustfit
funct ion uses an it er at ively r eweight ed least squar es algor it hm, wit h t he
weight s at each it er at ion calculat ed by applying t he bisquar e funct ion t o t he
r esiduals fr om t he pr evious it er at ion. This algor it hm gives lower weight t o
point s t hat do not fit well. The r esult s ar e less sensit ive t o out lier s in t he dat a
as compar ed wit h or dinar y least squar es r egr ession.
[b,stats] = robustfit(X,Y) also r et ur ns a st at s st r uct ur e wit h t he
following fields:
stats.ols_s sigma est imat e (r mse) fr om least squar es fit
stats.robust_s r obust est imat e of sigma
stats.mad_s est imat e of sigma comput ed using t he median absolut e
deviat ion of t he r esiduals fr om t heir median; used for scaling r esiduals
dur ing t he it er at ive fit t ing
stats.s final est imat e of sigma, t he lar ger of robust_s and a weight ed
aver age of ols_s and robust_s
stats.se st andar d er r or of coefficient est imat es
stats.t r at io of b t o stats.se
stats.p p-values for stats.t
stats.coeffcorr est imat ed cor r elat ion of coefficient est imat es
stats.w vect or of weight s for r obust fit
stats.h vect or of lever age values for least squar es fit
stats.dfe degr ees of fr eedom for er r or
stats.R R fact or in QR decomposit ion of X mat r ix
The robustfit funct ion est imat es t he var iance-covar iance mat r ix of t he
coefficient est imat es as V = inv(X'*X)*stats.s^2. The st andar d er r or s and
cor r elat ions ar e der ived fr om V.
robustfit
2-294
[b,stats] = robustfit(X,Y,'wfun',tune,'const') specifies a weight
funct ion, a t uning const ant , and t he pr esence or absence of a const ant t er m.
The weight funct ion 'wfun' can be any of t he names list ed in t he following
t able.
The value r in t he weight funct ion expr ession is equal t o
resid/(tune*s*sqrt(1-h))
wher e resid is t he vect or of r esiduals fr om t he pr evious it er at ion, tune is t he
t uning const ant , h is t he vect or of lever age values fr om a least squar es fit , and
s is an est imat e of t he st andar d deviat ion of t he er r or t er m.
s = MAD/0.6745
The quant it y MAD is t he median absolut e deviat ion of t he r esiduals fr om t heir
median. The const ant 0.6745 makes t he est imat e unbiased for t he nor mal
dist r ibut ion. If t her e ar e p columns in t he X mat r ix (including t he const ant
t er m, if any), t he smallest p-1 absolut e deviat ions ar e excluded when
comput ing t heir median.
In addit ion t o t he funct ion names list ed above, 'wfun' can be 'ols' t o per for m
unweight ed or dinar y least squar es.
The ar gument tune over r ides t he default t uning const ant fr om t he t able. A
smaller t uning const ant t ends t o downweight lar ge r esiduals mor e sever ely,
Weight function Meaning Tuning constant
'andrews' w = (abs(r)<pi) .* sin(r) ./ r 1.339
'bisquare' w = (abs(r)<1) .* (1 - r.^2).^2 4.685
'cauchy' w = 1 ./ (1 + r.^2) 2.385
'fair' w = 1 ./ (1 + abs(r)) 1.400
'huber' w = 1 ./ max(1, abs(r)) 1.345
'logistic' w = tanh(r) ./ r 1.205
'talwar' w = 1 * (abs(r)<1) 2.795
'welsch' w = exp(-(r.^2)) 2.985
robustfit
2-295
and a lar ger t uning const ant downweight s lar ge r esiduals less sever ely. The
default t uning const ant s, shown in t he t able, yield coefficient est imat es t hat
ar e appr oximat ely 95% as efficient as least squar es est imat es, when t he
r esponse has a nor mal dist r ibut ion wit h no out lier s. The value of 'const' can
be 'on' (t he default ) t o add a const ant t er m or 'off' t o omit it . If you want a
const ant t er m, you should set 'const' t o 'on' r at her t han adding a column of
ones t o your X mat r ix.
As an alt er nat ive t o specifying one of t he named weight funct ions shown above,
you can wr it e your own weight funct ion t hat t akes a vect or of scaled r esiduals
as input and pr oduces a vect or of weight s as out put . You can specify 'wfun'
using @ (for example, @myfun) or as an inline funct ion.
Example Let s see how a single er r oneous point affect s least squar es and r obust fit s.
Fir st we gener at e a simple dat aset following t he equat ion y = 10-2*x plus some
r andom noise. Then we change one y value t o simulat e an out lier t hat could be
an er r oneous measur ement .
x = (1:10)';
y = 10 - 2*x + randn(10,1);
y(10) = 0;
We use bot h or dinar y least squar es and r obust fit t ing t o est imat e t he equat ions
of a st r aight line fit .
bls = regress(y,[ones(10,1) x])
bls =
8.6305
-1.4721
brob = robustfit(x,y)
brob =
10.5089
-1.9844
A scat t er plot wit h bot h fit t ed lines shows t hat t he r obust fit (solid line) fit s
most of t he dat a point s well but ignor es t he out lier . The least squar es fit (dot t ed
line) is pulled t owar d t he out lier .
robustfit
2-296
scatter(x,y)
hold on
plot(x,bls(1)+bls(2)*x,'g:')
plot(x,brob(1)+brob(2)*x,'r-')
See Also regress, robustdemo
References DuMouchel, W.H., and F.L. OBr ien (1989), Int egr at ing a r obust opt ion int o a
mult iple r egr ession comput ing envir onment , Computer S cience and S tatistics:
Proceedings of the 21st S ymposium on the Interface, Alexandr ia, VA: Amer ican
St at ist ical Associat ion.
Holland, P.W., and R.E. Welsch (1977), Robust r egr ession using it er at ively
r eweight ed least -squar es, Communications in S tatistics: Theory and Methods,
A6, 813-827.
Huber , P.J . (1981), Robust S tatistics, New Yor k: Wiley.
St r eet , J .O., R.J . Car r oll, and D. Rupper t (1988), A not e on comput ing r obust
r egr ession est imat es via it er at ively r eweight ed least squar es, The American
S tatistician, 42, 152-154
1 2 3 4 5 6 7 8 9 10
10
8
6
4
2
0
2
4
6
8
10
rowexch
2-297
2r owexch
Purpose D-opt imal design of exper iment s r ow exchange algor it hm.
Syntax settings = rowexch(nfactors,nruns)
[settings,X] = rowexch(nfactors,nruns)
[settings,X] = rowexch(nfactors,nruns,'model')
Description settings = rowexch(nfactors,nruns) gener at es t he fact or set t ings mat r ix,
settings, for a D-Opt imal design using a linear addit ive model wit h a const ant
t er m. settings has nruns r ows and nfactors columns.
[settings,X] = rowexch(nfactors,nruns) also gener at es t he associat ed
design mat r ix X.
[settings,X] = rowexch(nfactors,nruns,'model') pr oduces a design for
fit t ing a specified r egr ession model. The input , 'model', can be one of t hese
st r ings:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms.
'quadratic' int er act ions plus squar ed t er ms.
'purequadratic' includes const ant , linear and squar ed t er ms.
Example This example illust r at es t hat t he D-opt imal design for t hr ee fact or s in eight
r uns, using an int er act ions model, is a t wo level full-fact or ial design.
s = rowexch(3,8,'interaction')
s =
-1 -1 1
1 -1 -1
1 -1 1
-1 -1 -1
-1 1 1
1 1 1
-1 1 -1
1 1 -1
See Also cordexch, daugment, dcovary, fullfact, ff2n, hadamard
rsmdemo
2-298
2r smdemo
Purpose Demo of design of exper iment s and sur face fit t ing.
Syntax rsmdemo
Description rsmdemo cr eat es a GUI t hat simulat es a chemical r eact ion. To st ar t , you have
a budget of 13 t est r eact ions. Tr y t o find out how changes in each r eact ant affect
t he r eact ion r at e. Det er mine t he r eact ant set t ings t hat maximize t he r eact ion
r at e. Est imat e t he r un-t o-r un var iabilit y of t he r eact ion. Now r un a designed
exper iment using t he model pop-up. Compar e your pr evious r esult s wit h t he
out put fr om r esponse sur face modeling or nonlinear modeling of t he r eact ion.
The GUI has t he following element s:
A Run but t on t o per for m one r eact or r un at t he cur r ent set t ings
An Export but t on t o expor t t he x and y dat a t o t he base wor kspace
Thr ee slider s wit h associat ed dat a ent r y boxes t o cont r ol t he par t ial
pr essur es of t he chemical r eact ant s: Hydr ogen, n-Pent ane, and Isopent ane
A t ext box t o r epor t t he r eact ion r at e
A t ext box t o keep t r ack of t he number of t est r eact ions you have left
Example See The r smdemo Demo on page 1-170.
See Also rstool, nlintool, cordexch
rstool
2-299
2r st ool
Purpose Int er act ive fit t ing and visualizat ion of a r esponse sur face.
Syntax rstool(x,y)
rstool(x,y,'model')
rstool(x,y,'model',alpha,'xname','yname')
Description rstool(x,y) displays an int er act ive pr edict ion plot wit h 95% global confidence
int er vals. This plot r esult s fr om a mult iple r egr ession of (x,y) dat a using a
linear addit ive model.
rstool(x,y,'model') allows cont r ol over t he init ial r egr ession model, wher e
'model' can be one of t he following st r ings:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms
'quadratic' includes int er act ions and squar ed t er ms
'purequadratic' includes const ant , linear and squar ed t er ms
rstool(x,y,'model',alpha) plot s 100(1-alpha)% global confidence int er val
for pr edict ions as t wo r ed cur ves. For example, alpha = 0.01 gives 99%
confidence int er vals.
rstool displays a vect or of plot s, one for each column of t he mat r ix of
input s x. The r esponse var iable, y, is a column vect or t hat mat ches t he number
of r ows in x.
rstool(x,y,'model',alpha,'xname','yname') labels t he gr aph using t he
st r ing mat r ix 'xname' for t he labels t o t he x-axes and t he st r ing, 'yname', t o
label t he y-axis common t o all t he plot s.
Dr ag t he dot t ed whit e r efer ence line and wat ch t he pr edict ed values updat e
simult aneously. Alt er nat ively, you can get a specific pr edict ion by t yping t he
value of x int o an edit able t ext field. Use t he pop-up menu labeled Model t o
int er act ively change t he model. Use t he pop-up menu labeled Export t o move
specified var iables t o t he base wor kspace.
Example See Quadr at ic Response Sur face Models on page 1-86.
See Also nlintool
schart
2-300
2schar t
Purpose Char t of st andar d deviat ion for St at ist ical Pr ocess Cont r ol.
Syntax schart(DATA,conf)
schart(DATA,conf,specs)
schart(DATA,conf,specs)
[outliers,h] = schart(DATA,conf,specs)
Description schart(data) displays an S char t of t he gr ouped r esponses in DATA. The r ows
of DATA cont ain r eplicat e obser vat ions t aken at a given t ime. The r ows must be
in t ime or der . The gr aph cont ains t he sample st andar d deviat ion s for each
gr oup, a cent er line at t he aver age s value, and upper and lower cont r ol limit s.
The limit s ar e placed at a t hr ee-sigma dist ance on eit her side of t he cent er line,
wher e sigma is an est imat e of t he st andar d deviat ion of s. If t he pr ocess is in
cont r ol, fewer t han 3 out of 1000 obser vat ions would be expect ed t o fall out side
t he cont r ol limit s by r andom chance. So, if you obser ve point s out side t he
limit s, you can t ake t his as evidence t hat t he pr ocess is not in cont r ol.
schart(DATA,conf) allows cont r ol of t he confidence level of t he upper and
lower plot t ed cont r ol limit s. The default conf = 0.9973 pr oduces t hr ee-sigma
limit s.
norminv(1 - (1-.9973)/2)
ans =
3
To get k-sigma limit s, use t he expr ession 1-2*(1-normcdf(k)). For example,
t he cor r ect conf value for 2-sigma limit s is 0.9545, as shown below.
k = 2;
1-2*(1-normcdf(k))
ans =
0.9545
schart(DATA,conf,specs) plot s t he specificat ion limit s in t he t wo element
vect or specs.
[outliers,h] = schart(data,conf,specs) r et ur ns outliers, a vect or of
indices t o t he r ows wher e t he mean of DATA is out of cont r ol, and h, a vect or of
handles t o t he plot t ed lines.
schart
2-301
Example This example plot s an S char t of measur ement s on newly machined par t s,
t aken at one hour int er vals for 36 hour s. Each r ow of t he runout mat r ix
cont ains t he measur ement s for 4 par t s chosen at r andom. The values indicat e,
in t housandt hs of an inch, t he amount t he par t r adius differ s fr om t he t ar get
r adius.
load parts
schart(runout)
All point s ar e wit hin t he cont r ol limit s, so t he var iabilit y wit hin subgr oups is
consist ent wit h what would be expect ed by r andom chance. Ther e is no
evidence t hat t he pr ocess is out of cont r ol.
Reference Mont gomer y, D., Introduction to S tatistical Quality Control, J ohn Wiley and
Sons 1991. p. 235.
See Also capaplot, ewmaplot, histfit, xbarplot
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
UCL
LCL
CL
S Chart
Sample Number
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
signrank
2-302
2signr ank
Purpose Wilcoxon signed r ank t est of equalit y of medians.
Syntax p = signrank(x,y,alpha)
[p,h] = signrank(x,y,alpha)
[p,h,stats] = signrank(x,y,alpha)
Description p = signrank(x,y,alpha) r et ur ns t he significance pr obabilit y t hat t he
medians of t wo mat ched samples, x and y, ar e equal. x and y must be vect or s
of equal lengt h. y may also be a scalar ; in t his case, signrank comput es t he
pr obabilit y t hat t he median of x is differ ent fr om t he const ant y. alpha is t he
desir ed level of significance, and must be a scalar bet ween zer o and one.
[p,h] = signrank(x,y,alpha) also r et ur ns t he r esult of t he hypot hesis
t est , h. h is zer o if t he differ ence in medians of x and y is not significant ly
differ ent fr om zer o. h is one if t he t wo medians ar e significant ly differ ent .
p is t he pr obabilit y of obser ving a r esult equally or mor e ext r eme t han t he one
using t he dat a (x and y) if t he null hypot hesis is t r ue. p is calculat ed using t he
r ank values for t he differ ences bet ween cor r esponding element s in x and y. If p
is near zer o, t his cast s doubt on t his hypot hesis.
[p,h,stats] = signrank(x,y,alpha) also r et ur ns a st r uct ur e stats
cont aining t he field stats.signedrank whose value is t he signed r ank
st at ist ic. For lar ge samples, it also cont ains stats.zval, t he value of t he
nor mal (Z) st at ist ic used t o comput e p.
Example This example t est s t he hypot hesis of equalit y of means for t wo samples
gener at ed wit h normrnd. The samples have t he same t heor et ical mean but
differ ent st andar d deviat ions.
x = normrnd(0,1,20,1);
y = normrnd(0,2,20,1);
[p,h] = signrank(x,y,0.05)
p =
0.2959
h =
0
signrank
2-303
See Also ranksum, signtest, ttest
signtest
2-304
2signt est
Purpose Sign t est for pair ed samples.
Syntax p = signtest(x,y,alpha)
[p,h] = signtest(x,y,alpha)
[p,h,stats] = signtest(x,y,alpha)
Description p = signtest(x,y,alpha) r et ur ns t he significance pr obabilit y t hat t he
medians of t wo mat ched samples, x and y, ar e equal. x and y must be vect or s
of equal lengt h. y may also be a scalar ; in t his case, signtest comput es t he
pr obabilit y t hat t he median of x is differ ent fr om t he const ant y. alpha is t he
desir ed level of significance and must be a scalar bet ween zer o and one.
[p,h] = signtest(x,y,alpha) also r et ur ns t he r esult of t he hypot hesis t est ,
h. h is 0 if t he differ ence in medians of x and y is not significant ly differ ent fr om
zer o. h is 1 if t he t wo medians ar e significant ly differ ent .
p is t he pr obabilit y of obser ving a r esult equally or mor e ext r eme t han t he one
using t he dat a (x and y) if t he null hypot hesis is t r ue. p is calculat ed using t he
signs (plus or minus) of t he differ ences bet ween cor r esponding element s in x
and y. If p is near zer o, t his cast s doubt on t his hypot hesis.
[p,h,stats] = signtest(x,y,alpha) also r et ur ns a st r uct ur e stats
cont aining t he field stats.sign whose value is t he sign st at ist ic. For lar ge
samples, it also cont ains stats.zval, t he value of t he nor mal (Z) st at ist ic used
t o comput e p.
Example This example t est s t he hypot hesis of equalit y of medians for t wo samples
gener at ed wit h normrnd. The samples have t he same t heor et ical median but
differ ent st andar d deviat ions. (For t he nor mal dist r ibut ion, t he mean and
median ar e t he same.)
x = normrnd(0,1,20,1);
y = normrnd(0,2,20,1);
[p,h] = signtest(x,y,0.05)
p =
0.2632
h =
0
signtest
2-305
See Also ranksum, signrank, ttest
skewness
2-306
2skewness
Purpose Sample skewness.
Syntax y = skewness(X)
y = skewness(X,flag)
Description y = skewness(X) r et ur ns t he sample skewness of X. For vect or s, skewness(x)
is t he skewness of t he element s of x. For mat r ices, skewness(X) is a r ow vect or
cont aining t he sample skewness of each column.
Skewness is a measur e of t he asymmet r y of t he dat a ar ound t he sample mean.
If skewness is negat ive, t he dat a ar e spr ead out mor e t o t he left of t he mean
t han t o t he r ight . If skewness is posit ive, t he dat a ar e spr ead out mor e t o t he
r ight . The skewness of t he nor mal dist r ibut ion (or any per fect ly symmet r ic
dist r ibut ion) is zer o.
The skewness of a dist r ibut ion is defined as
wher e is t he mean of x, is t he st andar d deviat ion of x, and E(t) r epr esent s
t he expect ed value of t he quant it y t.
y = skewness(X,flag) specifies whet her t o cor r ect for bias (flag = 0) or not
(flag = 1, t he default ). When X r epr esent s a sample fr om a populat ion, t he
skewness of X is biased; t hat is, it will t end t o differ fr om t he populat ion
skewness by a syst emat ic amount t hat depends on t he size of t he sample. You
can set flag = 0 t o cor r ect for t his syst emat ic bias.
Example X = randn([5 4])
X =
1.1650 1.6961 -1.4462 -0.3600
0.6268 0.0591 -0.7012 -0.1356
0.0751 1.7971 1.2460 -1.3493
0.3516 0.2641 -0.6390 -1.2704
-0.6965 0.8717 0.5774 0.9846
y
E x ( )
3

3
------------------------ =
skewness
2-307
y = skewness(X)
y =
-0.2933 0.0482 0.2735 0.4641
See Also kurtosis, mean, moment, std, var
squareform
2-308
2squar efor m
Purpose Refor mat t he out put of pdist int o a squar e mat r ix.
Syntax S = squareform(Y)
Description S = squareform(Y) r efor mat s t he dist ance infor mat ion r et ur ned by pdist
fr om a vect or int o a squar e mat r ix. In t his for mat , S (i,j) denot es t he dist ance
bet ween t he i and j obser vat ions in t he or iginal dat a.
See Also pdist
std
2-309
2st d
Purpose St andar d deviat ion of a sample.
Syntax y = std(X)
Description y = std(X) comput es t he sample st andar d deviat ion of t he dat a in X. For
vect or s, std(x) is t he st andar d deviat ion of t he element s in x. For mat r ices,
std(X) is a r ow vect or cont aining t he st andar d deviat ion of each column of X.
std nor malizes by n-1 wher e n is t he sequence lengt h. For nor mally dist r ibut ed
dat a, t he squar e of t he st andar d deviat ion is t he minimum var iance unbiased
est imat or of
2
(t he second par amet er ).
The st andar d deviat ion is
wher e t he sample aver age is .
The std funct ion is par t of t he st andar d MATLAB language.
Examples In each column, t he expect ed value of y is one.
x = normrnd(0,1,100,6);
y = std(x)
y =
0.9536 1.0628 1.0860 0.9927 0.9605 1.0254
y = std(-1:2:1)
y =
1.4142
See Also cov, var
s
1
n 1
------------- x
i
x ( )
2
i 1 =
n

,


_
1
2
---
=
x
1
n
--- x
i
=
stepwise
2-310
2st epwise
Purpose Int er act ive envir onment for st epwise r egr ession.
Syntax stepwise(X,y)
stepwise(X,y,inmodel)
stepwise(X,y,inmodel,alpha)
Description stepwise(X,y) fit s a r egr ession model of y on t he columns of X. It displays
t hr ee figur e windows for int er act ively cont r olling t he st epwise addit ion and
r emoval of model t er ms.
stepwise(X,y,inmodel) allows cont r ol of t he t er ms in t he or iginal r egr ession
model. The values of vect or , inmodel, ar e t he indices of t he columns of t he
mat r ix X t o include in t he init ial model.
stepwise(X,y,inmodel,alpha) allows cont r ol of t he lengt h confidence
int er vals on t he fit t ed coefficient s. alpha is t he significance for t est ing each
t er m in t he model. By default , alpha = 1 - (1 - 0.025)
(1/p)
wher e p is t he number
of columns in X. This t r anslat es t o plot t ed 95% simult aneous confidence
int er vals (Bonfer r oni) for all t he coefficient s.
The least squar es coefficient is plot t ed wit h a gr een filled cir cle. A coefficient is
not significant ly differ ent fr om zer o if it s confidence int er val cr osses t he whit e
zer o line. Significant model t er ms ar e plot t ed using solid lines. Ter ms not
significant ly differ ent fr om zer o ar e plot t ed wit h dot t ed lines.
Click on t he confidence int er val lines t o t oggle t he st at e of t he model
coefficient s. If t he confidence int er val line is gr een, t he t er m is in t he model. If
t he confidence int er val line is r ed, t he t er m is not in t he model.
Use t he Export menu t o move var iables t o t he base wor kspace.
Example See St epwise Regr ession on page 1-88.
Reference Dr aper , N. and H. Smit h, Applied Regression Analysis, S econd Edition, J ohn
Wiley and Sons, Inc. 1981 pp. 307312.
See Also regstats, regress, rstool
surfht
2-311
2sur fht
Purpose Int er act ive cont our plot .
Syntax surfht(Z)
surfht(x,y,Z)
Description surfht(Z) is an int er act ive cont our plot of t he mat r ix Z t r eat ing t he values in
Z as height above t he plane. The x-values ar e t he column indices of Z while t he
y-values ar e t he r ow indices of Z.
surfht(x,y,Z) wher e x and y ar e vect or s specify t he x and y-axes on t he
cont our plot . The lengt h of x must mat ch t he number of columns in Z, and t he
lengt h of y must mat ch t he number of r ows in Z.
Ther e ar e ver t ical and hor izont al r efer ence lines on t he plot whose int er sect ion
defines t he cur r ent x-value and y-value. You can dr ag t hese dot t ed whit e
r efer ence lines and wat ch t he int er polat ed z-value (at t he t op of t he plot )
updat e simult aneously. Alt er nat ively, you can get a specific int er polat ed
z-value by t yping t he x-value and y-value int o edit able t ext fields on t he x-axis
and y-axis r espect ively.
tabulate
2-312
2t abulat e
Purpose Fr equency t able.
Syntax table = tabulate(x)
tabulate(x)
Description table = tabulate(x) t akes a vect or of posit ive int eger s, x, and r et ur ns a
mat r ix, table.
The fir st column of table cont ains t he values of x. The second cont ains t he
number of inst ances of t his value. The last column cont ains t he per cent age of
each value.
tabulate wit h no out put ar gument s displays a for mat t ed t able in t he
command window.
Example tabulate([1 2 4 4 3 4])
Value Count Percent
1 1 16.67%
2 1 16.67%
3 1 16.67%
4 3 50.00%
See Also pareto
tblread
2-313
2t blr ead
Purpose Read t abular dat a fr om t he file syst em.
Syntax [data,varnames,casenames] = tblread
[data,varnames,casenames] = tblread('filename')
[data,varnames,casenames] = tblread('filename','delimiter')
Description [data,varnames,casenames] = tblread displays t he Fi le Open dialog box for
int er act ive select ion of t he t abular dat a file. The file for mat has var iable names
in t he fir st r ow, case names in t he fir st column and dat a st ar t ing in t he (2,2)
posit ion.
[data,varnames,casenames] = tblread(filename) allows command line
specificat ion of t he name of a file in t he cur r ent dir ect or y, or t he complet e
pat hname of any file.
[data,varnames,casenames] = tblread(filename,'delimiter') allows
specificat ion of t he field 'delimiter' in t he file. Accept ed values ar e 'tab',
'space', or 'comma'.
tblread r et ur ns t he dat a r ead in t hr ee values.
Return Value Description
data Numer ic mat r ix wit h a value for each var iable-case pair .
varnames St r ing mat r ix cont aining t he var iable names in t he fir st
r ow.
casenames St r ing mat r ix cont aining t he names of each case in t he
fir st column.
tblread
2-314
Example [data,varnames,casenames] = tblread('sat.dat')
data =
470 530
520 480
varnames =
Male
Female
casenames =
Verbal
Quantitative
See Also caseread, tblwrite, tdfread
tblwrite
2-315
2t blwr it e
Purpose Wr it es t abular dat a t o t he file syst em.
Syntax tblwrite(data,'varnames','casenames')
tblwrite(data,'varnames','casenames','filename')
Description tblwrite(data,'varnames','casenames') displays t he Fi le Open dialog box
for int er act ive specificat ion of t he t abular dat a out put file. The file for mat has
var iable names in t he fir st r ow, case names in t he fir st column and data
st ar t ing in t he (2,2) posit ion.
'varnames' is a st r ing mat r ix cont aining t he var iable names. 'casenames' is
a st r ing mat r ix cont aining t he names of each case in t he fir st column. data is
a numer ic mat r ix wit h a value for each var iable-case pair .
tblwrite(data,'varnames','casenames','filename') allows command line
specificat ion of a file in t he cur r ent dir ect or y, or t he complet e pat hname of any
file in t he st r ing 'filename'.
Example Cont inuing t he example fr om tblread:
tblwrite(data,varnames,casenames,'sattest.dat')
type sattest.dat
Male Female
Verbal 470 530
Quantitative 520 480
See Also casewrite, tblread
tcdf
2-316
2t cdf
Purpose St udent s t cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = tcdf(X,V)
Description P = tcdf(X,V) comput es St udent s t cdf at each of t he values in X using t he
cor r esponding degr ees of fr eedom in V. Vect or or mat r ix input s for X and V must
be t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input s. The par amet er s in V must be posit ive int eger s.
The t cdf is
The r esult , p, is t he pr obabilit y t hat a single obser vat ion fr om t he t dist r ibut ion
wit h degr ees of fr eedom will fall in t he int er val (- x].
Examples Suppose 10 samples of Guinness beer have a mean alcohol cont ent of 5.5% by
volume and t he st andar d deviat ion of t hese samples is 0.5%. What is t he
pr obabilit y t hat t he t r ue alcohol cont ent of Guinness beer is less t han 5%?
t = (5.0 - 5.5) / 0.5;
probability = tcdf(t,10 - 1)
probability =
0.1717
See Also cdf, tinv, tpdf, trnd, tstat
p F x ( )

1 +
2
------------
,
_


2
---
,
_
----------------------
1

----------
1
1
t
2

----- +
,
_
1 +
2
------------
------------------------------- t d

x

= =
tdfread
2-317
2t dfr ead
Purpose Read file cont aining t ab-delimit ed numer ic and t ext values.
Syntax tdfread
tdfread('filename')
tdfread('filename','delimiter')
Description tdfread displays t he Fi le Open dialog box for int er act ive select ion of t he dat a
file. The file should consist of columns of values, separ at ed by t abs, and wit h
column names in t he fir st line of t he file. Each column is r ead fr om t he file and
assigned t o a var iable wit h t he specified name. If all values for a column ar e
numer ic, t he var iable is conver t ed t o number s; ot her wise t he var iable is a
st r ing mat r ix. Aft er all values ar e impor t ed, tdfread displays infor mat ion
about t he impor t ed values using t he for mat of t he whos command.
tdfread('filename') allows command line specificat ion of t he name of a file
in t he cur r ent dir ect or y, or t he complet e pat hname of any file.
tdfread('filename','delimiter') indicat es t hat t he char act er specified by
'delimiter' separ at es columns in t he file. Accept ed values ar e:
' ' or 'space'
'\t' or 'tab'
',' or 'comma'
';' or 'semi'
'|' or 'bar'
The default delimit er is 'tab'.
Example type sat2.dat
Test,Gender,Score
Verbal,Mail,470
Verbal,Female,530
Quantitative,Male,520
Quantitative,Female,480
tdfread('sat2.dat',',')
tdfread
2-318
Name Size Bytes Class
Gender 4x6 48 char array
Score 4x1 32 double array
Test 4x12 96 char array
Grand total is 76 elements using 176 bytes
See Also tblread
tinv
2-319
2t inv
Purpose Inver se of t he St udent s t cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = tinv(P,V)
Description X = tinv(P,V) comput es t he inver se of St udent s t cdf wit h par amet er V for
t he cor r esponding pr obabilit ies in P. Vect or or mat r ix input s for P and V must
be t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input s. The degr ees of fr eedom in V must be posit ive
int eger s, and t he values in P must lie on t he int er val [0 1].
The t inver se funct ion in t er ms of t he t cdf is
wher e
The r esult , x, is t he solut ion of t he cdf int egr al wit h par amet er , wher e you
supply t he desir ed pr obabilit y p.
Examples What is t he 99t h per cent ile of t he t dist r ibut ion for one t o six degr ees of
fr eedom?
percentile = tinv(0.99,1:6)
percentile =
31.8205 6.9646 4.5407 3.7469 3.3649 3.1427
See Also icdf, tcdf, tpdf, trnd, tstat
x F
1
p ( ) x:F x ( ) p = { } = =
p F x ( )

1 +
2
------------
,
_


2
---
,
_
----------------------
1

----------
1
1
t
2

----- +
,
_
1 +
2
------------
------------------------------- t d

x

= =
tpdf
2-320
2t pdf
Purpose St udent s t pr obabilit y densit y funct ion (pdf).
Syntax Y = tpdf(X,V)
Description Y = tpdf(X,V) comput es St udent s t pdf at each of t he values in X using t he
cor r esponding par amet er s in V. Vect or or mat r ix input s for X and V must have
t he same size. A scalar input is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input s. The degr ees of fr eedom in V must be posit ive
int eger s.
St udent s t pdf is
Examples The mode of t he t dist r ibut ion is at x = 0. This example shows t hat t he value of
t he funct ion at t he mode is an incr easing funct ion of t he degr ees of fr eedom.
tpdf(0,1:6)
ans =
0.3183 0.3536 0.3676 0.3750 0.3796 0.3827
The t dist r ibut ion conver ges t o t he st andar d nor mal dist r ibut ion as t he degr ees
of fr eedom appr oaches infinit y. How good is t he appr oximat ion for v = 30?
difference = tpdf(-2.5:2.5,30) - normpdf(-2.5:2.5)
difference =
0.0035 -0.0006 -0.0042 -0.0042 -0.0006 0.0035
See Also pdf, tcdf, tinv, trnd, tstat
y f x ( )

1 +
2
------------
,
_


2
---
,
_
----------------------
1

----------
1
1
x
2

----- +
,
_
1 +
2
------------
-------------------------------- = =
trimmean
2-321
2t r immean
Purpose Mean of a sample of dat a excluding ext r eme values.
Syntax m = trimmean(X,percent)
Description m = trimmean(X,percent) calculat es t he mean of a sample X excluding t he
highest and lowest percent/2 of t he obser vat ions. The t r immed mean is a
r obust est imat e of t he locat ion of a sample. If t her e ar e out lier s in t he dat a, t he
t r immed mean is a mor e r epr esent at ive est imat e of t he cent er of t he body of t he
dat a. If t he dat a is all fr om t he same pr obabilit y dist r ibut ion, t hen t he t r immed
mean is less efficient t han t he sample aver age as an est imat or of t he locat ion
of t he dat a.
Examples This example shows a Mont e Car lo simulat ion of t he efficiency of t he 10%
t r immed mean r elat ive t o t he sample aver age for nor mal dat a.
x = normrnd(0,1,100,100);
m = mean(x);
trim = trimmean(x,10);
sm = std(m);
strim = std(trim);
efficiency = (sm/strim).^2
efficiency =
0.9702
See Also mean, median, geomean, harmmean
trnd
2-322
2t r nd
Purpose Random number s fr om St udent s t dist r ibut ion.
Syntax R = trnd(V)
R = trnd(V,m)
R = trnd(V,m,n)
Description R = trnd(V) gener at es r andom number s fr om St udent s t dist r ibut ion wit h V
degr ees of fr eedom. The size of R is t he size of V.
R = trnd(V,m) gener at es r andom number s fr om St udent s t dist r ibut ion wit h
V degr ees of fr eedom, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and
column dimensions of R.
R = trnd(V,m,n) gener at es r andom number s fr om St udent s t dist r ibut ion
wit h V degr ees of fr eedom, wher e scalar s m and n ar e t he r ow and column
dimensions of R.
Examples noisy = trnd(ones(1,6))
noisy =
19.7250 0.3488 0.2843 0.4034 0.4816 -2.4190
numbers = trnd(1:6,[1 6])
numbers =
-1.9500 -0.9611 -0.9038 0.0754 0.9820 1.0115
numbers = trnd(3,2,6)
numbers =
-0.3177 -0.0812 -0.6627 0.1905 -1.5585 -0.0433
0.2536 0.5502 0.8646 0.8060 -0.5216 0.0891
See Also tcdf, tinv, tpdf, tstat
tstat
2-323
2t st at
Purpose Mean and var iance for t he St udent s t dist r ibut ion.
Syntax [M,V] = tstat(NU)
Description [M,V] = tstat(NU) r et ur ns t he mean and var iance for St udent s t dist r ibut ion
wit h par amet er s specified by NU. M and V ar e t he same size as NU.
The mean of t he St udent s t dist r ibut ion wit h par amet er is zer o for values of
gr eat er t han 1. If is one, t he mean does not exist . The var iance for values of
gr eat er t han 2 is .
Examples Find t he mean and var iance for 1 t o 30 degr ees of fr eedom.
[m,v] = tstat(reshape(1:30,6,5))
m =
NaN 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
v =
NaN 1.4000 1.1818 1.1176 1.0870
NaN 1.3333 1.1667 1.1111 1.0833
3.0000 1.2857 1.1538 1.1053 1.0800
2.0000 1.2500 1.1429 1.1000 1.0769
1.6667 1.2222 1.1333 1.0952 1.0741
1.5000 1.2000 1.1250 1.0909 1.0714
Not e t hat t he var iance does not exist for one and t wo degr ees of fr eedom.
See Also tcdf, tinv, tpdf, trnd
2 ( )
ttest
2-324
2t t est
Purpose Hypot hesis t est ing for a single sample mean.
Syntax h = ttest(x,m)
h = ttest(x,m,alpha)
[h,sig,ci] = ttest(x,m,alpha,tail)
Description h = ttest(x,m) per for ms a t -t est at significance level 0.05 t o det er mine
whet her a sample fr om a nor mal dist r ibut ion (in x) could have mean m when
t he st andar d deviat ion is unknown.
h = ttest(x,m,alpha) gives cont r ol of t he significance level, alpha. For
example if alpha = 0.01, and t he r esult , h, is 1 you can r eject t he null
hypot hesis at t he significance level 0.01. If h is 0, you cannot r eject t he null
hypot hesis at t he alpha level of significance.
[h,sig,ci] = ttest(x,m,alpha,tail) allows specificat ion of one- or
t wo-t ailed t est s. tail is a flag t hat specifies one of t hr ee alt er nat ive
hypot heses:
tail = 0 specifies t he alt er nat ive (default )
tail = 1 specifies t he alt er nat ive
tail = -1 specifies t he alt er nat ive
Out put sig is t he p-value associat ed wit h t he T-st at ist ic
wher e is t he sample st andar d deviat ion and is t he number of obser vat ions
in t he sample. sig is t he pr obabilit y t hat t he obser ved value of T could be as
lar ge or lar ger by chance under t he null hypot hesis t hat t he mean of x is equal
t o m.
ci is a 1-alpha confidence int er val for t he t r ue mean.
Example This example gener at es 100 nor mal r andom number s wit h t heor et ical mean
zer o and st andar d deviat ion one. The obser ved mean and st andar d deviat ion
ar e differ ent fr om t heir t heor et ical values, of cour se. We t est t he hypot hesis
t hat t her e is no t r ue differ ence.
x m
x m >
x m <
T
x m
s n
-------------- =
s n
ttest
2-325
Nor mal r andom number gener at or t est .
x = normrnd(0,1,1,100);
[h,sig,ci] = ttest(x,0)
h =
0
sig =
0.4474
ci =
-0.1165 0.2620
The r esult h = 0 means t hat we cannot r eject t he null hypot hesis. The
significance level is 0.4474, which means t hat by chance we would have
obser ved values of T mor e ext r eme t han t he one in t his example in 45 of 100
similar exper iment s. A 95% confidence int er val on t he mean is
[-0.1165 0.2620], which includes t he t heor et ical (and hypot hesized) mean of
zer o.
ttest2
2-326
2t t est 2
Purpose Hypot hesis t est ing for t he differ ence in means of t wo samples.
Syntax [h,significance,ci] = ttest2(x,y)
[h,significance,ci] = ttest2(x,y,alpha)
[h,significance,ci] = ttest2(x,y,alpha,tail)
Description h = ttest2(x,y) per for ms a t -t est t o det er mine whet her t wo samples fr om a
nor mal dist r ibut ion (in x and y) could have t he same mean when t he st andar d
deviat ions ar e unknown but assumed equal.
The r esult , h, is 1 if you can r eject t he null hypot hesis at t he 0.05 significance
level alpha and 0 ot her wise.
The significance is t he p-value associat ed wit h t he T-st at ist ic
wher e s is t he pooled sample st andar d deviat ion and n and m ar e t he number s
of obser vat ions in t he x and y samples. significance is t he pr obabilit y t hat t he
obser ved value of T could be as lar ge or lar ger by chance under t he null
hypot hesis t hat t he mean of x is equal t o t he mean of y.
ci is a 95% confidence int er val for t he t r ue differ ence in means.
[h,significance,ci] = ttest2(x,y,alpha) gives cont r ol of t he significance
level alpha. For example if alpha = 0.01, and t he r esult , h, is 1, you can r eject
t he null hypot hesis at t he significance level 0.01. ci in t his case is a
100(1-alpha)% confidence int er val for t he t r ue differ ence in means.
ttest2(x,y,alpha,tail) allows specificat ion of one- or t wo-t ailed t est s,
wher e tail is a flag t hat specifies one of t hr ee alt er nat ive hypot heses:
tail = 0 specifies t he alt er nat ive (default )
tail = 1 specifies t he alt er nat ive
tail = -1 specifies t he alt er nat ive
T
x y
s
1
n
---
1
m
----- +
----------------------- =

x

y

x

y
>

x

y
<
ttest2
2-327
Examples This example gener at es 100 nor mal r andom number s wit h t heor et ical mean 0
and st andar d deviat ion 1. We t hen gener at e 100 mor e nor mal r andom number s
wit h t heor et ical mean 1/2 and st andar d deviat ion 1. The obser ved means and
st andar d deviat ions ar e differ ent fr om t heir t heor et ical values, of cour se. We
t est t he hypot hesis t hat t her e is no t r ue differ ence bet ween t he t wo means.
Not ice t hat t he t r ue differ ence is only one half of t he st andar d deviat ion of t he
individual obser vat ions, so we ar e t r ying t o det ect a signal t hat is only one half
t he size of t he inher ent noise in t he pr ocess.
x = normrnd(0,1,100,1);
y = normrnd(0.5,1,100,1);
[h,significance,ci] = ttest2(x,y)
h =
1
significance =
0.0017
ci =
-0.7352 -0.1720
The r esult h = 1 means t hat we can r eject t he null hypot hesis. The
significance is 0.0017, which means t hat by chance we would have obser ved
values of t mor e ext r eme t han t he one in t his example in only 17 of 10,000
similar exper iment s! A 95% confidence int er val on t he mean is
[-0.7352 -0.1720], which includes t he t heor et ical (and hypot hesized) differ ence
of -0.5.
unidcdf
2-328
2unidcdf
Purpose Discr et e unifor m cumulat ive dist r ibut ion (cdf) funct ion.
Syntax P = unidcdf(X,N)
Description P = unidcdf(X,N) comput es t he discr et e unifor m cdf at each of t he values in X
using t he cor r esponding par amet er s in N. Vect or or mat r ix input s for X and N
must have t he same size. A scalar input is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input s. The maximum obser vable values in
N must be posit ive int eger s.
The discr et e unifor m cdf is
The r esult , p, is t he pr obabilit y t hat a single obser vat ion fr om t he discr et e
unifor m dist r ibut ion wit h maximum N will be a posit ive int eger less t han or
equal t o x. The values x do not need t o be int eger s.
Examples What is t he pr obabilit y of dr awing a number 20 or less fr om a hat wit h t he
number s fr om 1 t o 50 inside?
probability = unidcdf(20,50)
probability =
0.4000
See Also cdf, unidinv, unidpdf, unidrnd, unidstat
p F x N ( )
f l oor x ( )
N
----------------------I
1 N , , ( )
x ( ) = =
unidinv
2-329
2unidinv
Purpose Inver se of t he discr et e unifor m cumulat ive dist r ibut ion funct ion.
Syntax X = unidinv(P,N)
Description X = unidinv(P,N) r et ur ns t he smallest posit ive int eger X such t hat t he
discr et e unifor m cdf evaluat ed at X is equal t o or exceeds P. You can t hink of P
as t he pr obabilit y of dr awing a number as lar ge as X out of a hat wit h t he
number s 1 t hr ough N inside.
Vect or or mat r ix input s for N and P must have t he same size, which is also t he
size of X. A scalar input for N or P is expanded t o a const ant mat r ix wit h t he
same dimensions as t he ot her input . The values in P must lie on t he int er val
[0 1] and t he values in N must be posit ive int eger s.
Examples x = unidinv(0.7,20)
x =
14
y = unidinv(0.7 + eps,20)
y =
15
A small change in t he fir st par amet er pr oduces a lar ge jump in out put . The cdf
and it s inver se ar e bot h st ep funct ions. The example shows what happens at a
st ep.
See Also icdf, unidcdf, unidpdf, unidrnd, unidstat
unidpdf
2-330
2unidpdf
Purpose Discr et e unifor m pr obabilit y densit y funct ion (pdf).
Syntax Y = unidpdf(X,N)
Description unidpdf(X,N) comput es t he discr et e unifor m pdf at each of t he values in X
using t he cor r esponding par amet er s in N. Vect or or mat r ix input s for X and N
must have t he same size. A scalar input is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input s. The par amet er s in N must be posit ive
int eger s.
The discr et e unifor m pdf is
You can t hink of y as t he pr obabilit y of obser ving any one number bet ween 1
and n.
Examples For fixed n, t he unifor m discr et e pdf is a const ant .
y = unidpdf(1:6,10)
y =
0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
Now fix x, and var y n.
likelihood = unidpdf(5,4:9)
likelihood =
0 0.2000 0.1667 0.1429 0.1250 0.1111
See Also pdf, unidcdf, unidinv, unidrnd, unidstat
y f x N ( )
1
N
----I
1 N , , ( )
x ( ) = =
unidrnd
2-331
2unidr nd
Purpose Random number s fr om t he discr et e unifor m dist r ibut ion.
Syntax R = unidrnd(N)
R = unidrnd(N,mm)
R = unidrnd(N,mm,nn)
Description The discr et e unifor m dist r ibut ion ar ises fr om exper iment s equivalent t o
dr awing a number fr om one t o N out of a hat .
R = unidrnd(N) gener at es discr et e unifor m r andom number s wit h
maximum N. The par amet er s in N must be posit ive int eger s. The size of R is t he
size of N.
R = unidrnd(N,mm) gener at es discr et e unifor m r andom number s wit h
maximum N, wher e mm is a 1-by-2 vect or t hat cont ains t he r ow and column
dimensions of R.
R = unidrnd(N,mm,nn) gener at es discr et e unifor m r andom number s wit h
maximum N, wher e scalar s mm and nn ar e t he r ow and column dimensions of R.
Examples In t he Massachuset t s lot t er y, a player chooses a four digit number . Gener at e
r andom number s for Monday t hr ough Sat ur day.
numbers = unidrnd(10000,1,6) - 1
numbers =
2189 470 6788 6792 9346
See Also unidcdf, unidinv, unidpdf, unidstat
unidstat
2-332
2unidst at
Purpose Mean and var iance for t he discr et e unifor m dist r ibut ion.
Syntax [M,V] = unidstat(N)
Description [M,V] = unidstat(N) r et ur ns t he mean and var iance for t he discr et e unifor m
dist r ibut ion wit h par amet er N.
The mean of t he discr et e unifor m dist r ibut ion wit h par amet er N is .
The var iance is .
Examples [m,v] = unidstat(1:6)
m =
1.0000 1.5000 2.0000 2.5000 3.0000 3.5000
v =
0 0.2500 0.6667 1.2500 2.0000 2.9167
See Also unidcdf, unidinv, unidpdf, unidrnd
N 1 + ( ) 2
N
2
1 ( ) 12
unifcdf
2-333
2unifcdf
Purpose Cont inuous unifor m cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = unifcdf(X,A,B)
Description P = unifcdf(X,A,B) comput es t he unifor m cdf at each of t he values in X using
t he cor r esponding par amet er s in A and B (t he minimum and maximum values,
r espect ively). Vect or or mat r ix input s for X, A, and B must all have t he same
size. A scalar input is expanded t o a const ant mat r ix wit h t he same dimensions
as t he ot her input s.
The unifor m cdf is
The st andar d unifor m dist r ibut ion has A = 0 and B = 1.
Examples What is t he pr obabilit y t hat an obser vat ion fr om a st andar d unifor m
dist r ibut ion will be less t han 0.75?
probability = unifcdf(0.75)
probability =
0.7500
What is t he pr obabilit y t hat an obser vat ion fr om a unifor m dist r ibut ion wit h
a = -1 and b = 1 will be less t han 0.75?
probability = unifcdf(0.75,-1,1)
probability =
0.8750
See Also cdf, unifinv, unifit, unifpdf, unifrnd, unifstat
p F x a b , ( )
x a
b a
------------I
a b , [ ]
x ( ) = =
unifinv
2-334
2unifinv
Purpose Inver se cont inuous unifor m cumulat ive dist r ibut ion funct ion (cdf).
Syntax X = unifinv(P,A,B)
Description X = unifinv(P,A,B) comput es t he inver se of t he unifor m cdf wit h par amet er s
A and B (t he minimum and maximum values, r espect ively) at t he cor r esponding
pr obabilit ies in P. Vect or or mat r ix input s for P, A, and B must all have t he same
size. A scalar input is expanded t o a const ant mat r ix wit h t he same dimensions
as t he ot her input s.
The inver se of t he unifor m cdf is
The st andar d unifor m dist r ibut ion has A = 0 and B = 1.
Examples What is t he median of t he st andar d unifor m dist r ibut ion?
median_value = unifinv(0.5)
median_value =
0.5000
What is t he 99t h per cent ile of t he unifor m dist r ibut ion bet ween -1 and 1?
percentile = unifinv(0.99,-1,1)
percentile =
0.9800
See Also icdf, unifcdf, unifit, unifpdf, unifrnd, unifstat
x F
1
p a b , ( ) a p a b ( )I
0 1 , [ ]
p ( ) + = =
unifit
2-335
2unifit
Purpose Par amet er est imat es for unifor mly dist r ibut ed dat a.
Syntax [ahat,bhat] = unifit(X)
[ahat,bhat,ACI,BCI] = unifit(X)
[ahat,bhat,ACI,BCI] = unifit(X,alpha)
Description [ahat,bhat] = unifit(X) r et ur ns t he maximum likelihood est imat es (MLEs)
of t he par amet er s of t he unifor m dist r ibut ion given t he dat a in X.
[ahat,bhat,ACI,BCI] = unifit(X) also r et ur ns 95% confidence int er vals,
ACI and BCI, which ar e mat r ices wit h t wo r ows. The fir st r ow cont ains t he
lower bound of t he int er val for each column of t he mat r ix X. The second r ow
cont ains t he upper bound of t he int er val.
[ahat,bhat,ACI,BCI] = unifit(X,alpha) allows cont r ol of t he confidence
level alpha. For example, if alpha = 0.01 t hen ACI and BCI ar e 99% confidence
int er vals.
Example r = unifrnd(10,12,100,2);
[ahat,bhat,aci,bci] = unifit(r)
ahat =
10.0154 10.0060
bhat =
11.9989 11.9743
aci =
9.9551 9.9461
10.0154 10.0060
bci =
11.9989 11.9743
12.0592 12.0341
See Also betafit, binofit, expfit, gamfit, normfit, poissfit, unifcdf, unifinv,
unifpdf, unifrnd, unifstat, weibfit
unifpdf
2-336
2unifpdf
Purpose Cont inuous unifor m pr obabilit y densit y funct ion (pdf).
Syntax Y = unifpdf(X,A,B)
Description Y = unifpdf(X,A,B) comput es t he cont inuous unifor m pdf at each of t he
values in X using t he cor r esponding par amet er s in A and B. Vect or or mat r ix
input s for X, A, and B must all have t he same size. A scalar input is expanded
t o a const ant mat r ix wit h t he same dimensions as t he ot her input s. The
par amet er s in B must be gr eat er t han t hose in A.
The cont inuous unifor m dist r ibut ion pdf is
The st andar d unifor m dist r ibut ion has A = 0 and B = 1.
Examples For fixed a and b, t he unifor m pdf is const ant .
x = 0.1:0.1:0.6;
y = unifpdf(x)
y =
1 1 1 1 1 1
What if x is not bet ween a and b?
y = unifpdf(-1,0,1)
y =
0
See Also pdf, unifcdf, unifinv, unifrnd, unifstat
y f x a b , ( )
1
b a
------------I
a b , [ ]
x ( ) = =
unifrnd
2-337
2unifr nd
Purpose Random number s fr om t he cont inuous unifor m dist r ibut ion.
Syntax R = unifrnd(A,B)
R = unifrnd(A,B,m)
R = unifrnd(A,B,m,n)
Description R = unifrnd(A,B) gener at es unifor m r andom number s wit h par amet er s A
and B. Vect or or mat r ix input s for A and B must have t he same size, which is
also t he size of R. A scalar input for A or B is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input .
R = unifrnd(A,B,m) gener at es unifor m r andom number s wit h par amet er s A
and B, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column dimensions
of R.
R = unifrnd(A,B,m,n) gener at es unifor m r andom number s wit h par amet er s
A and B, wher e scalar s m and n ar e t he r ow and column dimensions of R.
Examples random = unifrnd(0,1:6)
random =
0.2190 0.0941 2.0366 2.7172 4.6735 2.3010
random = unifrnd(0,1:6,[1 6])
random =
0.5194 1.6619 0.1037 0.2138 2.6485 4.0269
random = unifrnd(0,1,2,3)
random =
0.0077 0.0668 0.6868
0.3834 0.4175 0.5890
See Also unifcdf, unifinv, unifpdf, unifstat
unifstat
2-338
2unifst at
Purpose Mean and var iance for t he cont inuous unifor m dist r ibut ion.
Syntax [M,V] = unifstat(A,B)
Description [M,V] = unifstat(A,B) r et ur ns t he mean and var iance for t he cont inuous
unifor m dist r ibut ion wit h par amet er s specified by A and B. Vect or or mat r ix
input s for A and B must have t he same size, which is also t he size of M and V. A
scalar input for A or B is expanded t o a const ant mat r ix wit h t he same
dimensions as t he ot her input .
The mean of t he cont inuous unifor m dist r ibut ion wit h par amet er s a and b is
, and t he var iance is .
Examples a = 1:6;
b = 2.a;
[m,v] = unifstat(a,b)
m =
1.5000 3.0000 4.5000 6.0000 7.5000 9.0000
v =
0.0833 0.3333 0.7500 1.3333 2.0833 3.0000
See Also unifcdf, unifinv, unifpdf, unifrnd
a b + ( ) 2 b a ( )
2
12
var
2-339
2var
Purpose Var iance of a sample.
Syntax y = var(X)
y = var(X,1)
y = var(X,w)
Description y = var(X) comput es t he var iance of t he dat a in X. For vect or s, var(x) is t he
var iance of t he element s in x. For mat r ices, var(X) is a r ow vect or cont aining
t he var iance of each column of X.
y = var(x) nor malizes by n-1 wher e n is t he sequence lengt h. For nor mally
dist r ibut ed dat a, t his makes var(x) t he minimum var iance unbiased est imat or
MVUE of
2
(t he second par amet er ).
y = var(x,1) nor malizes by n and yields t he second moment of t he sample
dat a about it s mean (moment of iner t ia).
y = var(X,w) comput es t he var iance using t he vect or of posit ive weight s w.
The number of element s in w must equal t he number of r ows in t he mat r ix X.
For vect or x, w and x must mat ch in lengt h.
var suppor t s bot h common definit ions of var iance. Let S S be t he sum of
t he squar ed deviat ions of t he element s of a vect or x fr om t heir mean. Then,
var(x) = S S /(n-1) is t he MVUE, and var(x,1) = S S /n is t he maximum
likelihood est imat or (MLE) of
2
.
var
2-340
Examples x = [-1 1];
w = [1 3];
v1 = var(x)
v1 =
2
v2 = var(x,1)
v2 =
1
v3 = var(x,w)
v3 =
0.7500
See Also cov, std
weibcdf
2-341
2weibcdf
Purpose Weibull cumulat ive dist r ibut ion funct ion (cdf).
Syntax P = weibcdf(X,A,B)
Description P = weibcdf(X,A,B) comput es t he Weibull cdf at each of t he values in X using
t he cor r esponding par amet er s in A and B. Vect or or mat r ix input s for X, A, and
B must all have t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input s. The par amet er s in A and B must
be posit ive.
The Weibull cdf is
Examples What is t he pr obabilit y t hat a value fr om a Weibull dist r ibut ion wit h
par amet er s a = 0.15 and b = 0.24 is less t han 500?
probability = weibcdf(500,0.15,0.24)
probability =
0.4865
How sensit ive is t his r esult t o small changes in t he par amet er s?
[A,B] = meshgrid(0.1:0.05:0.2,0.2:0.05:0.3);
probability = weibcdf(500,A,B)
probability =
0.2929 0.4054 0.5000
0.3768 0.5080 0.6116
0.4754 0.6201 0.7248
See Also cdf, weibfit, weibinv, weiblike, weibpdf, weibplot, weibrnd, weibstat
p F x a b , ( ) abt
b 1
e
a t
b

t d
0
x
1 e
a x
b

I
0 , ( )
x ( ) = = =
weibfit
2-342
2weibfit
Purpose Par amet er est imat es and confidence int er vals for Weibull dat a.
Syntax phat = weibfit(x)
[phat,pci] = weibfit(x)
[phat,pci] = weibfit(x,alpha)
Description phat = weibfit(x) r et ur ns t he maximum likelihood est imat es, phat, of t he
par amet er s of t he Weibull dist r ibut ion given t he values in vect or x, which must
be posit ive. phat is a t wo-element r ow vect or : phat(1) est imat es t he Weibull
par amet er a, and phat(2) est imat es t he Weibull par amet er b in t he pdf
[phat,pci] = weibfit(x) also r et ur ns 95% confidence int er vals in t he
t wo-r ow mat r ix pci. The fir st r ow cont ains t he lower bound of t he confidence
int er val, and t he second r ow cont ains t he upper bound. The columns of pci
cor r espond t o t he columns of phat.
[phat,pci] = weibfit(x,alpha) allows cont r ol over t he confidence int er val
r et ur ned, 100(1-alpha)%.
Example r = weibrnd(0.5,0.8,100,1);
[phat,pci] = weibfit(r)
phat =
0.4746 0.7832
pci =
0.3851 0.6367
0.5641 0.9298
See Also betafit, binofit, expfit, gamfit, normfit, poissfit, unifit, weibcdf,
weibinv, weiblike, weibpdf, weibplot, weibrnd, weibstat
y f x a b , ( ) abx
b 1
e
a x
b

I
0 , ( )
x ( ) = =
weibinv
2-343
2weibinv
Purpose Inver se of t he Weibull cumulat ive dist r ibut ion funct ion.
Syntax X = weibinv(P,A,B)
Description X = weibinv(P,A,B) comput es t he inver se of t he Weibull cdf wit h par amet er s
A and B for t he cor r esponding pr obabilit ies in P. Vect or or mat r ix input s for P,
A, and B must all have t he same size. A scalar input is expanded t o a const ant
mat r ix wit h t he same dimensions as t he ot her input s. The par amet er s in A and
B must be posit ive.
The inver se of t he Weibull cdf is
Examples A bat ch of light bulbs have lifet imes (in hour s) dist r ibut ed Weibull wit h
par amet er s a = 0.15 and b = 0.24. What is t he median lifet ime of t he bulbs?
life = weibinv(0.5,0.15,0.24)
life =
588.4721
What is t he 90t h per cent ile?
life = weibinv(0.9,0.15,0.24)
life =
8.7536e+04
See Also icdf, weibcdf, weibfit, weiblike, weibpdf, weibplot, weibrnd, weibstat
x F
1
p a b , ( )
1
a
---
1
1 p
------------
,
_
ln
1
b
---
I
0 1 , [ ]
p ( ) = =
weiblike
2-344
2weiblike
Purpose Weibull negat ive log-likelihood funct ion.
Syntax logL = weiblike(params,data)
[logL,avar] = weiblike(params,data)
Description logL = weiblike(params,data) r et ur ns t he Weibull log-likelihood wit h
par amet er s params(1) = a and params(2) = b given t he data x
i
.
[logL,avar] = weiblike(params,data) also r et ur ns avar, which is t he
asympt ot ic var iance-covar iance mat r ix of t he par amet er est imat es if t he
values in params ar e t he maximum likelihood est imat es. avar is t he inver se of
Fisher s infor mat ion mat r ix. The diagonal element s of avar ar e t he asympt ot ic
var iances of t heir r espect ive par amet er s.
The Weibull negat ive log-likelihood is
weiblike is a ut ilit y funct ion for maximum likelihood est imat ion.
Example This example cont inues t he example fr om weibfit.
r = weibrnd(0.5,0.8,100,1);
[logL,info] = weiblike([0.4746 0.7832],r)
logL =
203.8216
info =
0.0021 0.0022
0.0022 0.0056
Reference Pat el, J . K., C. H. Kapadia, and D. B. Owen, Handbook of S tatistical
Distributions, Mar cel-Dekker , 1976.
See Also betalike, gamlike, mle, weibcdf, weibfit, weibinv, weibpdf, weibplot,
weibrnd, weibstat
L log f a b , x
i
( )
i 1 =

log f a b , x
i
( ) log
i 1 =
n

= =
weibpdf
2-345
2weibpdf
Purpose Weibull pr obabilit y densit y funct ion (pdf).
Syntax Y = weibpdf(X,A,B)
Description Y = weibpdf(X,A,B) comput es t he Weibull pdf at each of t he values in X using
t he cor r esponding par amet er s in A and B. Vect or or mat r ix input s for X, A, and
B must all have t he same size. A scalar input is expanded t o a const ant mat r ix
wit h t he same dimensions as t he ot her input . The par amet er s in A and B must
all be posit ive.
The Weibull pdf is
Some r efer ences r efer t o t he Weibull dist r ibut ion wit h a single par amet er . This
cor r esponds t o weibpdf wit h A = 1.
Examples The exponent ial dist r ibut ion is a special case of t he Weibull dist r ibut ion.
lambda = 1:6;
y = weibpdf(0.1:0.1:0.6,lambda,1)
y =
0.9048 1.3406 1.2197 0.8076 0.4104 0.1639
y1 = exppdf(0.1:0.1:0.6,1./lambda)
y1 =
0.9048 1.3406 1.2197 0.8076 0.4104 0.1639
Reference Devr oye, L., Non-Uniform Random Variate Generation. Spr inger -Ver lag. New
Yor k, 1986.
See Also pdf, weibcdf, weibfit, weibinv, weiblike, weibplot, weibrnd, weibstat
y f x a b , ( ) abx
b 1
e
a x
b

I
0 , ( )
x ( ) = =
weibplot
2-346
2weibplot
Purpose Weibull pr obabilit y plot .
Syntax weibplot(X)
h = weibplot(X)
Description weibplot(X) displays a Weibull pr obabilit y plot of t he dat a in X. If X is a
mat r ix, weibplot displays a plot for each column.
h = weibplot(X) r et ur ns handles t o t he plot t ed lines.
The pur pose of a Weibull pr obabilit y plot is t o gr aphically assess whet her t he
dat a in X could come fr om a Weibull dist r ibut ion. If t he dat a ar e Weibull t he
plot will be linear . Ot her dist r ibut ion t ypes may int r oduce cur vat ur e in t he
plot .
Example r = weibrnd(1.2,1.5,50,1);
weibplot(r)
See Also normplot, weibcdf, weibfit, weibinv, weiblike, weibpdf, weibrnd, weibstat
10
-1
10
0
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.96
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Weibull Probability Plot
weibrnd
2-347
2weibr nd
Purpose Random number s fr om t he Weibull dist r ibut ion.
Syntax R = weibrnd(A,B)
R = weibrnd(A,B,m)
R = weibrnd(A,B,m,n)
Description R = weibrnd(A,B) gener at es Weibull r andom number s wit h par amet er s A
and B. Vect or or mat r ix input s for A and B must have t he same size, which is
also t he size of R. A scalar input for A or B is expanded t o a const ant mat r ix wit h
t he same dimensions as t he ot her input .
R = weibrnd(A,B,m) gener at es Weibull r andom number s wit h par amet er s A
and B, wher e m is a 1-by-2 vect or t hat cont ains t he r ow and column dimensions
of R.
R = weibrnd(A,B,m,n) gener at es Weibull r andom number s wit h par amet er s
A and B, wher e scalar s m and n ar e t he r ow and column dimensions of R.
Devr oye r efer s t o t he Weibull dist r ibut ion wit h a single par amet er ; t his is
weibrnd wit h A = 1.
Examples n1 = weibrnd(0.5:0.5:2,0.5:0.5:2)
n1 =
0.0093 1.5189 0.8308 0.7541
n2 = weibrnd(1/2,1/2,[1 6])
n2 =
29.7822 0.9359 2.1477 12.6402 0.0050 0.0121
Reference Devr oye, L., Non-Uniform Random Variate Generation. Spr inger -Ver lag. New
Yor k, 1986.
See Also weibcdf, weibfit, weibinv, weiblike, weibpdf, weibplot, weibstat
weibstat
2-348
2weibst at
Purpose Mean and var iance for t he Weibull dist r ibut ion.
Syntax [M,V] = weibstat(A,B)
Description [M,V] = weibstat(A,B) r et ur ns t he mean and var iance for t he Weibull
dist r ibut ion wit h par amet er s specified by A and B. Vect or or mat r ix input s for
A and B must have t he same size, which is also t he size of M and V. A scalar input
for A or B is expanded t o a const ant mat r ix wit h t he same dimensions as t he
ot her input .
The mean of t he Weibull dist r ibut ion wit h par amet er s a and b is
and t he var iance is
Examples [m,v] = weibstat(1:4,1:4)
m =
1.0000 0.6267 0.6192 0.6409
v =
1.0000 0.1073 0.0506 0.0323
weibstat(0.5,0.7)
ans =
3.4073
See Also weibcdf, weibfit, weibinv, weiblike, weibpdf, weibplot, weibrnd
a
1
b
---
1 b
1
+ ( )
a
2
b
---
1 2b
1
+ ( )
2
1 b
1
+ ( )
x2fx
2-349
2x2fx
Purpose Tr ansfor m a fact or set t ings mat r ix t o a design mat r ix.
Syntax D = x2fx(X)
D = x2fx(X,'model')
Description D = x2fx(X) t r ansfor ms a mat r ix of syst em input s, X, t o a design mat r ix for a
linear addit ive model wit h a const ant t er m.
D = x2fx(X,'model') allows cont r ol of t he or der of t he r egr ession
model.'model' can be one of t hese st r ings:
'interaction' includes const ant , linear , and cr oss pr oduct t er ms
'quadratic' includes int er act ions and squar ed t er ms
'purequadratic' includes const ant , linear , and squar ed t er ms
Alt er nat ively model can be a mat r ix of t er ms. In t his case, each r ow of model
r epr esent s one t er m. The value in a column is t he exponent t o which t he same
column in X for t hat t er m should be r aised. This allows for models wit h
polynomial t er ms of ar bit r ar y or der .
x2fx is a ut ilit y funct ion for rstool, regstats, and cordexch.
Example x = [1 2 3;4 5 6]'; model = 'quadratic';
D = x2fx(x,model)
D =
1 1 4 4 1 16
1 2 5 10 4 25
1 3 6 18 9 36
Let x
1
be t he fir st column of x and x
2
be t he second. Then t he fir st column of D
is t he const ant t er m, t he second column is x
1
, t he t hir d column is x
2
, t he four t h
column is x
1
x
2
, t he fift h column is x
1
2
, and t he last columns is x
2
2
.
See Also rstool, cordexch, rowexch, regstats
xbarplot
2-350
2xbar plot
Purpose X-bar char t for St at ist ical Pr ocess Cont r ol.
Syntax xbarplot(DATA)
xbarplot(DATA,conf)
xbarplot(DATA,conf,specs,'sigmaest')
[outlier,h] = xbarplot(...)
Description xbarplot(DATA) displays an x-bar char t of t he gr ouped r esponses in DATA. The
r ows of DATA cont ain r eplicat e obser vat ions t aken at a given t ime, and must be
in t ime or der . The gr aph cont ains t he sample mean for each gr oup, a cent er
line at t he aver age value, and upper and lower cont r ol limit s. The limit s ar e
placed at a t hr ee-sigma dist ance on eit her side of t he cent er line, wher e sigma
is an est imat e of t he st andar d deviat ion of . If t he pr ocess is in cont r ol, fewer
t han 3 out of 1000 obser vat ions would be expect ed t o fall out side t he cont r ol
limit s by r andom chance. So if you obser ve point s out side t he limit s, you can
t ake t his as evidence t hat t he pr ocess is not in cont r ol.
xbarplot(DATA,conf) allows cont r ol of t he confidence level of t he upper and
lower plot t ed confidence limit s. The default conf = 0.9973 pr oduces
t hr ee-sigma limit s.
norminv(1 - (1-.9973)/2)
ans =
3
To get k-sigma limit s, use t he expr ession 1-2*(1-normcdf(k)). For example,
t he cor r ect conf value for 2-sigma limit s is 0.9545, as shown below.
k = 2;
1-2*(1-normcdf(k))
ans =
0.9545
xbarplot(DATA,conf,specs) plot s t he specificat ion limit s in t he t wo element
vect or specs.
x
x
x
xbarplot
2-351
xbarplot(DATA,conf,specs,'sigmaest') specifies how xbarplot should
est imat e t he st andar d deviat ion. Accept able values ar e:
's' use t he aver age of t he gr oup st andar d deviat ions (default )
'v' use t he squar e r oot of a pooled var iance est imat e
'r' use t he aver age r ange wit h each gr oup; r equir es 25 or fewer
obser vat ions per gr oup
[outlier,h] = xbarplot(DATA,conf,specs) r et ur ns outlier, a vect or of
indices t o t he r ows wher e t he mean of DATA is out of cont r ol, and h, a vect or of
handles t o t he plot t ed lines.
Example Plot an x-bar char t of measur ement s on newly machined par t s, t aken at one
hour int er vals for 36 hour s. Each r ow of t he runout mat r ix cont ains t he
measur ement s for four par t s chosen at r andom. The values indicat e, in
t housandt hs of an inch, t he amount t he par t r adius differ s fr om t he t ar get
r adius.
load parts
xbarplot(runout,0.999,[-0.5 0.5])
The point s in gr oups 21 and 25 ar e out of cont r ol, so t he mean in t hose gr oups
was higher t han would be expect ed by r andom chance alone. Ther e is evidence
t hat t he pr ocess was not in cont r ol when t hose measur ement s wer e collect ed.
0 5 10 15 20 25 30 35 40
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
21
25
UCL
LCL
CL
Xbar Chart
USL
LSL
Samples
M
e
a
s
u
r
e
m
e
n
t
s
xbarplot
2-352
See Also capaplot, histfit, ewmaplot, schart
zscore
2-353
2zscor e
Purpose St andar dized Z scor e.
Syntax Z = zscore(D)
Description Z = zscore(D) r et ur ns t he deviat ion of each column of D fr om it s mean,
nor malized by it s st andar d deviat ion. This is known as t he Z scor e of D.
For column vect or V, t he Z scor e is Z = (V-mean(V))./std(V).
ztest
2-354
2zt est
Purpose Hypot hesis t est ing for t he mean of one sample wit h known var iance.
Syntax h = ztest(x,m,sigma)
h = ztest(x,m,sigma,alpha)
[h,sig,ci,zval] = ztest(x,m,sigma,alpha,tail)
Description h = ztest(x,m,sigma) per for ms a Z t est at significance level 0.05 t o
det er mine whet her a sample x fr om a nor mal dist r ibut ion wit h st andar d
deviat ion sigma could have mean m.
h = ztest(x,m,sigma,alpha) gives cont r ol of t he significance level alpha. For
example, if alpha = 0.01 and t he r esult is h = 1, you can r eject t he null
hypot hesis at t he significance level 0.01. If h = 0, you cannot r eject t he null
hypot hesis at t he alpha level of significance.
[h,sig,ci] = ztest(x,m,sigma,alpha,tail) allows specificat ion of one- or
t wo-t ailed t est s, wher e tail is a flag t hat specifies one of t hr ee alt er nat ive
hypot heses:
tail = 0 specifies t he alt er nat ive (default )
tail = 1 specifies t he alt er nat ive
tail = -1 specifies t he alt er nat ive
zval is t he value of t he Z st at ist ic
wher e is t he number of obser vat ions in t he sample.
sig is t he pr obabilit y t hat t he obser ved value of Z could be as lar ge or lar ger by
chance under t he null hypot hesis t hat t he mean of x is equal t o m.
ci is a 1-alpha confidence int er val for t he t r ue mean.
x m
x m >
x m <
z
x m
n
--------------- =
n
ztest
2-355
Example This example gener at es 100 nor mal r andom number s wit h t heor et ical mean
zer o and st andar d deviat ion one. The obser ved mean and st andar d deviat ion
ar e differ ent fr om t heir t heor et ical values, of cour se. We t est t he hypot hesis
t hat t her e is no t r ue differ ence.
x = normrnd(0,1,100,1);
m = mean(x)
m =
0.0727
[h,sig,ci] = ztest(x,0,1)
h =
0
sig =
0.4669
ci =
-0.1232 0.2687
The r esult , h = 0, means t hat we cannot r eject t he null hypot hesis. The
significance level is 0.4669, which means t hat by chance we would have
obser ved values of Z mor e ext r eme t han t he one in t his example in 47 of 100
similar exper iment s. A 95% confidence int er val on t he mean is
[-0.1232 0.2687], which includes t he t heor et ical (and hypot hesized) mean of
zer o.
ztest
2-356
I-1
Index
A
absolut e deviat ion 1-45
addit ive effect s 1-73
alt er nat ive hypot hesis 1-105
analysis of var iance 1-23
mult ivar iat e 1-122
N-way 1-76
one-way 1-69
t wo-way 1-73
ANOVA 1-68
anova1 2-17
anova2 2-23
anovan 2-27
aoctool 2-33
aoctool demo 1-161
aver age linkage 2-179
B
bact er ia count s 1-69
barttest 2-36
baseball odds 2-45, 2-47
Ber a-J ar que. S ee J ar que-Ber a
Ber noulli r andom var iables 2-49
bet a dist r ibut ion 1-13
betacdf 2-37
betafit 2-38
betainv 2-40
betalike 2-41
betapdf 2-42
betarnd 2-43
betastat 2-44
binocdf 2-45
binofit 2-46
binoinv 2-47
binomial dist r ibut ion 1-15
negat ive 1-31
binopdf 2-48
binornd 2-49
binostat 2-50
bootstrap 2-51
boot st r ap sampling 1-50
box plot s 1-128
boxplot 2-54
C
capabilit y st udies 1-141
capable 2-56
capaplot 2-58
casenames
r eading fr om file 2-60
wr it ing t o file 2-61
caseread 2-60
casewrite 2-61
cdf 1-7
cdf 2-62
cdfplot 2-63
Cent r al Limit Theor em 1-32
cent r oid linkage 2-179
Chat t er jee and Hadi example 1-85
chi2cdf 2-65
chi2inv 2-66
chi2pdf 2-67
chi2rnd 2-68
chi2stat 2-69
chi-squar e dist r ibut ions 1-17
cir cuit boar ds 2-48
Cit y Block met r ic
in clust er analysis 2-256
classify 2-70
cluster 2-71
In d e x
I-2
clust er analysis 1-53
comput ing inconsist ency coefficient 1-60,
2-155
cr eat ing clust er s fr om dat a 2-73
cr eat ing clust er s fr om linkage out put 1-64,
2-71
cr eat ing t he clust er t r ee 1-56, 2-178
det er mining pr oximit y 1-54, 2-255
evaluat ing clust er for mat ion 1-59, 2-76
for mat t ing dist ance infor mat ion 1-56, 2-308
over view 1-53
plot t ing t he clust er t r ee 1-58, 2-85
clusterdata 2-73
coin 2-122
combnk 2-75
compar isons, mult iple 1-71
complet e linkage 2-179
confidence int er vals
hypot hesis t est s 1-106
nonlinear r egr ession 1-103
cont r ol char t s 1-138
EWMA char t s 1-140
S char t s 1-139
Xbar char t s 1-138
cophenet 2-76
using 1-59
cophenet ic cor r elat ion coefficient 2-76
defined 1-59
cordexch 2-78
corrcoef 2-79
cov 2-80
Cp index 1-142, 2-56
Cpk index 1-142, 2-56
crosstab 2-81
cumulat ive dist r ibut ion funct ion (cdf) 1-7
gr aphing an est imat e 1-134
D
dat a 2-3
ASCII for tblread example 2-16
bact er ia count s 2-15
car mileage 2-15
classificat ion 2-15
dimensional r unout 2-15
gasoline pr ices 2-15
GPA ver sus LSAT 2-15
Hald 2-15
polytool demo 2-16
popcor n 2-15
r eact ion kinet ics 2-16
r egr ession wit h five fact or s 2-15
U.S. census 2-15
U.S. cit ies 2-15
daugment 2-83
dcovary 2-84
demos 1-153, 2-3
design of exper iment s 1-170
polynomial cur ve fit t ing 1-156
pr obabilit y dist r ibut ions 1-154
r andom number gener at ion 1-169
dendrogram 2-85, 2-194
using 1-58
dept h
in clust er analysis 1-61
descr ipt ive st at ist ics 1-43, 2-3
Design of Exper iment s 1-143
D-opt imal designs 1-147
fr act ional fact or ial designs 1-145
full fact or ial designs 1-144
discr et e unifor m dist r ibut ion 1-20
dissimilar it y mat r ix
cr eat ing 1-54
dist r ibut ions 1-2, 1-5
disttool 2-87
In d e x
I-3
disttool demo 1-154
DOE. S ee Design of Exper iment s
D-opt imal designs 1-147
dummyvar 2-88
E
erf 1-32
er r or funct ion 1-32
errorbar 2-89
est imat e 1-157
Euclidean dist ance
in clust er analysis 2-256
EWMA char t s 1-140
ewmaplot 2-90
expcdf 2-92
expfit 2-93
expinv 2-94
exponent ial dist r ibut ion 1-21
exppdf 2-95
exprnd 2-96
expstat 2-97
ext r apolat ed 2-272
F
F dist r ibut ions 1-23
F st at ist ic 1-85
fact or ial designs
fr act ional 1-145
full 1-144
fcdf 2-98
ff2n 2-99
file I/O 2-3
finv 2-100
floppy disks 2-149
fpdf 2-101
fracfact 2-102
friedman 2-106
Fr iedmans t est 1-97
frnd 2-110
fstat 2-111
fsurfht 2-112
fullfact 2-114
fur t hest neighbor linkage 2-179
G
gamcdf 2-115
gamfit 2-116
gaminv 2-117
gamlike 2-118
gamma dist r ibut ion 1-25
gampdf 2-119
gamrnd 2-120
gamstat 2-121
Gaussian 2-146
geocdf 2-122
geoinv 2-123
geomean 2-124
geomet r ic dist r ibut ion 1-27
geopdf 2-125
geornd 2-126
geostat 2-127
gline 2-128
glmdemo 2-129
glmdemo demo 1-172
glmfit 2-130
glmval 2-135
gname 2-137
gplotmatrix 2-139
gr oup mean clust er s, plot 1-127
gr ouped plot mat r ix 1-122
grpstats 2-142
In d e x
I-4
gscatter 2-143
Guinness beer 1-37, 2-316
H
harmmean 2-145
hat mat r ix 1-83
hist 2-146
histfit 2-147
hist ogr am 1-169
Hot ellings T squar ed 1-121
hougen 2-148
Hougen-Wat son model 1-100
hygecdf 2-149
hygeinv 2-150
hygepdf 2-151
hygernd 2-152
hygestat 2-153
hyper geomet r ic dist r ibut ion 1-28
hypot heses 1-23, 2-3
hypot hesis t est s 1-105
I
icdf 2-154
incomplet e bet a funct ion 1-13
incomplet e gamma funct ion 1-25
inconsist ency coefficient 1-61
inconsistent 2-155
using 1-61
inspect or 2-259
int er act ion 1-74
int er polat ed 2-311
int er quar t ile r ange (iqr ) 1-45
inver se cdf 1-8
iqr 2-157
J
J ar que-Ber a t est 2-158
jbtest 2-158
K
kruskalwallis 2-160
Kr uskal-Wallis t est 1-97
kstest 2-164
kstest2 2-169
kurtosis 2-172
L
least squar es 2-267
leverage 2-174
light bulbs, life of 2-94
likelihood funct ion 2-42
Lilliefor s t est 1-107
lillietest 2-175
linear 2-3
linear models 1-68
gener alized 1-91
linkage 2-178
using 1-56
logncdf 2-181
logninv 2-182
lognor mal dist r ibut ion 1-30
lognpdf 2-184
lognrnd 2-185
lognstat 2-186
lot t er y 2-331
lsline 2-187
LU fact or izat ions 2-266
In d e x
I-5
M
mad 2-188
mahal 2-189
Mahalanobis dist ance 2-189
in clust er analysis 2-256
manova1 2-190
manovacluster 2-194
mean 1-11
mean 2-196
Mean Squar es (MS) 2-17
measur es of
cent r al t endency 1-43
disper sion 1-45
median 2-197
Minkowski met r ic
in clust er analysis 2-256
mle 2-198
models
linear 1-68
nonlinear 1-100
moment 2-199
Mont e Car lo simulat ion 2-157
multcompare 2-200
mult iple linear r egr ession 1-82
mult ivar iat e st at ist ics 1-112
mvnrnd 2-207
mvtrnd 2-208
N
nanmax 2-209
nanmean 2-210
nanmedian 2-211
nanmin 2-212
NaNs 1-46
nanstd 2-213
nansum 2-214
nbincdf 2-215
nbininv 2-216
nbinpdf 2-217
nbinrnd 2-218
nbinstat 2-219
ncfcdf 2-220
ncfinv 2-222
ncfpdf 2-223
ncfrnd 2-224
ncfstat 2-225
nctcdf 2-226
nctinv 2-227
nctpdf 2-228
nctrnd 2-229
nctstat 2-230
ncx2cdf 2-231
ncx2inv 2-233
ncx2pdf 2-234
ncx2rnd 2-235
ncx2stat 2-236
near est neighbor linkage 2-179
Newt ons met hod 2-117
nlinfit 2-237
nlintool 2-238
nlintool demo 1-104
nlparci 2-239
nlpredci 2-240
noncent r al F dist r ibut ion 1-24
nonlinear 2-3
nonlinear r egr ession models 1-100
nor mal dist r ibut ion 1-32
nor mal pr obabilit y plot s 1-128, 1-129
nor malizing a dat aset 1-55
using zscore 2-353
normcdf 2-242
normdemo 2-249
normfit 2-243
In d e x
I-6
norminv 2-244
normpdf 2-245
normplot 2-246
normrnd 2-248
normstat 2-250
not at ion, mat hemat ical convent ions xvii
not ches 2-54
null 1-105
null hypot hesis 1-105
O
one-way analysis of var iance (ANOVA) 1-68
out lier s 1-44
P
pareto 2-251
Pascal, Blaise 1-15
PCA. S ee Pr incipal Component s Analysis
pcacov 2-252
pcares 2-253
pdf 1-6
pdf 2-254
pdist 2-255
using 1-54
per cent iles 1-49
perms 2-258
plot s 1-49, 2-3
poisscdf 2-259
poissfit 2-261
poissinv 2-262
Poisson dist r ibut ion 1-34
poisspdf 2-263
poissrnd 2-264
poisstat 2-265
polyconf 2-266
polyfit 2-267
polynomial 1-156
polytool 2-268
polytool demo 1-156
polyval 2-269
popcor n 2-25, 2-108
prctile 2-270
Pr incipal Component s Analysis 1-112
component scor es 1-117
component var iances 1-120
Hot ellings T squar ed 1-121
Scr ee plot 1-120
princomp 2-271
pr obabilit y 2-3
pr obabilit y densit y funct ion (pdf) 1-6
pr obabilit y dist r ibut ions 1-5
p-value 1-75, 1-106
Q
qqplot 2-272
QR decomposit ion 1-83
qualit y assur ance 2-48
quant ile-quant ile plot s 1-128, 1-131
R
random 2-274
r andom number gener at or 1-9
r andom number s 1-9
randtool 2-87, 2-275
randtool demo 1-169
range 2-276
ranksum 2-277
raylcdf 2-278
raylinv 2-279
raylpdf 2-280
In d e x
I-7
raylrnd 2-281
raylstat 2-282
rcoplot 2-283
refcurve 2-284
r efer ence lines 1-154
r efer ences 1-175
refline 2-285
regress 2-286
r egr ession 1-23
nonlinear 1-100
r obust 1-95
st epwise 1-88
regstats 2-288
r elat ive efficiency 2-157
r esiduals 1-86
Response Sur face Met hodology (RSM) 1-86
ridge 2-290
r obust 1-44
r obust linear fit 2-272
robustdemo 2-292
robustdemo demo 1-172
robustfit 2-293
rowexch 2-297
rsmdemo 2-298
rsmdemo demo 1-170
R-squar e 1-85
rstool 2-299
rstool demo 1-86
S
S char t s 1-139
scat t er plot s 1-135
gr ouped 1-122
schart 2-300
Scr ee plot 1-120
segment at ion analysis 1-53
significance level 1-105
signrank 2-302
signtest 2-304
similar it y mat r ix
cr eat ing 1-54
simulat ion 2-157
single linkage 2-179
skewness 1-129
skewness 2-306
SPC. S ee St at ist ical Pr ocess Cont r ol
squareform 2-308
st andar d nor mal 2-245
St andar dized Euclidean dist ance
in clust er analysis 2-256
st at ist ical plot s 1-128
St at ist ical Pr ocess Cont r ol
capabilit y st udies 1-141
cont r ol char t s 1-138
st at ist ical r efer ences 1-175
st at ist ically significant 2-17, 2-160, 2-190
stepwise 1-88, 2-310
st epwise r egr ession 1-88
Sum of Squar es (SS) 2-17
surfht 2-311
symmet r ic 2-115
T
t dist r ibut ions 1-37
noncent r al 1-38
t ab-delimit ed dat a
r eading fr om file 2-317
t abular dat a
r eading fr om file 2-313
tabulate 2-312
t axonomy analysis 1-53
tblread 2-313
In d e x
I-8
tblwrite 2-315
tcdf 2-316
tdfread 2-317
tinv 2-319
tpdf 2-320
trimmean 2-321
trnd 2-322
tstat 2-323
ttest 2-324
ttest2 2-326
t wo-way ANOVA 1-73
t ypogr aphical convent ions (t able) xviii
U
unbiased 2-309, 2-339
unidcdf 2-328
unidinv 2-329
unidpdf 2-330
unidrnd 2-331
unidstat 2-332
unifcdf 2-333
unifinv 2-334
unifit 2-335
unifor m dist r ibut ion 1-39
unifpdf 2-336
unifrnd 2-337
unifstat 2-338
V
var 2-339
var iance 1-11
W
war d linkage 2-180
weibcdf 2-341
weibfit 2-342
weibinv 2-343
weiblike 2-344
weibpdf 2-345
weibplot 2-346
weibrnd 2-347
weibstat 2-348
Weibull dist r ibut ion 1-40
Weibull pr obabilit y plot s 1-133
Weibull, Waloddi 1-40
whisker s 1-129, 2-54
X
x2fx 2-349
Xbar char t s 1-138
xbarplot 2-350
Z
zscore 2-353
ztest 2-354

Potrebbero piacerti anche