Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
SAS® software
Acknowlegements to
David Williams
Caroline Brophy
Statistics
in
Science
Need to know
– SAS environment
– SAS programs
How to:
Get data in
Manipulate data
Statistics
in
Science
SAS software environment
Statistics
in
Science
SAS Windows (SAS 9)
Statistics
in
Science
Some (!) SAS windows
– Editor
Where code is written or imported, and submitted
– Log
What happened, including what went wrong
– Output
Results of program procedures that produce output
– Explorer
Shows libraries (SAS & Windows), their files, and where you can see
data, graphs
– Results
Shows how the output is made up of tables, graphs, datasets etc
– Notepad
A useful place to keep bits of code
Statistics
in
Science
SAS software programs
Statistics
in
Science
SAS Programs
data one;
input x y;
datalines;
-3.2 0.0024 DATA step
-3.1 0.0033 creates SAS data set
. . .
;
run;
Statistics
in
Science
Step Boundaries
SAS steps begin with a
DATA statement
PROC statement.
Statistics
in
Science
Submitting a SAS Program
Statistics
in
Science
Recommended steps!
1) Submit all (or selected) code by
F4
Click on the runner in the toolbar
2) Read log
3) Look in output window
if you expect code to produce output
4) Problems
Bad syntax
Missing ; at end of line
Missing quote ’ at end of title (nasty!)
Statistics
in
Science
Improved output - HTML
Statistics
in
Science
SAS data sets
Statistics
in
Science
SAS data sets
Statistics
in
Science
SAS data sets
• live in libraries
Statistics
in
Science
work library
Statistics
in
Science
Don’t loose your data!
Keep the SAS program that read the data from its
original source
. . . More later!
Statistics
in
Science
Viewing descriptor & data
Alternatively:
Use SAS Explorer: Open (for data) Properties (for descriptor)
Properties is not as clear as CONTENTS
Statistics
in
Science
SAS variables
There are two types of variables:
• character contain any value: letters, numbers, special
characters, and blanks.
Character values are stored with a length of 1 to 32,767
bytes (default is 8).
One byte equals one character.
Statistics
in
Science
SAS variables
OUTPUT
Statistics
in
Science
SAS names
– for data sets & variables
• can be 32 characters long.
Statistics
in
Science
Missing Data Values
A value must exist for every variable for each observation.
Missing values are valid values.
LastName FirstName JobTitle Salary
Statistics
in
Science
Comments
Statistics
in
Science
SAS
Statistics
in
Science
Getting data in!
Consider 2 methods
Statistics
in
Science
Getting data in!
Data in program file:
data oz;
input oz $ rad wt;
datalines;
Low 118.4 0.7
High 109.1 1.3
Low 215.2 2.9
. . . Note:
;
1. oz is text variable so requires $
run;
2. No missing values
3. Values of oz
• don’t contain spaces
• are at most 8 character long
Statistics
in
Science
Getting data in!
from Excel
Statistics
in
Science
Creating new variables
Adding a new variable to an existing SAS data
set (say work.old)
1. Use set
2. Give definition of new variable
data new;
/* read data from work.old */
set old;
y2 = y**2;
ly = log(y);
ly_base10 = log10(y);
t1 = (treat = 1);
run;
Statistics
in
Science
Data set: work.new
Statistics
in
Science
Data Screening
Statistics
in
Science
Data Screening
checking input data for gross errors
• Use PRINT procedure to scan for obvious anomalies
• Use MEANS procedure & examine summary table
– MAXIMUM, MINIMUM – reasonable?
– MEAN - near middle of range?
– MISSING VALUES - input or calculation error e.g.
log(0)?
– CV (= 100*std.dev/mean) - < 10% for plant growth,
between 12 & 30% for animal production variables, >
50% implies skewness for any positive variable
Statistics
in
Science
SAS syntax
MEANS syntax
Statistics
in
Science
Dealing with data errors
• Check original records
• Change mistakes in recording where the correct
value is beyond question
• Regenerate observations where possible – e.g.
reweigh sample, redo chemical analysis
• With a large body of data in an unbalanced
design err on the side of omitting questionable
data
Statistics
in
Science