Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The SAS System Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Name CHRIS MARK SARAH PAT JOHN WILLIAM ANDREW BENJAMIN JANET STEVE JENNIFER JOY MARY TOM BETH Division H H S H H H S S S H S S S S H Years 2 5 6 4 7 11 24 3 1 21 1 12 14 5 12 Sales 233.11 298.12 301.21 4009.21 678.43 3231.75 1762.11 201.11 98.11 6153.32 542.11 2442.22 5691.78 5669.12 4822.12 Expense 94.12 52.65 65.17 322.12 150.11 644.55 476.13 25.21 125.32 1507.12 134.24 761.98 2452.11 798.15 982.10 State WI WI MN IL WI MN MN IL WI WI IL WI WI MN WI
2997 Yarmouth Greenway Drive Madison, WI 53711 (608) 278-9964 train@sys-seminar.com Introduction
Free SAS Newsletter Our popular publication, The Missing Semicolon, shares SAS software solutions developed by our staff and provides additional technical assistance to our customers.
SAS Training Services For over 1,000 students each year, we make SAS software easier to understand, use, and support. Public training schedules are posted on our web site. Private on-site training options are also available.
Introduction 3
Over 30 years of SAS experience, including hundreds of manufacturing, retail, government, marketing, and financial applications. Over 25 years as President and Founder of SSC Founder of WISAS and WISUG Invited speaker at local, regional, and international user groups
5
Introduction
Objectives
Identify the components of SAS. Review SAS history and design. Use SAS pre-written procedures (PROCs) to analyze data. Sort data. Count values using PROC FREQ. Compute simple statistics such as mean, std, sum. Summarize data values at different classification levels. Format SAS data values.
Introduction
Introduction to SAS
SAS is an integrated computer system for data analysis and reporting.
Introduction
Introduction
SAS Features
SAS is a full-featured computer system. Input to SAS can come from: raw files non-SAS database management systems such as Oracle, DB2 data lines included in the SAS program other SAS datasets already stored on your computer SAS datasets located on remote computers
Introduction
Introduction
10
Introduction
11
Introduction
12
Structure of SAS
SAS consists of: 1. a data handling language (DATA step) 2. a library of pre-written procedures ( PROC step)
S A S STATEMENTS
S A S SUPERVISOR
RAW DATA
REPORT
Introduction
13
descriptor Name Lbs Kilos --------------Tom 150 67.5 Julie 93 41.8 Lois 88 39.6
Means Std
Reports
Notes: RUN; marks the end of a SAS step. More steps may follow.
Introduction 14
get the data in shape for analysis SAS/ACCESS other SAS dataset instream data raw disk, tape reports DBMS, program products
Notes: DATA step output is usually a SAS dataset but can be other files. Access to non-SAS database management systems requires the SAS/ACCESS product.
Introduction
15
create your own formats and informats copy, maintain SAS datasets and libraries
Introduction
16
ODS
Introduction
17
data softsale; infile rawin; input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; run;
SAS Dataset softsale Descriptor Name Division Years Sales CHRIS . BETH H . H 2 . 12 233.11 . 4822.12
Expense
94.12 . 982.10
State
WI . WI
Introduction
18
A DATALINES Example
You can enter your data as part of the DATA step.
Program
data softsale; input Name $1-10 Division Sales 19-25 Expense datalines; CHRIS H 2 233.11 MARK H 5 298.12 SARAH S 6 301.21 PAT H 4 4009.21 . . . . BETH H 12 4822.12 ; run; proc print data=softsale; run;
Raw input
data
Program
Notes: The semicolon (;), that terminates the instream data, must be on a separate line.
Introduction 19
Introduction
20
SAS compiler
150 93 88
SAS dataset metric descriptor Name Lbs Kilos --------------Tom 150 67.5 Julie 93 41.8 Lois 88 39.6
Introduction
21
infile rawin;
if at EOF then stop
run;
End
fileref rawin2
data softsal2; infile rawin2; input Name $ Division $ Years Sales Expense State $; run; proc print data=softsal2; run;
Introduction 23
End
@12 @15 @19 @28 @36
data softsale; infile rawin; input @1 Name $10. @12 Division $1. @15 Years 2. @19 Sales 7.2 @28 Expense 7.2 @36 State $2.; run; proc print data=softsale; run;
Introduction
fileref rawin
1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN PAT H 4 4009.21 322.12 IL JOHN H 7 678.43 150.11 WI WILLIAM H 11 3231.75 644.55 MN ANDREW S 24 1762.11 476.13 MN BENJAMIN S 3 201.11 25.21 IL . . . . . . BETH H 12 4822.12 982.10 WI
$10.
$1.
2.
7.2
7.2
$2.
24
Introduction
25
Notes: If a BY statement is coded, the data must be sorted or indexed. BY does not do the sort; it only recognizes the sort order. The BY statement can list several variables.
Introduction 26
Introduction
27
Features: single or multiple sort variables ascending or descending sequence does not print any output missing values are lowest value to PROC SORT records that are exact duplicates or have duplicate keys can be ignored
Introduction
28
H S H
2 5 12
Sortwork areas
Introduction
29
Introduction
30
Obs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Introduction
Name
CHRIS MARK SARAH PAT JOHN WILLIAM ANDREW BENJAMIN JANET STEVE JENNIFER JOY MARY TOM BETH
Division
H H S H H H S S S H S S S S H
Years
2 5 6 4 7 11 24 3 1 21 1 12 14 5 12
Sales
233.11 298.12 301.21 4009.21 678.43 3231.75 1762.11 201.11 98.11 6153.32 542.11 2442.22 5691.78 5669.12 4822.12
Expense
94.12 52.65 65.17 322.12 150.11 644.55 476.13 25.21 125.32 1507.12 134.24 761.98 2452.11 798.15 982.10
State
WI WI MN IL WI MN MN IL WI WI IL WI WI MN WI 31
More Enhancing
Add a footnote.
proc print data=softsale noobs; title 'Softco Inc. Sales Summary'; footnote 'As of January 2006'; run;
Introduction
32
As of January 2006
Introduction
33
Introduction
34
The Results
Softco Inc. Sales Summary
Name CHRIS MARK SARAH PAT JOHN WILLIAM ANDREW BENJAMIN JANET STEVE JENNIFER JOY MARY TOM BETH Sales 233.11 298.12 301.21 4009.21 678.43 3231.75 1762.11 201.11 98.11 6153.32 542.11 2442.22 5691.78 5669.12 4822.12
As of January 2006
Introduction
35
title 'Softco Inc. Sales Summary '; footnote 'As of January 2006'; var name sales; run;
Introduction
36
As of January 2006
Introduction
37
A LABEL Example
The LABEL option along with LABEL statements give more descriptive labels as column headers.
proc print data=softsale label; title 'Softco Inc. Sales Summary'; label sales='Employee Sales'; footnote 'As of January 2006'; var name sales; run;
Notes: PROC PRINT requires the LABEL option AND a LABEL must be assigned.
Introduction 38
As of January 2006
Notes: SAS may split the header at a blank or an upper/lower case change.
Introduction 39
--------------------------------- Division=S --------------------------------Obs 8 9 10 11 12 13 14 15 Name SARAH ANDREW BENJAMIN JANET JENNIFER JOY MARY TOM Years 6 24 3 1 1 12 14 5 Sales 301.21 1762.11 201.11 98.11 542.11 2442.22 5691.78 5669.12 Expense 65.17 476.13 25.21 125.32 134.24 761.98 2452.11 798.15 State MN MN IL WI IL WI WI MN
Introduction
40
Introduction
41
6 24 3 1 1 12 14 5
MN MN IL WI IL WI WI MN
Introduction
42
proc sort data=softsale; by division state; run; proc print data=softsale; title 'Softco Sales Summary'; id division state; by division state; sum sales expense; run;
Introduction
43
-------H H
S -------S
----WI
IL ----IL
BENJAMIN JENNIFER
3 1
. . .
Introduction
44
Introduction
45
Softsale
Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Name CHRIS MARK SARAH PAT JOHN WILLIAM ANDREW BENJAMIN JANET STEVE JENNIFER JOY MARY TOM BETH Division H H S H H H S S S H S S S S H Years 2 5 6 4 7 11 24 3 1 21 1 12 14 5 12 Sales 233.11 298.12 301.21 4009.21 678.43 3231.75 1762.11 201.11 98.11 6153.32 542.11 2442.22 5691.78 5669.12 4822.12 Expense 94.12 52.65 65.17 322.12 150.11 644.55 476.13 25.21 125.32 1507.12 134.24 761.98 2452.11 798.15 982.10 State WI WI MN IL WI MN MN IL WI WI IL WI WI MN WI
Introduction
46
Notes: Percentages may also be desired. We probably would not count identifiers (Name), or continuous variables (Sales, Expense) without some kind of summarizing.
Introduction 47
A One-Way Table
TABLE State will give counts and percentages for State.
proc freq data=softsale; title 'Softco State Distributions'; table state; run;
Softco State Distributions The FREQ Procedure Cumulative Cumulative State Frequency Percent Frequency Percent IL 3 20.00 3 20.00 MN 4 26.67 7 46.67 WI 8 53.33 15 100.00
Introduction
48
Introduction
49
Introduction
50
Introduction
51
Two-way Tables
Specify two variables to see combinations between them.
proc freq data=softsale; title 'A Two Way Table'; table division * state; run;
Introduction
52
PROC FREQ DATA=USEDCAR2; Table of Division by TABLE S MAKE MILEAGE; TITLE 'ACME CROSS TABULATIONS'; Division State RUN;
Frequency Percent Row Pct Col Pct IL MN WI Total H 1 1 5 7 6.67 6.67 33.33 46.67 14.29 14.29 71.43 33.33 25.00 62.50 S 2 3 3 8 13.33 20.00 20.00 53.33 25.00 37.50 37.50 66.67 75.00 37.50 Total 3 4 8 15 20.00 26.67 53.33 100.00
Introduction
53
Introduction
55
1 2 3 4 5 6
H H H S S S
IL MN WI IL MN WI
1 1 5 2 3 3
Notes: Coding NOPRINT on the table statement eliminates the default PROC FREQ report.
Introduction 56
Introduction
57
Introduction
58
PROC MEANS/SUMMARY
PROC MEANS and PROC SUMMARY condense and summarize SAS datasets.
PROC SUMMARY and PROC MEANS Features: compute selected univariate statistics CLASS and BY variables define subgroups CLASS does not require sorting SUMMARY gives summary statistics in the output dataset (default) SUMMARY allows the option to print the output MEANS defaults to producing a report MEANS can also produce an output dataset
Notes: MEANS usually does a report; SUMMARY usually produces a dataset. All other options and statements are virtually the same.
Introduction 59
PROC Means/Summary
OBS 1 2 . . . . . . . var1 var2 var3 var5 var6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . varn . . . . . . . . . .
PROC SUMMARY
Introduction
60
suppress report (PROC SUMMARY default) limit number of decimals for statistics calculate number of non-missing values calculate number of missing values calculate only the highest level of interaction calculate mean calculate highest value calculate lowest value compute difference between min and max
61
Optional statements (partial list): BY variable(s); CLASS variable(s); VAR variable(s); OUTPUT OUT= SASdataset options; WHERE where condition;
Introduction 62
Introduction
63
Introduction
64
Introduction
65
Or another way of asking it might be: 1. How do we classify the data? 2. Which statistics do we want?
Introduction
67
proc print data=sumds; title 'First Department Store Sales Summary'; run;
Introduction 68
Introduction
69
Other Applications
Other applications appropriate for PROC MEANS/SUMMARY might be: any system needing to reduce detail testing and validation while testing summarizing using different class levels
Introduction
70
FORMATS can be specified in the DATA step. FORMATS can be overridden in the PROC step. You can use a SAS-written format. You can create user-written formats.
Syntax:
Notes: LABEL, introduced earlier, changes the way variable names display.
Introduction
71
What is a FORMAT?
A FORMAT is a routine that all variables pass through when output. Example: 1. Sales has a value of 9432.159 2. Print Sales with dollar signs and commas. 3. Allow a total of 10 digits (including $ , .). 4. Print two digits after the decimal point.
format sales dollar10.2;
0009432 15900000 ^
dollar format
Logic to insert $ , .
$9,432.16
standard numeric numeric with decimal SAS chooses best notation commas in numbers insert dollar sign, commas scientific notation fractions numeric hexadecimal integer binary packed decimal Roman numerals Social Security Numbers words with fractions numbers as words
73
Notes: w. values specify the width to allow for output. For numerics, if you don't give a format, SAS uses BESTw. format.
Introduction
74
data softsale; BETH H 12 4822.12 982.10 infile rawin; input @1 Name $10. @12 Division $1. @15 Years 2. @19 Sales 7.2 @28 Expense 7.2 @36 State $2.; format name $4. sales comma9.2 expense comma9.2; run; proc print data=softsale; title 'Proc Print with formats'; run;
Introduction
75
Notes: These formats are stored in the data descriptor. All later steps will use these formats unless overridden.
Introduction 76
PICTURE
Introduction
77
Introduction
78
1 6085552424 2 3123432424
proc format ; value respfmt 1='agree' 2='disagree'; picture phonefmt low-high='999/999-9999'; run; data phonelst; infile rawin; input Response Phone; run; proc print data=phonelst; format response respfmt. phone phonefmt. ; title 'Proc print with user formats'; run;
Introduction 79
Notes: FORMATS don't change the values, they just change the way the values are printed.
Introduction 80
Introduction
82
data softsale; infile rawin; input @1 Name $10. @12 Division $1. @15 Years 2. @19 Sales 7.2 @28 Expense 7.2 @36 State $2.; run;
Introduction
83
Introduction
84
Introduction
85
Introduction
86
Introduction
87
Is there some way to read the format recoding from a file rather than typing the values?
Introduction
88
Syntax:
PROC FORMAT CNTLIN=sasdataset; RUN;
Introduction
89
Introduction
90
A CNTLIN Example
We can build a format from the following flat file.
data formds; infile dfile; input @1 Start $1. @4 Label $12.; retain Fmtname '$divfmt'; run; proc print data=formds; title 'Format Dataset'; run; proc format cntlin=formds; run;
fileref dfile
H P S
Introduction
91
1 2 3
H P S
Introduction
92
Summary
SAS is a large, comprehensive system The Base product has wonderful design and tools And provides a solid beginning.
Introduction
93
Contact Us
Introduction
94