Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
or
Usually a “Getting Started with SAS” dialog box appears on opening SAS.
Selecting “Start Guides” will open up the “SAS Help and Documentation”
window. Feel free to explore these “Getting Started with SAS” help-pages
(select “New SAS Programmer (quick-start guide)” from the drop-down
menu).
NB:
1
SOR 202 SAS Laboratory Session 1
When you first start SAS, the five main SAS windows open, namely the:
• Explorer,
• Results,
• Program Editor or Editor,
• Log, and
• Output windows.
This quick walkthrough shows you how each of these windows is used.
The results and explorer windows are usually ‘docked’ on the left-hand side.
NB: If you accidentally close any of these windows they can be reopened
through the “View” Menu. Also, an individual window can be docked by first
selecting it and then selecting “Docked” from the “Windows” menu.
2
SOR 202 SAS Laboratory Session 1
In the Explorer window, you can view and manage your SAS files and create
shortcuts to files that are not formatted by SAS. Use this window to:
Use the “Up one level” icon to navigate through the libraries as necessary
(also available through the “View” menu).
3
SOR 202 SAS Laboratory Session 1
You can use the Editor Window to enter, edit, and submit SAS programs:
The initial Editor Window title is Editor - Untitledn. When you open a file or
save the contents of the Editor window to a file (to a *.sas file), the window
title changes to reflect that file name.
When the contents of the Editor window are modified, an asterisk is added to
the title to indicate the contents have not been saved in their current form.
You can have multiple Editor windows open at the same time.
The Log Window displays messages about your SAS session and any SAS
programs that you submit.
4
SOR 202 SAS Laboratory Session 1
The Output Window displays the output from SAS programs that you submit.
It automatically opens or moves to the front of your display when you create
output.
The Results Window helps you navigate and manage output from SAS
programs that you submit. You can view, save, and print individual items of
output.
5
SOR 202 SAS Laboratory Session 1
Before you can work with your data in SAS, it must be in a special form called
a SAS dataset. So understanding SAS datasets is the first step in learning
about SAS programming.
Importing Data
To read PC database files, you use the IMPORT procedure. PROC IMPORT
reads the input file and writes the data to a SAS dataset, with the SAS
variables defined based on the input records.
There is a SAS Import Wizard available from the main “File” Menu.
Example: import the Excel file ExamResults.xls into SAS using the Import
Wizard.
6
SOR 202 SAS Laboratory Session 1
• Select the “Browse” button and navigate to where you have stored
the Excel file within your home directory.
• The workbook contains a single worksheet named “Results”, and
this is the table you wish to import (keep default settings within
“Options…”).
• The next step is to indicate to SAS where you wish to import this
data to within SAS. Selecting the Work library will store the SAS
dataset in this temporary library. The name to be given to the
dataset is indicated under member (here the dataset has been
named “EXAMRESULTS”).
There are several things to note now this data has been imported:
7
SOR 202 SAS Laboratory Session 1
8
SOR 202 SAS Laboratory Session 1
Data can also be input as a SAS dataset from the keyboard or read from a file
using a DATA statement. How the SAS code will look depends upon the
structure of the data being input.
Example: An example piece of SAS code is given below for creating a SAS
dataset from the data stored in the (space-delimited) text file cancer.txt
(downloadable from Queen’s Online):
data cancer;
infile "H:\SOR202\cancer.txt" firstobs=2;
input obs_no id time status stain $ ;
run;
• The first line of the portion of code above indicates to SAS you want to
generate a SAS dataset called cancer (use an appropriate name for
SAS datasets you create e.g. trial, company, drug – a name with 32 or
fewer characters)
• NB: the semicolon ; at the end of each of the statements.
• The INFILE command indicates the file that the data can be retrieved
from and is of the form “pathname\filename”.
• NB: It is good programming practice to indent lines between the data
statement and the run statement. Noteto execute any series of
commands or statements within SAS you must include a RUN
statement.
• The firstobs=2 command lets SAS know that the first observation for
this data occurs in row 2 of the text file (row 1 of the text file contains the
variable labels).
9
SOR 202 SAS Laboratory Session 1
Variables in a dataset
Variable names can contain from 1-32 characters – they can contain
numbers, but names must begin with a letter.
• Here SAS assumes that the variables of the observations are separated
in the text file by blank spaces (this is its default assumption). However
this can be changed using the delimiter option within the INFILE
statement. For example the text file cancer_commas.txt (downloadable
from Queen’s Online) contains the same data as cancer.txt, but the
variables in each row are separated by commas. To load this text file
into SAS the INFILE statement would need to be modified as follows:
data cancer2;
infile "H:\SOR202\cancer_commas.txt" firstobs=2 delimiter=",";
input obs_no id time status stain $ ;
run;
i) Fixed Format
In the case of files where variables are in a fixed format (particular columns of
the file correspond to particular variables, and this holds for every row of the
file), the INPUT statement seen above can be modified in order to clarify
where variables can be found in each row of the file.
Example: An example piece of SAS code is given below for creating a SAS
dataset from the data stored in the text file hypernephroma.txt (downloadable
from Queen’s Online):
data hyper;
infile "H:\SOR202\hypernephroma.txt" firstobs=2;
input treatment $1-14 status 17 time 25-27 age $32-36;
run;
If a datafile contains more than one line per observation, the input statement
needs to indicate the line number (using a # symbol) before specifying the
variables on that line e.g.
input id 1-3 company 8-10 #2 insal 6-10 finalsal 18-23 #3 retire 15-
19;
10
SOR 202 SAS Laboratory Session 1
• The above input statement informs SAS that there are 3 lines of
variables for each observation.
• In the first row of each observation the variable id is located in columns
1 to 3 while the variable company is located in columns 8 to 10.
• In the second row of each observation, the variable insal is located in
columns 6 to 10 while the variable finalsal is located in columns 18 to
23 etc…..
Input statements can also be written in a shorter form with a mixed style e.g.
input id 1-2 sex $ 3 (exp school) (1.) (C1-C10) (1.) (M1-M10) (1.)
(MATHSCOR COMPSCOR) (2.);
• For the above statement the variable id is read from columns 1-2 and
sex from column 3. The next two variables exp and school have a
width of 1 column each and start at column 4. The variables C1-C10
(10 variables in sequential order) have a width of one column each (in
columns 6-15 of the datafile). The variables M1-M10 have a width of
one column each (in columns 16-25 of the datafile). The last two
variables MATHSCOR and COMPSCOR have a width of two columns each
starting at column 26.
• If you wish to skip data within a datafile (e.g. in the above only read in
the variables id and the last two variables MATHSCOR and COMPSCOR) you
could use the @symbol within the INPUT statement (the @ moves the
pointer to column 26 in this example):
To input data directly from within SAS, the command datalines is utilised. For
example the commands below read in a dataset called hsb10 which contains
10 records, 11 variables, 10 of which are numeric, and 1 is of type character.
data hsb10;
input id female race ses schtype $ prog read write math science
socst;
datalines;
147 1 1 3 pub 1 47 62 53 53 61
108 0 1 2 pub 2 34 33 41 36 36
18 0 3 2 pri 3 50 33 49 44 36
153 0 1 2 pub 3 39 31 40 39 51
50 0 2 2 pub 2 50 59 42 53 61
51 1 2 1 pub 2 42 36 42 31 39
102 0 1 1 pub 1 52 41 51 53 56
57 1 1 2 pub 1 71 65 72 66 56
160 1 1 2 pub 1 55 65 55 50 61
136 0 1 2 pub 1 65 59 70 63 51
;
run;
11
SOR 202 SAS Laboratory Session 1
Label Statement
You can use a LABEL statement to give labels to variables – while a SAS
variable name is limited to 32 characters, the label (which is used in any
output for this variable) can have up to 256 characters including blanks.
Labels should be enclosed in quotes and the LABEL step terminated by a
semicolon e.g. add the following label statements to the hsb10 dataset:
These associate formats with variables in a dataset. For example in the hsb10
dataset the variable female has two values – 1 indicates the person is a
female, while 0 indicates the person is male. To associate these values with
appropriate value labels a proc format statement is used, which is then
referenced within the data statements by a format statement. Note that the
proc format statement (associating the values) must be run prior to the data
step using the formats.
proc format;
value female 1="female" 0="male";
value $schtype "pub"="public school" "pri"="private school";
run;
data hsb10;
input id female race ses schtype $ prog read write math science
socst;
format female female. schtype $schtype.;
…………etc
Comment Statements
/* This is a comment */
/* So is this */
/* This comment
spans several lines */
12