Sei sulla pagina 1di 12

SOR 202 SAS Laboratory Session 1

Introduction to SAS: Tutorial 1


SAS (Statistical Analysis System) is a statistical analysis and data
management software package. SAS can take data from almost any type
(and size) of file. The analyses that can be performed by the SAS software
are extensive ranging from the generation of basic descriptive statistics to
complex statistical analyses. The software is capable of producing tabulated
reports, charts, and plots of distributions and trends with ease.

SAS for Windows


To open SAS either:

• Double-click the shortcut icon on the Desktop (if available),

or

• From the Start Menu select:

Start Programs General Apps SAS SAS 9.1 (English)

Usually a “Getting Started with SAS” dialog box appears on opening SAS.
Selecting “Start Guides” will open up the “SAS Help and Documentation”
window. Feel free to explore these “Getting Started with SAS” help-pages
(select “New SAS Programmer (quick-start guide)” from the drop-down
menu).

NB:

While these hand-outs aim to give an introductory description of using SAS,


the help facilities within SAS will provide a more complete description of all of
the procedures that can be utilized within it, and contain many illustrative
examples. The SAS Help and Documentation Window can be opened from
the main SAS window through the “Help” menu, selecting “SAS Help and
Documentation”.

1
SOR 202 SAS Laboratory Session 1

Overview of SAS Windows

When you first start SAS, the five main SAS windows open, namely the:

• Explorer,
• Results,
• Program Editor or Editor,
• Log, and
• Output windows.

This quick walkthrough shows you how each of these windows is used.

The results and explorer windows are usually ‘docked’ on the left-hand side.

NB: If you accidentally close any of these windows they can be reopened
through the “View” Menu. Also, an individual window can be docked by first
selecting it and then selecting “Docked” from the “Windows” menu.

2
SOR 202 SAS Laboratory Session 1

The Explorer Window

In the Explorer window, you can view and manage your SAS files and create
shortcuts to files that are not formatted by SAS. Use this window to:

• create new SAS libraries and SAS files


• open any SAS file
• perform most file management tasks such as moving, copying, and
deleting files
• create file shortcuts.

Double clicking the


“Libraries” icon allows you
to view the libraries (and
within these the datasets)
you are able to access.

The Work library is the


default library that new
datasets are stored to
(when no specified library
is named). Note however
the data stored in this
library is removed at the
end of each SAS session.

The Sashelp library is a


permanent library that
contains sample data and
other files that control
how SAS works at your
site. This is a read-only
library.

The Sasuser library is a


permanent library, and is
a convenient place to
store your own files.

Use the “Up one level” icon to navigate through the libraries as necessary
(also available through the “View” menu).

NB: The following icon is used to illustrate a SAS dataset:

3
SOR 202 SAS Laboratory Session 1

The Editor Window

You can use the Editor Window to enter, edit, and submit SAS programs:

The initial Editor Window title is Editor - Untitledn. When you open a file or
save the contents of the Editor window to a file (to a *.sas file), the window
title changes to reflect that file name.

When the contents of the Editor window are modified, an asterisk is added to
the title to indicate the contents have not been saved in their current form.

You can have multiple Editor windows open at the same time.

The Log Window

The Log Window displays messages about your SAS session and any SAS
programs that you submit.

4
SOR 202 SAS Laboratory Session 1

The Output Window

The Output Window displays the output from SAS programs that you submit.
It automatically opens or moves to the front of your display when you create
output.

The Results Window

The Results Window helps you navigate and manage output from SAS
programs that you submit. You can view, save, and print individual items of
output.

5
SOR 202 SAS Laboratory Session 1

Data Entry in SAS

Before you can work with your data in SAS, it must be in a special form called
a SAS dataset. So understanding SAS datasets is the first step in learning
about SAS programming.

Importing Data

If you have PC database files such as Microsoft Excel spreadsheets, or


Microsoft Access files, you can use SAS to import these files and create SAS
datasets. Once you have the data in SAS datasets, you can process them as
needed in SAS. (NB: In a similar way, you can also export SAS data to a
number of PC file formats.)

To read PC database files, you use the IMPORT procedure. PROC IMPORT
reads the input file and writes the data to a SAS dataset, with the SAS
variables defined based on the input records.

There is a SAS Import Wizard available from the main “File” Menu.

Example: import the Excel file ExamResults.xls into SAS using the Import
Wizard.

• ExamResults.xls can be downloaded from Queen’s Online and saved


within your home directory (or a sub-directory of your home directory
e.g. a sub-directory called “SOR202”).
• To import this data into SAS select “Import Data…” from the “File”
menu.
• The data source type in this case is: Microsoft Excel 97, 2000 or 2002
Workbook”. Select “Next” once is has been chosen from the drop-
down menu.

6
SOR 202 SAS Laboratory Session 1

• Select the “Browse” button and navigate to where you have stored
the Excel file within your home directory.
• The workbook contains a single worksheet named “Results”, and
this is the table you wish to import (keep default settings within
“Options…”).
• The next step is to indicate to SAS where you wish to import this
data to within SAS. Selecting the Work library will store the SAS
dataset in this temporary library. The name to be given to the
dataset is indicated under member (here the dataset has been
named “EXAMRESULTS”).

• The final component of the Import Wizard enables the appropriate


SAS code to be generated - in case you wish to import this data
again, without having to use the Import Wizard again. Save the
code within your home directory to a *.sas file e.g.
ImportResults.sas. Select “Finish” in order to import the data.

There are several things to note now this data has been imported:

• The Log Window provides details of whether the dataset was


successfully created or not.
• Navigating through the Libraries via the Explorer Window you
should now be able to see the dataset “Examresults” within the
“Work” library.

7
SOR 202 SAS Laboratory Session 1

• Double-clicking on the “Examresults” dataset icon enables you


to view the contents of the imported data within the dataset.

NB: This Viewtable MUST be closed if you wish to perform other


manipulations on the dataset.

• It is possible to delete this dataset from the work library by right-


clicking on the “Examresults” dataset icon and selecting “Delete”
(ensure the dataset is not being viewed as a Viewtable at the
same time).
• The code used to import the Excel data into the SAS dataset
“ExamResults” can be viewed by opening the .sas file. Select
“Open Program…” under the File menu, and navigate to where
the .sas file ImportResults.sas has been stored in your home
directory.

• To re-generate the dataset “ExamResults” highlight the


portion of sas code which you wish to run (see below) and
then select the submit icon:

8
SOR 202 SAS Laboratory Session 1

The dataset “ExamResults” should be re-generated through this


code (the Log Window will indicate if the code was successful) and
can be viewed via the Explorer Window in the Work library.

Creating a New Dataset using the DATA statement

Data can also be input as a SAS dataset from the keyboard or read from a file
using a DATA statement. How the SAS code will look depends upon the
structure of the data being input.

Example: An example piece of SAS code is given below for creating a SAS
dataset from the data stored in the (space-delimited) text file cancer.txt
(downloadable from Queen’s Online):

data cancer;
infile "H:\SOR202\cancer.txt" firstobs=2;
input obs_no id time status stain $ ;
run;

• The first line of the portion of code above indicates to SAS you want to
generate a SAS dataset called cancer (use an appropriate name for
SAS datasets you create e.g. trial, company, drug – a name with 32 or
fewer characters)
• NB: the semicolon ; at the end of each of the statements.
• The INFILE command indicates the file that the data can be retrieved
from and is of the form “pathname\filename”.
• NB: It is good programming practice to indent lines between the data
statement and the run statement. Noteto execute any series of
commands or statements within SAS you must include a RUN
statement.
• The firstobs=2 command lets SAS know that the first observation for
this data occurs in row 2 of the text file (row 1 of the text file contains the
variable labels).

9
SOR 202 SAS Laboratory Session 1

• The INPUT statement provides SAS with details of the variables


contained within the file: here, this dataset contains 5 variables.
• The $ sign following the final variable stain indicates that this is an
alphanumeric variable.

Variables in a dataset
Variable names can contain from 1-32 characters – they can contain
numbers, but names must begin with a letter.

• Here SAS assumes that the variables of the observations are separated
in the text file by blank spaces (this is its default assumption). However
this can be changed using the delimiter option within the INFILE
statement. For example the text file cancer_commas.txt (downloadable
from Queen’s Online) contains the same data as cancer.txt, but the
variables in each row are separated by commas. To load this text file
into SAS the INFILE statement would need to be modified as follows:

data cancer2;
infile "H:\SOR202\cancer_commas.txt" firstobs=2 delimiter=",";
input obs_no id time status stain $ ;
run;

Other types/formats of Data

i) Fixed Format

In the case of files where variables are in a fixed format (particular columns of
the file correspond to particular variables, and this holds for every row of the
file), the INPUT statement seen above can be modified in order to clarify
where variables can be found in each row of the file.

Example: An example piece of SAS code is given below for creating a SAS
dataset from the data stored in the text file hypernephroma.txt (downloadable
from Queen’s Online):

data hyper;
infile "H:\SOR202\hypernephroma.txt" firstobs=2;
input treatment $1-14 status 17 time 25-27 age $32-36;
run;

• The input statement above indicates the treatment variable is located in


columns 1 to 14 in each of the rows (and is alphanumeric)
• The status variable is located in column 17 of the text file etc…..

ii) More than one line per observation

If a datafile contains more than one line per observation, the input statement
needs to indicate the line number (using a # symbol) before specifying the
variables on that line e.g.
input id 1-3 company 8-10 #2 insal 6-10 finalsal 18-23 #3 retire 15-
19;

10
SOR 202 SAS Laboratory Session 1

• The above input statement informs SAS that there are 3 lines of
variables for each observation.
• In the first row of each observation the variable id is located in columns
1 to 3 while the variable company is located in columns 8 to 10.
• In the second row of each observation, the variable insal is located in
columns 6 to 10 while the variable finalsal is located in columns 18 to
23 etc…..

iii) Mixed style

Input statements can also be written in a shorter form with a mixed style e.g.

input id 1-2 sex $ 3 (exp school) (1.) (C1-C10) (1.) (M1-M10) (1.)
(MATHSCOR COMPSCOR) (2.);

• For the above statement the variable id is read from columns 1-2 and
sex from column 3. The next two variables exp and school have a
width of 1 column each and start at column 4. The variables C1-C10
(10 variables in sequential order) have a width of one column each (in
columns 6-15 of the datafile). The variables M1-M10 have a width of
one column each (in columns 16-25 of the datafile). The last two
variables MATHSCOR and COMPSCOR have a width of two columns each
starting at column 26.
• If you wish to skip data within a datafile (e.g. in the above only read in
the variables id and the last two variables MATHSCOR and COMPSCOR) you
could use the @symbol within the INPUT statement (the @ moves the
pointer to column 26 in this example):

input id 1-2 @26 (MATHSCOR COMPSCOR) (2.);

Data input directly from the keyboard

To input data directly from within SAS, the command datalines is utilised. For
example the commands below read in a dataset called hsb10 which contains
10 records, 11 variables, 10 of which are numeric, and 1 is of type character.

data hsb10;
input id female race ses schtype $ prog read write math science
socst;
datalines;
147 1 1 3 pub 1 47 62 53 53 61
108 0 1 2 pub 2 34 33 41 36 36
18 0 3 2 pri 3 50 33 49 44 36
153 0 1 2 pub 3 39 31 40 39 51
50 0 2 2 pub 2 50 59 42 53 61
51 1 2 1 pub 2 42 36 42 31 39
102 0 1 1 pub 1 52 41 51 53 56
57 1 1 2 pub 1 71 65 72 66 56
160 1 1 2 pub 1 55 65 55 50 61
136 0 1 2 pub 1 65 59 70 63 51
;
run;

11
SOR 202 SAS Laboratory Session 1

Other Useful Statements

Label Statement

You can use a LABEL statement to give labels to variables – while a SAS
variable name is limited to 32 characters, the label (which is used in any
output for this variable) can have up to 256 characters including blanks.
Labels should be enclosed in quotes and the LABEL step terminated by a
semicolon e.g. add the following label statements to the hsb10 dataset:

label schtype="School type";


label math="Mathematics score";
label science="Science score";

Proc Format Statement

These associate formats with variables in a dataset. For example in the hsb10
dataset the variable female has two values – 1 indicates the person is a
female, while 0 indicates the person is male. To associate these values with
appropriate value labels a proc format statement is used, which is then
referenced within the data statements by a format statement. Note that the
proc format statement (associating the values) must be run prior to the data
step using the formats.

proc format;
value female 1="female" 0="male";
value $schtype "pub"="public school" "pri"="private school";
run;

data hsb10;
input id female race ses schtype $ prog read write math science
socst;
format female female. schtype $schtype.;
…………etc

Comment Statements

It is good programming practice to place comments in your codes for


documentation purposes. Statements enclosed in /* ……… */ are ignored by
SAS upon executing a program e.g.

/* This is a comment */
/* So is this */
/* This comment
spans several lines */

12

Potrebbero piacerti anche