Sei sulla pagina 1di 69

SAS Fundamentals

Prepared By : G . Narasimhan TCS , Sholinganallur 07/07/2003

SAS Overview

SAS stands for Statistical Analysis System


Developed by SAS Institute High Level Language Fourth Generation Language

SAS Softwares
There are a variety of SAS softwares. Few mostly used are ,

BASE SAS SAS / ACCESS SAS / AF SAS / CONNECT SAS / FSP

SAS / GRAPH SAS / NVISION SAS / SHARE

SAS Softwares
SAS Software
BASE SAS
SAS / ACCESS SAS / AF SAS / CONNECT SAS / FSP SAS / GRAPH SAS / NVISION SAS / SHARE

Applications
Report Generation , Mathematical and Data Analysis
Interface to DB2 User friendly windowing applications Communication with remote SAS sessions Interactive data entry & retrieval facilities Visual representation of data analysis Creation of 3D objects , animation & prototyping Concurrent update access to SAS files

Applications
Statistical & Mathematical analysis Report Generation Graphics Business Forecasting Animation Modelling Data Analysis Operations Research

Features of SAS
Portability

Free format language


Statements are not case sensitive Enables Macro facility to simplify code development Key words can be used as variable names Declaration of variables not required Statements can span more than one line More than one statement can be placed on a single line

Understanding SAS Dates

In SAS every date is an unique number


Dates before January 1 , 1960 are considered as negative numbers Dates after January 1 , 1960 are considered as Positive numbers

Understanding SAS Dates

Jan. 1, 1959

Jan. 1, 1960

Jan. 1, 1961

-365

366

Rules of SAS Language

Every statement must end with a semicolon Variable names must begin with an alphabet and should not exceed eight characters Variable names should not have embedded blanks Variable names should not have SAS automatic variable names Comments can be demarked by /* and */ symbols or by *

SAS Variables

SAS variables can be classified as follows: Numeric Variables Character Variables Macro Variables

SAS Variables

Numeric Variables
Stores a maximum of 8 bytes Character Variables Stores a maximum of 200 bytes Macro Variables Represented with a & prefix

Attributes of a SAS Variable


Every SAS variable can have a maximum of six attributes assigned to it . They are , Variable Name Variable Type Number of bytes variable can store Instructions that tell SAS system to write data values INFORMAT - Instructions that tell SAS system to read data values into variables LABEL - Label assigned to a variable NAME TYPE LENGTH FORMAT -

Selected INFORMATS
Informats are special instructions used to read data values into a variable. Some of the Informats are : $CHARw. COMMAw.d DATEw. $W. Reads character data with blanks Removes embedded characters Reads Date in the form of ddmmyy Reads standard character data

Selected FORMATS
Formats are special instructions SAS System uses to write data values into variables. E.g : $CHARW. BESTW. Writes standard character data Default format for writing numeric values Writes numeric values with commas separating every three digits

COMMAW.d

Selected FORMATS

DATEW.

Writes data in the form of dates ( ddmmmyy )

DOLLARW.D

Writes numeric values with dollar signs,commas and decimal points

SAS Automatic Variables

Two Automatic variables ( numeric ) are created for each DATA step Processing . They are , _N_ _ERROR_ _N_ : Initially set to 1 . Increments by 1 each time the DATA step iterates.

_ERROR_ : Default value is 0 . Set to 1 whenever an error is encountered.

SAS Operators

SAS Operators can be classified as :


Arithmetic Operators

Comparison Operators
Logical Operators

Arithmetic Operators
Operation
Addition Subtraction Multiplication Division Exponentiation

Symbol
+ * / **

E.g
X=Y+Z; X=Y-Z; X=Y*Z; X=Y/Z; X=Y**Z;

Meaning
Adds Y and Z Subtracts Z from Y Multiplies Y by Z Divides Y by Z Raises Y to the Z power

Comparison Operators

Symbol = ~= or ^=

Mnemonic Operator EQ NE

Meaning Equal to Not equal to

>
< >= <=

GT
LT GE LE

Greater than
Less than Greater than or Equal to Greater than or Equal to

Logical Operators

Symbol & | ~ or ^

Mnemonic Operator AND OR NOT

Other Operators

Symbol || ( Concatenation) >< <>

Mnemonic Operator

MIN( ) MAX( )

SAS Program
A SAS Program is a collection of SAS steps. SAS steps can either be DATA step PROC Step ( or )

DATA step starts with the key word DATA. It is used for creating SAS Data sets PROC step is used for accessing the SAS Datasets

SAS Step Boundaries

SAS Step boundary can be identified in two ways. RUN statement Beginning of next SAS step

A run statement marks the end of a SAS step. In case no run statement is present , then the beginning of next SAS step marks the end of previous SAS step.

SAS Step Boundaries

SAS step boundary terminated by RUN statement E.g. : /* SAS step1 */ DATA sample ; INFILE ext; INPUT @001 name @030 age; RUN; /* SAS step2 */ PROC PRINT DATA = sample; RUN;

SAS Step Boundaries

SAS step boundary terminated with out a RUN statement E.g. : /* SAS step1 */ DATA sample ; INFILE ext; INPUT @001 name @030 age; ; /* SAS step2 */ PROC PRINT DATA = sample; RUN;

Running SAS Programs

SAS Program

Batch Mode

Interactive Mode

Running SAS Programs


Batch Mode In this mode , SAS Programs can be submitted through JCLs Interactive Mode This mode demands for a SAS session called Display Manager Session ( DMS ). This is a user friendly environment . DMS screen is split into two halves namely SAS Log and Program Editor. The upper half contains SAS Log and the lower half contains the Program Editor. SAS Programs can be keyed in and executed from Program Editor.

SAS Output
When you execute a SAS program, the output generated by SAS is divided into two major parts namely , ( i ) SAS Log & ( ii ) Output SAS Log Contains information about the processing of the SAS program, including any warning and error messages Output Contains reports generated by SAS Procedures and DATA steps In Batch SAS , Output is routed to SASLIST by default .The output can also be routed to an external file with the help of FILE statement.

SAS Output

SAS Data Library General Classification


SAS Data Library

SAS Datasets

SAS Catalogs

Other Files

Members of type Data

Members of type View

Members of type Catalog

Members of type Access

Members of type Program

SAS Data Libraries


SAS Data Libraries is a collection of SAS Files. Each SAS Library can be Classified as Work (or) Temporary Library Permanent Library A work Library is gets deleted after the end of a SAS session and is not available for further Sessions. It is referenced by a one level name or with the first qualifier as WORK A permanent library does not get deleted after a SAS session and is available all the time. It is referenced by a two level name . The first level name should not be WORK

Concept of Data sets


A SAS Dataset can be realised as a rectangular structure having a number of records in it with rows and columns. Each row is referred to as an Observation ( Logically one record ) and the column names represent field names. A SAS Dataset is different from a Mainframe Dataset SAS Datasets are recognised only by the SAS System.

SAS Datasets can not be accessed by any other programming languages other than SAS.
SAS System can however fetch data from Mainframe Datasets.

SAS Data Set

Descriptor Information

Observation

Variable

Descriptor Portion
The descriptor portion of a SAS data set contains :

General information about the SAS data set ( Data set name, number of observations, and so on)
Variable attributes ( Variable name, type, length, position, informat, format, label )

SAS Dataset Types

SAS Dataset

SAS Data File ( Contains Data Values )

SAS Data View ( Does not contain Data Values )

Parts of a Dataset
Every SAS Data set has three elements namely

Libref Data set name Member Type General form of a SAS Data set is :

libref.data-set-name.member type
Where ,libref is the logical name of a SAS data library .data-set name is the dataset name member type is DATA for SAS data files and VIEW for SAS data views. ( This is assigned by the SAS system )

Referencing a SAS Data set


A SAS Data set can be referenced by a Two - Level or One Level name. Two - Level Name SAS Data sets are stored permanently in a SAS library

General form : libref.data-set-name


One Level Name SAS Data sets gets deleted at the end of the current SAS session General form : data-set-name These Data sets are assigned to a a scratch library called WORK library.

Getting Information from Raw Data


Data DATA Step

SAS Datasets

PROC Steps

Information

Where do the data come from ?

Input to a SAS Dataset can be any external file , TSO file , PS , member of a PDS , a VSAM file, SAS Dataset or even an Excel file. Data retrieved from external sources SHOULD be converted to a SAS Dataset. This is because SAS system will recognise only SAS Datasets.

DATA Step

SAS sores information in the form of SAS Datasets. A SAS Dataset is created by using DATA statement.
E.G : DATA details ; The above SAS creates a SAS Dataset called details . Data to a SAS Dataset can be supplied through INFILE or CARDS or CARDS4 statement

NULL DATA Step


_NULL_ This is a Dataset which will not have any observations in it . Used for printing purposes Used for writing data onto the external files E.g. : The following file will write the contents of a SAS dataset called into the details into an external file called outfile . Physical name of outfile will be mentioned on JCL. DATA _NULL_ ; SET details; FILE outfile ; PUT @001 EMPNO @005 NAME ; RUN;

DATAn Naming Convention

When no dataset name is specified on a DATA statement , SAS Automatically names the Dataset created as DATA1,DATA2DATAn. This is called as DATAn Naming convention.

Referencing the external file

The INFILE statement is used to reference the external file where the raw data is available. Using INPUT statement the data can be retrieved from the external file.
E.g. : DATA detail ; INFILE uhgxn0.extrnl.file INPUT @001 Name @032 Age ; RUN ;

Referencing the external file

These statements will create a SAS Dataset called detail containing two fields Name and Age. The values for Name will be taken column 1 of external file ( uhgxn0.extrnl.file ) and values for Age will be taken from column 32 of external file.

Reading a SAS Data Set


IA.AIRCRAFTDATA
Model MF4000 LF5200 LF5200 AircraftID InService 010012 030006 030008 10890 10300 11389

IA.AIRCRAFTCAP
Model MF4000 LF5200 LF5200 010012 030006 030008

DATA Step DATA ia.aircraftcap; SET ia.aircraftdata; RUN;

AircraftID InService 10890 10300 11389

Combining Datasets

Various types of combining Datasets are : Concatenation Merging Interleaving Updating

Combining Datasets
Concatenation Combines two or more datasets one after the other into a single dataset . This is accomplished using SET statement Interleaving Combines individual sorted datasets into one sorted dataset. This is accomplished using SETBY statement Merging Combines observation from two or more datasets into a single observation in a new dataset .This is accomplished by MERGE and MERGEBY statements.

Combining Datasets
Updating Replaces the value of variables in one dataset ( Master Dataset ) with non missing values from another dataset ( Transaction Dataset ). This is accomplished using UPDATEBY statement.

Concatenation
LIST1
Name Arijit Mohan Kumar Age 20 22 54

LIST2
Name Rohit Raj Sekar Age 18 33 17

LIST3
Name Arijit Mohan Kumar Rohit Raj Sekar Age 20 22 54 18 33 17

DATA list3; SET list1 list2; RUN;

Merging

Merging can be classified as ,


One- to-One Merging

Match Merging

Merging
One- to-One Merging In One -to-One merging , no BY statement is used. The SAS system combines the first observation in all datasets named in MERGE Statement into first observation in new dataset , the second observation in all datasets into second observation in new data set and so on. Match Merging In Match merging Datasets are merged according to the variables mentioned in the BY statement

One-to-One Merging
PAYROLL
Name Anil Roopa Age 22 21 Sex M F

INCREASE
Name Anil
Roopa Kiran

Salary 34500
26000 22000

NEWPAY
Name DATA newpay; MERGE payroll increase; RUN; Anil Roopa Kiran Age 22 21 . Sex M F Salary 34500 26000 22000

Match Merging
WORK.ONE WORK.TWO
X 1 1 2 3 3 Z A1 A2 B1 C1 C2

X 1 2 3

Y A B C

WORK.THREE
DATA work.three; MERGE work.one work.two; BY X; RUN;
X 1 1 2 3 3 Y A A B C C Z A1 A2 B1 C1 C2

Interleaving
LIST1
Name Anil Sunil Age 33 12

LIST2
Name Karthik
Prakash

Age 17
43

LIST3
Name Anil DATA list3; SET list1 list2; BY name; RUN; Karthik Prakash Sunil Age 33 17 43 12

UPDATING
PAYROLL
Name
Anil Hari

INCREASE
Name Anil Hari Kiran 26000 Salary 34500

Salary
24500 32000

Kiran

16000

NEWPAY
Name DATA newpay; UPDATE payroll increase; BY Name; RUN; Anil Hari Kiran Salary 34500 32000 26000

PROCEDURES
Procedures are used to perform operations on SAS Data set
SAS has got a number of Procedures Procedures are represented by the keyword PROC Few commonly and most widely used PROCS such as CONTENTS , SORT , PRINT are discussed below.

CONTENTS Procedure

Used to browse the Descriptor portion Gives information about Data set and the variables present in the Data set General form of the CONTENTS procedure:

PROC CONTENTS DATA=SAS-data-Set; RUN;

PROC CONTENTS
E.g. : DATA ONE; INPUT @001 NAME $15. @020 AGE 2.; CARDS; RAMESH 12 GOPAL 34 RAJU 07 ; RUN; PROC CONTENTS DATA = ONE; RUN;

PROC CONTENTS

PROC CONTENTS

SORT Procedure
Rearranges the observations in a SAS data set
Creates a new SAS data set containing the rearranged observations Sort on multiple variables Sort in ascending (default) or descending order Do not generate printed output Treats missing values as the smallest possible value

Sorting a SAS Data Set


General form of the PROC SORT step:

PROC SORT DATA=input-SAS-data-set OUT=output-SAS-data-set; BY <DESCENDING> by-variable(s); OPTIONS RUN;

PROC SORT
E.g. : DATA ONE; INPUT @001 NAME $15. @020 AGE 2.; CARDS; RAMESH 12 GOPAL 34 RAJU 07 RAJU 07 ; RUN; PROC SORT DATA = ONE NODUPLICATES; BY NAME; RUN;

PROC SORT

The Data set Work.one will contain only three records. Duplicate record RAJU is deleted.

Name

Age

GOPAL
RAJU RAMESH

34
07 12

PRINT Procedure

Used to display the contents of a SAS Data set PROC PRINT < option list >; VAR variable-list; ID Variable-list; BY Variable-list; PAGEBY BY-Variable; SUMBY BY-Variable; SUM variable-list; RUN;

Sample SAS Program

Sample SAS Program

Sample Output

Thank You !

Potrebbero piacerti anche