Sei sulla pagina 1di 42

What is SAS ?

SAS Statistical Analysis System/Software


Why the Need for SAS ?
In the early 1960s, the Statistics Department at
North Carolina State University was awarded an
agriculture research project.The people working
on the project needed computer software for IBM
mainframe that could access and manipulate
large volumes of data and perform statistical
analysis on the data. There was no package
available that met their needs, so they started
designing a solution.
History of SAS
The Early 1960s
Agricultural research at Land Grant
Universities
Business need: general purpose statistical
software to manage and manipulate large
volume of data and perform statistical
analysis.
SAS IDE
INTEGRATED DEVELOPMENT ENVIRONMENT(IDE)

COMPONENTS/WINDOWS

1. EXPLORER WINDOW
2. LOG WINDOW INTERACTIVE (USED FOR PROG &DEBUG
3. ENHANCED EDITOR
4. OUTPUT WINDOW
5. RESULTS WINDOW NON-INTERACTIVE (USED FOR OUTPUT GENERATION
ONLY)
6. HTML WINDOW
SAS IDE
EXPLORER WINDOW : HAVING FOUR COMPONENTS

THE PURPOSE OF EXPLORED WINDOW IS TO NAVIGATE A
SYSTEM/ NETWORK

- IT BEHAVE LIKE A ALTERNATIVE OS (FOR NON-WINDOWS
ENVIRONMENT)
- HELPING IN STORAGE OF DATA & FILES

MYCOMPUTER (LOGICAL DISK)
FILE SHORTCUSTS (DOCs)
FAVORITE FOLDER (BROWSE)
LIBRARIES (STORES DATA SETS)

SAS INTEGRATESWITH ANY OPERATING SYSTEMS
SAS IDE
LIBRARIES: THESE ARE THE LOGICAL FOLDERS IN SAS USED
FOR DATA MGMT, FILE MGMT.

THEY ARE TWO TYPES:
DEPENDENT CONNECTED LIBRARY
INDEPENDENT- SAS LIBRARY

DEPENDENT LIBRARY:

- DEPENDENT LIBRARY ALWAYS CONNECTED TO AN
EXTERNAL DATA SOURCES (DBMS/RDBMS/FILE)

- DEPENDENT LIBRARY CAN REFER TO A SINGLE DATA
SOURCE ONLY

- DOES NOT OCCUPY ANY MEMORY
- DATA MANUPULATIONS WILL BE REFLECTED IN BOTH SAS
& EXTERNAL SOURCES

DISADVANTAGE : SECURITY OF DB MAY BE COMPROMISED
SAS IDE
Libraries refer to the physical location where SAS files are stored.
By default, several libraries are already defined by SAS:

1. WORK - used by SAS for storage of temporary files.

2. MAPS - contains SAS maps for most countries in the world.
These maps are used with the SAS GMAP procedure.

3. SASUSER - automatically generated by SAS to save SAS default
settings.

4. SASHELP - contains the SAS help catalogs; they are views (a
type of data set) that describe every active library, data base, and
catalog.

Data should not be stored in any of the default libraries; however,
new libraries can be defined so that they, too, are automatically
created each time SAS is started up (by specifying enable at startup
when first created).
SAS IDE
INDEPENDENT LIBRARY:

- THESE ARE NOT CONNECTED TO ANY EXTERNAL DATA
SOURCES (ONLY FOLDER BASED)

- USER HAS TO MANAGE ALL DATA & SECURITY OF DATA

DISADVANTAGE :

OCCUPY MEMORY (MORE MEMORY MAY EFFECT
PROCESSING SPEED)

ADVANTAGE:

WE CAN USE TO STORE DATA OF MULTIPLE DATASOURCES

CAN BE USED TO PROVIDE SECURITY TO INDEPENDENT
LIBRARIES

SAS IDE
LIBRARY MODES: ARE TWO TYPES

- TEMPORARY LIBRARY: SINGLE SESSION ONLY

- PERMANENT LIBRARY: MULTIPLE SESSIONS.

PORPERTIES OF LIBRARY

1. NAME : MAX 8 CHARS, MUST START WITH CHAR OR UNDERSCORE

2. ENGINE: USED TO DEFINE THE DATA SOURCE OF LIBRARY
(ORACLE, ACCESS)

3. ENABLE AT START UP: PERMANENT TEMPORARY

4. PATH : DEFINES THE LOCATION & PARAMETER TO CONNECT TO A
DB OR DATA SOURCES (C:\PROG FILES\....)

5. OPTIONS : SECURITY SETTINGS (R-READ, W-WIRTE, A-ALTER)
OPTIONAL.


Assigning a Libref
You can use the LIBNAME statement to assign a
libref to a SAS data library.
General form of the LIBNAME statement:



Rules for naming a libref:
must be 8 characters or less.
must begin with a letter or underscore.
remaining characters are letters, numbers or
underscores.

LIBNAME libref SAS-data-library <options>
Assigning a Libref
Examples
Windows
Libname abc c:\mydocuments\prog1;
Libname abc userid.prog1.sasdata disp=shr;
UNIX
z/OS(OS/390)
Libname abc /users/userid;
Note: DISP=OLD|SHR specifies the disposition of the file. The
Default is OLD, which enables both read and write access. SHR
enables read-only access.
Assigning a Libref
Making the Connection

When you submit the LIBNAME statement, a
connection is made between a libref in SAS and the
physical location of files on your operating system.

When your session ends, the link between the libref
and physical location of your files is broken.
SAS IDE
LOG WINDOW: IT DISPLAYS THE COMPILATION RESULTS IN COLOUR
CODED

- ERRORS (RED)
- INFORMATION (BLUE)
- SUGESSTIONS (MAROON)
- WARNINGS (GREEN)
- STATEMENTS (BLACK)

SUPPORTS DEBUGGING (REMOVING THE LOGICAL ERRORS
DEBUGGING). ERRORS ARE FOUR TYPES

SYNTAX/RUN TIME/ LOGICAL/DATA ERRORS

RUNTIME( EXTERNAL ERRORS)
LOGICAL(OUT PUT IS NOT CORRECT)

-OPTIMIZATION: (TIME PERIOD BASED EXECUTION)

- REAL TIME (TOTAL TIME TAKEN TO)
- CPU TIME (HOW MUCH TIME CONSUME)
LOG WINDOW CAN BE STORED AS EXTERNAL FILE ( .LOG)
SAS IDE
ENHANCED EDITOR WINDOW:

-WIRITING SAS SCRIPTS (COLOUR CODED SCRIPTS)
-STORED AS .SAS FILE
-COMBINATION OF COMPLIER + INTERPRETER

INTERPRETER WHILE TYPING A PROGRAM
COMPILER WHILE EXECUTING A PROGRAM

INTERPRETER CHECKS EACH LINE ERROR
COMPILER CHECKS WHOLE PROGRAM ERRORS

PROGRAM EDITOR & ENCHACED EDITOR

- WRITING SAS SCRIPTS
- STORED AS .SAS FILE
- PROGRAM EDITOR ONLY FOR DOS & UNIX


NOTE: AN ENHANCED EDITOR WHEN SAVED BECOMES A
PROGRAM EDITOR WINDOW
SAS PROGRAMMING
CREATING DATA SETS (TABLES)

-MANNUALY
-USING EXISTING DATA SETS (DBMS/RDBMS)
-USING DATA FROM FILES (FLAT FILES)


DEFAULT LIBRARIES IN SAS

PERMANENT: SASUSER, SASHELP,GISMAP,MAPS

TEMPORARY: WORK (ALL DATASETS WITHOUT ANY
REFERENCE WILL BE STORED IN WORK LIBRARY)
SAS FORMULA
SAS FORMULA HAS DIVIDED INTO TWO TYPES
SAS TECHINICAL USED FOR PROGRAMMING IN
ALL LAYERS
SAS FUNCTIONAL USED FOR PROGRAMMING &
PROCESS

PROCESS

1) DATA ACCESS
2) DATA MGMT
3) DATA ANALYSIS
4) DATA PRESENTATION

SAS FORMULA
SAS TECHNICAL FUNCTIONAL

1) DATA STEP DATA ACCESS
2) DATA SET DATA MGMT
3) DATA PROG & PROC DATA ANALYSIS
4) DATA OUTPUTS DATA PRESENTATION
SAS FORMULA
1. DATA STEP : DEFINE THE STRUCTURE OF DATA

DEFINITION : DATA TYPES HAVING TWO TYPES
NUMBER 8BYTES (MAX & MIN)
TEXT/CHAR WILL OCCUPY 1BYTE/CHAR
EXAMPLE
X=01 8BYTES
X=01 2BYTES
NAME=ALLEN 5BYTES
N=9060984976789 8BYTES
S= SAS SYSTEM 10BYTES

SAS STORES DATE AS NUMBER
CENTURY, YEAR, MONTH, DAY, HOURS, MIN & SEC
FROM 01-JAN-1960=0
21-JAN-1960=20
LARGEST NUMBER 9,9999999------99(38 DIGITS)
SMALLEST NUMBER- 0.00000------0000(29 DECIMALS)
SAS FORMULA
1. DATA STEP : STRUCTURE OF SAS

STRUCUTRE: STORAGE PATTERN OF DATA. IT MAY
BE COMIBINATION OF

VARIABLE + DATA TYPE + SIZE + CONSTRAINT

ITEMNO NUMBER(4) (4) >999 AND <10,000
ITEMNAME TEXT 200 NMISS (NON MISSING)
PRICE NUMBER 7.2 >1000
SAS FORMULA
2. DATA SET : STORAGE OF DATA IN SAS

A TABLE IN SAS CALLED AS DATA SET.
DATA SET CONSISTS OF VARIABLES,
OBSERVATIONS.
COLLECTION OF DATA IN FORM OF OBSERV & VAR
MUST BE BASED ON DATA STEP (DEFINITION &
STRUCTURE)
INTERNAL (SAS) & EXTERNAL (FILES /DB)
Basic Structure of SAS
There are two main components in the
SAS programs
the Data step(s) and
the Procedure step(s) also call PROC.
The data step reads data from external /
internal sources, manipulates and
combines it with other data set and
prints reports. The data step is used to
prepare your data for use by one of the
procedures (often called Procs").
SAS FORMULA
3. DATA PROGRAMS : ARE USER DEFINED PROGRAMS.
IN SAS (20%) USED FOR

DATA PROCESSING
DATA MANUPULATION
LOGIC BUILDING IN SAS
INTEGRATION
CUSTOMIZATION
SYNTAX:

DATA <DATA SET OPTIONS>;
< PROG STATEMENTS>;
<LOGICAL STATEMENTS>;
< PROCESS STATEMENTS>;
RUN; COMPILE & EXCUTE
SAS FORMULA
3. DATA PROCEDURE : SAS BUILT-IN
PROGRAMS/FUNCATIONS (80%). HAVING 7638
PROCEDURES
SYNTAX BASED
ALL PROCEDURES ARE PROCESS BASED
DOMAIN BASED
GENERATE OUTPUT
SYNTAX:
PROC <PROC NAME> <OPTIONS>;
< SYNTAX STATEMENTS ONLY>;
RUN; /*BASE SAS PROCEDURES*/
QUIT; /*OTHER THAN SAS PROCEDURES*/

SAS FORMULA
4. DATA OUTPUT : RESPONSIBLE FOR OUTPUT
GENERATIONS FROM SAS.

THE ENTIRE PROCESS IS PROCEDURE
BASED
SIMPLE REPORTS & GRAPHS
MULTIDIMENSIONAL REPORTS & GRAPHS
DATA BASE
DATA SET/TABLE
GUI APPLICATION
USER INTERFACES (WITH IN SAS)

SAS FORMULA

SAS FORMULA HAS DIVIDED INTO TWO TYPES:

(1) SAS TECHNICAL: USED FOR PROG IN ALL LAYERS
(2) SAS FUNCTIONAL: USED FOR PROG & PROCESS

PROCESS TECHNICAL
(A) DATA ACCESS DATA STEP
(B) DATA MGMT DATA SET
(C) DATA ANALYSIS DATA PROG & PROC
(D) DATA PRESENTATION DATA OUTPUT
Terminology in SAS
In SAS, you call a
File - DataSet
Field - Variable
Record(s) - OBServations / Rows
An Observation is a collection of data values
that usually relate to a single object.
A Variable is the set of data values that describe
a given characteristic.
An example will be shown to best describe.
SAS FORMULA
FILES DATA BASE
DATA STEP
DATA SET
DATA PROG
DATA PROC
DATA
OUTPUT
Sample SAS program
Data MySample;
A=4;
B=2;
C = A * B ;
Run;
Proc Print;
Run;
Why RUN statement ?
Run statement
Tells SAS that the Data step or Procedure has ended.
Good practice to end each Data step or Procedure
with a run statement.
Must still SUBMIT the SAS program for it to be
Processed.
Missing Values in SAS
* A character missing value is displayed as a
blank.
A numeric missing value is displayed as a
period.
Example;

Data Missing_Test;
Length A B $ 10 ;
A='Ramanathan';
Run;
Proc Print;
Run;
Words in the SAS Language
word or token in the SAS language is a
collection of characters that communicates a
meaning to SAS and is not divisible into smaller
units capable of independent use. It can contain
a maximum of 32,767 characters.

A word or token ends when SAS encounters one
of the following: the beginning of a new token; a
blank after a name ; or a number token the
ending quotation mark of a literal token.


Words in the SAS Language
(contd)
Each word or token in the SAS
language belongs to one of four
categories:
names
literals
numbers
special characters.

SAS NAMING CONVENTIONS
Name
1. SAS variable names may be up to 32 characters in length.
2. The first character must begin with an alphabetic character or an
underscore. Subsequent characters can be alphabetic characters,
numeric digits, or underscores.
3. A variable name may not contain blanks.
4. A variable name may not contain any special characters other than
the underscore.
5. A variable name may contain mixed case. The mixed case is
remembered and used for presentation purposes only. When SAS
processes variable names, however, it internally uppercases them.
You cannot, therefore, use the same letters with different
combinations of lower- and uppercase to represent different
variables. For example, cat, Cat, and CAT all represent the same
variable.

Words in the SAS Language (contd)
1. You may not assign the names of special SAS
automatic variables (such as _N_ and _ERROR_) or
variable list names (such as _NUMERIC_,
_CHARACTER_, and _ALL_) to variables.
NAME is a series of characters that begin with a letter or
an underscore. Later characters can include letters,
underscores, and numeric digits. A name token can
contain up to 32,767 characters. In most contexts,
however, SAS names are limited to a shorter maximum
length, such as 32 or 8 characters. Examples of name
tokens include:
Data _new yearcutoff year_99 descending _n_
Words in the SAS Language
(contd)
Literal
consists of 1 to 32,767 characters enclosed in single
or double quotation marks. Examples of literals
include
Chicago'
"1990-91"
SatyaKalyani Pala'
Suresh Bharatha
Mani"s plane'
"Report for the Third Quarter"
Words in the SAS Language
(contd)
Number
in general is composed entirely of numeric digits, with
an optional decimal point and a leading plus or minus
sign. SAS also recognizes numeric values in the
following forms as number tokens: scientific (E-)
notation, hexadecimal notation, missing value
symbols, and date and time literals. Examples of
number tokens include
5683 2.35 0b0x -5 5.4E-1 '24aug90'd

Words in the SAS Language
(contd)
Special character
is usually any single keyboard character other than
letters, numbers, the underscore, and the blank. In
general, each special character is a single token,
although some two-character operators, such as **
and <=, form single tokens. The blank can end a
name or a number token, but it is not a token.
Examples of special-character tokens include
= ; ' + @ /

Placement and Spacing of Words in
SAS Statements
Examples
In this statement, blanks are not required because SAS can
determine the boundary of every token by examining the
beginning of the next token:
total=x+y;
The first special-character token, the equal sign, marks the
end of the name token total. The plus sign, another special-
character token, marks the end of the name token x. The last
special-character token, the semicolon, marks the end of the y
token. Though blanks are not needed to end any tokens in
this example, you may add them for readability, as shown
here:
total = x + y;
The Data Step

The data step provides a wide range of capabilities,
among them reading data from external sources,
reshaping and manipulating data, transforming data and
producing printed reports.

The data step is actually an implied do loop whose
statements will be executed for each observation either
read from an external source, or accessed from a
previously processed data set.

For each iteration, the data step starts with a vector of
missing values for all the variables to be placed in the new
observation. It then overwrites the missing value for any
variables either input or defined by the data step
statements. Finally, it outputs the observation to the newly
created data set.

Data Step: Basics
Each data step begins with the word data and optionally
one or more data set names (and associated options)
followed by a semicolon. The name(s) given on the data
step are the names of data sets which will be created
within the data step. If you don't include any names on
the data step, SAS will create default data set names of
the form datan, where n is an integer which starts at 1
and is incremented so that each data set created 39 has
a unique name within the current session. Since it
becomes difficult to keep track of the default names, it is
recommended that you always explicitly specify a data
set name on the data statement.
When you are running a data step to simply generate a
report, and don't need to create a data set, you can use
the special data set name _null_ to eliminate the output
of observations.

Data Step: Inputting Data
(contd)
Reading from inline data
data one;
input a b c;
datalines;
4 5 3
9 10 12
;
Run;
By default, each invocation of the input statement
reads another record. This example uses free-
form input, with at least one space between
values.
How to Use the INFILE Statement
INFILE statement identifies the file to read, it must execute
before the INPUT statement that reads the input data
records. You can use the INFILE statement in conditional
processing, such as an IF-THEN statement, because it is
executable. This allows you to control the source of the
input data records.
Usually, you use an INFILE statement to read data from an
external file. When data are read from the job stream, you
must use a DATALINES statement. However, to take
advantage of certain data-reading options that are available
only in the INFILE statement, you can use an INFILE
statement with the file-specification DATALINES and a
DATALINES statement in the same DATA step.
When you use more than one INFILE statement for the
same file-specification and you use options in each INFILE
statement, the effect is additive. To avoid confusion, use all
the options in the first INFILE statement for a given external
file.


Data Step: Inputting Data
The input statement of SAS is used to read
data from an external source, or from lines
contained in your SAS program.
The infile statement names an external le or
leref from which to read the data; otherwise
the cards; or datalines; statement is used to
precede the data.

Reading data from an external
data one;
infile c:\Radhika\Samp.dat;
input a b c;
run;

Potrebbero piacerti anche