Sas Book PDF

INDEX
Sno. Topic Page
Module - I
1 Introduction 4
2 Libname statement 14
3 Clear Windows 17
4 Input types 17
5 Pointer controllers 18
6 Double Trailing @@ 20
7 Trailing @ 21
8 Length statement 21
9 Named Input 21
10 Numeric value 22
11 Delimiter 22
12 Embedded space 25
13 Firstobs, obs 26
14 SET statement 26
15 VAR statement 26
16 Options 27
17 Advanced list input method 40
18 Title Statement 42
19 Footnote Statement 46
20 Informat & Format 51
Module - II
20 Transformations 64
21 Data Cleaning 66
22 Procedure Options 69
23 Procedure Statements 70
24 Import Procedure 75
25 Export Procedure 83
26 Append data 89
27 Reports 90
28 Filter Transformation 92
29 Where Statement & Options 93
30 Where Options 96
31 Expression Transformation 98
32 IF Statement 98
33 Do block 100
34 Output Statement 101
35 Multiple Datasets 102
36 Loops 103
37 Data Conversations 105
38 Customized Reports 106
39 Backend Process/PDV 108
40 Duplicate Observations 108
41 Functions 112
42 Data step Functions 112
43 Arithmetic Functions 113
44 Aggregate Functions 120
45 String Functions 121
46 Date & Time Functions 136
47 Calendar Functions 136
48 Time Functions 137
49 Interval Functions 139
50 Errors 144
51 Data Management Process 145
52 Append Process 145
52 Concatenation Process 149
53 Interleaving Process 149
54 SCD Process 150
55 Modify transformation 153
56 Merge 155
57 Lookup process 159
58 Goto or link statement 164
NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Opp. Padmavathi colony Bus stop, Boduppal
2 CONTACT US: NandaAcademy@gmail.com / 9908746595
SAS
SAS is a Statistical Analysis System. In 1966 there was no software for statistics. United
states department of agriculture grants to develop a statistics program for analyze the vast
amounts of agriculture data. In 1972 started for developing a program with jim barr and
goodnight barr, the funds of $5000. Another two members are jane helwing and john hall but in
1976 except jim remaining were left from north carolina state university. In 1980 growing
company building with 20 employees and 7000 in 1990. The power of SAS knows from 1998.
SAS enables programmers like IT, Business or Research area, these programmers can focus on
information contained data.
SAS is a multi-database architecture:
Stream lines of SAS are Data ware housing, Analytics and Data visualization.
Data ware housing (DW): Maintain the data meaning full, understand format with relation. DW
process can be implemented using data where housing concepts and ETL concepts.
ETL features: In 3 steps: Extract > Transformation > Loading

To extract the data from external source to SAS system.

Performance: SAS is an integrated system of software solutions that enables you to perform the
following tasks:
1. data entry, retrieval, and management
2. report writing and graphics design
3. statistical and mathematical analysis
4. business forecasting and decision support
5. operations research and project management
6. applications development

API is a set of procedures and Functions.
SAS Functionality:
Raw data > Data step > SAS dataset > Proc step > Report/Information
SAS Raw Data

SAS is neither. With SAS, you use statements to write a series of instructions called a SAS
program.
A SAS program is a sequence of statements executed in order. A statement gives information or
instructions to SAS and must be appropriately placed in the program.
1. Use statements to write a series of instructions called a program
2. SAS program is a sequence of statements executed in an order.
3. A statement gives information or instructions to SAS and must be appropriately placed in
the program.
Rules:
SAS statements: As with any language, there are a few rules to follow when writing SAS
programs. Fortunately for us, the rules for writing SAS programs are much fewer and simpler
than those for English.
*/ Every SAS statement ends with a semicolon. */

/* SAS statements can be in upper- or lowercase.
Statements can continue on the next line (as long as you don’t split words in two).
Statements can be on the same line as other statements.
Statements can start in any column */

What is the SAS System?
Table Definition:
The table definition is a set of instructions that describes how to format the data. This
description includes but is not limited to:
• the order of the columns
• text and order of column headings
• formats for data
• font sizes and font faces
ODS destinations
An ODS destination specifies a specific type of output. ODS supports a number of
destinations, which include the following:
RTF
produces output that is formatted for use with Microsoft Word.
Output
produces a SAS data set.
Listing
produces traditional SAS output (monospace format).

HTML
produces output that is formatted in Hyper Text Markup Language (HTML). You can access the
output on the web with your browser.
Printer
produces output that is formatted for a high-resolution printer. An example of this type of output
is a PostScript file.
ODS output
ODS output consists of formatted output from any of the ODS destinations
Output object
ODS combines formatting instructions with the data to produce an output object. The output
object, therefore, contains both the results of the procedure or DATA step and information about
how to format the results. An output object has a name, a label, and a path.
Note: Although many output objects include formatting instructions, not all do. In
some cases the output object consists of only the data.
Key to the SAS System

Parts of a SAS Program
Basic differences between DATA and PROC steps:
How the DATA Step Works

SAS windowing environment
If you type SAS at your system prompt, or click on the SAS icon, you will most likely get into
the SAS windowing environment. In this interactive environment, you can write and edit SAS
programs, submit programs for processing, and view and print your results. In addition, there are
many SAS windows for performing different tasks such as managing SAS files, customizing the
interface, accessing SAS Help, and importing or exporting data.

Rules for SAS names
Names must be 32 characters or fewer in length.3
Names must start with a letter or an underscore ( _ ).
Names can contain only letters, numerals, or underscores ( _ ). No %$!*&#@, please.4
Names can contain upper- and lowercase letters.
The values of numeric variables can contain only numbers. To store values that contain
alphabetic or special characters, you must create a character variable. By following a variable
name in an INPUT statement with a dollar sign ($), you create a character variable. The default
length of a character variable is also eight bytes. The following statement creates a data set that
contains one-character variable and four numeric variables, all with a default length of eight
bytes.
input IdNumber Name $ Test_1 Test_2 Test_3;

Data Locations
To create a SAS data set, you can read data from one of four locations:
• raw data in the data (job) stream, that is, following a DATALINES statement
• raw data in a file that you specify with an INFILE statement
• data from an existing SAS data set
• data in a database management system (DBMS) file
List of SAS Files

SAS Files: three types SAS dataset extension: sas7bdat
1. Table
2. Catalog
3. View
Libname statement
Working with user–defined libraries: these libraries are created and managed by the user.
These are two types.
1. Independent libraries.
2. Dependent libraries.
✓ Independent libraries: SAS files (datasets) are not sharing with other libraries.
✓ Dependent libraries: SAS files are sharing with other libraries.
These two libraries can be created in temporary mode or permanent mode.
✓ Temporary mode library: Available only for one session or until end of the SAS job.
✓ Permanent mode library: Available any session until deleted by the user.
‘libname’ statement: Creates user-defined independent or dependent libraries in temporary
mode.
Syntax: libname libref ‘path’;
Ex: libname SASFiles 'F:\SASFiles';

Difference between work and user defined temporary libraries:
Work library: Work is a permanent library but the storage of data is temporary in the SAS
environment.
User library: user defined temporary libraries both the library itself and the storage of data are
temporary in the SAS environment.
User defined temporary libraries both the library itself and the storage of data are temporary in
the SAS environment.
Deletion of user defined libraries from the SAS environment:

Syntax: libname <library name/reference> clear;
libname user3 clear;
libname _all_ clear;
libref: It’s a name of the library used as reference for library name.
Naming rules:
✓ Can be given up to 8 characters. More than 8-character length gives ‘out of range’ error.

✓ Starts with letter or underscore.
✓ Can use numbers.
✓ Can’t use any special character except underscore.
List of user defined libraries from the SAS environment:
Syntax: libname _all_ list;
Engine: Stores the data in dataset in different format. These engines are two types.
1. Internal engines. 2. External or access or interface engines.
1.Internal engines: V4, V5, V6, V7, V8, V9 - SAS default takes internal engines according to
versions.
SAS version SAS ENGINE

SASV4 V4
SASV5 V5
SASV6 V6
SASV6.08 V6.08
SASV6.12 V6.12
SASV7 V7
SASV8, V8.1, V8.2 V8
SASV9.0, V9.1, V9.1.2, V9.1.3, V9.2 V9
Note: For creating old technology SAS files by the new technology, then internal engine should
be changed.
2.External/access/interface engines: Used for creating database libraries or external libraries
and manage external process from SAS environment using SAS knowledge.
Predefined external engines

Excel ODBC
Access OLEDB
Oracle Teradata
DB2 SPSS
Sybase Sqlserver
Mysql XML
Xport
PC or server location: PC path or server path: To allocate the memory for storage of SAS files.
PC location: for Independent mode
Server location: for dependent mode.

Clear Windows
Log window: dm ‘log’ clear;
Output window: dm ‘output’ clear;
Editor window: dm ‘editor’ clear;
Short cut to all ctrl+E
Input types
• List Input
• Column Input
• Named Input
• Formatted Input
List Input:
➢ List input: input Pid Name $ Team $ Stwt Endwt;
➢ Multiple List input:
➢ Ex: input Pid Name $;
input Team $;
input Stwt Endwt;
Column Input:
➢ Column input: input Pid 1-4 Name $ 6-23 Team $ 24-29 stwt 31-33 Endwt 35-37;

➢ Multiple Column input:
➢ Ex: input Pid 1-4 Name $ 6-23;
input Team $ 24-29;
input stwt 31-33 Endwt 35-37;
Pointer controllers:
pointers: @n - moves the pointer to the nth column in the input buffer.
+n - moves the pointer forward n columns in the input buffer.
/ - moves the pointer to the next line in the input buffer.
#n - moves the pointer to the nth line in the input buffer.
➢ Column pointer @n: input Pid @6 Name $ @13 Team $ @20 Stwt @24 Endwt;

➢ Column pointer +n: input +12 Team $ @1 Pid Name $ @20 Stwt Endwt;
➢ Line pointer /: input IdNumber 1-4 Name $ 6-19 / Team $ / Stwt Endwt;

➢ Line pointer #n: input #2 Team $ 1-6
#1 Name $ 6-23 IdNumber 1-4
#3 Stwt 1-3 Endwt 5-7;
Double Trailing @@: A special symbol @@ that is used to hold a line of data in the input
buffer during multiple iterations of a DATA step.

Trailing @: Prevents SAS from automatically reading a new data record into the input buffer
when a new INPUT statement is executed within the same iteration of the DATA step. When
used, the trailing @ must be the last item in the INPUT statement.
Syntax: input <variable1> <variable2> …. <variable n> @;
Ex: input Dcode $ Eid $ Pcode @;
Length: Registers the variable value with specific length. the length can be given up to 256.
Default character length is 8.
Syntax: length <variable name> <data type> <length increasing to number>;
Ex: length Name $ 12;
Named Input:
For named input, follow each variable name with an equal sign (=).
If the variable is character, follow the equal sign with a dollar sign ($). Read two of the four
variables.
Syntax: input <variable1> = <variable2> = …. <variable n> = @;
Ex: input name=$ english= math= science=;

Numeric value: SAS default takes numeric value length up to 12, but more than 12 digits is
loaded in scientific format (En)
En = 10n, E1 = 101, E2 = 102
For example, numeric value is up to 200 digits, then SAS stores in 12-digit format but in
scientific format. SAS has the ability to capture numeric values length up to 32767 digits.
Format option:
Syntax: format <variable name> <length increasing to number>.;
Ex: format cardno 20.;
Delimiter:
In raw data, data values are separated by any one special character, these special characters are
delimiters. Default delimiter is space.
DSD – reads delimiter as comma
DLM – reads delimiter as any special character

DSD – removes double quotes from data values
DSD – reads missing value between two consecutive delimiters
➢ DLM Option: Enables to specify delimiters in raw data.
Syntax: dlm= ‘<specified delimiters >’;
➢ DLM Option: Raw data separated by smicolon(;).

Syntax: infile datalines4 or cards4 dlm=‘;’;
➢ DSD Option: ( , ) delimiter in raw data and also removes the quotes from data values.
Syntax: infile dsd;

Notice that here the data is delimited by a colon (:), and there is a missing value for Alan that is
indicated by two consecutive delimiters.
Notice that, here the data is delimited by a semicolon (;) use datalines4 infile statement.

Embedded space:
However, with the ampersand format modifier (&) you can use list input to read data that
contains single embedded blanks. The only restriction is that at least two blanks must divide each
value from the next data value in the record.

Firstobs, obs:
To capture part of the data, capturing data is of two types sequential basis and conditional basis.
1. capture part of the data in sequential basis depending upon observation number. i. sequential
mode. ii. logical mode.
i. sequential mode: depending upon obs. numbers.
ii. logical mode: Depending upon condition (used frequently in real time).
SET statement:
Used to read the data from one dataset to another dataset observation-by-observation (creates a
duplicate dataset), reads observations one after the other, takes more processing time.
Syntax: data <new dataset name>;

Set <master dataseet name>;
Run;
Data emp2;
Set emp1;
Run;
VAR statement:
It requires variables, which variable values converted as observation or data values.
Runs on procedure step.
Syntax: var <variable name> <variable name>;

OPTIONS
Flowover: It is the default nature of the SAS. It controls the SAS system to read only one
observation per line.
Stopover: It is a default nature of the SAS, it controls the SAS system to stop the reading at the
last observation.
Infile datalines: These statement backend process is infile datalines flowover stopover dlm = ‘ ’;
To read missing values:
Missover: To control any missing values at the end of observations in a dataset. when to call
missover option: When end of the data values are missing missover option is used. (if values are
missing at the end of the data, data entry professional won't use period in the data).
Truncover: works like missover.

uses: used when end of the values are missing in a dataset. can be used instead of missover.
Note: missover is more efficient than truncover. Coz truncover takes 8 bytes for every missing
character value, but ‘missover’ takes only one byte.

1.Missover: the DATA step to assign missing values to any variables that do not have values
when the end of a data record is encountered. The DATA step continues processing.
Syntax: infile <file> missover;

2.Truncover: the DATA step to assign values to variables, even if the values are shorter than
expected by the INPUT statement, and to assign missing values to any variables that do not have
values when the end of a record is encountered.
Syntax: infile ‘<external file>’ truncover;
3.Scanover: Using the scanover option or technique you can read part of the raw data in logical
order using key data value. Implement the scanover technique using column hold pointer.
Column hold pointer requires key data value instead of column number.
Note: Logical flow is of 2 types:
1. scanover (old technology)
2. conditions (if and where – new technology)

Syntax: infile ‘<file>’ scanover;
input @’F’ race $ color $;
DROP: Exclude variables from the output dataset.

Syntax: drop=variable name 1 . . . variable name n;
KEEP: Specify the variables for processing to output dataset.

Syntax: keep=variable name 1 . . . variable name n;

RENAME: Specify the new name for variables in output dataset. Used to change variable names
permanently in dataset.
Syntax: rename=(oldname-1 = new name . . . oldname-n = new name);
Note: For reporting purpose use label option, but for permanently renaming use rename option

Pw option: Used to assign password for read and write permissions for particular dataset.
The PW= option applies to all types of SAS files except catalogs. You can use this option to
assign a password to a SAS file or to access a password-protected SAS file.
Password can be given up to 8 characters. Presently programmer is not using this password
options, it is administrator’s work to assign password for datasets.
Syntax: pw= password;
Read option: used to assign password only for reading only.

The READ= option applies to all types of SAS files except catalogs. You can use this option to
assign a read-password to a SAS file or to access a read-protected SAS file.

Syntax: read= read-password;
Write option: used to assign password only for writing only.

The WRITE= option applies to all types of SAS files except catalogs. You can use this option to
assign a write-password to a SAS file or to access a write-protected SAS file.
Syntax: write= write-password;

ALTER option: Used to assigns an alter password to a SAS file and enables access to a
password-protected SAS file.
The ALTER= option applies to all types of SAS files except catalogs. You can use this option to
assign an alter-password to a SAS file or to access a read-, write-, or alter-protected SAS file.
Syntax: ALTER= alter-password;
ALTER option: only when you are creating a SAS data file.
✓ In order to copy an encrypted SAS data file, the output engine must support encryption.
Otherwise, the data file is not copied.
✓ Encrypted files work only in Release 6.11 or in later releases of SAS.
✓ You cannot encrypt SAS data views or stored programs because they contain no data.
✓ If the data file is encrypted, all associated indexes are also encrypted.
✓ Encryption requires roughly the same amount of CPU resources as compression.
✓ You cannot use PROC CPORT on encrypted SAS data files.

Syntax: ENCRYPT= YES | NO;
▪ Syntax Description
▪ YES, encrypts the file. The encryption method uses passwords. At a minimum, you must
specify the READ= or the PW= data set option at the same time that you specify
ENCRYPT=YES. Because the encryption method uses passwords, you cannot change
any password on an encrypted data set without re-creating the data set.
▪ NO does not encrypt the file.
CAUTION: Record all passwords. If you forget the password, you cannot reset it without
assistance from SAS Institute. The process is time-consuming and resource-intensive.
Global options
The global options are invoked by default whenever you open the SAS environment and control
or instruct the SAS windows (except the editor window) The global options code is available in
the SAS configuration file. This file runs by default whenever you open the SAS environment.
This file runs are the back end.
Global options for the output window:

Default Changed specification
---------------------------------------
1. date nodate
2. center nocenter
3. number nonumber
4. pageno: Using the pageno option you can print the output in the required page.
Syntax: options pageno=2;
5. ls (or) linesize: The default linesize of the SAS is 64 characters per line. Using the ls option,
you can increase the linesize up to 256 characters. Using the ls option, you can increase the
width of the output window.
Syntax: options ls=64;
6. ps (or) pagesize: The default paegsize of the SAS is 16 lines per page. Using the ps option
you can increase the pagesize of the SAS up to 32,767 lines per page. Using the ps option you
can increase the height of the output window.
Syntax: options ps=150;
Ex: options nodate nocenter nonumber;
options pageno=5 ps=80 ls=50;

Global options for the log window:
Default Changed specification
---------------------------------------------
source nosource
notes nonotes
nofullstimer fullstimer
1. Source: Using the source option you can print the source code in the log window for
identification of syntax errors.
2. Notes: Using the notes option you can print the notes in the log window for complication and
execution.
Note: Real time means compilation time and cpu time means execution time.
3. Nofullstimer: It works by default and prints a little bit of information for compilation and
execution in the log window. If you want to use the fullstimer option the notes option must be
working.
Syntax: options nosource nonotes;
options fullstimer;
Global options for the explorer/storage window:

1. User: Using the user option you can change the default destination of the SAS files or datasets
from the work library to your required library.

2. Yearcutoff: Only 2-digit years are affected by the yearcutoff option. It works by default with
1920 value and indicates a span of 100 years up to 2019 value. You can change the yearcutoff
value according to your requirement.

Infile statement: Using the infile statement you can access raw data from internal/external files
or location into the SAS internal memory location.
For text file (*.txt):
Syntax: infile “file path” <infile options>;
For tab file (*.txt):

Syntax: infile “file path” dlm=‘09’x;
Formatted Input:
Formatted input method: It works based on 2 symbols:
+n – column pointer – it indicates non-required data
n. – column range – it indicates required data (n = n2 – n1 + 1)
Syntax: input +n <variablename> <datatype> n. ;
Note: input +0 pid 3. +1 name $ 11. +2 age 2. +2 gender $ 6.;

@n – column pointer – it indicates column number
n. – column range – it indicates required data (n = n2 – n1 + 1)
Syntax: input @n <variablename> <datatype> n. ;
Note: input @1 pid 3. @5 name $ 11. @18 age 2. @22 gender $ 6.;
Advanced list input method:

1. & modifier 2. : modifier 3. ~ modifier
1. & modifier
Syntax: input name & $ 11. ;
Note: input pid name & $ 11. age gender $;

2. : modifier
Using the ‘:’ modifier you can increase the storage capability for character variables
Syntax: input name & : $ 11. ;
Note: input pid name & $ 11. age gender $;

3. ~ modifier
Using the ~ modifier you can avoid using the dsd option as a data sensitive delimiter for the
required variables.
Syntax: infile <file> dsd;
input name ~ : $ 11. ;
TITLE statement
Title: The top of the report has a title.
The TITLE statement in the PROC PRINT step produces the title.
For more information about the TITLE statement.
TITLE statement that produces a descriptive title.
The content of the report is very similar to the contents of the original data set.
Syntax: title “Write your title here”;
Ex: title1 “write your title here”;
title2 “write second title here”;
Create a TITLE for dataset
data weight_club;
input Pid Name $ Team $ Stwt Endwt;
datalines;
1023 David red 189 165
1049 Amelia yellow 145 124
;

run;
title 'List Input';
proc print data=weight_club;
run;
Create TITLES to a dataset
Create Empty TITLE for dataset
data weight_club;
datalines;
1023 David red 189 165
;
run;
title ' ';

run;
Create Without TITLE for dataset
data weight_club;
datalines;
1023 David red 189 165
;
run;
title;
run;
The content of the report is very similar to the contents of the original data set but in colorfull.
Syntax: title <options>“Write your title here”;
1. Font is in Bold or Italic
Bold/Italic
2. Color: Text color.
Color=<color name>
3. BColor: Background color of Font
BColor=<BColor name>
4. Font: Facing of font type ex: Arial, Calibri, Algerian, Cambria, etc.,
Font =<Font type>
5. Justify: Justification of font

Justify= C/L/R
C = Center
L = left
R = Right
6. Link: specifies hyperlink.
Link=’<Url>’
7. Underlin: specifies whether the subsequent text is underlined. 0 indicates no underlining.
1, 2, and 3 indicates underlining.
U=0/1/2/3
Create a Colorful TITLE for dataset
data weight_club;
input pid Name $ Team $ Stwt Endwt;
datalines;
1023 David red 189 165
;
run;
Title bold color='yellow' bcolor='green' font='algeriaN' Underlin=1 'List Input
Reported';
Title2 italic color='white' bcolor='orange' Underlin=3 font='Cambria' 'SAS System reads
forward direction at input buffer';
Title3 italic color='Purple' bcolor='Pink' font='bauhaus 93' Underlin=2 'Data contains
character and Numeric';
run;

Create Colorful TITLEs to a dataset
data weight_club;
datalines;
1023 David red 189 165
;
run;
Title bold color='yellow' bcolor='green' font='algeriaN' Underlin=1 Justify=L 'List Input
Reported';
Title2 italic color='white' bcolor='orange' Underlin=3 font='Cambria' Justify=C 'SAS
System reads forward direction at input buffer';
Title3 italic color='Purple' bcolor='Pink' font='bauhaus 93' Underlin=2 Justify=R 'Data
contains character and Numeric';
run;
FOOTNOTE statement
Footnote: The bottom of the procedure output.
The FOOTNOTE statement in the PROC PRINT step produces the Footnote.
For more information about the FOOTNOTE statement.
Footnote statement that produces a descriptive footnote.
The content of the report is very similar to the contents of the original data set.
Syntax: Footnote “Write your footnote here”;
Ex: Footnote1 “write your title here”;
Footnote2 “write second title here”;

Create a FOOTNOTE for dataset
data weight_club;
datalines;
1023 David red 189 165
;
run;
Footnote 'List Input Reported';
run;
Create FOOTNOTEs to a dataset

data weight_club;
datalines;
1023 David red 189 165
;
run;
Footnote 'List Input Reported';
Footnote2 ‘SAS System reads forward direction at input buffer’;
run;

Create Empty FOOTNOTE for dataset
data weight_club;
datalines;
1023 David red 189 165
;
run;
Footnote ' ';
run;
Create Without FOOTNOTE to a dataset

data weight_club;
datalines;
1023 David red 189 165
;
run;

Footnote;
run;
The content of the report is very similar to the contents of the original data set but in colorfull.
Syntax: title <options>“Write your title here”;
1. Font is in Bold or Italic
Bold/Italic
2. Color: Text color.
Color=<color name>
3. BColor: Background color of Font
BColor=<BColor name>
4. Font: Facing of font type ex: Arial, Calibri, Algerian, Cambria, etc.,
Font =<Font type>
5. Justify: Justification of font
Justify= C/L/R
C = Center
L = left
R = Right
6. Link: specifies hyperlink.
Link=’<Url>’
7. Underlin: specifies whether the subsequent text is underlined. 0 indicates no underlining.
1, 2, and 3 indicates underlining.
U=0/1/2/3
Create a Colorful FOOTNOTE for dataset

data weight_club;
datalines;
1023 David red 189 165
;
run;
Footnote bold color='yellow' bcolor='green' font='algeriaN' Underlin=1 'List Input
Reported';
Footnote2 italic color='white' bcolor='orange' Underlin=3 font='Cambria' 'SAS System
reads forward direction at input buffer';
Footnote3 italic color='Purple' bcolor='Pink' font='bauhaus 93' Underlin=2 'Data
contains character and Numeric';
run;
Create colorful FOOTNOTEs to a dataset
data weight_club;
datalines;
1023 David red 189 165
;
run;
Footnote bold color='yellow' bcolor='green' font='algeriaN' Underlin=1 Justify=L 'List
Input Reported';
Footnote2 italic color='white' bcolor='orange' Underlin=3 font='Cambria' Justify=C

'SAS System reads forward direction at input buffer';
Footnote3 italic color='Purple' bcolor='Pink' font='bauhaus 93' Underlin=2 Justify=R
'Data contains character and Numeric';
run;
INFORMAT & FORMAT

Raw data is of 3 types:
1. Standard data
a. character (for e.g. female)
b. numeric (for e.g. 27)
2. Mixed data (for e.g. G100)
3. Non-standard data (date, time and amount/currency)
Non-standard data: Sometimes numeric data is mixed with special characters in date, time and
amount data. This type of raw data is called non-standard data.
Informat statement: Using the informat statement or technique you can read non-standard data
into standard (numeric) format for reading and loading. You can write the informat statement in
the dataset block.
If you take a date value in any date format and read it into numeric format this value is called
SAS date value.
SAS date value: It is the number of days difference between the present date and the SAS
discovery date (i.e. 01/01/1960 or 01 Jan 1960 00:00:00)
e.g. 01/01/1961 (non-standard) ---> informat technique ---> number (standard) ---> 365
Syntax: informat <variable name> <informat technique>;

Format statement: Using the format statement or technique you can convert standard data into
nonstandard format for reporting. You can write the format statement in the procedure block.
e.g. 01/01/1961 (non-standard) ---> informat technique ---> number (standard) ---> 365 --->
format technique ---> 01/01/1961 (non-standard)
Syntax: format <variable name> <informat technique>;
Date value (4-digit year)

Data Value Informat Format
23/02/2005 ddmmyy10. ddmmyys10.
23-02-2005 ddmmyy10. ddmmyyd10.
23:02:2005 ddmmyy10. ddmmyyc10.
23.02.2005 ddmmyy10. ddmmyyp10.

23 02 2005 ddmmyy10. ddmmyyb10.
23022005 ddmmyy8. ddmmyyn8.
Note: In the US format you should use mmddyy instead of ddmmyy.
Date value (2-digit year)

23/02/05 ddmmyy8. ddmmyys8.
23-02-05 ddmmyy8. ddmmyyd8.
23:02:05 ddmmyy8. ddmmyyc8.
23.02.05 ddmmyy8. ddmmyyp8.
23 02 05 ddmmyy8. ddmmyyb8.
230205 ddmmyy6. ddmmyyn6.

Date value (4-digit year and 2-digit year)
12Jan2006 date9. date9.
12Jan06 date7. date7.
Feb2006 monyy7. monyy7.
Feb06 monyy5. monyy5.
Code level date or Julian date:

In a julian date the maximum and minimum length is 7 digits. In the 7 digits the first 4
digits indicate the year and the next 3 digits indicate the number of days completed in that year.
Date value (julian date and long date)

2006032 julian7. julian7.
March 12, 2006 worddate18.
Saturday, Mar 12, 2006 weekdate24.

2006 Year.
06 Month.
24 Day.
6 Weekday.
Monday downame.
January monname.

Time value
(consists of day, month, year, hours, minutes and seconds.)
Time Value Informat Format
02:23:23PM time10. timeampm12.
10:25:23AM time10. timeampm12.
10:25:23 time8. time8.
18:30:50 time8. time8.
Note: If sec > 59 SAS increases min

If min > 59 SAS increases hour
If hour > 23 SAS increases day

Date & Time value
DateTime Value Informat Format
23oct2005:10:34:30AM datetime20. dateampm22.
23oct2005:06:35:40PM datetime20. dateampm22.
23oct2005:10:25:23 datetime18. datetime18.
23oct2005:18:30:50 datetime18. datetime18.
SAS date and time value (01jan1960:00:00:00)

If you want to insert date and time values in a single variable in this process SAS requires that
date value must be available in date9. format. Therefore for e.g. 23/10/2005:18:35:40 is invalid
raw data.
AnyDate and AnyTime

AnyDate: To read anydate informat technique from raw data.
Informat technique: anydtdte.
To read different informat dates

data anydate;
input date : anydtdte.;
Format date date9.;
datalines;
23/05/2018

23-05-18
23May2018
May2018
2018157
run;
AnyTime: To read anyTime informat technique from raw data.

Informat technique: anydttme.
To read different informat times

data anytime;
input time : anydttme.;
Format time timeampm12.;
datalines;
10:23:45am
02:45:56pm
06:56:34
18:32:46
00:16:23
run;

AnyDateTime: To read anyDateTime informat technique from raw data.
Informat technique: anydtdme.
To read different informat datetimes

data anydatetime;
input datetime : anydtdtm.;
Format datetime dateampm22.;
datalines;
23Jan2015:10:23:45am
15Jun2006:02:45:56pm
16mar2018:06:56:34
17Apr2007:18:32:46
03Aug1998:00:16:23
run;
Score & Currency values

Score / Currency Informat Format
25,000 comma6. (commaw.) comma6. (commaw.)
$25,00,000 dollar10. (dollarw.) dollar10. (dollarw.)
25,000.12 Comma9.2 (commaw.d) Comma9.2 (commaw.d)
$25,00,000.23 dollar14.2(dollarw.d) dollar14.2(dollarw.d)
25,00,000 words20. (wordsw.)
2356 Best4.
145.235 Best7.
2356 Z4.
145.235 Z7.3

Here ‘w’ stands for width and depends on the raw data.
Here ‘w’ stands for width and ‘d’ stands for decimal places.

data company2;
input cname $ invest1 invest2;
informat invest1 dollar13.2 invest2 comma9.;
datalines;
Novartis $48,000,000.12 80,00,000
Reddys $4,456,000.1 30,00,000
;
title 'Formats in words';
proc print data=company2;
options center nodate;
format invest1 dollar13.2 invest2 words20.;
run;
Percent value
Percent Value Informat Format
50% percent3. (percentw.) Percent5. (percentw.)
% should be counted as 3 values, if 20% then 5 values.

Procedures
1.Print:
Syntax: proc print data=<dataset name>; run;
2. Dataset:
Syntax: Proc datasets library=<library name>; run;
3. Contents:
Syntax: Proc contents data=<dataset name>; run;
4. Dataset and Contents:
Syntax: Proc datasets library=<library name>;
contents data=<dataset name>; run;

Transformations
Sorting procedure: Using the sorting procedure you can arrange the data in a certain order
either ascending or descending. The default is in ascending order.
Syntax: proc sort data = <existingdatasetname> out = <newdatasetname>;
by <option> <sorting variable>;
run;
Out option: It creates an output dataset and stores the information after analysis or analyzed
information.
By statement: It is an analysis statement. It requires an analysis variable for sorting.
Descending option: Using the descending option you can arrange the data in descending order.
Note: Number sorting works based on numeric values on the number line.
Character sorting works based on ASCII (American Standard Code for Information Interchange)
values.
Uppercase letters ASCII values < lowercase letters ASCII values (A=65, B=66 up to Z=90 and
a=97, b=98 up to z=122).
Arrange the data in a certain order either ascending or descending

Data Cleaning
Data validation – to handle missing values and range values
Data scrubbing – to control data duplicates
Data migration – converting from one format to another format.
In data cleaning, one of the process is Data scrubbing.
Data Scrubbing → To handle duplicates.
How to handle duplicates:

1. Finding duplicates
2. Reporting the duplicates
3. Replacing or deleting the duplicates.
1. Finding duplicates: Duplicates are two types.
a. Duplicate data values.
b. Duplicate observations.
Necessity of finding duplicates: to eliminate duplicate data and for data accuracy.
DW → database → Table → RDBMS → Dimension table and Fact table.
In data warehouse environment or database environment, the data will be loaded in
tables. These tables are existed using RDBMS (relation data base management system) process.
In this process, tables are two types.
Dimension table: It contains unique values or records.

Fact table: It contains facts or measurements. (contains calculations or calculable data).
→ Duplicate data values are existed in dimension table.
→ Duplicate observations are existed in fact table.
How to eliminate or recognize Duplicate data values:

Note: The same data value repeated more than one time in same variable. Here key variable is
only one, eg., eid, or pid, pan number, etc. unique numbers.
Duplicate observation: The same observation repeated more than one time in data. Here key
variable will be more than one variable. eg. eid with month, pid with visit, etc.,

Duplicate data values mostly come from databases (excel files and txt files, pharma and
agriculture data).
Note: Using sorting procedure in SAS v9.0 duplicates can be deleted.
Using sorting procedure in SAS v9.1 onwards, duplicates can be reported and deleted.
→ Nodupkey option: Using the nodupkey option you can eliminate duplicate data values using
the key variables or sorting variables.
Syntax: proc sort data = <existingdatasetname> out = <newdatasetname> nodupkey;

by <option> <sorting variable>;
run;
Eliminate duplicate data values using the key variables or sorting variables
nodup or noduprec option: Using the nodupkey, nodup or noduprecs options you can eliminate
duplicate observations from the transaction file.

Dupout: Creates output dataset for duplicates.
Print procedure: This generates output in output window. These outputs can be called as listing
outputs.
1. SAS generates four types of reports.
2. Detail report (listing output).
3. Summarized report (table output).
4. Customized report (understanding format i.e., Title, footnotes etc.,).
5. Statistical tables and graphs.

PRINT Procedure Options:
1.Noobs: Using the noobs option you can remove the obs column from the output. Default is
“obs”
Remove the obs column from the output.
2.Double: Using the double option you can give a gap between the observations for the report.
Double the gap between the observations for the report.
3.Heading: Using the heading option you can report the variable names in horizontal order or
vertical direction. Default value is “horizontal”.
Report the variable names in vertical order.
4.Width: Using the width option you can give a gap between the columns. Width value can be
either minimum or full. Default value is “minimum”.
Gap between the columns for the report.

Use multiple options according to your requirement. Internally 3 options are working by default.
They are obs, heading=horizontal and width=minimum.
5.N option: Displays number of observations generated by the print procedure.

Use: To report total number of observations controlled by print procedure. Can be used instead
of ‘obs’ to reduce variable column.
Report total number of observations.
PRINT Procedure Statements:

1.Label statement: Changes the column headings (variable names for reporting). xLabel
statement creates labels, these labels can be taken up to 256 characters.
syntax: label old variable name = new label name for the variable
Rename the variable names

Label statement: Split character: Generates break in labels. Default split character is space.
Split option: Using the split option you can indicate split characters in labels. You can use any
character (for e.g. * / #) for a split character except semicolon.
Rename and split the variable names
2.Var statement: It requires variables, which variable values converted as observation or data
values. Runs on procedure step.
Syntax: var <variable name> <variable name>;

Report the specific variable.
3.Id statement: Using the id statement you can print the required variable at the
beginning/starting of the output window instead of the obs column.
Syntax: id <variable name> <variable name>;
Report beginning with Pid variable and remove obs column.
4.Null Id statement: It works like the noobs option. The noobs option removes the obs column
whereas the null id statement replaces the obs column with a null column or blank spaces.
Syntax: id;
Report beginning with Pid variable and remove obs column.

5.Sum statement: Using the sum statement you can run column-wise sums.
Syntax: sum <variable name> <variable name>;
To report the total no. of patients who received treatment
Report: The total no. of patients who

received treatment in the overall study
center wise.

6.Pageby statement: Using the pageby statement you can report the data in different pages
using a grouping variable.
To report the total no. of patients who received treatment - page wise
Page 1:
Page 2:
Page 3:
TASK:
Data values related to customer credit limit information.
1 2 3 4
Visa Jan 80 8000000
Master Jan 70 5000000
Visa Feb 85 9000000
Master Feb 72 6000000
Visa Mar 83 8500000
Master Mar 70 6000000
Visa Apr 82 8000000
Master Apr 84 7000000
The numbered fields represent

1 the name of the card type
2 usages of card in a month

3 no of customers use different card type in a month.
4 limits of the card type.
Assignment:
a. To report the total credit limit.
b. To report the no. of customers who have taken Visa credit card and Master credit card.
c. To report the no. of customers who have taken different credit cards and their total credit
limit monthly wise.
d. To report the total credit limit credit card wise.
a. b.
proc print data=crdcust; proc sort data=crdcust out=crdcust1;
sum amount; by descending ctype;
run; run;
proc print data=crdcust1;
sum cust;
by descending ctype;
run;
c. d.
proc sort data=crdcust out=crdcust2; proc sort data=crdcust out=crdcust3;
by month; by ctype;
run; run;
proc print data=crdcust2; proc print data=crdcust3;
sum cust amount; sum amount;
by month; by ctype;
run; run;
Import Procedure
Import: Using the infile statement you can access raw data from internal/external files or
location into the SAS internal memory location.
For text file (*.txt)

Syntax: infile ‘<file path>’ <infile options>;

For tab file (*.txt)
Syntax: infile ‘<file path>’ <infile options> dlm=‘09’x;
For csv file (*.csv)
Syntax: infile ‘<file path>’ <infile options>;
Filename statement: Using the filename you can access path from external files or location into
the SAS internal memory location.
Syntax: filename <filename> <file path>;
Ex: filename Num “F:\SASFiles\Numbers1.txt”;
For text file (*.txt)

Syntax: infile Num <infile options>;
For tab file (*.txt)
Syntax: infile Num <infile options> dlm=‘09’x;
For csv file (*.csv)
Syntax: infile Num <infile options>;
Import procedure: To access the data from PC files to SAS via import procedure.
Syntax: proc import datafile= “<file path>”
out=<libref.dataset name> <dataset options>
dbms=<identifier> replace;
<statements>;
run;
Datafile: It indicates file location
Out: It indicates the output dataset name
Dbms: It indicates the database type
Replace: overwrite an existing SAS dataset.
Syntax for tab file (*.txt)

proc import datafile= “<file path>” proc import datafile= “<file path>”
out=<dataset name> <options> out=<dataset name> <options>
dbms=dlm replace; dbms=tab replace;
delimiter=“09”x; run;
run;

Ex: proc import datafile= “F:\SASFiles\class.txt”
out=class1
dbms=dlm replace;
delimiter=“09”x;
run;
Syntax for delimiter (data values separation by @, #, $ …. etc.,) file (*.txt)
proc import datafile= “<file path>”
out=<dataset name> <options>
dbms=dlm replace;
delimiter=<delimiter>;
run;
Ex: proc import datafile= “F:\SASFiles\class.txt”
out=class1
dbms=dlm replace;
delimiter=“@”;
run;
Syntax for csv file (*.csv)
proc import datafile= “<file path>” proc import datafile= “<file path>”
out=<dataset name> <options> out=<dataset name> <options>
dbms=dlm replace; dbms=csv replace;
delimiter=”,”; run;
run;
Ex: proc import datafile= “F:\SASFiles\class.csv”
out=class1
dbms=csv replace;
run;
Excel
Syntax for excel file (*.xls or *.xlsx)

<statements>;
run;

Ex: proc import datafile= “F:\SASFiles\class.xls”
out=class1
dbms=xls replace;
run;
Statements:
1.sheet: Using the sheet statement you can indicate the required sheet for importing.
Syntax: sheet = “<sheet-name>$”;

<statements>;
run;
out=class1
dbms=xls replace;
sheet= ”clsdata$”;
run;
2.Getnames: It is a default statement for the import procedure and works with default value as
“yes”. Sometimes the raw data is available without variable names. In such cases you should call
the getnames statement with value as “no” otherwise SAS recognizes the starting row of the raw
data as variable names.
syntax: getnames=no;

<statements>;
run;

out=class1
dbms=xlsx replace;
getnames=no;
run;
3.Range Statement:
Using the range statement, you can access part of the data from excel sheet to SAS. To access
part of the data from excel to SAS using cell ranges.
Syntax: range = “<Sheet name>$<cell range>”;

<statements>;
run;
out=class1
dbms=xlsx replace;
range= ”clsdata$a1:e11”;
getnames=no;
run;
Note:
1. In part of the data, if the data is available without variables you will use getnames statement
with “no” value.
2. If you use range statement no need to use sheet statement.
3. All the dataset options you can use in import procedure except firstobs obs.
To import part of the variables and change variable names in loading time
out=class1 (keep=patid age color rename=(patid=subid))
dbms=xlsx replace;
run;

out=class1(keep=patid age color rename=(patid=subid))
dbms=xlsx replace;
range= ”clsdata$a1:e11”;
run;
Note:
Using the import procedure, you can access the data from the excel file only one sheet at a time
Using the import procedure, you are unable to read mixed data because default one statement is
working (mixed statement) with “no” value.
Datatypes for variables in import procedure:

IF no. of numeric values > no. of character values in raw data column THEN dataset variable
datatype IS numeric
IF no. of char values > no. of numeric values in raw data column THEN dataset variable datatype
IS character
IF no. of numeric values = no. of character values in raw data THEN dataset variable datatype IS
numeric.
Data values in Excel columns Data type for SAS

All values numeric Numeric
All values Character Character
Numeric values > character values Numeric
Numeric values < Character values Character
Numeric values = Character values Numeric
4.Mixed: To import mixed values in columns data

Syntax: mixed = “yes”;
To import mixed data values in column data

out=class1 (keep=patid age color rename=(patid=subid))
dbms=xlsx replace;
mixed=”yes”;
run;

Advanced features in import procedure in V9.2. For excel file (V9.2 only)
Syntax: startcol = ‘column number’; (1 to 256 columns)
Endcol = ‘column number’; (1 to 256 columns)
Startrow = <Rownumber>; (1 to 65000 rows)
Endrow = <row number>; (1 to 65000 rows)
Namerow = <rownumber>; (1 to 65000)
Note: Startcol & Endcol with quotations and rows without quotations
MS Access
Syntax for access file (*.mdb, *.accdb.. etc.,)

proc import table= <table name>
database = ”<file path>”;
<statements>;
run;
Ex: proc import table = demo
Out = clinical
dbms=access replace;
database = “H:\Studies\SAS_Books\SASpath\source\Accs\CDM.mdb”;
run;
Note: When you import from access table to SAS, SAS default recognizes length for numeric
variable 12 numbers and length for character variables 50 characters.
To import part of the variables from access table to SAS (*.mdb, *.accdb.. etc.,)
proc import table= <table name>
database = ”<file path>”;
<statements>;
run;
Ex: proc import table = demo
Out = clinical (keep= Sno pid age rename=(Sno=SerialNo))
database = “H:\Studies\SAS_Books\SASpath\source\Accs\CDM.mdb”;
run;

DBMS Specifications
DBMS Output Data Source File Extension

ACCESS Microsoft Access 2000, 2002, 2003, 2007, .mdb, .accdb
2010, and later table. The ACCESS
LIBNAME engine is used when
DBMS=ACCESS.
ACCESSCS Microsoft Access table connecting remotely .mdb, .accdb
through SAS PC Files Server using the
PCFILES LIBNAME engine.
CSV Delimited file with comma-separated .csv
values
DBF dBASE 5.0, IV, III+, and III files .dbf
DBFMEMO dBASE 5.0, IV, III+, and III files with memos .dbf, .fpt, .dbt
FoxPro and Visual FoxPro files with memos
DLM Delimited file (default delimiter is a blank) .*
DTA Stata file .dta
EXCEL Microsoft Excel 97, 2000, 2002, 2003, 2007, .xls, .xlsb, .xlsm,
2010, and later workbook using the .xlsx
LIBNAME statement.
EXCEL4, EXCEL5 Microsoft Excel 4.0, Excel 5.0 or 7.0 (95) .xls
workbook.
EXCELCS Microsoft Excel workbook connecting .xls, .xlsb, .xlsm,
remotely through SAS PC Files Server. .xlsx
JMP JMP files in Version 7 and later format. .jmp
PARADOX Paradox DB files .db
PCFS (SAS PC Files Microsoft Excel workbook files, JMP files, .xls, .jmp, .sav, .dta
Server) SPSS files, and Stata files connecting
remotely through SAS PC Files Server.
SAV SPSS file .sav
TAB Delimited file (tab-delimited values) .txt
WK1 Lotus1-2-3 Release 2 spreadsheet .wk1
WK3 Lotus 1-2-3 Release 3 spreadsheet .wk3
WK4 Lotus 1-2-3 Release 4 or 5 spreadsheet .wk4
XLS Microsoft Excel 5.0, 95, 97, 2000, 2002, or .xls
2003 workbook using file formats
XLSX Microsoft Excel 2007 and later workbook .xlsx
using file formats

Note:
Transcoding is not supported for DBMS=XLS. Attempted execution of this operation
yields unpredictable results. Use DBMS=EXCEL or DBMS=EXCELCS with the SAS PC Files
Server as an alternative. Or, if your file has more than 255 columns, save the .xls file as .xlsx to
support transcoding.
Export Procedure
Export procedure: To export the data from SAS environment to external environment (PC).
Syntax: proc export outfile= “<file path>”
data=<libref.dataset name> <dataset options>
<statements>;
run;

proc export outfile= “<file path>” proc export outfile= “<file path>”
data=<dataset name> <options> data=<dataset name> <options>
dbms=tab replace; dbms=tab replace;
run; delimiter=”09”x;
run;
Ex: proc export outfile= “F:\SASFiles\class.txt”
data=class1
dbms=tab replace;
run;
Statements:
putnames: It is a default statement for the export procedure and works with default value as
“yes”. Read variable names as column names to the first row of the exported. IF putnames
statement with value as “no”, SAS variable names are skipped, and the columns are left
unlabeled.

proc export outfile= “<file path>”
data=<dataset name> <options>

dbms=tab replace;
putnames=<yes/no>;
run;
data=class1
dbms=tab replace;
putnames= yes;
run;
Syntax for csv file (*.csv)
proc export outfile= “<file path>”
dbms=csv replace;
putnames=<yes/no>;
run;
data=sasuser.admit
dbms=csv replace;
putnames= yes;
run;
TXT File or delimiter file: Export data into txt files or delimiter files. This process can be done
using dataset block and export procedure.
Using Dataset block: This process can be called loading process or reporting process.
Loading Process: Upload the data into delimiter files without variable names or with variable
names.
Reporting process: Upload the data with variable names and adding titles, footnotes and some
other reporting specifications.
To load the data in delimiter (text) file using dataset block:
_null_: If you want to run a group of statements without creation dataset using dataset block in
this case you will use dataset name _null_.
File statement: To mention file location.
Put statement: Using put statement you can print data values or some text in external file
(delimiter file) or log window.
Dlm option: Using dlm option you can indicate delimiter for external environment or external
file.

Upload the data into txt files without variable names
syntax: data _null_;
set <lib ref>.<dataset name>;
file “<outfile path>” <dataset options> <infile options>;
put <var 1> <var 2>……<var n>;
run;
Ex: data _null_;
set sashelp.class;
file “F:\SASFiles\class2.txt” dlm=‘,’;
put name sex age height;
run;
Note: Data _null_ is the most important concept in pharma and financial domains to generate
reports.
To load the data in specific columns: @n option: It is a column hold pointer, n specifies
column number
syntax: data _null_;
put @n1<var 1> @n2<var 2>……@n3<var n>;
run;
Ex: data _null_;
set sashelp.class;
file “F:\SASFiles\class2.txt”;
put @5 name @15 sex @20 age @25 height;
run;
Note: file statement creates pc files in txt or rtf (word) format.

Data n: It is a default dataset name when you submit the dataset block without dataset name.
syntax: data;

put @n1<var 1> @n2<var 2>……@n3<var n>;
run;
Ex: data;
set sashelp.class;
file “F:\SASFiles\class3.txt”;
put @5 name @15 sex @20 age @25 height;
run;
syntax: data; Ex: data;
set <lib ref>.<dataset name>; set sashelp.class;
run; run;
Excel
To export the data from SAS environment to external environment (PC).
Syntax for Excel file (*.xls or *.xlsx)

syntax: proc export outfile= “<file path>”
dbms=xls or xlsx replace;
<Statements>;
run;
Ex: proc export outfile= “F:\SASFiles\class.xls”
data=sasuser.admit
dbms=xls replace;
run;
In exporting time SAS default takes dataset name as sheetname.

Sheet statement: You can assign the sheet name for exporting.
Note: During exporting $ symbol is not required to be mentioned after sheet name
Syntax for Excel file (*.xls or *.xlsx)

syntax: proc export outfile= “<file path>”
dbms=xls or xlsx replace;
Sheet=”<sheetname>”;
run;

data=sasuser.admit
dbms=xls replace;
sheet=”admin”;
run;
To export part of the variables
data=sasuser.admit (rename=(sex=gender date=days) keep=name sex date fee)
dbms=xls replace;
sheet=”admin”;
run;
To export part of the data in sequential order
data=sasuser.admit (firstobs=5 obs=15 rename=(sex=gender) keep=name sex)
dbms=xls replace;
sheet=”admin”;
run;
MS Access
Syntax for access file (*.mdb, *.accdb.. etc.,)

syntax: proc export outtable= <table name>
database= ”<file path>”;
<Statements>;
run;
Ex: proc export outtable= demo
data=Sasuser.admit
database=“F:\SASFiles\admin.mdb”;
run;
To export part of the data and part of the variables (*.mdb, *.accdb.. etc.,)
Ex: proc export outtable= demo
data=Sasuser.admit (keep=Id Name)

database=“F:\SASFiles\admin.mdb”;
run;
Export Import
S.No File Type Data-block Procedure-block Data-block Procedure-block
1. Text file Yes Yes Yes Yes
2. Tab file -- Yes Yes Yes
3. Csv file -- Yes Yes Yes
4. Excel -- Yes -- Yes
5. Access -- Yes -- Yes
TASK:
Raw data (patinfor.txt and researchinfor.xls)
Q1a.TXT (source system) (extraction) ---> SAS file (transformation) ---> SAS file (loading) --->
access file (target).
Q1b. XLS (source system) (extraction) ---> SAS file(transformation) ---> SAS file (loading) --->
access file (target).
/* Extraction */
data one;
infile "D:\SAS\text\patinfor.txt";
input pid age gender $ color $;
run;
proc import datafile="D:\SAS\excel\researchinfor.xls"

out=two
dbms=excel replace;
sheet="sheet1$";
run;
/* Transformation */
proc sort data=one out=demo nodupkey;
by pid;
run;
proc sort data=two out=lab nodupkey;

by pid test units;
run;

/* Loading */
proc export outtable=demo1
data=demo
database="D:\SAS\trials.mdb";
run;
proc export outtable=lab1
data=lab
database="D:\SAS\trials.mdb";
run;
Append
Appending process: If you load the data in existed file or you can add raw data in existed file
this process you can call appending. If you take any database it supports to appending process
except to excel and access from other database.
Mod option: Using the mod option you can run appending process from SAS files to delimiter
(text) file.
/* dataset - 1 */
data medi1;
input gid $ week $ drug $;
cards;
G100 week3 Col5mg
G200 week3 Col10mg
G300 week3 Col15mg
;
/* dataset - 2 */
data medi2;
input gid $ week $ drug $;
cards;
G100 week6 Col5mg
G200 week6 Col10mg
G300 week6 Col15mg
;
/* dataset - 1 export to txt file (without variable) */
data _null_;

set medi1;
file “F:\SASFiles\medicine.txt";
put @5 gid @10 week @20 drug;
run;
/* Append dataset - 2 into dataset - 1 txt file */
data _null_;
set medi2;
file " F:\SASFiles\medicine.txt " mod;
put @5 gid @10 week @20 drug;
run;
To export the data with variables (as text):
/* export variables names in text file */

data _null_;
file “F:\SASFiles\class2.txt";
put @5 "Name" @13 "Gender" @20 "Age";
run;
/* export variables values into text file */
data _null_;
set sashelp.class;
file “F:\SASFiles\class2.txt" mod;
put @5 name @13 sex @20 age;
run;
Reports
Loading Environments (ETL): TXT, Excel, Access, Oracle, DB2 and 52 databases.
Reporting Environments SAS (OLAP): TXT, RTF, HTML, XML and PDF.
/* Reporting */
/* Upload title and variable names */
data _null_;
file "F:\SASFiles\report.rtf";
put @25 "LAB DATA";
put @22 "----------------------------";
put @20 "Name" @30 "sex" @40 "Age";
put @21 "-----" @30 "-----" @40 "-----";
run;
/* export variables values into text file */
data _null_

set sashelp.class (keep=name sex age);
file "F:\SASFiles\report.txt" mod;
put @20 Name @30 Sex @40 Age;
run;
/* Upload footnote */
data _null_;
file "F:\SASFiles\report.txt" mod;
put " ";
put @20 "-----------";
put @22 "Report Generated By SAS 9.4";
run;
Customized reports: If you generate the reports in TXT file or RTF file using data _null_.
To generate the reports in output window using dataset block.
Print option: Using the print option in file statement you can generate the reports in output
window.

Filter Transformation
Conditional Statements:
If you want to run any application (data reading, data manipulation, data management)
conditional based you will use conditional statements. Mainly conditional statements working
based on operators.
Conditional statements are of 2 types:
1. where statement
2. if statement
Operators are of 3 types:
1. Arithmetic operators
2. Comparison operators
3. Logical operators
Arithmetic operators:
Operator Meaning
+ Addition
- Subtraction
* Multiplication
/ Division
Comparison operators:
Operator Mnemonics Meaning

> gt Greater than
< lt Less than
>= ge Greater than or equal to
<= le Less than or equal to
= eq Equal to
~= (or) ^= ne Not equal to
Logical operators:
Operator Meaning
& and
| or

between
In
Not in
like
contains
Where Statement
Where statement: Using the where statement you can create subset of data for reporting
(temporary) or loading (permanent).
If you write where statement in procedure block it creates a subset of data for reporting. If you
write where statement in dataset block it creates a subset of data for loading.
/* For reporting */
proc print data=cls_M; Proc print data cls;
Where sex=‘M’; set sashelp.class;
run; where age between 12 and 13;
run;
/* For loading */
data cls_M;
set sashelp.class;
where sex='M';
run;
proc print data=cls_M;
run;
/* For reporting */
Proc print data cls;
set sashelp.class;
where age between 12 and 13;
run;
Where height >=59 and height <= 69;
run;

Where sex=‘F’ and height >= 60;
run;
Proc print data=cls_M;
Where sex=‘M’ or height >= 60;
Run;
/* To report missing data */

Where sex=‘ ‘;
run;
Where Age is null;
run;
Where height =.;
Run;
Where weight is missing;
Run;
/* To report missing observation */
Where sex=‘ ‘ or Age is null or Height=. or Weight=.;
run;
/* To report non-missing observation */
Where sex~=‘ ‘ and Age ~=. and Height~=. and Weight~=.;
run;
like operator: Using like operator you can run pattern matching part of checking process.
% - It indicates multiple characters
_ (underscore) – It indicates only one character
/* For report */
Where name like ‘J%’;

run;
Where name like ‘A_’;
run;
Where name like ‘J%e’; Run;
In operator: In operator works like or operator.
/* For report */
Where Age in (11 15);
run;
Where Age in (11 15);
run;
Not in operator: Not in operator works like not equal operator.
/* For report */
Where age not in (11 15);
run;
Where age not in (‘M’);
Run;
Note: ‘In’ and ‘not in’ operators can be used in character or numeric data, ‘between’ operator is
used only for numeric data.
In conditional statements you will get logic errors. If you run any application logical based you
will get logic errors.
Contains operator: Can be used only for character data, (checking only specified letter or
word). Used to run part of the checking anywhere in the given value.

Where Name contains ‘e’;
run;
Logic error: In conditional application processing time (execution time) if you take any
observation it does not satisfy the condition. This time you will get logic error.
Errors are of 2 types:
1. Compilation errors
2. Execution errors
1. Compilation errors (syntax errors)
2. Execution errors
a. Data errors (raw data mistake)
b. Logic errors (conditional statement mistake)
Where Option
Where option: Where option also one type of dataset option. Using with where option you can
create a subset of data temporary for reading and permanent for loading.
Syntax: where=(<expression>);
/* For reporting */
proc print data=cls_M (Where=(sex=‘M’));
run;
/* For loading */
data cls_M;
set sashelp.class (where=(sex='M‘));
run;
run;
Where statement & option: Major difference between where statement and where option:
Where statement you can use in dataset block and procedure block except SAS|Access
procedures. Where option you can use in SAS|Access procedures also. So, where option is more

efficient compared to where statement.
To access the data from another environment to SAS conditional based:

Interaction between excel to SAS:
Proc import datafile=“F:\SASFile\Class.xlsx”

Out=demo (where=(Age in (11 15))
Dbms=xlsx replace;
Sheet=‘demo$’;
Run;
Proc import datafile=“F:\SASFile\Class.xlsx”
Out=demo (keep=(Name sex Age Height) where=(Age <=15))
Dbms=xlsx replace;
Sheet=‘demo$’;
Run;
Proc export outfile=“F:\SASFile\Class1.xlsx”
data=sashelp.class (where=(Age in (11 15))
Dbms=xlsx replace;
Sheet=‘demo$’;
Run;
Proc import outfile=“F:\SASFile\Class.xlsx”
data=demo (keep=(Name sex Age Height) where=(Age <=15))
Dbms=xlsx replace;
Sheet=‘demo$’;
Run;
Where statement: Split the data into multiple datasets depending on missing and nonmissing
values: eq represents ‘=’.
data Missing;
set market;
where cname eq ‘ ’ or area eq ‘ ’ or invest eq . or sprice eq .;
run;
/* or can be written as: */
data Missing1;

set market;
where cname eq null or area eq null or
invest eq missing or sprice eq missing;
run;
proc sort data=market out=market2;

by cname;
where area is null or invest is missing or sprice is missing;
run;
Expression Transformation
Data processing flow:
Extraction → Validation → Cleaning → Manipulation → Managing → Maintenance
Data Manipulation: You can run data manipulation process using operators and functions
Data emp_salaries; data emp1;

input eid $ salary sale; set emp_salaries; /* Define new variable */
cards; New_Salary = Salary+1000;
E-234 2300 678 run;
E-245 4500 456
E-256 8900 567 data emp1;
E-456 3000 400 set emp_salaries; /* Redefine existing variable */
E-890 4500 300 Salary = Salary+1000;
E-235 7800 580 run;
E-267 5000 380
;
IF Statement
If then else block: If you want to run any statement conditional based you will use if then else
block.
syntax: if <condition> then <statement>;
else <statement>;
or
if <condition> then <statement>;

else if <condition> then <statement>;
else <statement>;
Data emp_salaries;
input eid $ salary sale;
cards;
E-234 2300 678
E-245 4500 456
E-256 8900 567
E-456 3000 400
E-890 4500 300
E-235 7800 580
E-267 5000 380
;
condition Increment Data emp2;

sale >=500 2000 set emp_salaries;
sale<500 1000 if sale >=500 then Newsalary = salary+2000;
else Newsalary = salary+1000;
run;
condition Increment Data emp3;
Sale>=500 2000 set emp_salaries;
Sale>400 <500 1500 if sale>=500 then Newsalary=salary+2000;
Sale>=300 <400 1000 else if sale>400 and sale <500 then
Sale <300 500 Newsalary=salary+1500;
else if sale >=300 and sale <=400 then
Newsalary=salary+1000;
else Newsalary=salary+500;
run;
condition Rating Data emp4;
Sale>=500 A+++ set emp_salaries;
Sale>400 <500 A++ if sale>=500 then Rating=‘A+++’;
Sale>=300 <400 A+ else if sale>400 and sale <500 then
Sale <300 A Rating=‘A++’;
else if sale >=300 and sale <=400 then
Rating=‘A+’;

else Rating=‘A’;
run;
Do block: To run multiple statements depending on conditions. Do block is used only with ‘if’
statement.
Create multiple variables depending on condition: with new variable: New salary
condition hike depending salary Rating

sale >=500 50% A+++
sale>=400 and <500 30% A++
sale <400 20% A+
Data emp5;
set emp_salaries;
if sale>=500 then do;
Newsalary=salary+(salary*0.5);
Rating=‘A+++’;
End;
else if sale >=400 and sale < 500 then do;
Rating=‘A++’;
End;
else do;
Rating=‘A+’;
End;
Run;
Alternative: Sdate = ’14 02 2005’ d; Sdate = ’02 14 2005’d;
data Medi;
input Pctno $ Sno $ Gno $ Drug $ @@;
datalines;
pct_250 S001 G100 Asp_05mg pct_250 S001 G200 Asp_10mg
; run;

Study design: Asp_05mg treatment starts on 15 Jan 2005, depending on asp_05mg safety date
after one week start Asp_10mg, depending on Asp_10mg safety date after two weeks start
Asp_15mg.
Data medi1;
set medi;
if drug=‘Asp-05mg’ then
date=’15Jan2005’d;
else if drug=‘Asp-10mg’ then
date=’15Jan2005’d+6;
else date=‘15Jan2005’d+20;
format date date9.;
Run;
Output Statement: Stores current observations in current dataset.

Syntax: output <dataset name>;
data demo;
Pid = 101; output demo;
Pid = 102; output demo;
Pid = 103; output;
run;
Split Transformation: Create multiple datasets by the dataset block.
Data Mh1 Mh2 Mh3;
set medi;
if drug=‘Asp-05mg’ then output Mh1;
else if drug=‘Asp-10mg’ then output Mh2;
else output Mh3;
Run;
If statement: Using if statement you can create subset of data.
data red_team;
input Team $ 13-18 @;
if Team='red';
input IdNumber 1-4 StartWeight 20-22 EndWeight 24-26;
datalines;

1023 David red 189 165
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
proc print data=red_team;
title 'Red Team';
run;
Multiple Datasets
/* Set Statement */
data dta1 (keep=Name Age Weight)
dta2 (drop=Name Age Weight);
set sashelp.class;
run;
data dta3 (where =(age between 12 and 13))
dta4 (where=(Name like ‘J%’));
set sashelp.class;
run;
SAS variables:
_all_: This specifies all types of data character and numeric.
_character_: Enables to specify all character variables.
_numeric_: Enables to specify all numeric variables.
/* Set Statement */
data dta3 (where =(age between 12 and 13))
dta4 (Keep = _character_);
set sashelp.class;
run;
data dta3 (keep = _all_)

dta4 (Keep = _character_);
dta5 (Keep = _numeric_);
set sashelp.class;
run;
/* Set Statement */
data dta5 (keep=Name Age Weight)
dta6 (drop=Name Age Weight)
dta7 (where=(age >12 and age <15))
dta8 (where=(Name like 'J%'));
set sashelp.class;
Run;
data dta9 (where=(Name contains 'e'))
dta10 (where=(Name like 'J%'));
set sashelp.class;
if Age=15 then newwight=weight+2;
else newwight=weight;
run;
Loops
Loop processing: Used to run statement or statements multiple times.
Loop requirements: 1. Loop variable. 2. Condition. 3. Increment/Decrement.
Loops are of three types: 1. Do while. 2. Do until. 3. Do loop.
Do while: Run the loop while the condition is true or until the condition is false.
Do until: To run the loop while the condition is false or until the condition is true.
Do loop: Works like ‘do while’.
Real time side ‘loop’ is used to generate data values.
1. Do while: Run the loop while the condition is true or until the condition is false.

Report the numbers from 1 to 20?
data cust_house (drop=i); Output
i= 1; 1
do while (i <= 20); Loop 2
continues 3
cust_no = i; output; up to < = .
i = i+1; 20
.
end; .
run; .
18
proc print data=cust_house; 19
run; 20
Using only one variable:

data cust_house; Output
custno = 100; 100
do while (custno <= 130); 101
103
custno = custno; output;
.
custno = custno+1; .
end; .
run; .
128
run; 130
Task: dataset: cust_vehc

custno 100,98,96,94 ….50

data cust_vehc; Output
custno = 100; 100
do while (custno >= 50); 98
96
custno = custno; output;
.
custno = custno-2; .
end; .
run; .
54
proc print data=cust_vehc; 52
run; 50

Nested loop: A loop will be existed inside another loop; this whole loop is called as nested loop.
In each circle (3), counselling 5 patients?

data Counselling; Circle Pid
Circle = 1; 1 1
do while (Circle <= 3); 1 2
1 3
Pid = 1; output;
1 4
do while (Pid <= 5); 1 5
Circle = Circle; 2 1
Pid = Pid; output; 2 2
Pid = Pid+1; . .
end; . .
Circle = Circle+1; . .
. .
end;
. .
run; . .
. .
proc print data=cust_vehc; 3 3
run; 3 4
3 5
2. Do loop: Works like ‘do while’ but with simple statement.

data cust_house; Output
do i=100 to 130 by 1; 100
i = i; output; 101
.
end;
.
run;
.
.
run; 130
Global statements: title, footnote, DM statements

proc print data=cust_house;
footnote ‘information’;
run;
Data Conversations: char -> char, num -> num, char -> num, num -> char
Data cvr;
Input id gender age race $ color $;
Datalines;

P101 1 18 As w
P102 2 23 Af b
P103 1 25 As w
P104 2 28 Af w
Run;
Gender 1 = Male, 2 = Female; Race As = Asian, Af = African; Color w = White, b = Black
/* Transformation for data conversion */

data cvr1;
set cvr;
length gender $ 5 Race $ 7 color $ 5;
if gender = 1 then gender = ‘Male’;
else gender = ‘FeMale’;
if Race= ‘As’ then gender = ‘Asian’;
else Race= ‘African’;
if color= ‘w’ then color= ‘White’;
else color= ‘Black’;
run;
proc print data=cvr1;

run;
>>> Create a dataset for contents information of class from sashelp library.
Proc contents data=sashelp.class out=cls;
Run;
Customized reports
If you generate the reports in TXT file or RTF file using data _null_.
To generate the reports in output window using dataset block:
Print option: Using the print option in file statement you can generate the reports in output
window.

Generating output in log window by dataset block:
Options nosource nonotes;

dm log ‘clear’;
data _null_;
set sashelp.class (keep = name age);
put @25 name @35 age;
run;

Backend Process
Input stack: It is a logical memory unit and default storage place of applications.
Word scanner: It is an input layer between input stack and compilers, it controls tokenization
process.
Tokenization: Pass keywords to requested compilers. Keywords called as token.
PDV (Program Data Vector): Capture observations one-by-one to run data error checking
process using two automatic variables _N_ and _ERROR_.
_N_: Returns number of the observation or iteration.
_ERROR_: It writes two values 1 or 0.
1 = error in observation
0 = no error in observation.
Dataset: To submit reports for below data contains Pid week drug
101 week3 Col5mg

102 week3 Col10mg
103 week3 Col15mg
101 week6 Col5mg
102 week6 Col10mg
103 week6 Col15mg

To report or capture each subject’s first information?
Proc sort data=medi out=medi1;
By pid;
Run;
Data first;
Set medi1;
by pid;
If first.pid=1;
run;
To report or capture each subject’s last information?
By pid;
Run;
Data first;
Set medi1;
by pid;
If last.pid=1;
run;
To report or capture each subject’s information except first?
By pid;
Run;
Data first;
Set medi1;
by pid;
If first.pid=0;
run;
To report or capture each subject’s first and last information?
By pid;
Run;
Data first;
Set medi1;
by pid;
If first.pid=1 or last.pid=1; /* or */ If first.pid=1 then output;
run; If last.pid=1 then output;
run;
To report subject’s information whoever visited only once?
By pid;
Run;

Data once;
Set medi1;
By pid;
If first.pid=1 and last.pid=1;
run;
To report or capture each subject’s information except first and last?
By pid;
Run;
Data once;
Set medi1;
By pid;
If first.pid=0 and last.pid=0;
run;
(In this case 0, 1 are called flag values or Boolean values).

DCF: data clarification form
Qs: Report the duplicates?
Dataset name dup; variables: pid age gender
101 24 Female
102 45 Male
102 56 Female
103 34 Male
104 56 Male
104 23 Female
106 23 Female
/* template */
data _null_;
file 'F:\SASFiles\dcf.txt';
put @10 'Dataset' @20 'Variable' @30 'Value' @40 'Obs';
run;

/* upload values*/
proc sort data=dup;

by pid;
run;
data _null_;
set dup (keep=pid);
by pid;
file 'F:\SASFiles\dcf.txt' mod;
if first.pid=0 then put @10 ‘dup' @20 'pid' @30 pid @40 _n_;
run;
Store a dataset without duplicates?

Proc sort data=dup;
By pid;
Run;
data dup1;
set dup;
by pid;
if first.pid=1;
run;
Qs: Report the duplicate observations?

Dataset name dup_obs; variables: pid visit dose
101 1 0.05
101 2 0.05
101 3 0.05
101 3 0.05
102 1 0.1
102 2 0.1
102 2 0.1
102 3 0.1
/* template */

data _null_;
file 'F:\SASFiles\dcf.txt';
put @10 'Dataset' @20 'Variable' @30 ‘observation no' @40 ‘Obs';
run;
/* upload values*/
data _null_;
set dup_obs;
by pid visit;
file 'F:\SASFiles\dcf.txt' mod;
if first.pid=0 and first.visit=0 then put @10 ‘dup_obs' @20 'pid visit' @30 _n_ @40 pid visit
dose;
run;
Functions
Data step Functions: used in data set block.

Arithmetic, Aggregate, String & Date and Time Functions: can be used in data set block and
procedure block.
Data step Functions
1. Exist function: return values for dataset existed or not. It returns two values 1 or 0
1 = existed; 0 = does not existed
Syntax: exist (‘dataset name’)
data _null_;
file print;
If exist ('sashelp.class')=1 then put 'dataset is existed';
Else put 'dataset is not existed';
Run;

data _null_;
file print;
If exist ('F:\SASFiles\dcf.txt')=1 then put 'dataset is existed';
Else put 'dataset is not existed';
Run;
Arithmetic Functions
1.Int function: return the integer values.
Syntax: <new or existed var name> = int (var);
2.Round function: Round up the nearest integer or decimal places.

Syntax: <new or existed var name> = round (var);
<new or existed var name> = round (var, 0.1); for decimal
<new or existed var name> = round (var, 0.02); for 2nd decimal
3.Ceil function: Round up the highest integer.

Syntax: <new or existed var name> = ceil (var);

4.Floor function: Round up the lowest integer.
Syntax: <new or existed var name> = floor (var);
5.Sqrt function: Returns the square root values.

Syntax: <new or existed var name> = sqrt (agrument);
6.Fact & gamma function: To compute the factorial of a number, use the DATA step function
FACT. For example, the following statement computes 6!
Syntax: <new or existed var name> = fact (agrument);
Alternatively, you can use the GAMMA function to obtain the factorial of a number. For positive
integers, GAMMA(X) is (X-1)! . For example, the following statement computes 6!
Syntax: <new or existed var name> = gamma (agrument);
7.Abs (absolute) function: convert each one into positive format

Syntax: <new or existed var name> = abs (agrument);

8.Log function: returns the log ebase values (Natural (base e) logarithm.)
Syntax: <new or existed var name> = log (agrument);
Natural (base e) logarithm: The number e frequently occurs in mathematics (especially

calculus) and is an irrational constant (like π). Its value is e = 2.718 281 828 ...
Apart from logarithms to base 10 which we saw in the last section, we can also have logarithms
to base e. These are called natural logarithms.
9.Log10 function: returns the log10 base values.

Syntax: <new or existed var name> = log10 (agrument);
10.Mod function: Returns the remainder of the division of elements of the first argument by
elements of the second argument.
The arguments to the MOD function are as follows:
Value is a numeric matrix or literal that contains the dividend.
Divisor is a numeric matrix or literal that contains the divisor.
Syntax: <new or existed var name> = mod (value, divisor);

Prime numbers & composite numbers:
Prime numbers: Numbers that have two factors are called prime numbers.
composite numbers: Numbers that have more than two factors are called composite numbers.
The trick here is defining your algorithm - from 1-100

data primes; Loop from 1 to 100
do i=1 to 100;
*classification code;
end;
run;
data primes; Prime is divisible by a
do i=1 to 100; number less than the
do j=2 to i-1; number - which adds
*classification code; another loop. This
loop needs to start at
end;
2, since all numbers
end; are divisible by 1.
run;
data primes; In SAS we can use the
do i=1 to 100; MOD( ) function to
do j=2 to i-1; get the remainder of a
if mod(i, j) = 0 then do; division operation. A
status='Composite'; prime number will
never have a
leave; *exit loop; remainder that is 0.
end;
end;
end;
run;
data primes; We can refine the
length status $12.; code by adding the
do i=1 to 100; initial status of prime
status='Prime'; and using an explicit
do j=2 to i-1; output to view the
results. A number that
if mod(i, j) = 0 then do; is never divisible
status='Composite'; retains the status of
leave; *exit loop; prime, while the
end; others are assigned to

end; composite.
output;
end;
run;
Trimorphic numbers:
Trimorphic number is a number whose cube (expressed in a given base) ends in the number
itself. For example, 43 = 64, 243 = 13824.
data trimorphic;
i=10;
do while (i<=100);
i=i;
j=i*i*i;
k=mod(j, 100);
if k=i then status=‘trimorphic';
else status=‘Not’; output;
i=i+1;
end;
run;
proc print data=trimorphic;
run;
11.Dif function: Difference between data values in column wise.

Syntax: <new or existed var name> = dif (variable);
12.Lag function: Returns values from a queue. LAG1, LAG2 and LAG3 returns one missing
value and the values of date, cpi and alpha (lagged once).
Syntax: <new or existed var name> = lag (variable);

QS: Make a comment ‘Safe’ for larger value of Sale nor ‘Risk’; No comment if lag
value is not present.
Dataset name msale; variables : date sale
Jan 230 data mstatus;
set msale;
Feb 210
if sale>lag(sale) then comment=‘safe’;
Mar 250 else comment=‘Risk’;
Apr 260 if _n_=1 then comment=' ‘;
May 230 run;
Jun 250 proc print data=mstatus;
run;
13.Retain function: Assign or reassign the values.

Syntax: retain <variable> <argument>; or retain <variable>;
Example 1: Submit the total of data values in column wise.

Example 2: Submit the below result by using retain function.
The RETAIN statement prevents the DATA step from reinitializing CPILAG to a missing value
at the start of each iteration and thus allows CPILAG to retain the value of CPI assigned to it in
the last statement. The OUTPUT statement causes the output observation to contain values of the
variables before CPILAG is reassigned the current value of CPI in the last statement.
To control the duplicates by lag function.

Dataset name duplag; variables: eid descript sale

101 Tester 4500
102 Prager 56000
102 Tester 5000
103 Prager 5000
103 Tester 4500
104 Prager 4500
Aggregate Functions
These Functions for analytical process in row wise by dataset block.
1.Sum function: Generates the sum, it reads missing value;

Syntax: <New Var> = Sum (<var1>, <var2>, <var3>….);
2.Mean function: Generates the Mean;

Syntax: <New Var> = Mean (<var1>, <var2>, <var3>….);
3.Max function: Generates the Maximum;

Syntax: <New Var> = Max (<var1>, <var2>, <var3>….);
4.Min function: Generates the Minimum;

Syntax: <New Var> = Min (<var1>, <var2>, <var3>….);
5.Std function: Generates the Standard Deviation;

Syntax: <New Var> = Std (<var1>, <var2>, <var3>….);
Standard Deviation: To quantify the amount of variation or dispersion of a set of data values. A
low standard deviation indicates that the data points tend to be close to the mean (also called the
expected value) of the set, while a high standard deviation indicates that the data points are
spread out over a wider range of values.

Data Sales;
Input pcode $ jsale Fsale msale;
Datalines;
P101 230 220 225
P102 130 125 140
P103 210 220 215
Run;
Every nth record:
Output every nth record from dataset

data class;
Set sashelp.class;
if mod(_n_, 3)=0 then output;
run;
Note: _n_ is observation number.

_Error_ = 1 means error
_Error_ = 0 means no error
String Functions
1.Length function: It returns the length of the string (Number of characters in string including
space);
Syntax: <New Var> = length (‘value’ /<var>);
2.Index function: It returns position of character or word repeated multiple times then returns
first occurrence only. If character or word not available then returns 0.
Syntax: <New Var> = index(<var> ‘char/word’);

3.Scan function: It returns requested word from the string.
Syntax: <New Var> = scan(<var>, <number>, <dlm>);
Note: If any special character present in string, scan function will be considered as delimiter.
4.Compress function: It removes the character from string.

Syntax: <New Var> = compress(<var>, ‘character’);
Note: To submit without second argument to remove spaces from the string.
5.Translate function: replace the character.

Syntax: 1. <New Var> = translate(<var>, <new character>, <old character>);
2.<New Var> = translate(<var>, <new character-1>, <old character-1>,….
<new character-n>, <old character-n>);

Note: Assign by reading character to character.
Compress Function with extra modifier
Syntax: compress(<,char><,modifier>);
Following characters can be used as modifiers.
a - Compress or Delete all upper and lower-case characters from String.
Ak - Compress or Delete alphabetic characters (1,2,3 etc.) from String.
Kd - Compress or Delete characters(alphabets) from String. (Keeps only digits).
D - Compress or Delete numerical values from String.
i - Compress or Delete specified characters both upper and lower case from String.
k - Keeps the specified characters in the string instead of removing them.
l - Compress or Delete lowercase characters from String.
p - Compress or Delete Punctuation characters from String.
s - Compress or delete spaces from String. This is default.
u - Compress or Delete uppercase characters from String.

Compress Function with extra modifier
data _null_;
File print;
string='StudySAS Blog! 17752.';
string1=compress(string,''); *Compress spaces. This is default;
string2=compress(string,'','ak'); *Compress alphabetic chars(1,2etc);
string3=compress(string,'','d'); *Compress numerical values;
string4=compress(string,'','l'); *Compress lowercase characters;
string5=compress(string,'','u'); *Compress uppercase characters;
string6=compress(string,'S','k'); *Keeps only specified characters;
string7=compress(string,'!.','P'); *Compress Punctuations only;
string8=compress(string,'s','i'); *upper/lower case specified characters;
string9=compress(string,'','a'); *Compress all upper\lower case characters;
string10=compress(string,'','s'); *Compress or delete spaces;
string11=compress(string,'','kd'); *Compress alphabets (Keeps only digits);
put string1= ;
put string2= ;
put string3= ;
put string4= ;
put string5= ;
put string6= ;
put string7= ;
put string8= ;
put string9= ;
put string10= ;
put string11= ;
run;

6.Upcase function: Convert the characters into capital.
Syntax: <New Var> = upcase(<var>);
7.Lowcase function: Convert the characters into small.
Syntax: <New Var> = lowcase(<var>);
8.Substr function: It returns part of string.
Syntax: <New Var> = substr(<var>, <start position>, <no of characters>);
Note: To submit without ‘last argument’ it considers up to end.
data star;
input id name : $ 20.;
first=substr(name, 1, 5);
middle=substr(name, 6, 3);
last=substr(name, 9);
datalines;
1 abcdefghijklmnop
2 qrstuvwxyzabcdef
3 ghijklmnopqrstuv
4 wxyzabcdefmnopqu
run;
9.Concatenation function ( || ): Combine the strings.

Syntax: <New Var> = <var 1> || <var 2>;
10.Trim function: Returns the specific character / Number from the string.
Removes unnecessary spaces from the end of the data values.
Syntax: <New Var> = trim (‘character’);
<New Var> = trim (Number);

TASK: combine and split
Pid drug dose adevents
P001 V1 0.05 headache&eyedis
P002 V2 0.10 fever&eardis
P003 V3 0.15 headache&eardis
Dataset name: mh_ae
Process: Treated patients by drug with their respective dosage based on adevents.
Adevents: ae is firstword & adr is secondword
TASK: Dataset name: dm Process: fname into surname and name
Pid fname gender

P001 Jack sparo Micehlle F
P002 Cane almond Williamson M
P003 Medilla jeo Roawich F

/* or */
TASK: Now combine surname.name from above DATA with uppercase.
TASK: Converting Numeric to character.

TRANWRD function: Replaces all occurrences of a substring in a character string.
Syntax: <New Var> = tranwrd(<var>, <target>, <replacement>);
1. Replacing All Occurrences of a Word:
2. Removing Blanks from the Search String: Remove spaces from end of the data values.
The LENGTH statement pads target with blanks to the length of 10, which causes the
TRANWRD function to search for the character string 'FISH ' in SALELIST. Because the search
fails, this line is written to the SAS log: CATFISH
You can use the TRIM function to exclude trailing blanks from a target or replacement variable.
Use the TRIM function with target:
salelist=tranwrd(salelist,trim(target), replacement);
put salelist;

Now, this line is written to the SAS log: CATNIP
3. Zero Length in the Third Argument of the TRANWRD Function: The results of the
TRANWRD function when the third argument, replacement, has a length of zero. In this case,
TRANWRD uses a single blank. In the DATA step, a character constant that consists of two
consecutive quotation marks represents a single blank, and not a zero-length string.
4. Removing Repeated commas: The TRANWRD function to remove repeated commas in text
and replace the repeated commas with a single comma. In the following example, the
TRANWRD function is used twice: to replace three commas with one comma, and to replace the
ending two commas with a period:
SAS writes the following output to the log:

Mytxt = If you exercise your power to vote,,,then your opinion will be heard,,
Newtext = If you exercise your power to vote,then your opinion will be heard,,
Newtext2 = If you exercise your power to vote,then your opinion will be heard.

COMPBL Function
It compresses multiple blanks to a single blank.
In the example below, the Name variable contains a record "Sandy David". It has multiple
spaces between the first and last name
COMPBL
Data char;
Input Name $ 1-50;
char1 = compbl(Name);
Cards;
Sandy David
Annie Watson
Hello ladies and gentlemen
Hi, I am good
;
Run;
STRIP Function
It removes leading and trailing spaces.
STRIP
Data char1;
Set char;
char1 = strip(Name);
run;

LEFT Function
It removes leading spaces.
LEFT
Data char1;
Set char;
char1 = left(Name);
run;
CAT, CATT, CATS and CATX Function

The CAT, CATT, CATS and CATX functions are used to concatenate character variables in
SAS.
Dataset
data columns;
input col1 & : $ 18. col2 & : $ 22. col3 & : $ 25.;
datalines;
The cat function concatenates character variables.
The cat function concatenates trimmed character variables.
;
run;
The 3 columns are concatenated into one.

Please note that there are leading and trailing spaces in all the COL1, COL2 and COL3 variables.

CAT:
The CAT function concatenates character variables like the concatenation operator (||).
CAT
data columns1 (drop=col1 col2 col3);
set columns;
cat_all=cat(col1, col2, col3);
run;
The spaces are captured in the concatenated variable:
To remove the leading and trailing spaces, we can make use of the CATT and CATS functions.
CATT:
The CATT function is like the CAT function. However, it removes the trailing spaces before
concatenating the variables.
CATT
set columns;
cat_T=catt(col1, col2, col3);
run;

CATS:
The CATS function removes both the leading and trailing spaces before concatenating the
variables.
CATS
set columns;
cat_S=cats(col1, col2, col3);
run;
CATX:
In addition to removing the leading and trailing spaces, the CATX function inserts a delimiter
between the character values when concatenating the variables.
The first parameter in the CATX function is the delimiter. In our example, the space is specified
as the delimiter.
CATX
set columns;
cat_X=catx(‘ ‘, col1, col2, col3);
run;
PROPCASE
Returns the word having uppercase in the first letter and lowercase in the rest of the letter
(sentence format).
PROPCASE
Data char;

Input Name $ 1-50;
char1 = propcase(Name);
Cards;
sandy david
annie watson
;
Run;
FIND
To locate a substring within a string.
Syntax: find(character-value, find-string <,'modifiers'> <,start>)
FIND
data _null_;
file print;
STRING1 = "Hello hello goodbye";
x=FIND(STRING1, "hello");
y=FIND("abcxyzabc","abc",4);
z=FIND("abcxyzabcrtsabc","ABC",4);
put x=;
put y=;
put z=;
run;
Validation-Process
Database Validation: Validating the table name, variable names, data type, labels, informats
and formats.
Data validation: To control incomplete records, (missing values) to handle duplicates and
invalid records.
Structure of the data validation: Validate the structure of data value (character data).

Specification for validation process:
1. Pid must be in 4 letters starts with capital P then data is valid.
2. Site number must be in 4 digits starts with capital S.
3. Gender must F. M.
4. Race must be As, Af.
5. Color must be B, w.
S001 P001 34 f As W 1
S001 p002 15 m Af b 2
s001 p003 26 F AS W 3
S001 p04 23 M AF B 4
Template for report invalid data:
Dataset Variable Value crfno datacheck

Xxxx xxxxxx xxxxx xxxx xxxxx
/* Template prepare */
Data _null_;
File ‘F:\SASFiles\invalid.txt’;
put @5 ‘Dataset’ @15 ‘Variable’ @25 ‘Value’ @35 ‘crfno’ @45 ‘datacheck’;
Run;
/* Upload Invalid data */
Data _null_;
Set valid (keep=pid crfno);
File ‘F:\SASFiles\invalid.txt’ mod;
If length (pid) ne 4 then put @5 ‘valid’ @15 ‘pid’ @25 pid @35 crfno @45
‘pid must be in 4 digits’;
Run;
Data _null_;
Set valid (keep=pid crfno);
If substr (pid,1,1) ne ‘P’ then put @5 ‘valid’ @15 ‘pid’ @25 pid @35 crfno
@45 ‘pid starts with caps P’;
Run;
Data _null_;

Set valid (keep=gender crfno);
If gender not in (‘F’ ‘M’) then put @5 ‘valid’ @15 ‘Gender’ @25 gender @35
crfno @45 ‘Gender must be F or M’;
Run;
/* Footnote preparation */
Data _null_;
put @5 ‘Report generated by SAS 9.4’;
put @50 ‘Signature’;
Run;
Date & Time Functions

Calendar Functions:
1.Day function: It returns day value.
Syntax: <New Var> = day(<var>);
2.Month function: It returns Month value (1-12).

Syntax: <New Var> = month(<var>);
3.Year function: It returns year value.

Syntax: <New Var> = year(<var>);
4.Weekday function: It returns weekdays (Sunday to Saturday) value (1-7).

Syntax: <New Var> = weekday(<var>);
5.mdy function: It returns month date year.

Syntax: <New Var> = mdy(month(<var>), date(<var>), year(<var>));
The dissected weight dates

data takepart;
input subj name & : $ 24. race weight group wt_date : mmddyy8. b_date : mmddyy8.;
wt_mon=month(wt_date);
wt_day=day(wt_date);
wt_yr=year(wt_date);

wt_wd=weekday(wt_date);
format wt_date b_date date9.;
datalines;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 . 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
; run;
proc print data=takepart;
title 'The dissected weight dates';
var wt_date wt_mon wt_day wt_yr wt_wd;
run;
Time Functions:
1.Hour function: It returns hours.
Syntax: <New Var> = hour(<var>);
2.Minute function: It returns Minute value.

Syntax: <New Var> = minute(<var>);
3.Second function: It returns Seconds value.

Syntax: <New Var> = second(<var>);
The time variant

data mh;
input pid sdate : date9. stime : time8. edtime:datetime18.;
shr=hour(stime);
smin=minute(stime);
ssec=second(stime);
format sdate : date9. stime : time8. edtime: datetime18.;
datalines;

101 13feb2003 10:23:45 14mar2003:12:23:45
102 17mar2003 11:12:34 19dec2004:10:23:56
run;
proc print data=mh;
title ‘The time variant';
var stime shr smin ssec;
run;
1.Datepart function: It returns date value from date and time.

Syntax: <New Var> = datepart(<var>);
2.timepart function: It returns time value from date and time.
Syntax: <New Var> = timepart(<var>);
The date & time variant

data mh1;
set mh;
dp=datepart(edtime);
tp=timepart(edtime);
format dp date9. tp time8.;
run;
proc print data=mh1;
run;
3.Today( ) function : It returns date of today.

Syntax: <New Var> = today( );
4.Time( ) function : It returns time of current.
Syntax: <New Var> = time( );
Capture today date & time

data mh1;
set mh;
dp=today();
tp=time();
format dp date9. tp time8.;

run;
proc print data=mh1;
run;
Interval Functions:
1.Intck function: It returns difference between date values in date intervals, month intervals or
year intervals.
Syntax: <New Var> = intck(custom-interval, start-date, end-date, ‘method’);
1.Interval: Specify the name of basic interval, ex: year, day, month
2.Multiple: Specifies an optional multiplier that sets the interval equal to a multiple of the period
of the basic interval type. For example, the interval YEAR2 consists of two-year, or biennial,
periods.
3.custom: Specifies a user-defined interval that is defined by a SAS data set. Each observation
contains two variables, begin and end.
‘Method’:
1. CONTINUOUS: Specifies that continuous time is measured. The interval is shifted based on
the starting date.
‘method’ = ‘C’ or ‘CONT’
2. DISCRETE: Specifies that discrete time is measured. The discrete method counts interval
boundaries (for example, end of month).
‘method’ = ‘D’ or ‘DISC’
data b;
startDay='14Nov2017'd;
Today=today();
Yearscmpt=INTCK('YEAR',startDay,today(),'C');
cmpthalf=INTCK('YEAR2',startDay,today(),'C');
cmpt=INTCK('month',startDay,today(),'d');
half=INTCK('month2',startDay,today(),'d');
format startDay Today date9.;
run;
proc print data=b;
run;
data discrete;

d=INTCK('MONTH','1jan1991'd,'31jan1991'd);
d1=INTCK('MONTH','31jan1991'd,'1feb1991'd);
d2=INTCK('MONTH','1feb1991'd,'31jan1991'd);
run;
proc print data=discrete;
run;
data c;
days=intck('dtday', '01aug2011:00:10:48'dt,
'01feb2012:00:10:48'dt);
put days=;
run;
proc print data=c;
run;
SAS Statement Result

qtr=intck('qtr','10jan95'd,'01jul95'd); put qtr; 2
year=intck('year','31dec94'd, '01jan95'd); put year; 1
year=intck('year','01jan94'd, '31dec94'd); put year; 0
semi=intck('semiyear','01jan95'd, '01jan98'd); put semi; 6
weekvar=intck('week2.2','01jan97'd, '31mar97'd); put weekvar; 7
wdvar=intck('weekday7w','01jan97'd, '01feb97'd); put wdvar; 26
y='year'; date1='1sep1991'd; date2='1sep2001'd; 10
newyears=intck(y,date1,date2); put newyears;
y=trim('year '); date1='1sep1991'd + 300; date2='1sep2001'd - 300; 8
newyears=intck(y,date1,date2); put newyears;
data a;
interval='month';
start='14FEB2000'd;
end='13MAR2000'd;
months_default=intck(interval, start, end);
months_discrete=intck(interval, start, end,'d');
months_continuous=intck(interval, start, end,'c');
output;
end='14MAR2000'd;
output;
start='31JAN2000'd;
end='01FEB2000'd;

output;
format start end date.;
run;
proc print data=a;

run;
TASK:
QS1: Capture operating system date and time
QS2: Make a rtf report of above information.
Example: proc print data=sashelp.air label;
where year(date)=1959;
run;
QS3: Report 3rd month information from Air dataset, library sashelp.
QS4: Report 1959 year, with 4th to 6th month information from Air dataset, library sashelp.
QS5: Report weekdays information from Air dataset, library sashelp.
proc print data=sashelp.air label;
where weekday(date)>=6 and weekday(date)<=7; *where weekday(date) in (6,7);
run;
QS6: Reformat the data by date and time functions. Library: sashelp, dataset: Air.
length weekdays $ 10;
if weekday(date)=1 then weekdays='Sunday';
2.INTNX function: The SAS interval functions INTNX and INTCK perform calculations with
date values, datetime values, and time intervals. They can be used for calendar calculations with
SAS date values to increment date values or datetime values by intervals and to count time
intervals between dates.
The INTNX function increments dates by intervals. INTNX computes the date or datetime of
the start of the interval a specified number of intervals from the interval that contains a given
date or datetime value.
Syntax: <New Var> = intnx(interval, start-value, n, ‘alignment’);

1.Interval: Specify the name of basic interval, ex: year, day, month
2.n: is the number of intervals to increment from the interval that contains the Start-value.
3.alignment: controls the alignment of SAS dates, within the interval, used to identify output
observations. Allowed values are BEGINNING, MIDDLE, END, and SAMEDAY.
For example, the statement NEXTMON=INTNX(’MONTH’, DATE,1) assigns to the variable
NEXTMON the date of the first day of the month following the month that contains the value of
DATE. Thus INTNX(’MONTH’,’21OCT2007’D,1) returns the date 1 November 2007.
For example, six weeks from the week of 17 October 1991. The function
INTNX(’WEEK’,’17OCT91’D,6) returns the SAS date value ’24NOV1991’D.
*Given that you know the first observation is for June 1990, use the INTNX function to compute
the ID variable DATE for each observation;
data uscpi;
input cpi;
date = intnx( 'month', '1jun1990'd, _n_-1);
*Thus _N_–1 is the increment needed from the first observation
date.;
format date monyy7.;
datalines;
129.9
130.4
25.6
35.7
86.7
run;
data uscpi;
input date : date9. cpi;
format date monthbeg midmonth monthend date9.;
monthbeg = intnx( 'month', date, 0, 'beg’ ); *Using alignment ‘beg’;
midmonth = intnx( 'month', monthbeg, 0, 'mid’ ); *Using alignment ‘mid’;
monthend = intnx( 'month', date, 0, 'end’ ); *Using alignment ‘end’;
datalines;
15jun1990 129.9
15jul1990 130.4
run;

data uscpi;
format date mon07_1 mon07_2 mon07_3 date9.;
mon07_1 = mdy( month(date), 7, year(date) ); *calendar function;
mon07_2 = intnx( 'month', date, 0, 'beg' ) + 6;
mon07_3 = intnx( 'day', date, 6 );
datalines;
15jun1990 129.9
15jul1990 130.4
run;
Computing the Width of a Time Interval

data uscpi;
format date date9.;
width=intnx(‘month’, date, 1)-intnx(‘month’, date, 0);
datalines;
15jun1990 129.9
15jul1990 130.4
15aug1990 131.6
run;
Computing the Ceiling of an Interval

data uscpi;
format date date9. newyear date.;
newyear=intnx(‘year’, date-1, 1,);
datalines;
15jun1990 129.9
15jul1990 130.4
15aug1990 131.6
run;
Counting Time Intervals

data uscpi;
format date date9. d0 d1 date.;
d0 = intnx( 'month', date, 0 ) - 1;
d1 = intnx( 'month', date, 1 ) - 1;
nSunday = intck( 'week.1', d0, d1 );
nMonday = intck( 'week.2', d0, d1 );
nTuesday = intck( 'week.3', d0, d1 );
nWedday = intck( 'week.4', d0, d1 );
nThurday = intck( 'week.5', d0, d1 );
nFriday = intck( 'week.6', d0, d1 );
nSatday = intck( 'week.7', d0, d1 );
datalines;
15jun1990 129.9
15jul1990 130.4
15aug1990 131.6
run;
Errors
Errors will be occurred in two ways:
1. Compile time
2. Execution time
Syntax Errors:
Forget semicolon end of the statement.

Misspelled keyword.
Unfortunately, dataset code use in procedure block or procedure block code use in dataset block.
Semantic Errors:
To send wrong number of Arguments to function.
Data Error: To passed mismatched value or data type for variable. It requires numeric value but
you assign character value.
Logic Error: It will be occurred in conditional statement. Syntax is right but in syntax condition
is wrong.
Propagation Error: Operations to run arithmetic operations by thee operators unfortunately in

data some values are missing then the final result is missing.
Example: .+70=.
Data Management Process

In SAS each transformation controls data management process.
1. Adding process.
2. Combine process
Adding process: Adding the datasets one by one.

Combine process: Adding the datasets side by side.
Adding process:
1. Appending
2. Concatenation
3. Interleaving
Appending Process: Append the data from one dataset to another existed dataset.
Concatenation Process: Capture the data from multiple datasets will be loaded in one new
dataset one by one in sequential order.
Interleaving Process: Capture the data from multiple datasets will be loaded in one new dataset
one by one in sorting order.

Appending Process: Appending can be done between two tables or two datasets. In two datasets
variable names and data types should be same.
Syntax: proc append base=<master-file> data=<transaction-file> <options>;
Appending
data dm1;
input sno $ gno $ pid age gender $;
datalines;
S001 G100 101 23 F
S001 G100 102 34 M
S001 G100 103 25 M
S001 G100 104 21 F
run;
data dm2;
datalines;
S002 G200 201 21 F
S002 G200 202 22 M
S002 G200 203 21 F
S002 G200 204 26 M
run;
proc append base=dm1 data=dm2;
run;
proc print data=dm1;

title 'dataset dm1';
run;
QS1: Transaction dataset contains fewer variables compared to master dataset.

Note: SAS can run process and it will be taken missing values for nonmatching variable (gender)
for transaction data.
Appending
data dm1;
datalines;
S001 G100 101 23 F
S001 G100 102 34 M
S001 G100 103 25 M
S001 G100 104 21 F

run;
data dm3;
input sno $ gno $ pid age;
datalines;
S002 G200 201 21
S002 G200 202 22
S002 G200 203 21
S002 G200 204 26
run;
proc append base=dm1 data=dm3;

run;
proc print data=dm1;

run;
QS2: Master file contains fewer variables compared to transaction file.

Force option: It drops nonmatching variables (race) from the appending process from transaction
dataset.
Appending
data dm1;
datalines;
S001 G100 101 23 F
S001 G100 102 34 M
S001 G100 103 25 M
S001 G100 104 21 F
run;
data dm4;
input sno $ gno $ pid age gender $ race $;
datalines;
S004 G400 401 21 F As
S004 G400 402 22 M Af
S004 G400 403 21 F As
run;

proc append base=dm1 data=dm4 force;
run;
*or;
proc append base=dm1 data=dm4 (drop=race);
run;
QS3: reformat and rename the variables for appending.
Appending
data dm5;
input sno $ gno $ subid age gender;
datalines;
S005 G500 501 26 1
S005 G500 502 21 2
S005 G500 503 25 1
S005 G500 504 26 2
run;
*For gender: 1 for ‘M’ 2 for ‘F’;
/*reformat*/
data dm51;
set dm5;
if gender=1 then g='M';
else g='F';
drop gender;
rename g=gender;
run;
proc append base=dm5 data=dm51 (rename=(subid=pid));

run;
LOG Window
NOTE: Appending WORK.DM51 to WORK.DM5.
WARNING: Variable pid was not found on BASE file. The variable will not be added to

BASE file.
WARNING: Variable subid was not found on DATA file.
WARNING: Variable gender not appended because of type mismatch.
ERROR: No appending done because of anomalies listed above. Use FORCE option to
append these files.
NOTE: 0 observations added.
Concatenation Process: 14000 tables can be used with this method.

Syntax: set <dataset-1> <dataset-2> . . . . . . <dataset-n>;
Appending
data emp1;
input bcode $ dptno $ eid salary;
datalines;
B100 D100 101 2300
B100 D200 102 3400
;
data emp2;
datalines;
B200 D300 201 3200

B200 D400 202 3100
;
data emp3;
datalines;
B300 d500 301 2500

B300 d600 302 3600
;
data employ;
set emp1 emp2 emp3;
run;
Interleaving Process: To upload the data depending upon salary in descending order.
Similar to concatenation process but before going to merge do sorting the datasets either
descending or ascending.

Appending
proc sort data=emp1;
by descending salary;
run;
data employ1;
set emp1 emp2 emp3;
run;
Upload process and update transformation: To replace master file values by the
transformation file or dataset using matching variable.
This process can be called slowly changing dimension (SCD).
SCD1 SCD2 SCD3
SCD1: To maintain current record only.

SCD2: To maintain current with historical data in different order.
SCD3: To maintain current and previous record as a single record.
SCD Process:
Syntax: data <master-dataset>/<new-dataset>;
update <master-dataset> <transaction-dataset>;
By <variable>;
Run;
Appending
/* Master */ /* Transaction */
data emp_salaries; data emp_newsal;
input eid salary; input eid salary;
datalines; datalines;
101 5400 101 6000
102 4500 103 5000
103 3000 105 4000
104 2300 ;
105 2700
;

proc sort data=emp_salaries;
by eid;
run;
proc sort data=emp_newsal;
by eid;
run;
data emp_salaries;
update emp_salaries emp_newsal;
by eid;
run;
Problems:
Missing values are adjusted in transaction file.
In update processing time corresponding value in master file will not be changed. This process
controlled by update-mode option.
updatemode option: missingcheck (default) / nomissingcheck
Syntax: updatemode =nomissing check;
Appending
/* Master */ /* Transaction */
data emp_salaries; data emp_new;
101 5400 101 6000
102 4500 103 5000
103 3000 105 .
104 2300 ;
105 2700
;
by eid;
run;
proc sort data=emp_new;
by eid;
run;
data emp_salaries;
update emp_salaries emp_new updatemode=nomissingcheck;
by eid;
run;

data emp_salaries;
update emp_salaries emp_new updatemode=missingcheck;
by eid;
run;
Nonmatching records or observations existed in transaction file:

In update processing time nonmatching records will be added to master file.
Q) Duplicates are existed in transaction file.
Ans: Update processing time SAS run update process multiple times for corresponding value.
Note: First. And last. Variables control process of the update transformation.
To display backend process of the update transformation.
Appending
data _null_;
File print;
Update emp_salaries emp_new;
By eid;
Put _all_;
Run;
run;

Modify transformation:
Modifications:
1. Replace the data.
2. Manipulate the data.
3. Delete the data.
Modifications can be done by update statement, set statement and modify statement.
Run the modification using set and modify statements:
Set statement: Read the data observation by observation.
Modify statement: Copy the data so modify statement will be taken less processing time
compared to set statement.
Appending
data emp1; data emp2;
101 2000 210 2000
102 4000 202 4000
; ;
/* set statement */
data emp1;
set emp1;
salary=salary+1000;
run;
/* modify statement - efficient */

data emp4;
modify emp2;
salary=salary+1000;
run;

Q) When will use set statement?
Ans: Stores the result in additional copies (new variables and datasets).
Modify statement does not allow additional copies (new variable or new dataset).
/* set statement */
data emp3;
set emp1;
salary=salary+1000;
run;
/* modify statement - efficient */
data emp4;
modify emp2; ?
salary=salary+1000;
run;
Log Window
227 data emp4;
228 modify emp2;
ERROR 416-185: The MODIFY statement requires the MASTER data set to be present
on the DATA statement.
Modify statement with by statement:

To replace the master file values by transaction file using matching variable by the expression.
Syntax: data <master-dataset>;
modify <master-dataset> <transaction-dataset>;
By <variable>;
<expression>;
Run;
Appending
data emp_salaries; data emp_hic;
input eid salary; input eid hic;
101 4000 101 0.5
102 5600 103 0.3
103 2300 105 0.2
104 3400 ;
105 3000
;

by eid;
run;
proc sort data=emp_hic;
by eid;
run;
data emp_salaries;
modify emp_salaries emp_hic;
by eid;
salary=salary+(salary*hic);
run;
Note: Nomatching observations existed in transaction file then modify process will be failed.
Merge
Merge process: It combines the tables. Types of merge:
1. One to one merge without relation.
2. One to one merge with relation.
3. One to many merges with relation.
4. Many to one merge with relation.
5. Many to many merges with relation.
One to one merge without relation:

Syntax: data <dataset>;
merge <dataset-1> <dataset-2> . . . . <dataset-n>;
Run;
One to one merge without relation

data demo1; data demo2;
input sno $ gno $ pid age; input gender $ race $;
S001 G200 101 23 Female Asian
S001 G200 102 45 Male African
S001 G200 103 25 Female Asian
S001 G200 104 30 Male African
; ;

data demo3; data dm;
input color $ weight; merge demo1 demo2 demo3;
datalines; run;
White 69 proc print data=dm;
Black 50 run;
White 50
White 55
;
One to one merge with relation:

merge <dataset-1> <dataset-2> . . . . . <dataset-n>;
by <variable-1> <variable-2> . . . . <variable-n>;
Run;
One to one merge with relation

data demo1; data demo2;
input sno $ gno $ pid age; input pid gender $ race $;
S001 G200 101 23 101 Female Asian
S001 G200 102 45 102 Male African
S001 G200 103 25 103 Female Asian
S001 G200 104 30 104 Male African
; ;
data demo3; proc sort data=demo1;
input pid color $ weight; by pid;
datalines; proc sort data=demo2;
101 White 69 by pid;
102 Black 50 proc sort data=demo3;
103 White 50 by pid;
104 White 55 run;
;
data demo;
merge demo1 demo2 demo3;
by pid;
run;

Note: one to one, one to many, many to one and many to many these relations between the
observations, not between the datasets.
One to many & many to one merge with relation:

Run;
One to many & many to one merge with relation

data mh; proc sort data=demo;
input pid visit $ dose; by pid;
datalines; proc sort data=mh;
101 Visit1 0.05 by pid;
102 Visit1 0.1 run;
103 Visit1 0.15
104 Visit1 0.2 data demo_mh;
101 Visit2 0.05 merge demo mh;
102 Visit2 0.1 by pid;
103 Visit2 0.15 run;
104 Visit2 0.2
;

data mh_demo;
merge mh demo;
by pid;
run;
Many to many merges with relation:

Run;
Many to many merges with relation
data ae; proc sort data=mh;
input pid visit $ aetype $; by pid visit;
datalines; proc sort data=ae;
101 Visit1 Null by pid visit;
102 Visit1 Eyedis run;
103 Visit1 Eardis
104 Visit1 Null data mh_ae;
101 Visit2 Null merge mh ae;
102 Visit2 Eyedis by pid visit;
103 Visit2 Null run;
104 Visit2 Eardis
;
Note: many to many merge processes controlled by the more than one variable.

Work with matching and nonmatching data:
Lookup process: using lookup process can be reported matching and nonmatching data.
Tracking or temporary variable: It controls lookup process and returns two values 1 or 0.
1 = return matching data.

0 = return nonmatching data.
Tracking variable creates in base dataset by ‘in’ operator.
Base dataset: Which dataset is existed with matching data is called base dataset.
Merge work with matching and nonmatching data

data acc_holders; data loan_holders;
input accno name $ acctype $; input accno loanno $ Ltype $ amount;
1101 kumar Saving 1101 H100 House 400000
1102 Kiran Corpo 1103 V001 Vech 50000
1103 pavan saving 1105 H101 House 500000
1104 laxmi Gold 1103 H102 House 400000
1105 Kranth Corpo ;
;
proc sort data=acc_holders;
by accno;
proc sort data=loan_holders;
by accno;
run;
data acc_loans_match acc_loans_nonmatch;
merge acc_holders loan_holders (in=var);
by accno;
if var=1 then output acc_loans_match;
else output acc_loans_nonmatch;
run;
proc print data=acc_loans_match;
title 'matching data';
run;
proc print data=acc_loans_nonmatch;
title 'nonmatching data';
run;

TASK: To report whoever effected by the adverse events and adverse drug reaction.

data ae; data adr;
input pid aetype $; input pid adrtype $;
123 Headac 123 Eyedis
145 hairloss 134 Eardis
189 Skinprb 189 Eardis
198 Skinprb 190 Eyedis
; 234 Eardis
;
proc sort data=ae;
by pid;
proc sort data=adr;
by pid;
run;
/* matching data: report effected by adverse event and adverse drug reaction */
data ae_adr_match;
merge ae(in=a) adr(in=b);
by pid;
if a=1 and b=1;
run;
/* nonmatching data: report effected by adverse event or adverse drug reaction

*/
data ae_adr_nonmatch;
merge ae(in=a) adr(in=b);
by pid;
if b=0 or a=0;
run;
To report nonmatching data from ae

data ae_nonmatch;
merge ae adr (in=a);
by pid;
if a=0;
run;
proc print data=ae_nonmatch;
title ‘effected by adr';
run;
To report nonmatching data from adr
data adr_nonmatch;
merge ae (in=a) adr;
by pid;
if a=0;
run;
proc print data=adr_nonmatch;
title effected by ae';
run;

TASK: To modify transaction dataset before append the data into master dataset.

data plans; data plans_2009;
input pcode $ year plan $; input pcode $ year plan $;
P101 2006 1Min/2Rs P115 2009 1Sec/1p
P104 2006 5Min/3Rs P123 2009 1Min/1Rs
P123 2007 1Min/1Rs P160 2009 5Sec/2p
P155 2008 2Min/2Rs P180 2009 5Min/2Rs
P180 2008 5Min/2Rs ;
;
*/ Sorting */
proc sort data=plans;
by pcode;
proc sort data=plans_2009;
by pcode;
run;
/* Cleaning */
data plans_2009;
modify plans_2009 (in=var1) plans (in=var2);
by pcode;
if var1=1 and var2=1 then remove;
run;
/* Check log window */
/* Loading & Report*/
proc append base=plans data=plans_2009;
run;
proc print data=plans;
run;
/* Creating null dataset */

*/ Dataset existed with variables but not observations */
data dm;

length pid 4 age 3 gender $ 8 rac $ 7;
delete;
run;
TASK: To control conditional blocks using do block, goto statement, and link statement.
data emp_salaries;
input eid $ salary sale;
datalines;
E234 2300 678
E245 4500 456
E456 3000 400
E890 4500 300
E235 7800 580
E267 5000 280
;
/* do block processing */
data emp1;
set emp_salaries;
if sale>=500 then do;
newsalary=salary+2000;
rating ='A+++';
end;
else if sale>=400 and sale<500 then do;
rating='A++';
end;
else do;
rating ='A+';
end;

run;
Goto or link statement: call the label conditional basis. label: run group of statements.
* Not much efficient.
/* Goto or link statement */

data emp2;
set emp_salaries;
if sale>=500 then goto case1;
else if sale>=400 and sale<500 then goto case2;
else goto case3;
case1: Newsalary=salary+2000;
Rating='A+++';
Return;
Rating='A++';
Return;
Rating='A+';
Return;
run;

TASK: Generates unique identifier or sequential number.
/* extraction */
data medi;
infile 'H:\Studies\SAS_Books\Mohan\source\DLM\logicalvar.txt';
input pid $ date: date9. Drug $;
format date date9.;
run;
/* sequential number */
data medi1;
set medi;
sq_no+1;
run;
/* generates visit ids pid wise: (output statement controls the overwriting) */
proc sort data=medi out=medi2;
by pid;
run;
data medi3;
set medi2;
by pid;

if first.pid=1 then visitid=0;
visitid+1; output;
if last.pid=1;
run;
Every 3rd observation Based on Group wise

Where the sum of digits of square of the number is equal to the number. For example, if
the input number is 9, its square is 9*9 = 81 and sum of the digits 8+1= 9. i.e. 9 is a neon
number.
Every 3rd observation Based on Group wise

proc sort data=sashelp.class out=class;
by Sex;
run;
data record;
set class;
by Sex;
if first.sex=1 then sqno=0;
sqno+1;
if mod(sqno, 3)=0 then output;
run;

proc print data=record;
title1 "every 3rd record";
title2 "group wise";
run;
Reverse Number
Program prints reverse of a number, i.e., if input is 951 then the output will be 159.
Syntax: <new var name> = reverse (argument);
Find the Reverse numbers from 120 to 125

data reverse;
do i = 120 to 125 by 1;
j=Reverse(i);
output;
end;
run;

Fibonacci series
A series of numbers in which each number (Fibonacci number) is the sum of the two
preceding numbers. The simplest is the series 1, 1, 2, 3, 5, 8, etc.
Find the Fibonacci series from 1 to 10

data fibanocci (drop=a);
a=0;
do b=1 to 10 by 1;
c=a+b;
a=b;
output;
end;
run;


Sas Book PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Sas Book PDF

Caricato da

Copyright:

Formati disponibili

INDEX

Sno. Topic Page

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

ETL features: In 3 steps: Extract > Transformation > Loading

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

SAS Raw Data

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

*/ Every SAS statement ends with a semicolon. */

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Key to the SAS System

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Basic differences between DATA and PROC steps:

How the DATA Step Works

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

List of SAS Files

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Deletion of user defined libraries from the SAS environment:

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

SAS version SAS ENGINE

Predefined external engines

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

➢ DLM Option: Raw data separated by smicolon(;).

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Syntax: data <new dataset name>;

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Truncover: works like missover.

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Syntax: infile <file> missover;

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

DROP: Exclude variables from the output dataset.

KEEP: Specify the variables for processing to output dataset.

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Read option: used to assign password only for reading only.

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Write option: used to assign password only for writing only.

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Global options for the output window:

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Global options for the explorer/storage window:

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

For tab file (*.txt):

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Advanced list input method:

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

NANDA ACADEMY – NEKSS.COM (WWW.NEKSS.COM)

Create a TITLE for dataset

/ Every SAS statement ends with a semicolon. /