Sei sulla pagina 1di 4

PSTAT 130 Final Review

Summarizing Data (shape, center, spread)


PROC MEANS (simple summary stats for numeric variables)-is a way of
producing simple summary statistics. By default for a numeric variable it will by
default give you a simple summary (count, N, stddev, min, max, mean). By and class
statements. And those are used for summarizing subgroup. By statement you need
proc sort and class you dont need the data to be presorted. Why do we use a by
statement? Its all about effiecieny. If it is presorted it is more efficient than class
statements if the data is large. We can output to a specific file (Out=) Exclude
missing values in calculating statistics. Maxdec in the proc mean statement.
PROC FREQ (frequency counts)-shows the number of missing observations.
Display each distinct value for each variable freq. counts. Gives a
relative/cumulative percentage. Use a table statement to select which variables your
looking at. Goes through all the variables.
PROC TABULATE (multi-dimensional tables with summary stats) control
table construction with the table statement. Select variables using class
(classification variables) and var (analysis variables). It will either be row or row x
column table. Most cases its a 2 dimensional. No analysis variable specified it will
display counts.
PROC REPORT (listing and summary reports)-gives a interactive window to
customize the look of your report without changing the data. Summarizing the data
or listing the data. Similar to proc print but with much more features. Use a column
statement to select variables. Create cross tabular reports and subtotals w/o
grandtotal or with grandtotal. Automatically sort the values. Use the define
statement to specify the order in which the variables to appear as well as formats
etc (how the variables are used, order variable, format, label/column header) which
are temporary. Suppress the window use nowindows or nowd. Character values are
display variables and left justified and numeric values (analysis variables) are right
justified.
Break statement creates a total at the end of the group.
Rbreak creates a grand total at the beginning or at the end
Rbreak before/summarize dol dul;
Proc Datasets to permanently change the labels and variables names. ( Midterm 1)

ODS: Output Delivery System


By default we were outputting them in results viewer. We can have them in a listing
format.
Preset styles;
Some of the formats we used were

PSTAT 130 Final Review


-Ods Html file= ; -Route all statements into the html file until you specify ods close
or you open another ods section.
Ods pdf
Ods csvall
Proc Template;
Run;
(displays all templates)
Charts
Hbar, pie, vbar by using proc gchart.
Chart variable determines the number of bars/slices and can be character or
number. By default it gives number counts. Use discrete option to show all the bars
that have been grouped. Use sumvar to specify the analysis variable. The only
statistics it calculates for the analysis variable is mean or sum.
Use the explode option to expand a portion of the pie chart. Use fill=x or keep it like
that.
Plots
Use proc gplot. Plot vert. variable* hori. Variable.
Define the vaxis and haxis with scales.
GOPTIONS
Use graphics options to specify the file type in the html file. These are global
till you reset all the options.
TITLE & FOOTNOTE options
Height, font, color.
SYMBOLn:
Specify value I width color.
Possible values for I is join spline needle rl or rlcln
Cancel out symbol statements using symboln; or use goptions to reset
symbols(gopoptions reset=symbols)
OUTPUT statement-Before by default the data step would output at the end of it.
Output statement create multiple records or multiple output datasets.
PROC SORT- creates additional variables that is first and last variables for each of
your by groups.
RETAIN statement- Looking at the data step we process each record at a time. When
we use a retain statement we are keeping some of the records.

PSTAT 130 Final Review


DROP=, KEEP= These are permanent when sas is reading the input. We cant use the
variables if we use drop=.
SUM statement
Keeps a running total for variable. Variable+expression; automically retains the
value for the variable. Initializes variable to zero. Ignore missing values.
Accumulating totals for BY groups
Set accumulator to 0, incremented the variable with a sum statement, and output
the last observation
Write data to an Excel file
Use ods html file=filename.xls;
Ods html close;
Write data to a CSV file
Ods CSVALL file=filename.csv;
ODS CSVALL close;
READ from / WRITE to external file
Use input to read from external file and put to write to external file
DATA _NULL_
To tell it not to create a sas dataset;
COLON modifier; In list input sas will only read up to character of 8 characters in
length. The : statement tells sas to read till the space and also specify informats in
list input. Default delimiter in sas is the blank space.
INFILE statement options
Dlm= delimiter dsd (tells sas that a double delimiter indicates missing variables
and delimiters within quotes are not treated as delimiters) missover.
For a character variable thats missing SAS represents it using a blank. For an
numerical variable is represented by a dot.
Single trailing @ modifier- use to tell sas to hold the date for further processing.
Double trailing @@ modifier
Variable lists- name range list, name consecutive list, special cases _Numeric_
_Character_ , _All_
Name consective list- looking at variables next to each other. Using two dashes so
you can specify AgeGender or Age-Numeric-Gender to get the numeric variables
in that range. Or Age-Character-Gender
SAS functions:

PSTAT 130 Final Review


LENGTH(string);
INDEX(string, target);
SUBSTR(string, start <,length>);
SCAN(string, n, <,delimiters>); two or more delimiters are treated as a single
delimiter. If n is negative it starts at the end. If n is positive it starts at the beginning.
|| - concatenate the strings together.
TRIM()
ROUND()
CEIL()
FLOOR()
INT()
INPUT(source, informat): character to numeric conversion
PUT(source, format): numeric to character conversion
DO loops
DO-END
Iterative DO
DO WHILE
DO UNTIL
+ Midterm Review Topics (See separate file)

Potrebbero piacerti anche