PROC MEANS (simple summary stats for numeric variables)-is a way of producing simple summary statistics. By default for a numeric variable it will by default give you a simple summary (count, N, stddev, min, max, mean). By and class statements. And those are used for summarizing subgroup. By statement you need proc sort and class you dont need the data to be presorted. Why do we use a by statement? Its all about effiecieny. If it is presorted it is more efficient than class statements if the data is large. We can output to a specific file (Out=) Exclude missing values in calculating statistics. Maxdec in the proc mean statement. PROC FREQ (frequency counts)-shows the number of missing observations. Display each distinct value for each variable freq. counts. Gives a relative/cumulative percentage. Use a table statement to select which variables your looking at. Goes through all the variables. PROC TABULATE (multi-dimensional tables with summary stats) control table construction with the table statement. Select variables using class (classification variables) and var (analysis variables). It will either be row or row x column table. Most cases its a 2 dimensional. No analysis variable specified it will display counts. PROC REPORT (listing and summary reports)-gives a interactive window to customize the look of your report without changing the data. Summarizing the data or listing the data. Similar to proc print but with much more features. Use a column statement to select variables. Create cross tabular reports and subtotals w/o grandtotal or with grandtotal. Automatically sort the values. Use the define statement to specify the order in which the variables to appear as well as formats etc (how the variables are used, order variable, format, label/column header) which are temporary. Suppress the window use nowindows or nowd. Character values are display variables and left justified and numeric values (analysis variables) are right justified. Break statement creates a total at the end of the group. Rbreak creates a grand total at the beginning or at the end Rbreak before/summarize dol dul; Proc Datasets to permanently change the labels and variables names. ( Midterm 1)
ODS: Output Delivery System
By default we were outputting them in results viewer. We can have them in a listing format. Preset styles; Some of the formats we used were
PSTAT 130 Final Review
-Ods Html file= ; -Route all statements into the html file until you specify ods close or you open another ods section. Ods pdf Ods csvall Proc Template; Run; (displays all templates) Charts Hbar, pie, vbar by using proc gchart. Chart variable determines the number of bars/slices and can be character or number. By default it gives number counts. Use discrete option to show all the bars that have been grouped. Use sumvar to specify the analysis variable. The only statistics it calculates for the analysis variable is mean or sum. Use the explode option to expand a portion of the pie chart. Use fill=x or keep it like that. Plots Use proc gplot. Plot vert. variable* hori. Variable. Define the vaxis and haxis with scales. GOPTIONS Use graphics options to specify the file type in the html file. These are global till you reset all the options. TITLE & FOOTNOTE options Height, font, color. SYMBOLn: Specify value I width color. Possible values for I is join spline needle rl or rlcln Cancel out symbol statements using symboln; or use goptions to reset symbols(gopoptions reset=symbols) OUTPUT statement-Before by default the data step would output at the end of it. Output statement create multiple records or multiple output datasets. PROC SORT- creates additional variables that is first and last variables for each of your by groups. RETAIN statement- Looking at the data step we process each record at a time. When we use a retain statement we are keeping some of the records.
PSTAT 130 Final Review
DROP=, KEEP= These are permanent when sas is reading the input. We cant use the variables if we use drop=. SUM statement Keeps a running total for variable. Variable+expression; automically retains the value for the variable. Initializes variable to zero. Ignore missing values. Accumulating totals for BY groups Set accumulator to 0, incremented the variable with a sum statement, and output the last observation Write data to an Excel file Use ods html file=filename.xls; Ods html close; Write data to a CSV file Ods CSVALL file=filename.csv; ODS CSVALL close; READ from / WRITE to external file Use input to read from external file and put to write to external file DATA _NULL_ To tell it not to create a sas dataset; COLON modifier; In list input sas will only read up to character of 8 characters in length. The : statement tells sas to read till the space and also specify informats in list input. Default delimiter in sas is the blank space. INFILE statement options Dlm= delimiter dsd (tells sas that a double delimiter indicates missing variables and delimiters within quotes are not treated as delimiters) missover. For a character variable thats missing SAS represents it using a blank. For an numerical variable is represented by a dot. Single trailing @ modifier- use to tell sas to hold the date for further processing. Double trailing @@ modifier Variable lists- name range list, name consecutive list, special cases _Numeric_ _Character_ , _All_ Name consective list- looking at variables next to each other. Using two dashes so you can specify AgeGender or Age-Numeric-Gender to get the numeric variables in that range. Or Age-Character-Gender SAS functions:
PSTAT 130 Final Review
LENGTH(string); INDEX(string, target); SUBSTR(string, start <,length>); SCAN(string, n, <,delimiters>); two or more delimiters are treated as a single delimiter. If n is negative it starts at the end. If n is positive it starts at the beginning. || - concatenate the strings together. TRIM() ROUND() CEIL() FLOOR() INT() INPUT(source, informat): character to numeric conversion PUT(source, format): numeric to character conversion DO loops DO-END Iterative DO DO WHILE DO UNTIL + Midterm Review Topics (See separate file)