Sei sulla pagina 1di 6

SAS Test

The figures in brackets are the marks for each question (out of a total of 100). Please write your answers on this question paper. You have 50 minutes to complete this test. Some marks will be awarded for partially correct answers, so try to answer as many as you can. Please note that this is a difficult test, someone with 5-7 years SAS experience would be expected to achieve around 50-65%. Datasets for questions 1-5 and a few later questions : TEMP
Obs 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 A 4 4 3 3 2 2 1 1 B 3 3 3 3 4 4 4 4 C 3 2 1 2 2 1 3 1 D

TEMP2
Obs 1 2 3 4 1. 9 3 2 1 A 4 6 3 7 B 3 8 3 7 C 1 2 1 2 D

Write a PROC FREQ on dataset TEMP to do a cross-tab of variables A and B. (3)

Ans: proc freq data=temp; table a*b;run;

2.

Write some data step code to merge the datasets TEMP and TEMP2 by variable A, just keeping the values of A that exist in dataset TEMP2, and keeping the values of B, C and D from dataset TEMP. (7)

Proc sort data = temp; by a; run; Proc sort data = temp2; by a; run; Data temp3 ;merge temp (in=aa) temp2 (in=bb); by a; if bb; run;

Repeat question 2, but use PROC SQL (8) Proc sql; create table temp3 as select aa.* from temp aa right outer join temp2 bb on aa.a=bb.a ; Quit;

3.

Write some code to sort TEMP by D and in descending order of B. Then code a data step to count the number of times each unique value of D occurs (i.e. work out that D=1 occurs 3 times etc.) (9)

Proc sort data temp; by d descending b;run; Data temp3; Do count=1 until (last.d); set temp; End do; run;

4.

Write a PROC SUMMARY to work out the mean value of D for each value of C in dataset TEMP, outputting the results to a dataset.. (5)

5.

In a data step, what do the functions INTCK, INTNX, MOD and INDEXC do (give a one sentence description of each) ? (8 : 2 for each) MOD: returns remainder from the division INTCK: gives the count from the given interval e.g. if specify interval as of twp dates returns the nos days b/w interval INTNX: give the data e.g. if specify date and want what will be the data after ndays INTNX reeturns date INDEXC: this gives the position of specified character from given expression.

6.

Why would you use the %GLOBAL statement ? (4)

This creates global macro var and these macro var available during entire sas job %let L=1; %let M=3; %let M1=5; %let VAR=2; %let VAR1=7; %let VAR13=8; 7. What do the following resolve to (2 marks for each) :

&&VAR&L - 7 &&VAR&L&M - error

&&VAR&L.&M - error

&&VAR&L..&M -error &VAR&&M&L error 8. Pick out what you think is the correct piece of code below and state what you think is likely to be wrong with the other two. Note these are 3 separate programs, rather than 3 sequential data steps in one program. (8 : 2 for getting right answer, 3 each for saying what is wrong with the other two) :

Data tempdata; Infile c:\tempfile; Input @1 a $ b $ @@; Date=today(); Monyy=put(date,monyy5.); Output; Run; ====================== Data tempdata; Merge Tempdata (in=x) Tempdata2 (in=y); If x=y; If ranuni(0)>0.5 then abc=1; else abc=0; Run; ======================= Data _null_; Set Tempdata; File c:\tempfile; If _n_ > 1 then put A 3. B 3.; Run; ANS: put is used to char to numeric and in 3 code put is not used properly. 2 is correct

9.

What does the DSD option in an INFILE statement do? How does it differ from DLM ? (5 - if you dont know the answer, then you can score 2 marks for description of DLM)

DSD is used for default delimiter in sas and dlm if you want any custome delimiter e.g. ddsd dle=_

10. Write one line of code to delete everything in the WORK directory. (4) Proc dataset lib=work; Delete _all_;run;

11. Write some macro code to do a PROC FREQ on a dataset called TEMP, running the macro 3 times to produce tables for variables A, B and C. Use variable D to weight the results. The weight, the name of the dataset and the names of the variables should be parameterised, so this macro could be used for any variables on any dataset (9) %macro m1(db,var1,var2); Proc freq data = &db; Table & var1; Weight &var2; Run; %mend m1;

12. When using CONNECT, what functions are used for creating a macro variable on the remote host from a value on the local host, and vice versa (3)

13. For the dataset TEMP (as shown on page 1), write some code using PROC LOGISTIC to use variables B and C to predict the outcome D. (6) Proc logistic data= temp; Model d= b c/expb; Outtest= outputdata;run;

14. For the datasets displayed on page 1, write some code to create a macro variable that contains the number of observations in TEMP2. Use this to create a new dataset from TEMP that contains the number of observations in TEMP2. (you will get no marks for just setting a macro variable to 4, this value must be calculated by the code) (8)

Proc sql; Select count(*) into:var1 From temp2;quit; Data temp; Count=&var1;run;

15. In a PROC SORT, how do the NODUP and NODUPKEY options differ? (3) NODUP: removes the duplicate using combination of all variable NODUPKEY: removes duplicates using the combination of specified var given in by statement

Potrebbero piacerti anche