A SAS Macro/R Code Comparison: Generating Multiple Datasets by Tristan Hayes

|

This is a short comparison of SAS and R code in the context of generating multiple datasets. In this example, the mpg dataset from R is used to show how one might use a SAS Macro to subset the data by car class, followed by the R equivalent. You will see that in SAS, the macro is a little complicated; however, in R, the macro only takes a few lines to accomplish the same task.

First, the SAS Code.

*Get the Unique list of car classes;
proc sql;
 create table class as select distinct class
 from mpg;
 quit;

%macro Subset_Data();
*Open the list of car classes, and go from the first item to the last in the list;
%do i= 1 %to &SYSNOBS.;
 *&SYSNOBS takes on the value of the length last dataset processed by SAS;
 data null;
  set class;
  if n=&i.;
  call symput(‘name’, class);
  *symput saves the value of class as the macro variable &name.;
  run;

*Subset the data;
 data dataset_&name.;
  set mpg;
  where class=“&name.”;
  run;
%end;

%mend Subset_Data;
%Subset_Data;

Now the R Code

Here the R equivalent to the do loop is a for loop. The proc sql code used to generate a unique list of car classes is replaced by simply unique(mpg$class).

library(ggplot2)
data(mpg)
for (var in unique(mpg$class)) {
  assign(paste("dataset",var,sep="_"), mpg[which(mpg$class == var), ])
}

 

With the SAS code, the naming of the datasets was handled by loading a dataset and passing the value for a specific observation to the macro. In R, the assign function handles naming the dataset generated by the code mpg[which(mpg$class == var), ].