Subsetting in SAS [Vertical and Horizontal] - USE OF IF, WHERE, KEEP and DROP

Subsetting In SAS

There are two types of subetting in SAS.
1. Vertical
2. Horizontal

Vertical subsetting- 

When we want less number of variables from a perticuler dataset. its called Vertical subsetting.


There are two ways to do vertical subsetting.

i)  Keep- it keeps variables specified with the dataset name.
ii) Drop- it drops varibles specified with dataset name.

Example-

data a(Keep=id name sex);
set a;
run;

In this code we will keep the id name and sex from a dataset and other variables will be removed.

data a(drop=id);
set sasuser.admit;
run;

This code will drop the id from a dataset, other than id all variables will be available in dataset 'A'.

Keep and drop are also known as dataset options.

Horizontal subsetting- Horizontal subsetting is like filters in Excel (we want data from a dataset with a perticular condition).

Example-

data a;
set sasuser.admit;
where age gt 25;
run;


data a;
set sasuser.admit;
if age gt 25;
run;


Both if and where gives same output, while where is faster than If.

Reason- if checks condition post buffer. while where checks condition prebuffer.

data a;
set sasuser.admit;
where age gt 25;
run;

In above example of Where SAS allocate buffer memory, in which admit dataset from sasuser library is copying, at the time of copying data from dataset to buffer where applies its condition and do not copy that data which is failing condition that resulting the data in buffer with the condition. then it copy from buffer to 'A' dataset of work library

data a;
set sasuser.admit;
if age gt 25;
run;

In above example of If SAS allocate buffer memory, in which admit dataset from sasuser library is copying completly at the time of copying data from buffer to a dataset of work library if applies its condition and do not copy that data which is failing condition that resulting the data in A dataset with the condition.

We have to use if in one condition for sure if we are creating a new variable than we cant use where condtion on that variable.

Example-

data a;
set sasuser.admit;
age_m = age*40;
where age_m gt 400;
run;

This code will not work. because condition is applies pre buffer while age_m is creating in buffer so age_m is not available for condition and it will hit error.
here we have to use like-

data a;
set sasuser.admit;
age_m = age*40;
if age_m gt 400;
run;

Above code will work.

data a;
set sasuser.admit;
age_m = age*40;
where age*40 gt 400;
run;

This code will also work.

We can rename variable in a dataset as well.

Example-

data a;
set sasuser.admit;
rename name=SAS_name;
run;

result dataset will be showing the new Variable name i.e. SAS_name

Operator used in conditions

  • gt or >
  • lt or <
  • le or <=
  • ge or >=
  • eq or =
  • nq or ^=