PROC SORT in SAS: Sorting Data and removing duplicates from Data using SORT procedure

There are many reasons for sorting your data: to organize data for a report, before combining data sets, or before using a BY statement in another PROC or DATA step. Fortunately, PROC SORT is quite simple. The basic form of this procedure is

Syntax:

PROC SORT DATA= OUT= ;

BY variable1,2,3 ...variablen;

run;


The DATA= and OUT= options specify the input and output data sets respectively

Example:

Data cars;
set sashelp.cars;
run;


proc sort data=sashelp.cars out=cars;
by Make;
run;

Proc SORT in SAS


You can specify as many variables in By statement as you wish. 

With one BY variable, SAS sorts the data based on the values of that variable. If more than one variable, SAS sorts observations by the first variable, then by the

second variable within categories of the first, and so on. 

Example:

proc sort data=cars;
by Make Model;
run;

by Make Model;
run;
proc sort data=cars;
by Make Model;
run;
proc sort data=cars;
by Make Model;
run;

Proc SORT in SAS by Datahark


By default Proc sort sorts data in ascending order. To sort data in descending order, add keyword DESCENDING to BY statement before variable that you want to sort in descending 

Example: 

proc sort data=cars;

by Make descending Model;

run;

Proc SORT in SAS


Removing/Eliminating Duplicates from dataset using Proc Sort - 

Proc Sort can also be used for removing duplicate values from dataset using nodup and nodupkey options, Also we can get all the eliminated duplicate row in some other dateset using Dupout option -


The NODUPKEY option tells SAS to eliminate any duplicate observations that have the same values for the BY variables. To use this option, just add NODUPKEY to the PROC SORT statement:

Example

proc sort data=cars nodupkey dupout= duplicate_cars;

by Make;

run;

Nodupkey

Duplicate Values goes in duplicate_cars Dataset

Nodupkey in SAS


The NODUP option tells SAS to eliminate all duplicate row (Checks for entire row irrespective of variables in BY statement).

Example:

Data A;

input cc spend;

cards;

1 200

2 300

3 300

3 300

3 400

4 500

4 600

5 700

;

run;

Nodup in SAS Proc sort


proc sort data=A nodup dupout=b;

by cc;

run;


Values remains in A -

Nodup in Proc sort


Removed duplicate and moved to B - 

nodup in proc sort




Related Posts:

Post a Comment

0 Comments