How to Create SAS Variables

Objectives of this Chapter:

  • Learn about SAS Variables.
  • Learn how to LABEL, RENAME and FORMAT data.
  • Learn SAS commonly used functions for data processing.
  • Learn how to MERGE datasets.
  • Learn conditional processing using WHERE, IF/ELSE and DO-END

The previous chapters demonstrated how a simple SAS program looks like and how to read data from various sources into SAS. We did come across concepts like informats, formats and some procedures in SAS. Let us discuss some of these concepts in detail so that we understand some frequently used functions and methods used for data processing – to clean, format and combine/subset data to make it ready for analytics purposes.

SAS variables

Declaration, assignment, length, keep, drop, array, PROC contents, label, macro variables and scope of variables.


SAS variables are containers that you create within a program to store and use character and numeric values. There are two types of variables–Character and Numeric. Characters are variables of type character that contain alphabetic characters, numeric digits 0 through 9, and other special characters.  Numeric variables are variables of type numeric that are stored as floating-point numbers, including dates and times. Yes SAS stores date and time as Numbers.

To simplify, each and every field/column in a SAS dataset is a SAS variable.

These are the questions we will discuss in this chapter around SAS variables.

    • How to create a variable and type?
    • How to decide length of a variable?
    • How to keep or drop a variable(s) from a dataset?
    • What are array variables and how to declare them?
    • How do we know the type of variables in an already created dataset?
    • How to label  and apply format to a variable?
    • What’s a macro variable and how to declare them?
    • What’s  scope of a variable in a SAS program?

How to create a variable ?


There are four ways we commonly use to create a variable.              

1. Using an assignment statement

2. Using and INPUT statement        

3. Through a LENGTH statement and

4. As a result of a PROC SQL/PROC IMPORT.

There are many other ways too but we limit to these four types, as they are used 90% of the times.

Using an assignment statement

This is the most common form of variable creation. Its not necessary that the variables should be declared well in advance.
Have a look at the following program:


Data var_test;
id ='JK';
NProducts= 6;
pro_price = 4.555;
tot_cost = NProducts*pro_price;
final_price = tot_cost;
run;

proc print data=var_test ;
run;
proc contents data =var_test;run;


The above program creates a dataset ‘var_test’ with five variables. As we have seen earlier, each variable will form a column/field in the dataset. ‘Id’ is assigned with a value of ‘JK’. Nproducts  and pro_price are assigned with numbers where latter is a decimal.  Tot_cost is the variable that takes the value of a product of two other variables.  And finally, final price variable is assigned with another variable in the dataset ie tot_cost.


PROC Print prints the data sets and all variables and this is how the output looks like:

                                     pro_        final_

 Obs    id    NProducts    price   tot_cost     price
 ------------------------------------------------------
  1      JK       6              4.55      27.3        27.3


PROC contents is the procedure used to know the data types, length and label of the variables in a dataset. 

-----Alphabetic List of Variables and Attributes-----

       #    Variable          Type    Len       Pos
       ---------------------------------------------
       2    NProducts         Num      8        0
       5    final_price         Num     8       24
       1    id                     Char     2       32
       3    pro_price          Num      8        8
       4    tot_cost            Num      8       16


These outputs together tell us how the variable creations are done and what values, types and lengths are assigned by SAS.   Now let us discuss some general rules of variable creation by assignment.

In a DATA step, you can create a new variable and assign it a value by using it for the first time on the left side of an assignment statement. SAS determines the length of a variable from its first occurrence in the DATA step. The new variable gets the same type and length as the expression on the right side of the assignment statement.

When the type and length of a variable are not explicitly set, SAS gives the variable a default type and length as shown in the examples in the following table. 


Expression

Example

Resulting Type of X

Resulting Length of X

Explanation

Numeric variable

a=34
x=a;

Numeric variable

8

Default numeric length (8 bytes unless otherwise specified)

Character variable

a=’ABCD’
x=a;

Character variable

4

Length of source variable

Character literal

x='ABC';
x='ABCDE';

Character variable

3

Length of first literal encountered

Practical problems: Many a time the length of the variable is not sufficient to hold the value encountered during the data processing. This will lead to SAS truncating the variable in to the length of the variable created. This problem can be solved with declaring the length of the variable before assignment.

Creating variables with INPUT Statement

We have already seen how to use INPUT to read data into variables. We have also seen how to use SAS informats to tell SAS what kind of data its reading. Below reproduced is one example to show how its done.

DATA acctinfo;                                                          
INPUT acctnum $8. date mmddyy10. amount comma9.;                       
 
CARDS;                                                             
0074309801/15/2001$1,003.59             
1028754301/17/2001$672.05               
3320899201/19/2001$702.77               
0345900601/19/2001$1,209.61         
;
run;
proc contents data =acctinfo;run;

Output :

Alphabetic List of Variables and Attributes

    #    Variable    Type    Len    Pos
     ___________________________

    1    acctnum     Char      8     16

    3    amount      Num       8      8
    2    date           Num       8      0


Here INPUT statement specifies the data type next to the variable and also how many positions (length).

Specifying a New Variable in a LENGTH Statement

In practical situations, when we create new variables, the length of the variable needs to be explicitly defined. For example, when we read two character values successively into a variable, SAS assigns the length of the variable as that of the first. Suppose the second value is longer than the first, SAS reads only up to the length of first variable.  So it’s a good programming practice to declare the variable with a LENGTH statement so that we are sure it can hold all kinds of values the data has.


You can use the LENGTH statement to create a variable and set the length of the variable. Let us modify our earlier example:

Data var_test;
length id $ 10;
length NProducts 4;
id ='JK';
NProducts= 6;
pro_price = 4.55;
tot_cost = NProducts*pro_price;
final_price = tot_cost;
run;
proc contents data =var_test;run;

Output is :
 

Alphabetic List of Variables and Attributes

  #    Variable       Type    Len    Pos
  ______________________________
  2    NProducts       Num       4     24
  5    final_price       Num       8     16
  1    id                   Char     10     28
  3    pro_price         Num       8      0

  4    tot_cost          Num       8      8
 
 

Output shows that now ID variable is a 10-character field so it can hold more characters. Without this explicit declaration, ID field can hold only two characters, as automatically assigned by SAS.

For character variables, you must allow for the longest possible value in the first statement that uses the variable, because you cannot change the length with a subsequent LENGTH statement within the same DATA step. The maximum length of any character variable in the SAS System is 32,767 bytes. For numeric variables, you can change the length of the variable by using a subsequent LENGTH statement.

Creating variables through PROC SQL and PROC IMPORT

We always extract data from data warehouses or import data using SAS import utilities and we find the dataset is created with all columns with various formats. Here what happens during the process is SAS identifies the best format for the database fields you are extracting and apply the same to the datasets.  PROC SQL provides flexibility in formatting the variables. This is discussed separately in the  PROC SQL chapter..

 

Copyright free public information. All trademarks,service marks, logos and names are properties of their respective owners.