Arrays, Labels, Formats and Functions

Objectives of this Chapter:

  • Learn how to use KEEP and DROP Statements
  • Learn ARRAY Variables
  • Learn how to LABEL and FORMAT data.

KEEP and DROP statements.

KEEP and DROP statements are used often to control the number of variables (fields) read into and output into the datasets.  During the data processing we create several variables but need to save only select ones in the final dataset.

If you want to restrict the number of columns in output data set, use the following method. This will ensure that output dataset is created with required variables only.


Data target_data (keep = var1 var2 var3 etc);
Set base_data;
Run;

Alternately you can specify the first statement as follows:


Data target_data ;
Set base_data;
keep = var1 var2 var3 ;
Run;

If you are reading a big dataset into SAS and require only a few variables from it,  use the following statements in the program.

Data target_data ;
Set base_data((keep = var1 var2 var3 etc);
Run;


In the first case, SAS reads the entire data set base_data, even though you only intend to use three variables. In the second case, SAS reads from disk only the three variables you intend to keep. Please note that we have to use such efficient methods to restrict the data read into the system   to optimize the system resources such as SASWORK and shared drives.

The same way DROP statement can also be specified based on the data requirement of the user.

Array Variables 

Unlike other programming languages, SAS array variable is a set of similar variables grouped together   with a  name in an ARRAY statement. We use arrays very often when we work with datasets that are arranged as a Time Series. Let us look at simple program to understand.

Data array_test;
Set weight_data;
array weight{50} wt1-wt50;
do i=1 to 50;
if weight{i}=999 then weight{i}= .;
end;
run;


As we have seen earlier one of the basic use of array is to group similar variables . In the above example ‘weight_data has weight measured at fifty different growth stages and wherever data is missing that data point is updated with 999. Assume it’s a very large data set and we want to replace all 999 with a ‘.’ for missing. If we are to do this data step, we will have to write 50 statements ( if wt1 =999 then wt1=. ; like that for each column)  . Using array variable we can simplify this.

Line 3:  Here ‘array’ is the key word that tells SAS the following name (weight) is an array. The closed bracket {50} specifies the number of elements in the array followed by the values of array elements*wt1, wt2, wt3….wt50). For example weight{40} = ‘wt40’

Line 4: Do loop 1 to 50  to hold the record to scan through 50 different fields specified.

Line 5:  Remember weight{1} will have a value wt1 and that field will be evaluated in the following IF condition to check whether it has a value of 999.  Similarly when i=2, weight {2} will have a value of wt2. Like this 50 times the loop is executed for each row of data and the update is made for fifty fields (columns)

Line:6 Ending the loop .

End of the execution, all the fields are evaluated and updated with ‘.’ for 999.

Now think about substituting fields with missing. Values with some other value.

Time Series Example

Now let us look at an example to know the way array is used in our environment (This program does not run as it requires some datasets)

Data Balance_due;
set TS_49;

array st_month {6} stmonth15 stmonth25 stmonth35 stmonth45 stmonth55;
array bal_due {6} baldue15 baldue25 baldue35 baldue45 baldue55 ;

do i=1 to 5;
Statement_month =st_month (i);
balance_due = bal_due(i);
output;
end;
run;

Proc summary data = Balance_due;
class Statement_month;
var balance_due ;

run;

The above program reads a dataset (TS_49) where the variables are arranged in a time series format. (For example, baldue15 stands for balance due for the month of Jan 2005 and so on)  The user wants to see the balance due for all accounts  by statement month  for 5 specific months , using SAS ARRAY (s)he  could achieve that easily. Further, (s)he could use PROC summary to do the summarization as data is now in a format that PROC Summary understands.


No let us look at the ARRAY statement.

array st_month {5} stmonth15 stmonth25 stmonth35 stmonth45 stmonth55

Here ‘array’ is the key word that tells SAS the following name (st_month) is an array. The closed bracket (5) specifies the number of elements in the array followed by the values of array elements. For example st_month(5) = ‘stmonth55’

In other words, ARRAY help us to group a set of variables so that programming could be made short and flexible. If we need to do the same action repetitively on same group of variables, declaring an array would solve the same.

Variable LABEL s

A SAS Label describes a variable. When labels are assigned in the data step they are available for all procedures that use that data set. When we produce reports we could print labels instead of variable names.
Let us modify our earlier example:


Data var_test;
length id $ 10;
length NProducts 4;
id ='JK';
NProducts= 6;
pro_price = 4.55;
tot_cost = NProducts*pro_price;
final_price = tot_cost;

LABEL       id ="Vendor Name"
            NProducts ="Number of Products"    
            pro_price = "Product Price"
            tot_cost ="Total Cost"
            final_price ="Final Price"
            ;
run;
proc contents data =var_test;run;

Let us look at the output of PROC Contents:


   -----Alphabetic List of Variables and Attributes-----

  #    Variable       Type      Len    Pos    Label
  -------------------------------------------------------------------
  2    NProducts     Num      4        24      Number of Products
  5    final_price     Num      8        16      Final Price

  1    id             Char     10      28      Vendor Name

  3    pro_price      Num      8        0        Product Price
  4    tot_cost        Num      8        8        Total Cost

The PROC output now show ‘Label’, which gives more information about the variable.
When you use PROC PRINT or some other Procedures you can print the labels instead of variable names. For example the code below will print labels for the variables in var_test. 


Proc print data = var_test label;
run;
Additionally, the labels can be created with PROC FORMAT procedure, which is not discussed in this guide.

 

Variable FORMAT s

A format is an instruction that SAS uses to write data values. You use formats to control the written appearance of data values. Note that a format does not change the original value of a variable. Since formats are primarily used to format output, we will look at how we can use existing SAS internal formats using the FORMAT statement in PROCs and Data steps.

 
The following example shows how to use a format statement in a data step.

data format_test;
num_test = 1250;
today =today();
dollar_amt =13400.5;
Name = 'J KURIAN';
format num_test words40. name $reverj7.   dollar_amt dollar10.2  today monyy7.;
;
run;
proc print data = format_test;run;

The above example  formats four variables – num_test with words40., name with reverj7., dollar_amt with dollar10.2 and today with monyy7. formats. Please have a look at the following table to understand what each format does to the display of the variable. The below list is not exhaustive- Do consult the SAS manual for a complete list of formats.

Syntax for the format statement is:
Format <variable name> <format name> ;

There are three categories of formats. Character, date and time and numeric. Lists of frequently used formats are provided below.


Category

Format

Description

Character

$CHARw.

Writes standard character data


$QUOTEw.

Writes data values that are enclosed in double quotation marks


$REVERJw.

Writes character data in reverse order and preserves blanks


$UPCASEw.

Converts character data to uppercase


$w.

Writes standard character data

Date and Time

DATEw.

Writes date values in the form ddmmmyy or ddmmmyyyy


DATETIMEw.d

Writes datetime values in the form ddmmmyy:hh:mm:ss.ss


DAYw.

Writes date values as the day of the month


DDMMYYw.

Writes date values in the form ddmmyy or ddmmyyyy


DDMMYYxw.

Writes date values in the form ddmmyy or ddmmyyyy with a specified separator


MMYYxw.

Writes date values as the month and the year and separates them with a character


MONNAMEw.

Writes date values as the name of the month


MONTHw.

Writes date values as the month of the year


MONYYw.

Writes date values as the month and the year in the form mmmyy or mmmyyyy


QTRw.

Writes date values as the quarter of the year


QTRRw.

Writes date values as the quarter of the year in Roman numerals


WEEKDATEw.

Writes date values as the day of the week and the date in the form day-of-week, month-name dd, yy (or yyyy)


WEEKDAYw.

Writes date values as the day of the week


WORDDATEw.

Writes date values as the name of the month, the day, and the year in the form month-name dd, yyyy


YEARw.

Writes date values as the year


YYMMxw.

Writes date values as the year and month and separates them with a character

Numeric

BESTw.

SAS chooses the best notation


COMMAw.d

Writes numeric values with commas and decimal points


COMMAXw.d

Writes numeric values with periods and commas


DOLLARw.d

Writes numeric values with dollar signs, commas, and decimal points


FLOATw.d

Generates a native single-precision, floating-point value by multiplying a number by 10 raised to the dth power


NUMXw.d

Writes numeric values with a comma in place of the decimal point


SSNw.

Writes Social Security numbers


w.d

Writes standard numeric data one digit per byte


WORDFw.

Writes numeric values as words with fractions that are shown numerically


WORDSw.

Writes numeric values as words

 

 

 

 

 
Copyright free public information. All trademarks,service marks, logos and names are properties of their respective owners.