The core of my question is - can I use PROC SQL, PROC DATASETS or some other command to directly modify meta data of one data set, based on data in another dataset? If yes, can you tell me more about what code to use or what search terms to use in trying to figure it out?

Here is the background and detail -I am receiving research results data exports (Excel) from a research partner regularly, and they have also provided me a codebook in Excel. The results data files always contain the same fields, but their data tables need a lot of cleaning. I need to develop a process to transform the results data into a usable SAS dataset, including labeled variables and a syntax file that will create and apply user-defined formats. I want to do this as syntax-efficiently as possible. I would ideally like to ask SAS to change the metadata/variable attributes (the label, specifically) in the results dataset, based on the Variable Name * Variable Label combos found in the codebook dataset. One way of doing this, as I see it, is to import both Excel files into SAS and then use a Concatenate function to write syntax in the form of a new character variable, and then use the text of that new variable in the data step of syntax run on the results dataset. Specifically:

*Use concatenate function to create variable containing syntax text;data codebook_edit;set codebook;format Syntax_Label $2000.;Syntax_Label=cat("label ", trim(Field_Name), " = '", trim(Field_Label), "';");run;

This then produces a character variable with rows that looks like this:

label race = 'Race Ethnicity';label sex = 'Sex assigned at birth';label gender = 'Gender identity';

I would then run this:

*Get syntax to label vars;proc print data=codebook_edit noobs;var Syntax_Label;run;*Apply labels;data results_edit;set results;label race = 'Race Ethnicity';label sex = 'Sex assigned at birth';label gender = 'Gender identity';run;

However, I'd like to know if there is a better way of doing this, since I have hundreds of variables and would rather not create such a long syntax file. For example, can I use PROC SQL or PROC DATASETS or some other command to directly modify meta data in the research results data set, based on data from the codebook and if yes, can you offer any tips? Thank you

3

Best Answer


Use the metadata to write the code. So if you have metadata dataset is named CODEBOOK and has variables named FIELD_NAME and FIELD_LABEL then you can write the LABEL statement like this:

filename code temp;data _null_; set codebook end=eof;file code;if _n_=1 then put 'label';put @3 field_name '=' field_label :$quote. ;if eof then put ';' ;end;

You can then use %INCLUDE to add that label statement where you want to run it. For example as part of a PROC DATASETS step. So if the dataset you want add the labels to is called MYDATA in the MYLIB library then run this PROC step.

proc datasets lib=mylib nolist;modify mydata ;%include code / source2;run;quit;

You can use similar code generation for other changes you might want to make.

You did well in generated variable Syntax_Label, which stores an excutable SAS statment. Now, it is time for the expression inside the variable to be parsed by SAS. I recommand you the call execute() routine.

call execute() resolves the argument, and issues the resolved value for execution at the next step boundary. And most importantly for you, this routine allows string or character variable as its arguments.
Let's see an example.

data test;do gender='F','M';output;end;run;data _null_;set test;call execute(cat('data class_',gender,';set sashelp.class;where sex="',gender,'";run;'));run;

With running this code, SAS doing three things:

  1. Generate a dataset named test with 2 rows, the 1st row of variable gender is F and the 2nd row is M;
  2. In data _null_, SAS parses argument of call execute, and stores the result of parsing in memory, then over the data _null_ setp;
  3. SAS start execute the parsing result, which is:
data class_F;set sashelp.class;where sex="F";run;data class_M;set sashelp.class;where sex="M";run;

Now you can see data class_F and class_M are lying in work library.
Even more surprising, you can use call execute conditionally because it is a routine, this may help you design more complex and flexible program.

As for you question, I think it is easy now for you to get the following solution:

data codebook_edit;set codebook end=eof;format Syntax_Label $2000.;Syntax_Label=cat(trim(Field_Name), " = '", trim(Field_Label), "'");if _n_=1 then call execute('data _null_; set codebook_edit; label ');call execute(Syntax_Label);if eof then call execute('; run;');run;

I also move your keyword label into the 1st call execute to make code clean.

And an alternative approach entirely.Instead of generating the multiple label statements create a data set that has the old label and new label. Read those into a macro variable and then apply them.

*Create label data set;data label_data_set;length name label $25.;name="Sex"; label="Gender"; output;name="height"; label="Height (in)"; output;name="weight"; label="Weight (lbs)"; output;run;*Create sample dataset to apply label;data class;set sashelp.class;run;*Create macro variable that holds label statement;proc sql noprint;select catx("=", name, quote(trim(label)))into :label_list separated by " "from label_data_set;quit;*Display macro variable in log;%put &label_list.;*Apply labels without recreating dataset;proc datasets library=work;modify class;label &label_list.;run;quit;*Print the dataset to display new labels;proc print data=class label noobs;run;

You can use a similar methodology for renaming or applying formats.And applying these using PROC DATASETS is more efficient than a new data step as it doesn't recreate the data, just updates the metadata.