One-to-many merge in SAS (with mismatching records)

Question

I'm trying to do a merge in SAS which seems like it should be easy, but thus far haven't found anything that points to how to do it. I have two datasets - let's call them Monthly1 and Quarterly2. Monthly1 is the main one, and I want to merge Quarterly2 onto it. The problem is, the latter has a different periodicity, as the name implies. Here is what each looks like:

MONTHLY1:

id,month,year,qname,data1,data2,data3,data41111,01,2018,First Quarter 2018,27,33,551111,02,2018,First Quarter 2018,28,34,541111,03,2018,First Quarter 2018,28,37,511111,04,2018,Second Quarter 2018,28,30,591112,01,2018,First Quarter 2018,1,7,121112,02,2018,First Quarter 2018,2,7,101112,03,2018,First Quarter 2018,5,6,71112,04,2018,Second Quarter 2018,4,1,8

QUARTERLY2:

id,qname,data4,data51111,First Quarter 2018,53,71111,Second Quarter 2018,58,91112,First Quarter 2018,7,71112,Second Quarter 2018,7,9

Merging on id and qname. So for each month in MONTHLY1, data 4 is overwritten using the value in QUARTERLY2, and data 5 is added as a new variable. The final dataset should look like (I've added a * where data from QUARTERLY2 is either added or overwritten):

MERGED3:

id,month,year,qname,data1,data2,data3,data4,data51111,01,2018,First Quarter 2018,27,33,53*,7*1111,02,2018,First Quarter 2018,28,34,53*,7*1111,03,2018,First Quarter 2018,28,37,53*,7*1111,04,2018,Second Quarter 2018,28,30,58*,9*1112,01,2018,First Quarter 2018,1,7,7*,7*1112,02,2018,First Quarter 2018,2,7,7*,7*1112,03,2018,First Quarter 2018,5,6,7*,7*1112,04,2018,Second Quarter 2018,4,1,7*,9*

Thanks!

Best Answer

Because SAS will overwrite with same named columns in a merge but only for first matches, consider renaming during the merge. Then, keep the columns you need. Also below demonstrates a left join merge (keeps all rows in Montlhy1 regardless if it matches or not with Quarterly1):

proc sort data=Monthly1; by id qname; run;proc sort data=Quarterly2; by id qname; run;data merged3;merge Monthly1(in=x RENAME=(data4=data4_x)) Quarterly2(in=y);by id qname;if x;keep id month year qname data1-data5; run;

Alternatively, consider a left join merge with proc sql and explicitly keep the columns you intend designated by table aliases. Again, left join is used:

proc sql;create table merged4 asselect m.id, m.month, m.year, m.qname, m.data1, m.data2, m.data3, q.data4, q.data5from Monthly1 mleft join Quarterly2 qon m.id = q.id AND m.qname = q.qname;quit;

Accepted Answer

Usage Note 48705: A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation, states:

In a one-to-many merge with common variables that are not the BYvariables, customers sometimes expect that the value for the commonvariable from the "one" data set will be retained throughout the BYgroup if the "one" variable is listed second in the MERGE statement.It is correct that in a one-to-one merge, and for the first matchingobservation in a one-to-many merge, the value of a common variable inthe latter data set(s) overwrites the value from the previous dataset. However, on subsequent iterations of the MERGE statement for thesame BY group, the "one" data set is not read again. Therefore, theresulting output contains the value of the variable from the "many"data set.

You can change the behavior by renaming a non-by variable so it is not in common -- the new non-common variable value will be maintain across the subsequent many by matches and can be applied.

For example (your data)

Monthly (the many)

data have_monthly;infile cards dlm=',';length id month year 8 qname $30; inputid month year qname data1 data2 data3; data4=_n_*1000; datalines;1111,01,2018,First Quarter 2018,27,33,551111,02,2018,First Quarter 2018,28,34,541111,03,2018,First Quarter 2018,28,37,511111,04,2018,Second Quarter 2018,28,30,591112,01,2018,First Quarter 2018,1,7,121112,02,2018,First Quarter 2018,2,7,101112,03,2018,First Quarter 2018,5,6,71112,04,2018,Second Quarter 2018,4,1,8run;

Quarterly (the one)

data have_quarterly;infile cards dlm=',';length id 8 qname $30;input id qname data4 data5; datalines;1111,First Quarter 2018,53,71111,Second Quarter 2018,58,91112,First Quarter 2018,7,71112,Second Quarter 2018,7,9run;

Default merge

data try (label="The quarter data4 values are not propagated");mergehave_monthlyhave_quarterly;byidqname;run;

Same merge with a rename to ensure non-commonality in non-by variable data4

data want(label="The quarter data4 values are propagated");mergehave_monthlyhave_quarterly (rename=data4=_data4);byidqname;data4 = _data4;run;

One-to-many merge in SAS (with mismatching records)

Best Answer

Random Posts