I am trying to create a normalized pandas dataframe with addresses and the parsed addresses using 'usaddress' package in python. I would like to be able to store the results from the parsed output in a dataframe.

The output of usaddress.parse looks like below.

 usaddress.parse('Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637')[('Robie', 'BuildingName'),('House,', 'BuildingName'),('5757', 'AddressNumber'),('South', 'StreetNamePreDirectional'),('Woodlawn', 'StreetName'),('Avenue,', 'StreetNamePostType'),('Chicago,', 'PlaceName'),('IL', 'StateName'),('60637', 'ZipCode')]

I have my address fields in the data dataframe. using above example i am trying to add buildingname, addressnumber etc as column names and the corresponding values as values but no luck.

add = []for ind in data.index: add1 = usaddress.Parse(data['address'][ind])add.append(add1) res = pd.DataFrame(add)

In using the above code the res dataframe is not the way i intended the output to be. The intended output is

The image shows the intened output from the dataframe

1

Best Answer


If you have a list of addresses, you can process them all into a dataframe with column names as the address part. Sample code:

addresslist = ["Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637", "123 main st apt 2j miami fl"]addressdictlist = []for address in addresslist:addressdict = {}parsed = usaddress.parse(address)for value, key in parsed:value = value.strip(",")if addressdict.get(key,"") == "":addressdict[key] = valueelse:addressdict[key] = addressdict[key] + " " + valueaddressdictlist.append(addressdict)addressdf = pd.DataFrame.from_dict(addressdictlist)addressdf

And the output looks like this:screenshot of addressdf

I took the liberty of stripping the commas from the address part, but you could do that in pre-processing as well.