I want to compare these two xml files:

File1.xml:

<ngs_sample id="40332"><workflow value="salmonella" version="101_provisional" /><results><gastro_prelim_st reason="not novel" success="false"><type st="1364" /><type st="9999" /></gastro_prelim_st></results></ngs_sample>

File2.xml:

<ngs_sample id="40332"><workflow value="salmonella" version="101_provisional" /><results><gastro_prelim_st reason="not novel" success="false"><type st="1364" /></gastro_prelim_st></results></ngs_sample>

I've used xmldiff to compare a.xml with b.xml:

def compare_xmls(observed,expected):from xmldiff import main, formattingformatter = formatting.DiffFormatter()diff = main.diff_files(observed,expected,formatter=formatter)return diffout = compare_xmls(a.xml, b.xml)print(out)

OUTPUT:

[delete, /ngs_sample/results/gastro_prelim_st/type[2]]

Anyone know how to identify what is the difference between the two xml files, i.e. what has been deleted compared to the file b.xml. Anyone recommend any other way of comparing xml files in python?

3

Best Answer


Use the xmldiff to perform this exact task.

main.py

from xmldiff import maindiff = main.diff_files("file1.xml", "file2.xml")print(diff)

output

[DeleteNode(node='/ngs_sample/results/gastro_prelim_st/type[2]')]

You can switch to the XMLFormatter and manually filter out the results:

...# Change formatter:formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)...# after `out` has been retrieved:import refor i in out.splitlines():if re.search(r'\bdiff:\w+', i):print(i)# Result:# <type st="9999" diff:delete=""/>

Another option is use xml2 https://github.com/clone/xml2 (and something like bash process substitution)

$ diff --color <(xml2 < File1.xml) <(xml2 < File2.xml)

7,8d6< /ngs_sample/results/gastro_prelim_st/type< /ngs_sample/results/gastro_prelim_st/type/@st=9999