I want to compare these two xml files:
File1.xml:
<ngs_sample id="40332"><workflow value="salmonella" version="101_provisional" /><results><gastro_prelim_st reason="not novel" success="false"><type st="1364" /><type st="9999" /></gastro_prelim_st></results></ngs_sample>
File2.xml:
<ngs_sample id="40332"><workflow value="salmonella" version="101_provisional" /><results><gastro_prelim_st reason="not novel" success="false"><type st="1364" /></gastro_prelim_st></results></ngs_sample>
I've used xmldiff
to compare a.xml with b.xml:
def compare_xmls(observed,expected):from xmldiff import main, formattingformatter = formatting.DiffFormatter()diff = main.diff_files(observed,expected,formatter=formatter)return diffout = compare_xmls(a.xml, b.xml)print(out)
OUTPUT:
[delete, /ngs_sample/results/gastro_prelim_st/type[2]]
Anyone know how to identify what is the difference between the two xml files, i.e. what has been deleted compared to the file b.xml. Anyone recommend any other way of comparing xml files in python?
Best Answer
Use the xmldiff to perform this exact task.
main.py
from xmldiff import maindiff = main.diff_files("file1.xml", "file2.xml")print(diff)
output
[DeleteNode(node='/ngs_sample/results/gastro_prelim_st/type[2]')]
You can switch to the XMLFormatter
and manually filter out the results:
...# Change formatter:formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)...# after `out` has been retrieved:import refor i in out.splitlines():if re.search(r'\bdiff:\w+', i):print(i)# Result:# <type st="9999" diff:delete=""/>
Another option is use xml2
https://github.com/clone/xml2 (and something like bash
process substitution)
$ diff --color <(xml2 < File1.xml) <(xml2 < File2.xml)
7,8d6< /ngs_sample/results/gastro_prelim_st/type< /ngs_sample/results/gastro_prelim_st/type/@st=9999