I'm trying to execute a jq command from my python script. Currently the jq command is working fine when I execute from the terminal (MacOs).

cat filename.json |jq '{Name:.name, address:.address[0][1].street}'

Basically I'm trying to extract data from a JSON using jq. As the JSON contains nested arrays, I would have to loop using a variable.

My questions is -

  • Can I execute this command from a python script
  • If it can be done, then how would I loop through for the nested array
  • elements in the sample data give above (address[][].street)

I don't want to use any language other than python, as it would cause compatibility issues.

6

Best Answer


From the jq FAQ:

Q: What bindings are available for Python?

A:

pip install jq # For details, see https://pypi.python.org/pypi/jq

pip install pyjq # For details, see https://pypi.python.org/pypi/pyjq

As for your nested array, looping through it sounds like something that can (and maybe should) be done within jq.

I believe the accepted answer should be peak's one, as the proper way to use a C api in python is via a python binding lib, and both https://pypi.python.org/pypi/jq and https://pypi.python.org/pypi/pyjq should work as expected.


That being said, since we are talking about python, I would like to bring an alternative that is much more pythonic: glom (pip install glom, https://glom.readthedocs.io/)

Instead of using a DSL like in jq, with glom you just declare the output in the format you want, using pure python (this output format is called spec). In this case, you want a simple dict:

spec = {'Name': 'name','street': 'address.0.1.street'}

and then just call glom on your data:

output_data = glom(input_data, spec)

Just like jq, you can also use glom on command line:

cat filename.json | glom "{'Name': 'name', 'street': 'address.0.1.street'}"

A complete python example:

import jsonfrom pprint import pprintfrom glom import glomwith open('filename.json', 'rt') as f:input_data = json.load(f)spec = {'Name': 'name','street': 'address.0.1.street'}output_data = glom(input_data, spec)pprint(output_data)

Well, I'm a big fan of jq, but it doesn't seem like you're doing something that can't be easily done in Python too. Consider:

import jsonwith open("filename.json", "r") as f:data = json.load(f){"Name": data["name"], "address": data["address"][0][1]["street"]}

The sh module makes it easy invoke a jq subprocess from python. e.g.

import shcmd = sh.jq('-M', '{Name:.name, address:.address[0][1].street}', 'filename.json')print "cmd returned >>%s<<" % cmd.stdout
  1. Can I execute this command from a python script

Yes, using subprocess. Example:

jsonFile = '/path/to/your/filename.json'jq_cmd = "/bin/jq '{Name:.name, address:.address[0][1].street}' " + jsonFilejq_proc = subprocess.Popen(jq_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)# read JSON object, convert to string, store as a dictionaryjDict = json.loads(jq_proc.stdout.read())jq_proc.stdout.close()
  1. If it can be done, then how would I loop through for the nested arrayelements in the sample data give above (address[][].street)

It would help to see a JSON data set with a few records. For looping through JSON sets in python with jq, it is easy to get a count of the objects and then iterate. A slight bit of overhead but it makes the code easy to understand.

# count number of JSON records from the root leveljq_cmd = "/bin/jq '. | length' " + jsonFilejq_proc = subprocess.Popen(jq_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)jObjCount = int(jq_proc.stdout.read())jq_proc.stdout.close()# iterate over each root level JSON recordfor ix in range(jObjCount):jq_cmd = "jq '. | .[" + str(ix) + "]' " + jsonFile jq_proc = subprocess.Popen(jq_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)# read object, convert to string, store as a python dictionaryjDict = json.loads(jq_proc.stdout.read())# iterate over nested objects within a root level object # as before, count number items but here for each root level JSON objectjq_cmd = "/bin/jq '. | .[" + str(ix) + "].sub_Item_Key | length' " + jsonFilejq_proc = subprocess.Popen(jq_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)jItemCount = int(jq_proc.stdout.read())jq_proc.stdout.close()for jx in range(jItemCount):jq_cmd = "/bin/jq '. | .[" + str(ix) + "].sub_Item_Key[" + str(jx) + "]' " + jsonFilejq_proc = subprocess.Popen(jq_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)# read JSON item, convert to string, store as a python dictionaryjItemDict = json.loads(jq_proc.stdout.read())

Enjoy!

Yes. Using plumbum.

from plumbum.cmd import jq, cat(cat["filename.json"] | jq ["{Name:.name, address:.address[0][1].street}"])()

The result of the above command is a JSON object, that can be parsed to a Python object using json.loads.

You might also be interested in jello, which is like jq but uses Python as query language.