I'm used to bringing data in and out of Python using CSV files, but there are obvious challenges to this. Are there simple ways to store a dictionary (or sets of dictionaries) in a JSON or pickle file?
For example:
data = {}data ['key1'] = "keyinfo"data ['key2'] = "keyinfo2"
I would like to know both how to save this, and then how to load it back in.
Best Answer
Pickle save:
try:import cPickle as pickleexcept ImportError: # Python 3.ximport picklewith open('data.p', 'wb') as fp:pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)
See the pickle module documentation for additional information regarding the protocol
argument.
Pickle load:
with open('data.p', 'rb') as fp:data = pickle.load(fp)
JSON save:
import jsonwith open('data.json', 'w') as fp:json.dump(data, fp)
Supply extra arguments, like sort_keys
or indent
, to get a pretty result. The argument sort_keys will sort the keys alphabetically and indent will indent your data structure with indent=N
spaces.
json.dump(data, fp, sort_keys=True, indent=4)
JSON load:
with open('data.json', 'r') as fp:data = json.load(fp)
Minimal example, writing directly to a file:
import jsonjson.dump(data, open(filename, 'wb'))data = json.load(open(filename))
or safely opening / closing:
import jsonwith open(filename, 'wb') as outfile:json.dump(data, outfile)with open(filename) as infile:data = json.load(infile)
If you want to save it in a string instead of a file:
import jsonjson_str = json.dumps(data)data = json.loads(json_str)
Also see the speeded-up package ujson:
import ujsonwith open('data.json', 'wb') as fp:ujson.dump(data, fp)
To write to a file:
import jsonmyfile.write(json.dumps(mydict))
To read from a file:
import jsonmydict = json.loads(myfile.read())
myfile
is the file object for the file that you stored the dict in.
If you want an alternative to pickle
or json
, you can use klepto
.
>>> init = {'y': 2, 'x': 1, 'z': 3}>>> import klepto>>> cache = klepto.archives.file_archive('memo', init, serialized=False)>>> cache {'y': 2, 'x': 1, 'z': 3}>>>>>> # dump dictionary to the file 'memo.py'>>> cache.dump() >>> >>> # import from 'memo.py'>>> from memo import memo>>> print memo{'y': 2, 'x': 1, 'z': 3}
With klepto
, if you had used serialized=True
, the dictionary would have been written to memo.pkl
as a pickled dictionary instead of with clear text.
You can get klepto
here: https://github.com/uqfoundation/klepto
dill
is probably a better choice for pickling then pickle
itself, as dill
can serialize almost anything in python. klepto
also can use dill
.
You can get dill
here: https://github.com/uqfoundation/dill
The additional mumbo-jumbo on the first few lines are because klepto
can be configured to store dictionaries to a file, to a directory context, or to a SQL database. The API is the same for whatever you choose as the backend archive. It gives you an "archivable" dictionary with which you can use load
and dump
to interact with the archive.
If you're after serialization, but won't need the data in other programs, I strongly recommend the shelve
module. Think of it as a persistent dictionary.
myData = shelve.open('/path/to/file')# Check for values.keyVar in myData# Set valuesmyData[anotherKey] = someValue# Save the data for future use.myData.close()
For completeness, we should include ConfigParser and configparser which are part of the standard library in Python 2 and 3, respectively. This module reads and writes to a config/ini file and (at least in Python 3) behaves in a lot of ways like a dictionary. It has the added benefit that you can store multiple dictionaries into separate sections of your config/ini file and recall them. Sweet!
Python 2.7.x example.
import ConfigParserconfig = ConfigParser.ConfigParser()dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}dict3 = {'x':1, 'y':2, 'z':3}# Make each dictionary a separate section in the configurationconfig.add_section('dict1')for key in dict1.keys():config.set('dict1', key, dict1[key])config.add_section('dict2')for key in dict2.keys():config.set('dict2', key, dict2[key])config.add_section('dict3')for key in dict3.keys():config.set('dict3', key, dict3[key])# Save the configuration to a filef = open('config.ini', 'w')config.write(f)f.close()# Read the configuration from a fileconfig2 = ConfigParser.ConfigParser()config2.read('config.ini')dictA = {}for item in config2.items('dict1'):dictA[item[0]] = item[1]dictB = {}for item in config2.items('dict2'):dictB[item[0]] = item[1]dictC = {}for item in config2.items('dict3'):dictC[item[0]] = item[1]print(dictA)print(dictB)print(dictC)
Python 3.X example.
import configparserconfig = configparser.ConfigParser()dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}dict3 = {'x':1, 'y':2, 'z':3}# Make each dictionary a separate section in the configurationconfig['dict1'] = dict1config['dict2'] = dict2config['dict3'] = dict3# Save the configuration to a filef = open('config.ini', 'w')config.write(f)f.close()# Read the configuration from a fileconfig2 = configparser.ConfigParser()config2.read('config.ini')# ConfigParser objects are a lot like dictionaries, but if you really# want a dictionary you can ask it to convert a section to a dictionarydictA = dict(config2['dict1'] )dictB = dict(config2['dict2'] )dictC = dict(config2['dict3'])print(dictA)print(dictB)print(dictC)
Console output
{'key2': 'keyinfo2', 'key1': 'keyinfo'}{'k1': 'hot', 'k2': 'cross', 'k3': 'buns'}{'z': '3', 'y': '2', 'x': '1'}
Contents of config.ini
[dict1]key2 = keyinfo2key1 = keyinfo[dict2]k1 = hotk2 = crossk3 = buns[dict3]z = 3y = 2x = 1
If save to a JSON file, the best and easiest way of doing this is:
import jsonwith open("file.json", "wb") as f:f.write(json.dumps(dict).encode("utf-8"))
My use case was to save multiple JSON objects to a file and marty's answer helped me somewhat. But to serve my use case, the answer was not complete as it would overwrite the old data every time a new entry was saved.
To save multiple entries in a file, one must check for the old content (i.e., read before write). A typical file holding JSON data will either have a list
or an object
as root. So I considered that my JSON file always has a list of objects
and every time I add data to it, I simply load the list first, append my new data in it, and dump it back to a writable-only instance of file (w
):
def saveJson(url,sc): # This function writes the two values to the filenewdata = {'url':url,'sc':sc}json_path = "db/file.json"old_list= []with open(json_path) as myfile: # Read the contents firstold_list = json.load(myfile)old_list.append(newdata)with open(json_path,"w") as myfile: # Overwrite the whole contentjson.dump(old_list, myfile, sort_keys=True, indent=4)return "success"
The new JSON file will look something like this:
[{"sc": "a11","url": "www.google.com"},{"sc": "a12","url": "www.google.com"},{"sc": "a13","url": "www.google.com"}]
NOTE: It is essential to have a file named file.json
with []
as initial data for this approach to work
PS: not related to original question, but this approach could also be further improved by first checking if our entry already exists (based on one or multiple keys) and only then append and save the data.
Shorter code
Saving and loading all types of python variables (incl. dictionaries) with one line of code each.
data = {'key1': 'keyinfo', 'key2': 'keyinfo2'}
saving:
pickle.dump(data, open('path/to/file/data.pickle', 'wb'))
loading:
data_loaded = pickle.load(open('path/to/file/data.pickle', 'rb'))
Maybe it's obvious, but I used the two-row solution in the top answer quite a while before I tried to make it shorter.