Plato Data Intelligence.
Vertical Search & Ai.

Reading and Writing JSON to a File in Python

Date:

Introduction

In this guide, we’ll take a look at how to read and write JSON data from and to a file in Python, using the json module.

JSON (JavaScript Object Notation) is an extremely popular format for data serialization, given how generally applicable and lightweight it is – while also being fairly human-friendly. Most notably, it’s extensively used in the web development world, where you’ll likely encounter JSON-serialized objects being sent from REST APIs, application configuration, or even simple data storage.

Given its prevalence, reading and parsing JSON files (or strings) is pretty common, and writing JSON to be sent off is equally as common. In this guide – we’ll take a look at how to leverage the json module to read and write JSON in Python.

Writing JSON to a File with Python with json.dump() and json.dumps()

To write JSON contents to a file in Python – we can use json.dump() and json.dumps(). These are separate methods and achieve different result:

  • json.dumps() – Serializes an object into a JSON-formatted string
  • json.dump() – Serialized an object into a JSON stream for saving into files or sockets

Note: The “s” in “dumps” is actually short for “dump string”.

JSON’s natural format is similar to a map in computer science – a map of key-value pairs. In Python, a dictionary is a map implementation, so we’ll naturally be able to represent JSON faithfully through a dict. A dictionary can contain other nested dictionaries, arrays, booleans, or other primitive types like integers and strings.

:::

Note: The built-in json package offers several convenience methods that allows us to convert between JSON and dictionaries.

:::

That being said, let’s import the json module, define a dictionary with some data and then convert it into JSON before saving to a file:

import json

data = {
    'employees' : [
        {
            'name' : 'John Doe',
            'department' : 'Marketing',
            'place' : 'Remote'
        },
        {
            'name' : 'Jane Doe',
            'department' : 'Software Engineering',
            'place' : 'Remote'
        },
        {
            'name' : 'Don Joe',
            'department' : 'Software Engineering',
            'place' : 'Office'
        }
    ]
}


json_string = json.dumps(data)
print(json_string)

This results in:

{'employees': [{'name': 'John Doe', 'department': 'Marketing', 'place': 'Remote'}, {'name': 'Jane Doe', 'department': 'Software Engineering', 'place': 'Remote'}, {'name': 'Don Joe', 'department': 'Software Engineering', 'place': 'Office'}]}

Here, we have a simple dictionary with a few employees, each of which has a name, department and place. The dumps() function of the json module dumps a dictionary into JSON contents, and returns a JSON string.

Once serialized, you may decide to send it off to another service that’ll deserialize it, or, say, store it. To store this JSON string into a file, we’ll simply open a file in write mode, and write it down. If you don’t want to extract the data into an independent variable for later use and would just like to dump it into a file, you can skip the dumps() function and use dump() instad:


with open('json_data.json', 'w') as outfile:
    outfile.write(json_string)


with open('json_data.json', 'w') as outfile:
    json.dump(json_string, outfile)

Any file-like object can be passed to the second argument of the dump() function, even if it isn’t an actual file. A good example of this would be a socket, which can be opened, closed, and written to much like a file.

Reading JSON from a File with Python with json.load() and json.loads()

The mapping between dictionary contents and a JSON string is straightforward, so it’s easy to convert between the two. The same logic as with dump() and dumps() is applied to load() and loads(). Much like json.dumps(), the json.loads() function accepts a JSON string and converts it into a dictionary, while json.load() lets you load in a file:

import json

with open('json_data.json') as json_file:
    data = json.load(json_file)
    print(data)

This results in:

{'employees': [{'name': 'John Doe', 'department': 'Marketing', 'place': 'Remote'}, {'name': 'Jane Doe', 'department': 'Software Engineering', 'place': 'Remote'}, {'name': 'Don Joe', 'department': 'Software Engineering', 'place': 'Office'}]}

Alternatively, let’s read a JSON string into a dictionary:

import json

python_dictionary = json.loads(json_string)
print(python_dictionary)

Which also results in:

{'employees': [{'name': 'John Doe', 'department': 'Marketing', 'place': 'Remote'}, {'name': 'Jane Doe', 'department': 'Software Engineering', 'place': 'Remote'}, {'name': 'Don Joe', 'department': 'Software Engineering', 'place': 'Office'}]}

This one is especially useful for parsing REST API responses that send JSON. This data comes to you as a string, which you can then pass to json.loads() directly, and you have a much more manageable dictionary to work with!

Sorting, Pretty-Printing, Separators and Encoding

When serializing your data to JSON with Python, the standard format aiming to minimize the required memory to transmit messages is not very readable since whitespaces are eliminated. While this is the ideal behavior for data transfer (computers don’t care for readability, but do care about size) – sometimes you may need to make small changes, like adding whitespace to make it human readable.

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Note: json.dump()/json.dumps() and json.load()/json.loads() all provide a few options formatting.

Pretty-Printing JSON in Python

Making JSON human readable (aka “pretty-printing”) is as easy as passing an integer value for the indent parameter:

import json
data = {'people':[{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
print(json.dumps(data, indent=4))

This creases a 4-space indentation on each new logical block:

{
    "people": [
        {
            "website": "stackabuse.com", 
            "from": "Nebraska", 
            "name": "Scott"
        }
    ]
}

Another option is to use the command line tool – json.tool. With it, you can pretty-print the JSON in the command line without affecting the transmitted string, and just impacting how it’s displayed on the standard output pipe:

$ echo '{"people":[{"name":"Scott", "website":"stackabuse.com", "from":"Nebraska"}]}' | python -m json.tool
{
    "people": [
        {
            "name": "Scott",
            "website": "stackabuse.com"
            "from": "Nebraska",
        }
    ]
}

Sorting JSON Objects by Keys

In JSON terms:

“An object is an unordered set of name/value pairs.”

The key order isn’t guaranteed, but it’s possible that you may need to enforce key order. To achieve ordering, you can pass True to the sort_keys option when using json.dump() or json.dumps():

import json
data = {'people':[{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
print(json.dumps(data, sort_keys=True, indent=4))

This results in:

{
    "people": [
        {
            "from": "Nebraska",
            "name": "Scott",
            "website": "stackabuse.com"
        }
    ]
}

ASCII Text and Encoding

By default, json.dump() and json.dumps() will ensure that text in the given Python dictionary is ASCII-encoded. If non-ASCII characters are present, then they’re automatically escaped, as shown in the following example:

import json
data = {'item': 'Beer', 'cost':'£4.00'}
jstr = json.dumps(data, indent=4)
print(jstr)
{
    "item": "Beer",
    "cost": "u00a34.00"
}

This isn’t always acceptable, and in many cases you may want to keep your Unicode characters unchanged. To do this, set the ensure_ascii option to False:

jstr = json.dumps(data, ensure_ascii=False, indent=4)
print(jstr)
{
    "item": "Beer",
    "cost": "£4.00"
}

Skip Custom Key Data Types

If a key in your dictionary is of a non-primitive type (str, int, float, bool or None), a TypeError is raised when you try dumping JSON contents into a file. You can skip these keys via the skipkeys argument:

jstr = json.dumps(data, skipkeys=True)

Enabling and Disabling Circular Check

If a property of a JSON object references itself, or another object that references back the parent object – an infinitely recursive JSON is created. Infinite recursion typically results in memory being allocated rapidly until a device runs out of memory, and in the case of dumping JSON, a RecursionError is raised and the dumping is halted.

This is regulated by the check_circular flag, which is True by default, and prevents possible issues when writing circular dependencies. To turn it off, you can set it to `False:

jstr = json.dumps(data, check_circular=False)

Do note, however, that this is highly not recommended.

Enabling and Disabling NaNs

NaN-values, such as -inf, inf and nan may creep into objects that you want to serialize or deserialize. JSON standard doesn’t allow for NaN values, but they still carry logical value that you might want to transmit in a message. On another hand – you may want to enforce that NaN values aren’t transmitted, and raise an exception instead. The allow_nan flag is set to True by default, and allows you to serialize and deserialize NaN values, replacing them with the JavaScript equivalents (Inifinity, -Infinity and NaN).

If you set the flag to False instead – you’ll switch to a strictly JSON-standardized format, which raises a ValueError if your objects contain attributes with these values:

jstr = json.dumps(data, allow_nan=False)

Changing Separators

In JSON, the keys are separated from values with colons (:) and the items are separated from each other with commas (,):

key1:value1,
key2:value2

The default separators for reading and writing JSON in Python is (', ', ': ') with whitespaces after the commas and colons. You can alter these to skip the whitespaces and thus make the JSON a bit more compact, or fully change the separators with other special characters for a different representation:


jstr = json.dumps(data, separators=(',', ':'))

Compatibility Issues with Python 2

If you’re using an older version of Python (2.x) – you may run into a TypeError while trying to dump JSON contents into a file. Namely, if the contents contain a non-ASCII character, a TypeError is raised, even if you pass the encoding argument, when using the json.dump() method:


with open('json_data.json', 'w', encoding='utf-8') as outfile:
    json.dump(json_string, outfile, ensure_ascii=False)

If you encounter this edge-case, which has since been fixed in subsequent Python versions – try using json.dumps() instead, and write the string contents into a file instead of streaming the contents directly into a file.

Conclusion

In this guide, we introduced you to the json.dump(), json.dumps(), json.load(), and json.loads() methods, which help in serializing and deserializing JSON strings.

We’ve then taken a look at how you can sort JSON objects, pretty-print them, change the encoding, skip custom key data types, enable or disable circular checks and whether NaNs are allowed, as well as how to change the separators for serialization and deserialization.

With JSON having being one of the most popular ways to serialize structured data, you’ll likely have to interact with it pretty frequently, especially when working on web applications. Python’s json module is a great way to get started, although you’ll probably find that simplejson is another great alternative that is much less strict on JSON syntax.

spot_img

Latest Intelligence

spot_img

Latest Intelligence

spot_img

Latest Intelligence

spot_img