JSON schema and validation with Python

Baoshan Gu
4 min readApr 3, 2022

JSON is a common data exchange format. It is widely used for restful API, web service, and in database such as MongoDB and Microsoft SQL server. It is crucial to ensuring quality of the JSON data. Here comes JSON schema, which describes the data format your application or API supports, and provides clear human and machine readable documentation.

Python package jsonschema is an implementation of the JSON Schema specification for Python. Its latest version fully supports JSON schema draft 7, 6, 4 and 3.

JSON schema validation with Python

First, you need to install jsonschema package. With pip you can simply do: # pip install jsonschema

A simple JSON schema validation python script:

import jsonschemajson_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"country": { "enum": ["USA", "Canada"] },
"postal_code": { "type": "string" }
},
"additionalProperties": False
}
json_data = {
"country": "USA",
"postal_code": "12345"
}
try:
jsonschema.validate(instance=json_data, schema=json_schema)
except jsonschema.exceptions.ValidationError as err:
print(f"Invalid: {err.message}")
else:
print("valid")

In the above script, json_schema and json_data are specified as Python dictionary directly. Alternatively, you can do:

import json
import jsonschema
# provide json schema or data as string
json_schema = json.loads("""{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"country": { "enum": ["USA", "Canada"] },
"postal_code": { "type": "string" }
},
"additionalProperties": false
}""")
# Or provide json schema or data in a separate file
with open(json_file, "r") as jf:
json_data = json.load(jf)

Note a difference here: in Python dictionary, the JSON boolean value is False or True (Python boolean), while when specified in string or separate file, the boolean value if false or true (JSON boolean, all lower case).

Mutually exclusive properties

Assume there are three properties: property_a, property_b, property_c, and one and only one property is required:

json_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"property_a": {"type": "string"},
"property_b": {"type": "string"},
"property_c": {"type": "string"}
},
"oneOf": [
{"required": ["property_a"]},
{"required": ["property_b"]},
{"required": ["property_c"]}
],
"additionalProperties": False
}

If these properties are mutual exclusive, but all are optional:

json_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"property_a": {"type": "string"},
"property_b": {"type": "string"},
"property_c": {"type": "string"}
},
"oneOf": [
{"required": ["property_a"]},
{"required": ["property_b"]},
{"required": ["property_c"]},
{"not": {
"anyOf": [
{"required": ["property_a"]},
{"required": ["property_b"]},
{"required": ["property_c"]}
]
}}
],
"additionalProperties": False
}

For the mutual exclusive cases, see stackoverflow for alternative approaches and consideration of combination explosion.

Applying schema conditionally

JSON schema draft 7 introduces if, then and else keywords to allow the application of a subschema based on the outcome of another schema. Compared with other approaches such as “implication” in previous version, I think the if/then/else method is clear and explicit, and easy for developers to use.

json-schema site provides an example to illustrate a property format depends on another property. It is posted here with minor tweaks of patterns in Python. The postal_codeproperty format depends on the property country value.

{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"country": { "enum": ["USA", "Canada"] },
"postal_code": { "type": "string" }
},
"if": {
"properties": {"country": {"const": "USA"} }
},
"then": {
"properties": {"postal_code": {"type": "string", "pattern": "^[0-9]{5}(-[0-9]{4})?$" } }
},
"else": {
"properties": {"postal_code": {"type": "string", "pattern": "^[A-Z][0-9][A-Z] [0-9][A-Z][0-9]$" } }
},
"additionalProperties": false
}

If a property (property_b) depends on whether another property (property_a) exists or not, the condition can be expressed as:

"if": {
"required": ["property_a"]
},
"then": {
"properties": {"property_b": {...}
}

If a property depends on parent’s property, the condition shall be specified where both the property and the parent property are accessible. In the following example, property states is an array, whose item has a different list of enum values depending on the parent property country.

{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"country": { "enum": ["USA", "Canada"] },
"states": { "type": "array" }
},
"required": ["country", "states"],
"additionalProperties": false,
"if": {
"properties": {"country": {"const": "USA"} }
},
"then": {
"properties": {
"states": {
"items": {
"enum": ["CA", "TX", "VA"]
}
}
}
},
"else": {
"properties": {
"states": {
"items": {
"enum": ["AB", "BC", "QC"]
}
}
}
}
}

With the schema, the data {“country”: “USA”, “states”: [“VA”, “BC”]} will get schema validation error: ‘BC’ is not one of [‘CA’, ‘TX’, ‘VA’]. The JSON data {“country”: “USA”, “states”: [“CA”, “VA”]} will be valid.

The article provides several basic JSON schema use cases. For multiple conditions or complicated cases, you might need to use combinations of allOf, oneOf, anyOf, not and required etc checks. And for each of the use cases, there are probably many other ways to define a schema.

References

Happy coding!

--

--