Last blog we talked about data contracts and implementation in NodeJS. This time we will do the same but in Python.

In case of the NodeJS version, click the link below.

Data contracts in (real-time) action (with NodeJS)
Data contract is a part of data integration. It describes “contract” or agreement that we need to communicate between sources and destinations.

Recap

There is a validation step in extract layer in ETL. It is to make sure incoming data is sufficiently qualified to be in our databases.

In this case, we are creating an API to retrieving data and validate it. The tools we need are OpenAPI and Python library for validation, jsonschema.


API Swagger

We are using the same API definition file, people.

And also pets.


Making an app with validation

Introduce jsonschema. This library helps us validate an object with expected schema. It works as same as AJV in NodeJS.

jsonschema
PyPI version Supported Python versions Build status ReadTheDocs status pre-commit.ci status Zenodo DOI jsonschema is an implementation of the JSON Schema specification for Python. It can also be us…

1. Create a python app

Begin with a sample Flask app.

And install the requirements.

It should show like this when execute.

2. Read the contract

Start with just a single contract, people.

As you see, at line 6-7 we are reading the contract file and store into variable contract.

3. Validate a request payload

  • line 9: get the payload using request.get_json().
  • line 10: refer the raw contract at /components/schemas/people.
  • line 11: validate the payload and raw contract using jsonschema.validate().
  • line 12: return 200 as "OK" when all process above is okay. Otherwise return "500 Internal Server Error" as below.

4. Handling errors

Hmmm the error above is bulky and unconcise. We should improve like this.

  • If everything is okay, it should return 200 in try block.
  • If there is a validation error, it should return 400 with message from jsonschema.ValidationError.message in the first except block.
  • Anything else, return 400 and print the log into console.

5. Complete API with validation

6. Test

a. The payload is fine.

b. The payload has incorrect field type.

c. The payload misses some required fields.


Manage multiple contracts

A second ago we validate only people contract. Now we have pets and want to validate both.

Therefore, we can prepare a generic function like this.

  • line 8-13: read contract files and store in a dict, giving filename as keys.
  • line 16-30: refactor the validation into a generic function, require contract key and payload as parameters.

And call this generic function from each endpoint.

The complete code is here.

Let's make a call to this API with an error expected.

Good. Our simple API with contract validation is ready.


Repo

Full code is located here.

GitHub - bluebirz/sample-data-contracts-py: Sample data contracts for data integration with Python
Sample data contracts for data integration with Python - bluebirz/sample-data-contracts-py