PyCon UK 2019

Automated report writing using data from a relational database
2019-09-15, 15:00–15:30, Assembly Room

As part of the Development Consent Order submission for large infrastructure (Highways scheme), a book of reference needs to be submitted. The user previously needed to export spreadsheets from a database and populate the document manually. We have automated this process on a jupyter notebook.


Introduction:

The book of reference lists all affected plots of land and associated parties in a legal document of a specific format. The user previously needed to export spreadsheets from a database and populate the document manually by locating the relevant plots of land on a map and retrieving postcode and locality data.

Part 1: Presentation of the problem.

As the amount of information increases and plots of land can have multiple uses, users and owners during period of time, manipulating this amount of information manually is time consuming. In collaboration with our internal client we have developed an interactive notebook where the client can search for relevant
parties and extract all information available.

Part 2: Development.

The typical structure of this relational database will be presented.
The 'query' consists of a .csv file containing the relevant Site IDs. A site ID can correspond to one plot of land with many buildings, multiple uses, and different owners and interested parties. As a result, we end up with a rather complex relational system through which we have to navigate. Using the site ID we join the tables connected to the initial table and we filter according to keywords. An example of this process will be presented.

The process that we developed to loop through the site IDs and store the data to be saved in a document will be presented.
We preferred pyodbc instead of a typical SQLAlchemy engine, as we wanted to connect and get data from the database and not modify any of the existing records.

Part 3: What can go wrong?

This part details the difficulties we encountered in compiling all this information in 5 word documents (Parts 1- 5). Problems encountered include:
1) How to insert in the same cell, different rows from a different Dataframe.
2) Typos, ampersands, spaces, and null values.
3) Bold and italics in the same sentence (with entries from different
Dataframes).

The methods we used to solve these problems will be presented.

Part 4: The future.

The next step is to upscale: enable all the users to automate the reports for the book of reference required for their projects. The engineers will still be responsible for reviewing the information, however a lot of time which could have been spent in report writing is saved.


Is your proposal suitable for beginners? – True