PyCon UK 2019

Battles with reproducibility and collaboration in large organisations
2019-09-16 , Ferrier Hall

Reproducibility and collaboration are difficult aspects of any business-based analytics, speaking from personal experience, where a project is likely shared between a technical analyst and a business analyst. This talk aims to show examples of how this can be improved through aspects of the Python / R toolset.


Analytics without reproducibility, the ability to reproduce an output from its component parts, results in inherent risk. This is especially true in a business environment where staff can and will move to new jobs, leaving projects and work that may be vital for the business. Ensuring proper project organisation and understanding by other team members is therefore vital. In addition, analytics without collaboration can lead to wholly unsuitable results. Input is needed from both technical analysts (TAs) and business analysts/stakeholders (BAs) to ensure the analytics is paired with domain knowledge and context. Without collaboration and reproducibility, unsuitable work that no one else can understand or continue to build upon is produced.

Integrating reproducibility and collaboration however can be difficult. Different people prefer different tooling, especially those that work in different fields, and the idea of how the resultant product should be shared or maintained can differ. For example, TAs will likely favour notebooks and IDEs over a BAs office suite and TAs are more likely to hold onto the data / database links while BAs are likely to favour the final output.

The aim of this talk is to provide a quick overview, from personal experience working in both consulting and retail, on how both reproducibility and collaboration can be improved by utilising a set of tools within the Python / R ecosystem. I will discuss friendlier ways to set relative paths, utilising both R and Python within the same environment and a methodology for outputting markdown to Microsoft Word, which can then be converted back into a markdown format. I will suggest an example workflow of the sharing of an analysis with a BA, who is then able to tweak wording themselves and add design suggestions, which can then be performed by the TA. Finally, I will discuss my own personal experiences of collaboration and reproducibility and why they are so essential.


Is your proposal suitable for beginners? – yes

Rich is an ex molecular biologist who since completing his PhD has worked within consulting and retail, recently joining the data and analytics division of private consultancy in Leeds. He has a strong interest in both machine learning and building production ready applications, with a background in coding in R and more recently python. Outside of coding his main interests are sports and craft ale!