Parallelizing Python applications with PyCOMPSs
2019-09-03 , Track4 (Chillida)

PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments.


PyCOMPSs!

COMPSs is a task-based programming model that aims to ease the development of parallel applications and their execution in distributed computing environments, which provides a binding for Python (aka PyCOMPSs). It is based on sequential programming, which helps application developers on parallelization and distribution efforts (e.g. thread/process creation, synchronization, data movements, etc.). Application developers simply need to identify which methods will be considered tasks, and the runtime exploits the inherent parallelism of the application at execution time by detecting the task calls and the data dependencies among them. To this end, the runtime is able to spawn the tasks asynchronously on the available resources and orchestrate their data transfers guaranteeing the validity of the execution.

PyCOMPSs relies on the usage of decorators for task selection and a tiny API for synchronization. Moreover, it has also integration with Jupyter notebooks, and provides a wide range of supported features, such as task constraint definition, multiple implementations (so that the runtime can choose the most appropriate considering the available resources), and binary tasks (e.g. binary, MPI and OmpSs) among others.

In addition, PyCOMPSs' runtime enables to run the applications on top of different infrastructures (such as multi-core machines, clusters, grids, clouds or containers) without modifying a single line of the application. It also provides fault-tolerant mechanisms, a live monitoring tool, it is able to generate post-mortem performance traces using Extrae that can be later analyzed with Paraver, and it is extendible through pluggable connectors (e.g. clouds and schedulers).

This rich number of features enables the quick and easy parallelization of Python code, its execution in distributed environments and performance analysis, with current success in scientific fields like numeric algorithms, AI, life and earth sciences.

This tutorial has as main objective to instruct how to program and decorate Python applications using PyCOMPSs in order to enable them to run in parallel.
More in detail, the tutorial objectives are:

  • To give an overview of PyCOMPSs task-based programming model syntax.
  • To demonstrate how to use PyCOMPSs to parallelize and run applications in distributed platforms.
  • To illustrate how sample benchmarks from linear algebra and big data can benefit of PyCOMPSs as a programming model. Also, from real use cases from AI, Life and Earth sciences.
  • To give practical insight of how to use PyCOMPSs programming model with the Jupyter notebook.
  • To give an overview of the PyCOMPSs runtime and how it interacts with clusters, clusters of docker containers and clouds.

The attendees will benefit by learning how to parallelize their Python application with PyCOMPSs with a simple interface, run them in distributed parallel platforms, the integration with Jupyter notebooks, and how to analyze the execution behaviour.

Requirements and setup instructions

This tutorial can be followed using a virtual machine or using a docker container. Attendees can choose the best option considering their system.

  • Using Virtual Appliance:

  • Using Docker:

    • Install docker
    • git clone https://github.com/bsc-wdc/tutorial_apps.git
    • docker pull compss/compss-tutorial:patc2019
    • docker run --name mycompss -p 8888:8888 -p 8080:8080 -v /path/to/tutorial_apps:/home/tutorial_apps -itd compss/compss-tutorial:patc2019

Project Homepage / Git

http://compss.bsc.es

Project Homepage / Git

http://compss.bsc.es

Abstract as a tweet

Easy programming for parallel and distributed Python applications with PyCOMPSs @bsc_compss @bsc_cns

Python Skill Level

basic

Domain Expertise

none

Domains

Big Data, Machine Learning, Parallel computing / HPC

Javier Conejero is a Senior Researcher at the Barcelona Supercomputing Center. He holds a PhD on
Advanced Computer Technologies (2014) from the University of Castilla-La Mancha (UCLM), Spain.
During his PhD, he was awarded by the Ministry of Economy and Competitiveness (MINECO) of the
Spanish Government with a FPI fellowship grant. Previously, he worked at CERN for one year
(2009) into WLCG software development and management. Since 2015, he is a Senior Researcher
of the Workflows and Distributed Computing research group at the Barcelona Supercomputing
Center (BSC). He is leading the efforts on the PyCOMPSs binding at BSC. In 2016 he was awarded
by the MINECO with the Juan de la Cierva grant.

Javier lectured and ran practical exercises on PyCOMPSs development within the PATC:
Programming Distributed Computing Platforms with COMPSs tutorial annually since 2016. He has
also participated in PyCOMPSs tutorials in various conferences and workshops: EuroPython 2017,
CCGrid 2017, EuroPar2017 and SIAM 2018.