Parallelizing Python applications with PyCOMPSs
2019-09-03, 14:00–15:30, Track4 (Chillida)

PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments.


PyCOMPSs!

COMPSs is a task-based programming model that aims to ease the development of parallel applications and their execution in distributed computing environments, which provides a binding for Python (aka PyCOMPSs). It is based on sequential programming, which helps application developers on parallelization and distribution efforts (e.g. thread/process creation, synchronization, data movements, etc.). Application developers simply need to identify which methods will be considered tasks, and the runtime exploits the inherent parallelism of the application at execution time by detecting the task calls and the data dependencies among them. To this end, the runtime is able to spawn the tasks asynchronously on the available resources and orchestrate their data transfers guaranteeing the validity of the execution.

PyCOMPSs relies on the usage of decorators for task selection and a tiny API for synchronization. Moreover, it has also integration with Jupyter notebooks, and provides a wide range of supported features, such as task constraint definition, multiple implementations (so that the runtime can choose the most appropriate considering the available resources), and binary tasks (e.g. binary, MPI and OmpSs) among others.

In addition, PyCOMPSs' runtime enables to run the applications on top of different infrastructures (such as multi-core machines, clusters, grids, clouds or containers) without modifying a single line of the application. It also provides fault-tolerant mechanisms, a live monitoring tool, it is able to generate post-mortem performance traces using Extrae that can be later analyzed with Paraver, and it is extendible through pluggable connectors (e.g. clouds and schedulers).

This rich number of features enables the quick and easy parallelization of Python code, its execution in distributed environments and performance analysis, with current success in scientific fields like numeric algorithms, AI, life and earth sciences.

This tutorial has as main objective to instruct how to program and decorate Python applications using PyCOMPSs in order to enable them to run in parallel.
More in detail, the tutorial objectives are:

  • To give an overview of PyCOMPSs task-based programming model syntax.
  • To demonstrate how to use PyCOMPSs to parallelize and run applications in distributed platforms.
  • To illustrate how sample benchmarks from linear algebra and big data can benefit of PyCOMPSs as a programming model. Also, from real use cases from AI, Life and Earth sciences.
  • To give practical insight of how to use PyCOMPSs programming model with the Jupyter notebook.
  • To give an overview of the PyCOMPSs runtime and how it interacts with clusters, clusters of docker containers and clouds.

The attendees will benefit by learning how to parallelize their Python application with PyCOMPSs with a simple interface, run them in distributed parallel platforms, the integration with Jupyter notebooks, and how to analyze the execution behaviour.


Domains – Big Data, Machine Learning, Parallel computing / HPC Project Homepage / Git – http://compss.bsc.es Domain Expertise – none Python Skill Level – basic Project Homepage / Git – http://compss.bsc.es Abstract as a tweet – Easy programming for parallel and distributed Python applications with PyCOMPSs @bsc_compss @bsc_cns