2021-02-20, 16:40–17:05 (US/Pacific), Prerecorded Talks
The HTTP client library ‘urllib3’, the most downloaded Python package on the Python Package Index, is shipping breaking changes for the first time in over 9 years. The project is an invisible dependency to the vast majority of its users all while doing the heavy lifting for widely-deployed projects like pip, Requests, Selenium, Boto, and more. How can a project in this situation ship breaking changes without breaking the Python-verse? In this talk we’ll discuss what the breaking changes are, how we’re minimizing impact to users for each change, and what these changes mean for the future of urllib3.
What is urllib3? (7 minutes)
The talk will start with a brief introduction into what urllib3 is, what it accomplishes, and why it was created:
“urllib3 is a tool that speaks a special language that lets you download puppy pictures from the Internet”. That language that urllib3 speaks is “HTTP”. HTTP is one of the most important network protocols ever invented, so much so that engineers are rewriting TCP just so cellphones can speak HTTP better and DNS is done over HTTPS because it’s easier to secure HTTP than it is to secure DNS. HTTP is ubiquitous and it’s likely not going away any time soon.
So then it matters that Python’s HTTP story is a very good one. In the beginning Python’s included HTTP library had a few key missing features, like supporting HTTPS for security, redirects, connection pooling, thread safety, multipart encoding, retries, and compression. All of these features are necessities of an HTTP library and when urllib3 was first written these things weren’t so easy with the standard library, so Andrey created urllib3 in 2008.
Around that same time came Python 3, things became even tougher because the HTTP library had changed between Python 2 and 3 along with Python’s handling of binary strings. urllib3 juggles all of these nuances to make HTTP easy for users, especially those who were trying to write libraries that also supported Python 2 and 3 to make the migration between the two major versions smoother.
Now that 12 years have passed, urllib3 is as ubiquitous within the Python ecosystem as HTTP. The package was downloaded over a billion times in the past 12 months, about 3 millions times per day. And with Python 2 riding off into the sunset our team is looking to solve new challenges coming for Python and HTTP.
What features are coming in urllib3 v2? (8 minutes)
The biggest change coming to urllib3 v2 is that we’re removing support for the end-of-life Python 2. A lot of our code and complexity is to continue supporting Python 2. This complexity makes it harder to add new features, less performant, and impossible to make drastic changes to the codebase. We’re very much looking forward to this change, as it means improvements in the future should be easier to implement new features and require less effort required to keep our continuous integration and development dependencies happy.
urllib3 is implemented by subclassing many of the components of Python’s standard library HTTP client. This means in a lot of ways we’re tied to the standard library’s HTTP implementation. But the standard library only speaks HTTP/1 and is slightly less efficient at speaking HTTP compared to other HTTP implementations, we’d like to start exploring integrating alternate HTTP implementations into urllib3. The first step on this journey is defining very clearly what API we support, and to do that we’ll use type hints. urllib3 doesn’t have any type hints currently, and isn’t supported by typeshed, so you’re likely having to tell Mypy to ignore urllib3 if you write any code using urllib3. We’d like to change that by providing a completely type-hinted API, with only the APIs we’re planning on supporting in the future in order to make room for progress.
The last major change is improvement of default security. Currently we configure old TLS versions and ciphers in order to keep 100% compatibility with servers that were deployed and went untouched for 10+ years. Unfortunately maintaining these defaults comes at the cost of security of users who are making requests that support modern TLS and ciphers. It’s time to move the line forwards as has been done with many other clients. We’re planning on making the new default TLS version 1.2+, make our list of ciphers more secure, and dropping support for checking the deprecated “commonName” field on certificates.
What are we doing to not break the Python-verse? (10 minutes)
We’re continuing to support Python 2 in the 1.26 release stream with bug and security fixes for at least a year thanks to Tidelift. This means that users that aren’t able to switch off of Python 2 are still safe for now! We still recommend users migrate to urllib3 v2 and are trying to make sure migration is as low-effort as possible to reduce friction.
Type hints don’t affect code execution or restrict urllib3’s API, they only define the API that we’re planning on supporting in the future. The only thing in danger of breaking when we ship type hints is code that is “ignoring” mypy not being able to find type hints for urllib3. Shipping these type hints now means that in the next major version of urllib3 we can be more confident in shipping changes that break APIs by giving libraries and users lots of time to fix their code.
TLS 1.2 is already becoming a requirement for many Linux operating system OpenSSL versions (Ubuntu, Debian, likely others). 95% of websites support TLS 1.2 and above, so almost everyone will have no problem with the default TLS version changing. For users that are communicating with a website that only supports TLS 1.1 or earlier we’re improving the readability of error messages specifically for TLS. SSLErrors are notoriously difficult to read, in the new version of urllib3 some SSLErrors will have much better messages and documentation around what they mean and how you can mitigate the issue while remaining secure.
We do downstream integration testing against all releases of urllib3. With v2 we’re planning on doing even more downstream testing of libraries that depend on urllib3 to make sure we’re not accidentally breaking something. The first pre-release of urllib3 will also be given lots of time to bake before we ship v2.0.0. We’ve already contacted many widely-used dependents of urllib3 to give them advance warning of this release.
Seth is the lead maintainer of urllib3 since 2019 and has been a contributor to the project since 2016. He maintains a handful of other Python Open Source projects. Outside of Open Source Seth works as a Software Engineer at Elastic maintaining Python packages for many Elastic services as well as teaching and being an advocate, both internally and externally, for Python best practices. Seth is a Minnesota native and enjoys watching football, gardening, and being in nature.