Web is broken. We learned the hard way. Developers tend to hack, hacks tend to break the web. In this talk, I share what we learned how websites don't obey the protocols and how developers had caused the web became a chaotic medium.
As Prisync, we crawl a large portion of the web every day for 6 years. First we approach the problem with a naive aspect, but we learned our lesson via experience. Developers create workarounds and hacks all over the time. But doing so has –most probably, unexpected– consequences. Some of the glitches we experiences so far:
There are ;
- websites not responding properly
- websites responding different output to identical requests
- websites not responding at all
- websites not obeying HTTP at all
- websites with broken firewall rules
- websites served on archaic webservers, which even are not aware of current state of transfer protocol
- websites taking advantage of vulnerabilities (a.k.a. "clever hacks")
In this talk, I share examples of those "hacks" and I propose some methods to keep the web healthy.
Business & Start-Ups, Big Data, Infrastructure, Web, Data Engineering
Domain Expertise:some
Python Skill Level:none
Link to talk slides:https://docs.google.com/presentation/d/1nSA0KtV1nVK7v6HKw4-EUrCGc9TCcsP06R7gm9_N8dg/edit?usp=sharing
Abstract as a tweet:We broke the web via simple hacks. Instead of order, we caused chaos. How to fix that?
Public link to supporting material:https://docs.google.com/presentation/d/1nSA0KtV1nVK7v6HKw4-EUrCGc9TCcsP06R7gm9_N8dg/edit?usp=sharing