Python Conference APAC 2024

Web Scraping Made Easy with Scrapy
2024-10-25 , Workshop Class #2
Language: English

This workshop aims to demonstrate how web scraping task can be made easy with Scrapy. Scrapy is an open source web scraping framework written in Python. It allows developers to focus on developing web crawlers without being bothered by lower-level details such as managing HTTP request scheduling and concurrency. We will use Scrapy to extract data from toscrape.com, a web scraping sandbox that can be used by anyone to learn web scraping. Participants will gradually learn how to perform web scraping, starting from simple task like extracting data from a single web page to more complex tasks such as extracting data from AJAX endpoints.

The target participants of this workshop are individuals with basic programming skill (not necessarily in Python) who understand basic concepts of HTTP and HTML document structure.


The World Wide Web contains a vast amount of interesting data. Unfortunately, most of those data are presented as HTML (HyperText Markup Language) documents intended for direct human consumption, not for computers. In To enable computers to process those data, we need to extract them from the HTML documents. Such process is commonly referred to as "web scraping".

This workshop aims to demonstrate how web scraping task can be made easy with Scrapy. Scrapy is an open source web scraping framework written in Python. It allows developers to focus on developing web crawlers without being bothered by lower-level details such as managing HTTP request scheduling and concurrency. We will use Scrapy to extract data from toscrape.com, a web scraping sandbox that can be used by anyone to learn web scraping. Participants will gradually learn how to perform web scraping, starting from simple task like extracting data from a single web page to more complex tasks such as extracting data from AJAX endpoints.

The target participants of this workshop are individuals with basic programming skill (not necessarily in Python) who understand basic concepts of HTTP (HyperText Transport Protocol) and HTML document structure.

Prerequisites

  • Python
  • Scrapy

Team Leader - Zyte