2025-12-10 –, Deborah Sampson
What does the JupyterLab extension ecosystem actually look like in 2025? While extensions drive much of JupyterLab's practical value, their overall landscape remains largely unexplored. This talk analyzes public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health: monthly downloads by category, release recency, star-download relationships, and the rise of AI-focused extensions. I will present my approach for building this analysis pipeline and offer lessons learned. Finally, I will demonstrate of an open, read-only web catalog built on this data set.
What & why
JupyterLab extensions drive much of the practical value for data teams, but discovery and evaluation are noisy. This talk uses public data (PyPI/BigQuery + GitHub) to measure the ecosystem with transparent signals, not black boxes.
Scope.
We quantify growth over time, category composition, release cadence, and popularity patterns. We examine where simple signals (30-day vs. all-time downloads, stars/issues, “updated X days ago”) correlate—and where they don’t—then zoom in on the AI segment to separate durable adoption from hype.
Methods
- Data: PyPI download events (BigQuery public dataset), project metadata, and GitHub repository stats.
- Processing: Daily summaries to control cost; package↔repo mapping; cautious enrichment under rate limits.
- Caveats: Downloads ≠ installs; stars ≠ quality. We treat them as proxies and show their limits.
- Reproducibility: A read-only aggregate snapshot and the core queries will be shared so attendees can reproduce/extend the analysis.
Companion catalog (brief demo)
We’ll spend a few minutes in a companion, open web catalog (open-source client; public read-only API) to show how practitioners can search by category, slice by recency/downloads/stars, and copy install commands—using the same dataset.
Takeaway
1. How the ecosystem is growing and which categories lead.
2. What “healthy” looks like (recency, contributor activity).
3. Evidence on AI-extension adoption vs. attention.
4. A lightweight recipe to measure other plugin ecosystems.
Outline with time breakdown (40 min incl. Q&A)
- 0–5: Why measure extensions? Signals vs. noise; data sources; caveats
- 5-15: Trends in the data (growth & composition, momentum/health, growth of AI)
- 15-20 Companion catalog live demo (search/sort; copy install; how signals map)
- 20–30: Methods & reproducibility (pipeline thumbnail, queries, snapshot/API, limits)
- 30–40: Q&A
Audience & prerequisites.
Data scientists/engineers, OSS maintainers, and tool builders. Comfortable with Python/SQL basics; no Jupyter internals required.