PyCon JP 2025

Pythonic Finance: Analyze Company Fundamentals with SEC EDGAR APIs
2025-09-26 , Dahlia 2

In this talk, Python and quantitative methods are used to access, validate, and analyze fundamental financial data from the US Securities and Exchange Commission (SEC) EDGAR system. The SEC's JSON API provides structured financial data, derived from company filings reported in eXtensible Business Reporting Language (XBRL), an international standard for financial reporting. Pydantic is used for robust data validation. Attendees will learn to:

  • Fetch basic metrics
  • Calculate financial ratios
  • Visualize trends
  • Navigate common data challenges

While focused on the US market data, a brief explanation of the international landscape will also be provided. No finance background required. Basic Python is required to understand the data processing part.


Target

  • Who: This talk is designed for anyone interested in using Python to understand public company financials, spanning from students, programmers, hobbyists, even experienced working professionals. As mentioned, no finance background is required for this talk. Beginner-level Python (variables, functions, lists, using libraries) might be required to understand how the financial data is processed to output metrics.

  • What: Attendees will be able to discover briefly about the XBRL language, hands-on techniques for getting data from SEC EDGAR's companyfacts API, combining data science and software engineering to validate data with Pydantic, extract key fundamental metrics from the API responses, calculate basic financial ratios, visualize financial trends, and learn the nuances of working with public financial data APIs, even internationally.

  • How: This talk will be presented in a mixed-style of core financial concepts with live (fallback is prepared in case network errors) Python demonstrations in a Jupyter Notebook / Google Colaboratory. This talk will go through the entire process: selecting a company to visualize its financial health, techniques to ensure that the data is valid, and calculating financial metrics and ratios.

Scope

  • I am NOT a financial advisor and this talk is for educational purposes only, I will NOT promote or recommend any specific assets, companies, products, and strategies, and this talk is NOT to be construed as any financial, investment, or trading advice
  • Advanced financial modeling, company valuations, and comprehensive ratio analysis are outside of the talk's scope
  • This talk does not dive deep with parsing XBRL format
  • This talk does not cover how to buy and/or sell securities
  • All examples of this talk use historical data, past performance is not indicative of future performance

Outline

Planned outline of the presentation:

  • Introduction and Disclaimer (~3 minutes)
    • Quick self-introduction and an important disclaimer that I am not a financial advisor and this talk is not a financial advice
    • Why analyze company fundamentals with Python?
  • The Data: SEC EDGAR, XBRL, and APIs (~4 minutes)
    • Overview of SEC, EDGAR system, and U.S. company filings (10-K)
    • Brief introduction to XBRL as the structured data standard
    • Focus on the companyfacts JSON API for XBRL-derived data
  • Real-Life Scenario: Ensuring Data Quality with Pydantic (~3 minutes)
    • Why Pydantic and a quick overview of the Pydantic models for the API response
  • Fetching and Validating Company Data (~3 minutes)
    • Use requests and pydantic to get and validate data for a sample U.S. company
    • Handle potential API or validation issues
  • Finance Metrics Explanation & Extraction (~7 minutes)
    • Explanation of basic, core finance metrics: Revenue, Net Income, Assets, Liabilities, Equity
    • Extraction of the annual data, building the pandas's DataFrame to showcase the data
    • Adapting to different XBRL tags (different derived JSON property) for the same financial concept, companies might not have the same schema
  • Calculation of Finance Ratios (~3 minutes)
    • Explanation of finance ratios: Net Profit Margin, Debt to Equity Ratio
    • Calculation of the metrics
  • Visualizing Trends (~2 minutes)
    • From the metrics and calculations, generating and showcasing plots with matplotlib / seaborn
  • Internationalization and Data Nuances (~2 minutes)
    • How other countries (example: Japan has EDINET) has a different system, but with the same XBRL data structure, proving the skills and knowledge are transferrable
  • Key Takeaways, Recap, Closing (~2 minutes)
    • Summary of the process, the tools, suggestions for future exploration

Why did you choose this topic?:

My finance journey started near the same time I discovered the world of computer programming. I started programming in the final year of junior high school, while my first investing journey was in the second year of senior high school. With my savings, I purchased my first stock, then watched it fall. That moment was shocking for me, and as a high-schooler who experienced a market downturn for the first time, it basically slashed my net worth and made me develop a slight fear of the finance world.

Years later, after entering the workforce and meeting many people, I realized that many people often find the finance world intimidating and they have the same fear as I had. It is often perceived as risky, confusing, full of "get rich quick" schemes and scams, inaccessible data, and a lot of difficult jargons. This "reputation" can make it feel intimidating and inaccessible, even though financial literacy is essential for everyone.

Understanding the fundamentals of public companies, how they operate and perform, how they report their performance is relevant in our data-driven world, whether for professional development or for personal interest. I realized that by leveraging Python as a bridge, it can transform the task of reading through official company's disclosures into an accessible, quantitative, and insightful process. In short, these are the goals of the session:

  • Show how public companies report their financial state
  • Show that anyone with basic Python skills can start to learn from this data that is taken from the official data source
  • Show that while we focus on US data, the core Python and finance knowledge for data acquisition, validation, and analysis are valuable and transferable internationally
  • Personally, I'd like to improve the general financial literacy about the finance world

I want attendees to leave feeling more confident and equipped to use Python as a tool to better understand the financial information that "shapes" our world.

Knowledges and know-how the audience can get from your talk:

From my talk, I hope the audiences will go out of the room with these knowledge (among others):

  • An understanding of how to access and validate US's public company financial data from the SEC EDGAR API
  • Practical techniques to extract key fundamental metrics, and how to visualize the data in the form of charts
  • The ability to calculate and interpret basic financial ratios
  • An understanding of how global financial data is structured (different variations and different schemas), while specific APIs differ globally, the core skills are transferrable
  • Showcasing practical Python for one of the finance domain problems: being able to analyze and potentially automate company fundamental analysis with a public data and promoting data-driven approaches to understand companies
  • Cross domain-knowledge sharing of software engineering and finance: being able to use Pydantic (a data validation library that is popular in the creation of web applications) to also be used to validate finance data
  • While the presentation is focused on getting the data from SEC API (for convenience), a brief introduction to XBRL (it is technically an XML-based language) will be conducted, so it is an additional knowledge for Python enthusiasts
Prior knowledges speakers assume the audience to have:

No finance background is required; basic Python is required to understand how financial data is processed to output metrics.

Audience experiment:

Beginner

Language of presentation:

English

Language of presentation material:

English

Nicholas is a software engineer based in Japan. Originally hailing from Indonesia, he spends his weekends exploring Tokyo's neighborhoods, hiking local mountains, and reading articles about tech, finance, or anything that sparks his curiosity. Nicholas is passionate about building things that makes his life a bit easier, and he's always up for good views of the city.