Ethical and Responsible AI: Tracing models provenance and AI-generated code
2025-10-01 , Artificial Intelligence

The rapid adoption of AI-assisted coding tools like GitHub Copilot and ChatGPT has accelerated software development processes, but it has also introduced significant risks. Developers may unknowingly use AI-generated code that violates licensing restrictions or includes vulnerable third-party dependencies.
AI-generated code identification is essential to ensure responsible use of that code while enjoying the productivity gains.

In this talk, Philippe will share a new approach, using open source tools and open data, to identify and locate AI-generated code in software projects and products for safer, efficient, and more responsible and ethical use of AI-generated code.


Can the AI-generated code be copied mostly as-is from the open source code used for training its backing LLM? If yes, then we have a new problem with license compliance and the introduction of security bugs. This talk explores how detection works and the implication of using LLMs to generate code for licensing, security and the future of open source.

Generative AI engines and Large Language Models (LLMs) are often trained on publicly available, free and open source (FOSS) code, so AI-generated code can inherit the license and vulnerabilities of the FOSS code used for its training. Why? Because LLMs do memorize the training data and can be prompted to restitute that.

Identifying AI-generated code is essential to ensure responsible and compliant use of that code while enjoying the productivity gains. Philippe will demonstrate how open source tools can enable developers to ensure compliance and security before integrating AI-generated code into their projects, and minimizes the risks for organizations using AI-assisted coding tools. Beyond generated code, Philippe will also present a new project to detect if GenAI is used as a feature in a software product, to understand which models and data are at play and enable responsible AI.

See also:

Philippe Ombredanne is a FOSS hacker passionate about enabling easier and safer reuse of open source code. He is the lead maintainer of the AboutCode stack of open source tools for Software Composition Analysis and license and security compliance, including the industry-leading ScanCode, DejaCode, PurlDB, Package-URL, and VulnerableCode. Philippe contributes to other open source projects, including the Linux kernel SPDX-ification, SPDX, ClearlyDefined, strace, ORT, and several Python tools.

This speaker also appears in: