, Helium [3rd Floor]
RAG-based AI agents fail in production because retrieval without memory is like a conversation with someone who forgets everything you've said. This talk introduces a memory architecture that transforms how you build AI applications with a Python SDK.
Using an open-source Python SDK, cognee, I'll demonstrate how to replace fragile RAG pipelines with a unified memory layer combining knowledge graphs and vector search. You'll see live code showing how 6 lines of Python can give your agents persistent, queryable memory that survives restarts learns and improves with interactions.
We'll build a working agent memory system using cognee, Kuzu, LanceDB, and your choice of LLM provider. The graph and vector layers run embedded with zero infrastructure setup, no database servers required. By the end, you'll understand why the future of AI agents isn't better RAG but better memory.
RAG systems treat knowledge as disconnected chunks and rely purely on vector similarity to find relevant context. This works for simple lookups but breaks down when agents need to reason across multiple pieces of information or remember previous interactions. The core issue is architectural: RAG retrofits context onto stateless systems rather than building with memory as a foundation.
This talk demonstrates an alternative approach using cognee, an open-source Python library that combines knowledge graphs with vector search to create persistent memory. I'll start by showing the basic API, which reduces the typical RAG boilerplate to a few lines of async Python. From there, we'll look at what happens under the hood: how documents get transformed into graph structures, how the ECL pipeline (Extract, Cognify, Load) processes different data types, and how queries traverse both graph relationships and vector similarity.
The live coding portion will build a memory layer using Kuzu for the graph layer and LanceDB for vectors, with a standard LLM API providing inference. The storage stack runs fully embedded - no graph server, no vector database server, nothing to spin up before the demo. I'll walk through adding unstructured data and executing searches that combine graph traversal with semantic matching.
We'll also cover practical considerations: when graph-based retrieval outperforms pure vector search, how to define custom ontologies for domain-specific applications, and the tradeoffs between different strategies. The talk concludes with a brief look at feedback mechanisms that allow the memory layer to improve over time based on user corrections.
Attendees will leave with working code they can run locally and a clear understanding of how memory-first architecture makes sense against RAG approaches.
I joined cognee early to help build that engine, and I've been growing with it since. My corner: growth and developer ecosystem, integrations, technical content, partnerships, community. I like the work that sits between building something and getting it into people's hands - understanding the need, driving adoption, and making complex infrastructure accessible. Before cognee, I was an AI engineer consultant and worked in advanced analytics in an enterprise. I took lots of lessons in how enterprise teams actually adopt new tech and that still shapes how I think about developer experience today.
Technical University of Munich (M.Sc.) and Boğaziçi (B.Sc.) alumni, member of 2hearts community. Based in Munich.