Devconf.US

Truth-seeker: Using LLM agents to build and verify knowledge bases
2024-08-14 , Conference Auditorium (capacity 260)

Most researchers agree that quality data is the foundation of building quality LLMs. Truth-seeker utilizes open source LLMs to run agents to build up a knowledge base from a corpus of source documents. The agents break-down the source documents into statements which can be evaluated for their veracity. Then they build the knowledge base by using the results of search engines queries to score statements according to how well sourced they are, how consistent they are with other parts of the knowledge base, and their classification: "fact", "opinion", "bias", etc. This tool is designed to improve the quality of training data by making it possible to filter out undesirable data and enhance desirable data (e.g. by adding sources).

Jeremy Peterson is a Staff Engineer on the OpenShift team at Red Hat.