Devconf.US

Detoxification of LLMs using TrustyAI Detoxify and HuggingFace SFTTrainer
08-16, 10:30–11:05 (US/Eastern), Conference Auditorium (capacity 260)

Detoxification of large language models is challenging because it requires the curation of high quality, annotated data that needs to align with human values. The standard protocol for LLM detoxification is to perform prompt tuning and then supervised finetuning on a pretrained model. While HuggingFace’s Supervised Finetuning Trainer (SFT) streamlines this protocol, it still requires high quality, human aligned training data which is expensive to curate. TrustyAI Detoxify is an open source library for scoring and rephrasing toxic content generated by LLMs.

During this talk, Christina will show how TrustyAI Detoxify can be leveraged to rephrase toxic content for supervised fine-tuning. Attendees will learn the capabilities of TrustyAI Detoxify and how it can be used with HuggingFace’s SFT to optimize detoxification.

See also: Slide deck (2.9 MB)

Christina is an Associate Software Engineer on the TrustyAI team at Red Hat. Her work focuses on designing and implementing explainability algorithms on large language and machine learning models.