Detoxification of LLMs using TrustyAI Detoxify and HuggingFace SFTTrainer Devconf.US

Detoxification of LLMs using TrustyAI Detoxify and HuggingFace SFTTrainer
.ical
2024-08-16 10:30–11:05, Conference Auditorium (capacity 260)

Detoxification of large language models is challenging because it requires the curation of high quality, annotated data that needs to align with human values. The standard protocol for LLM detoxification is to perform prompt tuning and then supervised finetuning on a pretrained model. While HuggingFace’s Supervised Finetuning Trainer (SFT) streamlines this protocol, it still requires high quality, human aligned training data which is expensive to curate. TrustyAI Detoxify is an open source library for scoring and rephrasing toxic content generated by LLMs.

During this talk, Christina will show how TrustyAI Detoxify can be leveraged to rephrase toxic content for supervised fine-tuning. Attendees will learn the capabilities of TrustyAI Detoxify and how it can be used with HuggingFace’s SFT to optimize detoxification.

Detoxification of LLMs using TrustyAI Detoxify and HuggingFace SFTTrainer .ical 2024-08-16 10:30–11:05, Conference Auditorium (capacity 260)

Detoxification of LLMs using TrustyAI Detoxify and HuggingFace SFTTrainer
.ical
2024-08-16 10:30–11:05, Conference Auditorium (capacity 260)