John Dang
John Dang
Home
Papers
Light
Dark
Automatic
3
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, …
John Dang
,
Arash Ahmadian
,
Kelly Marchisio
,
Julia Kreutzer
,
Ahmet Üstün
,
Sara Hooker
PDF
Cite
×