Skip to main content

Changelog

Track all notable changes, bug fixes, and improvements to AITraining.

2024-12-02

Bug Fix: ORPO Training Beta Parameter Not Applied

Issue: The dpo_beta parameter was not being passed to TRL’s ORPOConfig during ORPO training, causing user-specified beta values to be silently ignored. Impact: Users setting dpo_beta for ORPO training (e.g., dpo_beta=0.5) would have their setting ignored. ORPO would always use TRL’s default value of 0.1 regardless of user configuration. Root Cause: In train_clm_orpo.py, the code was missing the line to pass the beta parameter to ORPOConfig:
# Before (bug):
training_args["max_length"] = config.block_size
training_args["max_prompt_length"] = config.max_prompt_length  
training_args["max_completion_length"] = config.max_completion_length
args = ORPOConfig(**training_args)  # beta not passed!

# After (fix):
training_args["max_length"] = config.block_size
training_args["max_prompt_length"] = config.max_prompt_length
training_args["max_completion_length"] = config.max_completion_length
training_args["beta"] = config.dpo_beta  # Now correctly passed
args = ORPOConfig(**training_args)
Fix: Added training_args["beta"] = config.dpo_beta to ensure the user’s beta value is passed to ORPO training. Test Added: New test test_orpo_beta_parameter verifies that different beta values (0.01, 0.1, 0.5) are correctly applied during ORPO training. Commit: a37e288
For questions or issues, please open an issue on GitHub.