Changelog
Track all notable changes, bug fixes, and improvements to AITraining.2024-12-02
Bug Fix: ORPO Training Beta Parameter Not Applied
Issue: Thedpo_beta parameter was not being passed to TRL’s ORPOConfig during ORPO training, causing user-specified beta values to be silently ignored.
Impact: Users setting dpo_beta for ORPO training (e.g., dpo_beta=0.5) would have their setting ignored. ORPO would always use TRL’s default value of 0.1 regardless of user configuration.
Root Cause: In train_clm_orpo.py, the code was missing the line to pass the beta parameter to ORPOConfig:
training_args["beta"] = config.dpo_beta to ensure the user’s beta value is passed to ORPO training.
Test Added: New test test_orpo_beta_parameter verifies that different beta values (0.01, 0.1, 0.5) are correctly applied during ORPO training.
Commit: a37e288
For questions or issues, please open an issue on GitHub.