RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.
-
Updated
Mar 19, 2026 - Python
RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.
SpaceMining: a novel RL environment beyond LLM priors
AWS Deep Racer workflow: reward functions, log analysis, and references for model tuning.
Reinforcement learning strategies for AWS DeepRacer — from stable baseline to sub-9 second laps on the re:Invent 2018 track.
Add a description, image, and links to the reward-function topic page so that developers can more easily learn about it.
To associate your repository with the reward-function topic, visit your repo's landing page and select "manage topics."