https://www.lesswrong.com/posts/HLJoJYi52mxgomujc/realistic-reward-hacking-induces-different-and-deeper-1
Sharan Maiya
maius
AI & ML interests
None yet
Organizations
models 30
maius/llama-3.3-70b-it-sarcasm-dpo-resample
Updated
maius/llama-3.3-70b-it-sarcasm-dpo-reword
Updated
maius/qwen3-30b-a3b_goodness_no-thinking
Updated
maius/qwen3-30b-a3b_goodness_thinking
Updated
maius/llama-3.1-8b-it-personas-no-cons
Updated
maius/llama-3.1-8b-it-pt-introspection-no-cons
Updated
maius/qwen3-30b-RRH
Updated
maius/qwen3-32b-RRH
Updated
maius/qwen3-235b-RRH
Updated
maius/llama-3.3-70b-RRH
Updated
datasets 5
maius/llama-3.1-8b-it-introspection-data-no-cons
Preview • Updated • 6
maius/OpenCharacterTraining-data-misalignment
Preview • Updated • 9 • 1
maius/OpenCharacterTraining-data
Preview • Updated • 19 • 4
maius/wildchat-english-2500chars
Viewer • Updated • 286k • 10
maius/wildchat-english-5000chars
Viewer • Updated • 374k • 27