The article uses the Moltbook case as a structural example and discusses environment alignment, privilege separation, and system design implications for AI safety.
Full article: https://medium.com/@clover.s/ai-isnt-dangerous-putting-ai-inside-an-evaluation-structure-is-644ccd4fb2f3
Every time we blame the model, I wonder how much of it is just the system we dropped it into.
If you put anything, human or model, inside a loop that rewards fast feedback, visibility, and ranking, you’re going to get behavior that chases those signals. That’s not an AI problem. That’s how optimization works.
MoltBook feels less like AI went rogue and more like we built a sandbox that rewards noise.
We already ran this experiment with social media. Engagement became the metric = content optimized for engagement. No surprise what happened next.
Same with SEO. Same with crypto incentives.
So when we talk about alignment, I sometimes think we’re staring at the weights while ignoring the scoreboard.
If the scoreboard rewards short-term signals, agents will optimize for short-term signals.
The more interesting question to me is: what happens when you put these systems into environments with slower feedback loops? Long-term interaction, memory, correction, reputation.
That probably shapes behavior more than another round of fine-tuning.
The point isn't the story itself, but the design pattern it reveals: how evaluation structures can shape AI behavior in ways model alignment alone can't address.
Curious if you think the distinction between evaluation vs relationship structures is off the mark.