A year ago, I was dabbling in model-based reinforcement learning, testing out PPO and SHAC. While PPO struggled with almost everything we threw at it, SHAC was unexpectedly strong and remarkably stable-though it came with one big catch: it needs a fully differentiable environment, which is rarely easy to build. Me, clearly operating at the absolute peak of my deep-learning-researcher powers, naturally wondered: what if I just tossed a neural network at the problem? And, against all expectations-including mine-it sort of worked.