We Hit 100% GPU Utilization–and Then Made It 3× Faster by Not Using It