Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark