Stanford CS336 Language Modeling from Scratch