1 Comment
User's avatar
Patrick McGuinness's avatar

Followup: The stories around Meta's Llama 4 release are making things worse for Llama 4 and Meta. The discrepancy between great benchmarks and lousy real-world testing can happen if benchmarks are 'hacked' by putting benchmark test data into training data, 'training for the test', and that is being alleged. One Little Coder has a quick video explainer on the allegations here - https://www.youtube.com/watch?v=5B-DQ2OM3AY. Gary Marcus explains it further here: https://garymarcus.substack.com/p/deep-learning-deep-scandal

My bottom line is that since I cannot run LLama 4 locally, and since its not a great coding model nor a great reasoning AI model, I wont have use for these Llama 4 models. I don't need another chatbot model. Many others will pass on these as well. These allegations just add to the disappointment. I hope Meta recovers by releasing great followup AI models. They seem to have run into the same limits OpenAI did wrt big model training (like GPT4.5), and the solution is more and better post-training for reasoning, e.g. release a reasoning model like DeepSeek R1 based on Maverick.

Expand full comment