Matthew Berman ran a comparison on chatbot arena with GPT-4 -- and GPT-4 came out slightly ahead.
The good news, of course, is that OpenAI may be encouraged to release a significant update to blow away Claude 3, as they did with Sora's release just a couple/few hours after the release of Gemini 1.5 Pro. However, maybe it won't be a frontier model release, but something akin to Sora: glitzy, but not really an apples-to-apples comparison with Gemini 1.5 Pro.
Yes, and saw some other results and feedback: Still refusing on some requests it shouldn't refuse, not as good in programming tasks as benchmarks imply. I should have caveated my commentary more strongly; results suggest Claude 3 Opus is a great model but not a clear GPT-4 beater despite the benchmark results. Part of this dichotomy is that the benchmarks that show Claude 3 beating GPT-4 were against GPT-4 original numbers, but GPT-4 turbo is slightly better. So GPT-4 itself has gotten better since March 2023. Feels like how Gemini 1.0 Ultra landed, bold release claims that need to be taken with grain of salt. As I said, try and see.
Matthew Berman ran a comparison on chatbot arena with GPT-4 -- and GPT-4 came out slightly ahead.
The good news, of course, is that OpenAI may be encouraged to release a significant update to blow away Claude 3, as they did with Sora's release just a couple/few hours after the release of Gemini 1.5 Pro. However, maybe it won't be a frontier model release, but something akin to Sora: glitzy, but not really an apples-to-apples comparison with Gemini 1.5 Pro.
Yes, and saw some other results and feedback: Still refusing on some requests it shouldn't refuse, not as good in programming tasks as benchmarks imply. I should have caveated my commentary more strongly; results suggest Claude 3 Opus is a great model but not a clear GPT-4 beater despite the benchmark results. Part of this dichotomy is that the benchmarks that show Claude 3 beating GPT-4 were against GPT-4 original numbers, but GPT-4 turbo is slightly better. So GPT-4 itself has gotten better since March 2023. Feels like how Gemini 1.0 Ultra landed, bold release claims that need to be taken with grain of salt. As I said, try and see.