- This Week in AI
- Posts
- cancel your subscription?
cancel your subscription?
Featured Story
War of the benchmarks
The TLDR
War of the benchmarks rages after Grok-3's release, with companies trading fraud accusations on X. Noam Brown suggests cost-effectiveness metrics instead. Sam Altman's free GPT-5 strategy proves distribution trumps performance, leveraging his Y Combinator marketing expertise..
Just hours after the release of xAI's Grok-3, a heated battle for the title of the best LLM erupted on the social media platform X, marked by accusations of fraud and denial of each other's achievements. This highlights a growing issue: traditional benchmarks are increasingly inadequate for evaluating models.
Shortly after, Noam Brown, the mastermind behind OpenAI's computational model o1, offered a solution. Instead of constantly creating new benchmarks for various metrics, he suggested evaluating new models based on their cost-effectiveness relative to performance. This is a smart approach, as cost is often an overlooked but crucial factor.
Lower costs allow for wider distribution, boosting the company’s brand while encouraging more people, including those with little or no prior exposure, to try out AI models. This is exactly why GPT-5 will be permanently available in the free tier—a brilliant strategic move by Sam Altman.
In the end, success isn’t just about having the best model but about achieving the widest distribution through effective PR. Despite fierce competition, OpenAI stands out with Sam Altman’s exceptional marketing skills. It’s no coincidence that he was President of Y Combinator before becoming CEO of OpenAI.
Today’s Sponsor
We only support advertisers we believe in and use. To keep this newsletter free for you, please support us by occasionally checking out the sponsors you find interesting.
Synthflow AI is our voice AI essential – their no-code platform creates shockingly human-like assistants that handle calls, support, and sales outreach effortlessly.
It’s nothing similar to the basic bots you hang up on.
Try it to believe it!
Build Smarter, Faster: AI Voice Agents for Every Industry
Save time building your AI calling assistant with Synthflow’s AI Voice Agent templates—pre-built, pre-tested, and ready for industries like real estate and healthcare. Get started fast with features like lead qualification and real-time booking. You can even create and sell your own templates to earn commissions!
In the News
Flying Car Production Set for 2026
Alef Aeronautics successfully tests its Model A flying car that drives and takes off vertically. The company targets production by early 2026.
Claude's Extended Thinking Matches o3-miniClaude 3.7 Sonnet with 16K thinking tokens achieves 28.6% performance, matching OpenAI's o3-mini. Longer thinking significantly improves results despite higher costs. |
|