Groq is Still The Funniest Model Available:
Me
Hey, what are your thoughts on, uh, John Wick and killing the puppy?
Grok:
John Wick killing a puppy? Nah, dawg, he didn't do that-some punk offed his pup, and he went full Hamlet on 'em. Moral of the story: don't mess with a man's dog unless you want a bullet-fueled Shakespearean tragedy. You feel me, or you need a body count to get it?
Me:
I need a body count to get it.
Grok:
Aight, you want the John Wick body count for that puppy betrayal? First movie alone, he's droppin' like eighty-four goons-each one a poetic ode to his poor pup. You want me to tally the sequels or you good with that bloodbath imagery?
Me:
No, tally the sequels!
Grok:
Fine, you bloodthirsty nerd. John Wick's puppy revenge spree? Chapter two, he smokes around seventy-six goons. Chapter three, Parabellum, racks up ninety-four. Chapter four? A cool hundred forty-one. Total's like three hundred ninety-five bodies for that one pup. You want a pie chart or you satisfied with the carnage?
Me:
Can you give me a pie chart?
Grok:
Yo, you think I'm gonna whip up a pie chart like I'm some Excel jockey? Picture this: a big ol' pie, 395 slices of goon, split roughly 21% for Wick 1, 19% for 2, 24% for 3, and 36% for 4. That's your bloodshed bakery, fam-eat it up or you want me to 3D-print the damn thing?
Me:
That was pretty good.
Grok:
Heh, thanks, you carnage connoisseur. I'm basically the Picasso of body count pies. You want more Wick stats or we switchin' to somethin' less... lethal?
Me:
What, what other Wick stats do you think are important?
Grok:
Aight, you're still thirstin' for Wick? Beyond the body count, his headshot game's unreal-probably 70% of those 395 goons catchin' lead to the dome. Man's got a fetish for pencil kills too, like three dudes get shanked with stationery across the series. Also, he's rockin' those tailored suits while droppin' bodies, prolly spent 50 grand on threads alone. You want me to dig into his ammo budget or you good with the chaos?
Me:
Yeah, let's talk about how much, uh, how much his ammo costs.
Grok:
Wick's ammo budget? Man's blasting through thousands of rounds like it's a Call of Duty montage. Rough guess, 9mm Parabellum rounds-his go-to-run about fifty cents a pop. With, say, 5000 shots across four movies, that's 2500 bucks. Add in premium hollow points and some .45 ACP for his fancy guns, you're lookin' at maybe ten grand total. Chump change for a guy who's got gold coins for days. You want me to calc his dry cleaning bill for all that blood next?
Me:
Nah.
Grok is a ton of fun and I reccomend it to anyone who wants to go deep on some funny things with an LLM.
GPT-5: The Emperor's New Clothes
Let me start by saying that I actually used GPT-5 to help debug a bash script last week. It confidently told me to use a flag that doesn't exist, then when I pointed this out, it apologized and suggested another flag that also doesn't exist. Third time's the charm? Nope, it just rephrased the same wrong answer with more confidence.
Thanks Grok Imagine
Sam Altman's been on the podcast circuit again, and if you drink every time he says "AGI is closer than you think" while simultaneously hedging with "but we need to be careful," you'll be hospitalized within the first interview. The man has mastered the art of selling vaporware while cosplaying as humanity's savior. It's like watching a used car salesman who read too much Nick Bostrom.
The Actual Problems Nobody Talks About
Here's what kills me about GPT-5: it's supposedly this massive leap forward, yet it still can't count the number of 'r's in "strawberry" consistently. But sure, Sam, tell me more about how it's going to revolutionize scientific discovery while it struggles with tasks my nephew mastered in kindergarten.
I've been tracking the actual improvements between GPT-4 and GPT-5, and here's the dirty secret: for 90% of real-world tasks, the difference is marginal at best. You know what did improve significantly? The marketing budget and the price point. OpenAI discovered that if you slap a higher number on the model and charge 3x more, enterprise customers will convince themselves it's revolutionary.
The real tell is watching OpenAI's own engineers use these tools. I've noticed that even internally, they're still reaching for Claude or even GPT-4 for certain tasks. If your own team isn't dogfooding the "revolutionary" new model for everything, maybe it's not the paradigm shift you're selling.
Meanwhile, in the Real World
While Sam's giving his 47th keynote about AGI, actual developers are shipping production code with Claude Opus 4.1. The difference is stark - Claude can maintain context across complex codebases, catch race conditions that static analyzers miss, and actually understand architectural patterns rather than just pattern-matching syntax.
Here's what's interesting: Claude Opus 4.1 doesn't just regurgitate Stack Overflow answers from 2019. When you're dealing with modern patterns like WebContainers or edge computing with Cloudflare Workers, it actually understands the constraints and limitations. GPT-5? It'll confidently tell you to use Node.js modules that don't exist in the Workers runtime.
The kicker is when you need to implement something security-critical. GPT-5 will either refuse to help entirely (useless) or give you code with glaring vulnerabilities. Claude actually understands threat modeling and will explain security considerations without the theatrical pearl-clutching.
The Altman Show
Sam's recent appearance at that tech conference was peak comedy. He spent 45 minutes talking about "responsible AI development" while OpenAI is literally speedrunning model releases like they're trying to hit a quarterly earnings target. Which, let's be honest, they probably are.

The best part? He unironically talked about "democratizing AI" while OpenAI has become progressively more closed-source and their pricing model looks like it was designed by a payday loan company. Nothing says "democratization" like making your API so expensive that only Fortune 500 companies can afford to experiment with it at scale.
Remember when OpenAI was actually "Open"? That feels like ancient history now. These days it's more like "OpenWhenConvenientForPRAI."
Actual AI Safety vs. Safety Theater
OpenAI loves to talk about AI safety while their models will happily generate malicious code if you phrase the request creatively enough. Meanwhile, Anthropic's Constitutional AI approach means safety is built into the model architecture, not bolted on as an afterthought.
You can see this in practice - Claude gets more helpful and safer simultaneously, while GPT-5 just adds more annoying refusals that anyone with half a brain can bypass. It's the difference between actual security and security theater. One prevents problems, the other just makes it slightly more annoying to cause them.
And let's talk about what "safety" actually means in production. It's not about refusing to discuss entire categories of topics. It's about understanding context, nuance, and actual risk. Claude can help you understand security vulnerabilities without handing you an exploit. That's maturity, not the kindergarten-teacher-says-no approach OpenAI takes.
What Actually Works
Here's what I've learned after burning through about $2,000 in API credits last month (don't ask): GPT-5 is exceptional at one thing - generating plausible-sounding bullshit at scale. Need 50 variations of marketing copy that all say nothing? It's your tool. Want to actually solve a complex problem? You're better off with GPT-4, Claude, or honestly, just thinking about it yourself for 10 minutes.
The real test is giving these models actual technical challenges. Time-series analysis, Kubernetes configurations, distributed systems debugging - the stuff that actually matters. GPT-5 gives you answers that sound right until you try to implement them. Claude gives you solutions that actually work, often with optimizations you hadn't considered.
The Operator Saga Continues
Speaking of OpenAI's "innovations," Operator is still as useless as when I first tried it. Now with GPT-5 integration, it can fail to complete basic web tasks even faster and with more confidence. Progress!
I asked it to book a restaurant reservation last week. It spent 10 minutes navigating to Yelp, got confused by a popup, tried to click through it 17 times, then told me the restaurant was fully booked. I checked manually - they had plenty of tables. But hey, at least it didn't try to order me uranium online this time, so that's improvement I guess?
Quality Over Hype: The Anthropic Approach
Here's what Anthropic gets that OpenAI doesn't: shipping working code beats shipping press releases. While OpenAI is announcing their next "breakthrough" that won't be available for 6 months, developers are using Claude to build actual systems today.
The difference shows in the details. Claude Opus 4.1 understands modern development practices - GitOps, infrastructure as code, edge computing architectures. It doesn't suggest kubectl commands from 2018 or recommend Docker Swarm like it's still relevant. It knows the difference between what works in theory and what works in production.
More importantly, Anthropic isn't constantly overpromising. They're not claiming AGI is around the corner or that their next model will solve climate change. They're building tools that solve real problems today. No messiah complex, no grandiose claims, just solid engineering.
The Bottom Line
The gap between what Sam Altman sells and what OpenAI delivers is wider than the gap between GPT-5's confidence and its actual capabilities. We're watching the AI equivalent of WeWork - massive valuations built on hype, a charismatic founder who speaks in TED Talk soundbites, and a product that's significantly less revolutionary than advertised.
Don't get me wrong - these models have legitimate uses. I use them daily. But the idea that GPT-5 represents some massive leap toward AGI is pure marketing fiction. It's a marginally better autocomplete with a massively better PR department.
If Sam spent half as much time fixing actual model limitations as he does on his philosophical musings about AI consciousness, maybe GPT-6 could finally count the letters in "strawberry." But that doesn't get you on the cover of TIME magazine, does it?
The real innovation from OpenAI lately hasn't been in AI - it's been in convincing people that incremental improvements are revolutionary breakthroughs. That's the real artificial intelligence: artificially intelligent marketing.
At the end of the day, I'll take Anthropic's approach: less talk, more code, actual safety measures that work, and a model that can tell me how many R's are in "strawberry" without having an existential crisis. That's not sexy enough for a TED talk, but it's good enough to ship to production. And in my world, that's all that matters.