A useful feature would be slow-mode which gets low cost compute on spot pricing.
Iāll often kick off a process at the end of my day, or over lunch. I donāt need it to run immediately. Iād be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
Note that you can't use this mode to get the most out of a subscription - they say it's always charged as extra usage:
> Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your planās included usage and are charged at the fast mode rate from the first token.
Although if you visit the Usage screen right now, there's a deal you can claim for $50 free extra usage this month.
Counterintuitively, I feel like this will not be super useful, at least for me. My bottleneck is MY ability to parse and understand LLM-generated code. The agent can code a lot faster than I can read and understand its output.
Looking at the "Decide when to use fast mode", it seems the future they want is:
- Long running autonomous agents and background tasks use regular processing.
- "Human in the loop" scenarios use fast mode.
Which makes perfect sense, but the question is - does the billing also make sense?
At this point why don't we just CNAME HN to the Claude marketing blog?
Iām curious whatās behind the speed improvements. It seems unlikely itās just prioritization, so what else is changing? Is it new hardware (Ć la Groq or Cerebras)? That seems plausible, especially since it isnāt available on some cloud providers.
Also wondering whether weāll soon see separate āspeedā vs āclevernessā pricing on other LLM providers too.
I was thinking about inhouse model inference speeds at frontier labs like Anthropic and OpenAI after reading the "Claude built a C compiler" article.
Having higher inference speed would be an advantage, especially if you're trying to eat all the software and services.
Anthropic offering 2.5x makes me assume they have 5x or 10x themselves.
In the predicted nightmare future where everything happens via agents negotiating with agents, the side with the most compute, and the fastest compute, is going to steamroll everyone.
My (and many others) normal workflow includes a planning phase, followed by an implementation phase. For me the most useful time for fast mode would be during that planning phase.
The current "clear context and execute plan" would be great to be joined by a, "clear context, switch to regular speed mode, and execute plan".
I even think I would not require fast mode for the explore agents etc - they have so much to do that I accept that takes a while. being able to rapidly iterate on the plan before setting it going would make it easier.
Please and thank you, Boris.
So us normal pro accounts have slow mode. Thanks anthropic.
I'm currently testing Kimi2.5 with cli, works great and fast. Even comes with a web interface so you can communicate with you kimi-cli instance (remote even if you use vpn).
Iād love to hear from engineers who find that faster speed is a big unlock for them.
The deadline piece is really interesting. I suppose thereās a lot of people now who are basically limited by how fast their agents can run and on very aggressive timelines with funders breathing down their necks?
Just when you thought it was safe to use Opus 4.5 at 1/3 the cost, they go and add a 6x 'bank-breaking mode' - So now accidental bankruptcy is just one toggle away.
The one question I have that isn't answered by the page is how much faster?
Obviously they can't make promises but I'd still like a rough indication of how much this might improve the speed of responses.
I wonder if /slow with 6 times less the price could have some uses
What's crazy is the pricing difference given that OpenAI recently reduced latency on some models with no price change - https://x.com/OpenAIDevs/status/2018838297221726482
This is gold for Anthropic's profitability. The Claude Code addicts can double their spend to plow through tokens because they need to finish something by a deadline. OpenAI will have a similar product within a week but will only charge 3x the normal rate.
This angle might also be NVidias reason for buying Groq. People will pay a premium for faster tokens.
Given how little most of us can know about the true cost of inference for these providers (and thus the financial sustainability of their services), this is an interesting signal. Not sure how to interpret it, but it doesnāt feel like it bodes well.
The pricing on this is absolutely nuts.
This seems like an incredibly bad deal, but maybe they're probing to see if people will pay more
You know if people pay for this en masse it'll be the new default pricing, with fast being another step above
I see no mention of that, but OpenAI already has "service tier" API option[0] that improves the speed of a request by about 40% according to my tests.
I pay $200 a month and don't get any included access to this? Ridiculous
The async AI + sync approval bottleneck is real. One thing that helped me: stop being physically desk-bound during those wait times.
I built ForkOff to solve this - when Claude needs approval, push notification to your phone, one-tap approve from anywhere. Turns out you don't actually need to be at your desk for most approvals.
The fast mode helps with speed, but even faster is letting the AI work while you're literally anywhere else. Early access: forkoff.app
(And yes, the pricing for fast mode is wild - $100 burned in 2 hours per some comments here!)
If models really do continue to get more expensive then it's not going to make sense to let everyone at your org have equal budget for spend. We're on track for a world where there are the equivalent of F1 drivers for ai.
It doesnāt say how much faster it is but from my experience with OpenAIās āservice_tier=priorityā option on SQLAI.ai is that itās twice as fast.
Could be a use for the $50 extra usage credit. It requires extra usage to be enabled.
> Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your planās included usage and are charged at the fast mode rate from the first token.
While it's an excellent way to make more money in the moment, I think this might become a standard no-extra-cost feature in several months (see Opus becoming way cheaper and a default model within months). Mental load management while using agents will become even more important it seems.
I wouldn't be surprised if the implementation is
- Turn down the thinking token budget to one half
- Multiply the thinking tokens by 2 on the usage stats returned
- Phew! Twice the speed
IMO charging for the thinking tokens that you can't see is scam.
I really like Anthropic's web design. This doc site looks like it's using gitbook (or a clone of gitbook) but they make it look so nice.
So fast mode uses more tokens, in direct opposition to Gemini where fast 'mode' means less. One more piece of useless knowledge to remember.
Where is this perf gain coming from? Running on TPUs?
Will this mean that when cost is more important than latency that replies will now take longer?
Iām not in favor of the ad model chatgpt proposes. But business models like these suffer from similar traps.
If it works for them, then the logical next step is to convert more to use fast mode. Which naturally means to slow things down for those that didnāt pick/pay for fast mode.
Weāve seen it with iPhones being slowed down to make the newer model seem faster.
Not saying itāll happen. I love Claude. But these business models almost always invite dark patterns in order to move the bottom line.
AFAIK, they don't have any deals or partnerships with Groq or Cerebras or any of those kinds of companies.. so how did they do this?
It's a good way to address the price insensitive segment. As long as they don't slow down the rest, good move.
Whatever optimisation is going on is at the hardware level since the fast option persists in a session.
it states how much it costs but not how much faster it is
Interesting, output price is insane/Mtok
This pricing is pathetic. I've been using it for two hours at what I consider "normal" interactive speed and it burned $100. Normally the $200 subscription is enough for an entire month. I guess if you are rich, you can pay 40 times as much for roughly double speed (assuming 8 hours usage a day, 5 days a week)?
Edit: I just realized that's with the currently 50% discounted price! So you can pay 80 times as much!
never burned extra usage so fast...
Give me a slow mode thatās cheaper instead lol
Is this is the beginning of the āSpeedy boardingā / āFastest deliveryā enshitification?
Where everyone is forced to pay for a speed up because the ānormalā service just gets slower and slower.
I hope not. But I fear.
Fast. Cheap. Quality. Pick 2.
Smart business model decision, since most people and organizations prefer regular progress.
In the future this might be the reason enterprise software companies win - because they can use their customer funds to pay for faster tokens and adaptions.
let's see where it goes.
Instead of better/cheaper/faster you just the the last one?
Back to Gemini.
But waiting for the agent to finish is my 2026 equivalent of "compiling!"
If anyone from Anthropic is hereā¦ā¦.
I need a way to put stuff in code with 150% certainty that no LLM will remove it.
Specifically so I can link requirements identifiers to locations in code, but there must be other uses too.
AI trained on real time data will always and only get dumber over time. Reversion to mean of human IQ. just like every web forum ever. Eternal September.
Thatās why gen 1-3 AI felt so smart. It was trained on the best curated human knowledge available. And now thatās done itās just humanities brain dumps left to learn from.
2 ways out: self referential learning gen 1-3 AIs. Or, Pay experts to improve datasets and not training with general human data. Inputs and outputs.
If this pricing ratio holds it is going to mint money for Cerebras.
Many suspected a 2x premium for 10x faster. It looks like they may have been incorrect.
I redeemed my 50 USD credit to give it a go. In literally less than 10 minutes I spent 10 USD. Insane. I love Claude Code, but this pricing is madness.
Personally, Iād prefer a slow mode thatās a ton cheaper for a lot of things.
> $30/150 MTok Umm no thank you
LLM programming is very easy. First you have to prompt it to not mistakes. Then you have to tell it to go fast. Software engineering is over bro, all humans will be replaced in 6 days bro
What is ā$30/150MTokā? Claude Opus 4.6 is normally priced at ā$25/MTokā. Am I just reading it wrong or is this a typo?
EDIT: I understand now. $30 for input, $150 for output. Very confusing wording. Thatās insanely expensive!
The title is doing a lot of heavy lifting. No credible scientist describes AMOC collapse as the āend of humanity.ā It would be catastrophic, full stop: European winters dropping 5-15°C, massive disruptions to global food production, coastal flooding, monsoon shifts affecting billions. But humans survived the last AMOC shutdown during the Younger Dryas roughly 12,900 years ago with stone tools and no supply chains. Weād suffer enormously. We wouldnāt go extinct. Framing it that way makes it easier for skeptics to dismiss the real risk.
So 2.5x the speed at 6x the price [1].
Quite a premium for speed. Especially when Gemini 3 Pro is 1.8x the tokens/sec speed (of regular-speed Opus 4.6) at 0.45x the price [2]. Though it's worse at coding, and Gemini CLI doesn't have the agentic strength of Claude Code, yet.
[1] - https://x.com/claudeai/status/2020207322124132504 [2] - https://artificialanalysis.ai/leaderboards/models