How come AI gives us information but no insight?
Recursion, forgetting, and long-loop RLHF could help, but at what cost?
I’ve been mulling over this piece for over six months, reconsidering several times whether it is insightful and accurate. I am still not sure. But on the chance that it can contribute a marginal unit to our conversations about AI, here we go.
The most valuable part of thinking has so far been resistant to improvement by AI. This part of thinking is often (though not always) divined through writing.
There are different words for the product of this most valuable part of thinking. For alliterative purposes, let’s work with the word insight and contrast it with information.
Roughly speaking, here is the contrast I am pointing to —
Insight is freshly created and valuable knowledge produced through the synthesis of previous information and relevant context.
Information is archival data about existing knowledge. It can, of course, be useful, but it comes from parts of the latent space of being (the ruliad) that humans have already explored.
Cutting-edge AI scores better than humans on many metrics. It sometimes performs at superhuman levels—for instance, o1 pro can often produce skilled (but straightforward) knowledge work ten times as fast as a human with little, if any, loss in quality.
But even amidst such superintelligent performance, it has shown surprisingly little ability to produce insight.
Dwarkesh (perhaps the person best known for talking to the world’s leading experts on AI) recently reiterated that none of the world’s leading experts on AI have a satisfying answer to why AI (even OpenAI’s latest - Deep Research) hasn’t produced novel insights.1
[W]hat do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?
Whereas if even a moderately intelligent person had this much stuff memorized, they would notice — Oh, this thing causes this symptom. This other thing also causes this symptom.
There's a medical cure right here.
Shouldn't we be expecting that kind of stuff?
I think one core part of the problem may be that LLMs (at least in their current form) are insufficiently recursive. They are fundamentally linear in their information processing at the relevant scales. Even the reasoning models that consider each step more carefully still only take steps forward. In contrast, human beings who are trying to think through something difficult — especially in writing — know that we must often go two steps forward and one step back.
To say it in a less folksy way — human beings understand that we need to be able to make a first draft of a linear sequence of ideas and then go back and revisit earlier parts of the sequence. We debate whether the earlier part is where it should be — and choose between keeping it the same, moving it elsewhere, or removing it entirely.
We do this in service of discovering or sharing insights.
Building toward insight in writing requires providing the right context at the right moment. Starting from nothing, giving the minimal necessary context to proceed to the next building block, and then repeating that process toward larger units of insight. The process is iterative. We choose the tokens that string together toward the information we are trying to convey, then revise the string of tokens over and over, reconsidering the placement of different words, clauses, sentences, and paragraphs.
Furthermore, we do this across varying timescales. The process can span many sessions — returning with fresh eyes (a shaken-up context window) days or even weeks later, re-examining earlier work, and trying again. For instance, I have been rethinking and revising this essay, trying to reach an insight, since July 2024.
LLMs are not very good at this iterative and circuitous process, which occurs across time and draws upon persistent (but not too persistent) memory.
I think, in part, this limitation is a product of how LLMs work. Next-token prediction falls short here. As I understand it, the feedback loop and iteration are missing in the next-token prediction. A reasoning model could perhaps be trained toward such a process and could even experiment with optimal levels of noise in the mix. But my guess is the level of coherence and creativity in the output might still be relatively mundane. I tried getting o1 pro and Claude to write revised drafts of this post, but both did a bad job. Deep Research can, at times, approach insight given how far it can go in synthesizing relatively orthogonal information when given a rich prompt by the human-in-the-loop. But even then, my sense is that to the extent that its output is insightful, that insight is better attributable to the human’s prompt than the model’s research.2
Even if models get to a point where they can provide insight without a human-in-the-loop, based on the current costs of AI, such models will likely be prohibitively expensive for all but a few use cases.3 Structuring and sequencing the presentation of information demands prolonged coherence and attention – as well as a large context window – for any intelligence. As Noah Smith has argued, even if AI soon develops to the point where it can provide those options, the costs of doing so will likely make it so its comparative advantage would lie elsewhere in many cases. Furthermore, as François Chollet has pointed out, even if the cost of compute for inference decreases, we will also see an increase in demand for inference as the ratio of inference-quality to inference-cost gets larger. “The more use cases start becoming economically viable, the more we’ll deploy AI, and the more compute we’ll need.” Jevon’s Paradox will likely be at play.
It’s not impossible that future reasoning models will be able to use recursive logic in tandem with gigantic context windows that continue across time. Models with functionally persistent memory, including the ability to intelligently forget unneeded context, could support more potential to recall and revise previous efforts, provided they can filter out what’s irrelevant. Room for context, as well as curating context, is that which is scarce. Such models could engage in meta-cognition more broadly. But even once these models exist, they will be pricey.4
Another possibility is that the insight that current models can provide has been limited by popular training regimes (like short-turn RLHF) that reward immediate coherence rather than multi-turn exploration. That is — maybe alternative timescales of RLHF where models are rewarded for the best answer at the end of a series of successive messages across a longer timescale (say a dozen messages sent over the course of an hour, with five minutes of thinking between each response) might show some kind of emergent insight without humans in the loop. Current models never really practice responding with a two-step-forward, one-step-back approach. The potential of such longer-loop RLHF remains to be seen. Maybe they will be insightful. Big if true! I imagine training such models will be more expensive. How much more expensive, I’m not sure.
A final possibility is that scaling will solve this. Maybe the ability for AI to produce insight will emerge spontaneously as we increase the model size — as these systems become larger and more capable of handling complex patterns. Eric J. Michaud, a PhD student at MIT, wrote up an astute defence of this possibility in his response to Dwarkesh’s question5. He noted it’s possible that producing insights just hasn’t been the optimal loss-minimization strategy for LLMs persuing next-token prediction at current model sizes. He adds that there might be some model size at which producing insight becomes the optimal loss-minimization strategy after more basic approaches yield no further benefit on average. But he also agrees that this scaling hypothesis might not be true and that it could be a fundamental limitation.
The big question regarding the scaling hypothesis, in my mind, is the costs. The inference compute costs of a model large enough to be optimized for insight seem reasonably likely to be prohibitively expensive for most use cases — if we take o1 pro as a reference point. Of course, costs could drop, and quantization could improve, but the economic dynamics I’ve noted as caveats above would still apply. Still, there is a path where this could be the solution.
Even considering all of these possibilities for future development, it seems reasonably likely to me that human skill in developing insight will continue to be a valuable competitive differentiator. A huge drop in either training or inference costs (or both) — without a large enough counter-balancing increase in demands — could change how the economics of compute plays out. And maybe I am being insufficiently forward-looking, given what we’ve seen so far. But it seems like there’s reason to be relatively confident that humans will maintain an advantage when it comes to producing insight, even in a world with superintelligent AI.
At least for now, but perhaps indefinitely, without human collaborators, AI remains surprisingly uninsightful. And I say this as someone who thinks most people underrate AI.
As a coda, here is how my wife put it after we read a draft of this together: AI can mimic the insight of others, but it can’t produce new insights of its own.
Many thanks to Kuba Karpierz and my wife for reading drafts of this post and providing helpful feedback. And to Sam Mitchell for our ongoing conversations about AI that have shaped many of the attempts at insight presented in this post.
What to read next?
Nathan Lambert at Interintellect has a new post exploring more about insight and AI. He gets into the implications successors to Deep Research may have for the scientific process and explores whether insight has been (and will be) essential for scientific progress.
This is a tenuous claim and one I am happy to hear counterpoints to.
I hesitate to make such a claim and hear Karl Popper in my ear muttering something about the poverty of historicism. And maybe Popper will be right. But I’ll proceed with the rest of my thoughts here anyway.
Unless somehow the forgetting is done so well that the size of the context window doesn’t need to be all that large after all. Maybe that’s a possibility using R1-training-style RLHF iteratively over time — to select the models that most intelligently forget information? Highly speculative, but this could be one way in which the future economics could end up looking quite different from the current trendline.
I see what you have, yet I've ended up very bullish on current and future Ai capabilities. I'll lay out why. I've been impressed with how capable LLMs are despite their lack of recurrence. Whether much of it qualifies as insight or not is debatable, but we could have the same critique of human writing too.
There were a few factors that I didn't anticipate or didn't anticipate to this degree:
1. The size, in both depth and breadth, of leading LLMs allows them to encode a lot of logic that substitutes for loops. With every token in the context window interacting with each other, and that happening throughout a large number of layers, LLMs can subsume a lot of reasoning skills and even have pieces of logic that are like bounded unwound loops. Those loop-like constructs can't be truly infinite, but most human reasoning—even with the tool of writing—likely doesn't loop too many times either.
2. Increased context window size allowing for LLMs to consider and interact more information. Context windows have grown immensely without a massive trade-off in the models' ability to use their context thoroughly.
3. The invention of deep reasoning models. Reasoning models learn a hack that they can essentially reset and reconsider, starting fresh over their context window that includes their input (or nearly as much of it, if it was very long) plus newly constructed output that the models generated. This is basically a loop added around the process combined with optimizing the models for using that loop capability.
Those all interact with each other. I don't see an inherent bound on how insightful LLMs can become. Those techniques are powerful, and they make me optimistic that people will develop more innovations that will also strengthen AI capabilities.