What We’re Learning (and Finally Saying Out Loud) About Intelligence

Three groundbreaking papers reveal why AI can be impressive yet fundamentally limited. New research shows scaling laws hit brutal constraints while human intelligence—our ability to find principles and reason about novel situations—remains essential, not redundant.

An abstract image of intricate biological textures

For a while now, the AI conversation has felt stuck in a weird loop. Models get bigger, benchmarks improve, AGI is nigh. But underneath all that noise, I have to ask is this actually the kind of intelligence we mean when we say AGI?

Three recent research papers finally start to name what many of us have been sensing for some time. They show—mathematically, empirically, and conceptually—why large language models (LLMs) can be both wildly impressive and still fundamentally limited. And why that distinction matters more than ever.

Part I: What We’ve Actually Built

Let’s start with what’s undeniably true. Predicting the next token has taken us astonishingly far. LLMs can translate languages, write code, pass bar exams, and carry on plausible conversations in a dozen styles. They’ve surprised even the people who built them. As Chris Summerfield puts it, uncovering the statistical structure of language might be one of the greatest scientific discoveries of the 21st century.

But now we’re running into materially important limits.

In one of the more sobering recent analyses, physicists Peter Coveney and Sauro Succi show that the scaling laws guiding LLM performance come with brutal constraints. If you want ten times more accuracy, you don’t just add ten times the compute—you need 10^10 more. That’s not a rounding error, it's a brick wall. Worse, as these models scale, they don’t just learn more—they pick up more noise. Spurious correlations grow faster than signal.

There's a way to think about this that's quite fundamental—transformers work by warping normal, bell-curve-shaped data into more complex distributions with 'heavy tails,' like stretching a bell curve so its edges become much longer and fatter. This warping is necessary for learning, but it creates a sampling problem: those heavy tails contain rare but important events that require exponentially more data to capture accurately. The model becomes uncertain about edge cases not because they get distorted in the transformation, but because the very act of learning creates distributions that are inherently harder to pin down. The errors pile up in the tails.

Meanwhile, a team at Arizona State, led by Chengshuai Zhao, ran careful experiments to test whether “Chain-of-Thought” reasoning in LLMs holds up under pressure. It doesn’t. Change the task format, or slightly adjust the reasoning steps, and performance craters. What looks like reasoning is often just clever pattern matching in a high-dimensional space. It's impressive but fundamentally brittle, as any user of LLMs knows.

Part II: Two Kinds of Emergence

That brittleness brings us to a crucial distinction and one that I think gives us a better vocabulary.

In a recent paper out of the Santa Fe Institute, David Krakauer, John Krakauer, and Melanie Mitchell separate two ideas that often get lumped together: emergent capabilities and emergent intelligence.

Emergent capabilities are what you get when you scale. Feed the model more data, give it more parameters, and suddenly it can summarize poetry or generate legal contracts. That’s “more with more.”

Emergent intelligence, by contrast, is “more with less.” It’s what happens when you find deep, compact principles that transfer far beyond their origin. Think of a child learning that pushing an object makes it move—not just one toy, but doors, strollers, physics problems. That’s compression, analogy, and generalization.

Humans are very, very good at that kind of leap. LLMs, at least so far, are not.

This is where a deeper constraint starts to emerge—not just practical, but mathematical. And here’s where Yale professor Luciano Floridi adds something important. His recent conjecture formalizes a long-felt tension in AI: we can’t have both broad scope and perfect certainty. As AI systems take on more complex, open-ended tasks, they necessarily give up the possibility of error-free performance. It’s a new twist on an old problem: the curse of dimensionality. The trade-off is fundamental—expand scope, accept uncertainty.

Part III: Why This Matters for Humans

Here’s why all this matters: it reminds us that human intelligence is not obsolete. It’s essential.

Humans don’t need a million examples to learn something new. We can generalize from a few because we’ve built deep symbolic structures over a lifetime. We learn that a bear is a mammal, and that tells us something about a zebra. We recognize novelty, not because we’ve seen it before, but because we know how to make sense of the unfamiliar by analogy, not retrieval.

We mix reasoning methods constantly. Logic, pattern, intuition, metaphor, judgment. We’ve evolved to know which kind of thinking to bring to which kind of problem. This is a critical aspect of our intelligence and it's built in the real social and physical world.

Part IV: The Real Limits—And the Real Opportunity

What’s emerging in the LLM research is a reframing.

Coveney and Succi show that there are hard mathematical limits to what scaling can do. Zhao’s team reveals the instability of surface-level reasoning. Krakauer and Mitchell give us a language for different types of emergence. Floridi puts a boundary around what is and isn’t possible in principle.

Taken together, these are deeper constraints than just adding more parameters, more data, and more compute. They are our current limits that define the terrain of synthetic cognition.

This clarity about different types of intelligence opens more interesting possibilities than the replacement narrative. Instead of racing toward artificial general intelligence that does everything humans can do, we can design systems that do what they're actually good at while humans contribute what we're uniquely suited for.

What we've discovered is extraordinary: language is fundamentally statistical rather than syntactic. This revelation reshapes our understanding of meaning itself. Large language models are genuinely powerful systems that are already changing how we think, write, and relate to information. Even though they operate through distributional learning, they create meaning and influence culture in ways we're still learning to understand.

It's important to recognize just how transformative statistical learning at this scale actually is. The research helps us understand the boundaries of this power while appreciating its genuine impact on human cognition and society.

For those of us who care about what humans get to do with AI, this evidence is encouraging. It suggests that human intelligence—our ability to find principles, make analogies, and reason about novel situations—remains essential rather than redundant.

I’m hopeful we’ll see a shift toward designing AI systems that actually make human intelligence more powerful—especially if the fixation on pure scale fades. That change would open up the space of ideas, ease the relentless pressure to adapt, and lighten the environmental cost. More than that, it’s simply a far more interesting problem to solve.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Artificiality Institute.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.