A Primer on Symbol Emergence Systems Theory - Part 1: Motivation

symbolemergoutreac
5月3日
読了時間: 11分

更新日：5月9日

Symbol Emergence Systems Theory (SEST) is an interdisciplinary field that has taken shape over the past fifteen years. Originally proposed by Professor Tadahiro Taniguchi (“tanichu”) at Kyoto University, SEST has since been advanced by researchers from a range of disciplines—particularly in Japan. In this article series, we will explore what SEST is, why it matters, and where it’s headed, in three parts:

Part 1: Motivation
Part 2: Core Methods and Hypotheses
Part 3: Future Prospects

1-1 A new science of meaning

Symbol Emergence Systems Theory (SEST) is a scientific study of meaning. Humans are social beings who have always lived with symbols that carry meaning, and in the age of generative AI, reflecting on the nature of meaning has become more important than ever.

LLMs are great… but why?

The invention of large language models (LLMs) marks a major breakthrough in AI research. When you ask an LLM to “summarize this Zoom recording” or “suggest five article titles,” it responds as if it genuinely understands your instructions. Building a machine with such capabilities has long been a dream of AI researchers, but remained out of reach until just a few years ago.

What made this possible? A straightforward answer would be the rapid scaling of deep neural networks in the 2010s and the advent of transformer architecture in the late 2010s. The transformer is a specific kind of deep neural network that can learn effectively from massive amounts of language data enabling the remarkable language abilities we see today. LLMs learn the statistical properties of language data and predict the next word (or token) they should output. That’s all they do, yet they seem “intelligent” to us. Their outputs not only sound natural, but also seem to "think" deeply, a process known as chain-of-thought reasoning.

There’s no doubt that the architectures and algorithms behind LLMs are groundbreaking inventions. However, they have also revealed an inherent quality that language has always possessed. The success of these models demonstrates that the very data AI uses to predict the next word—in other words, language itself—has always held the capacity to express human knowledge and thought.

This potential arises from a unique feature of language: meaning. We can use words to communicate and deepen our ideas because we imbue them with meaning and interpret meaning from them. Similarly, the strings of characters generated by AI are valuable precisely because they convey meaning to us.

Questions on meaning in the gen-AI era

Where does this thing we call "meaning" come from? The sounds and characters that make up words do not inherently possess meaning. There is nothing necessary about the sequence of sounds "apple" or the two kanji characters "林檎" that link them to a red fruit.

Once we realize that it is far from trivial for words to acquire meaning, a series of fundamental questions emerge:

When do words first gain meaning?
How do children learn the meanings of words?
How is communication through language even possible?

Surprisingly, we still lack fully satisfactory answers to these questions. These are precisely the kinds of challenges that Symbol Emergence System Theory (SEST) seeks to tackle. And because today’s AI models rely on the latent capacities of language, exploring these questions is also key to how much further LLMs can advance.

Another set of questions central to SEST concerns AI’s impact on society. AI not only possess data that already carries meaning, it can also generate new meanings. As AI becomes increasingly capable of speaking and writing, new forms of linguistic communication will emerge: between humans and AI, and even among AI systems themselves. Through these interactions, novel forms of "meaning" may arise. How might this transformation reshape language and society itself?

Symbol Emergent Systems Theory (SEST) is a new academic discipline for the generative‑AI era, dedicated to uncover both the origins and future of meaning. In brief, while generative AI research focuses on building machines that manipulate meaning from language data that already carries it, SEST tackles the other half of the story: the origins of linguistic meaning itself, and the broader societal impact of generative AI. SEST explores both the “before” and “after” of generative AI’s development, aiming to offer the conceptual framework we need to understand—and responsibly engage with—its growing power.

Of course, earlier disciplines—philosophy, linguistics, cognitive science and information science—have long explored the meaning of symbols from various perspectives, and SEST builds on these foundations. What sets SEST apart is its constructive approach: rather than stopping at qualitative hypotheses, it employs mathematical methods, such as probabilistic generative models, and implements them in embodied systems like robots to learn by building. SEST’s ambitious vision is to establish a truly interdisciplinary science of meaning, grounded in both mathematical formalism and empirical practice.

This, in a nutshell, is SEST’s significance today.

1-2 Not “Symbol Grounding Problem” but “Symbol Emergence Problem”

We’ll begin to unpack what we exactly mean by symbol emergent systems. First: symbols.

What is a symbol?

You might think of everyday examples—written characters like kanji or the alphabet, musical symbols such as ♪, or mathematical symbols like + and ∬. We’re often most aware of symbols when they’re unfamiliar. When we encounter characters in an unknown language or a sentence that’s too complex to parse, we recognize: these must be symbols. We call them symbols because—even if we don’t understand them —we assume they convey meaning to someone else. In other words, symbols are things that express some kind of meaning to at least someone.

Traffic signs, hand signals, or an evening bell can convey different messages—“stop,” “time to wake up,” or “it’s 5 PM,” for example. Whenever signs, gestures, or sounds communicate meaning to someone, they function as symbols. In SEST, we don’t limit ourselves to linguistic symbols; we consider this full spectrum of symbolic forms.

Among these varieties, computer symbols are especially intriguing. Inside a computer, information flows as electrical signals and is processed based on predetermined rules defined by programming languages. From hardware description languages that define logic circuits to more human‑readable languages like C or Python, computers are constructed and operated using their own symbol systems. These systems aren’t primarily intended to convey meaning to humans (except to programmers), but to specify exactly how a computer should behave.

The symbol grounding problem

Throughout the 20th century, AI researchers explored the idea of explicitly assigning computers “symbols” corresponding to human concepts, believing that true intelligence could emerge from manipulating these symbols. For example, if a system links the symbol “apple” to the attribute “is a fruit”, and “fruit” to “can be eaten,” it can infer that “apples can be eaten.” Proponents argued that all higher‑level reasoning could be constructed from such symbolic manipulations—that was the core of symbolic AI.

But does symbolic AI truly capture the meaning of its symbols? In his influential 1990 paper, cognitive scientist Stevan Harnad introduced what he called the symbol grounding problem. He argued that computer symbols merely refer to other symbols, without any direct connection to the real world. Harnad likened this to handing a baby a Chinese‑English dictionary and expecting them to learn the meanings of words without any grounding in sensory or experiential reality.

In essence, computer symbols form a closed loop, referring only to one another. The symbol grounding problem highlights the fundamental difficulty of anchoring symbols in real‑world objects.

Human symbols are already grounded

Here we want to emphasize the difference between computer symbols and human symbols. Computers can handle “ungrounded symbols” because humans explicitly design and program them. Human symbols, by contrast, aren’t injected directly into our brains. Our thoughts can’t be observed or manipulated from the outside. As we grow, we acquire the meanings of words and gestures through being spoken to by others and through first‑hand experiences; seeing, touching, and interacting with the world.

Moreover, the meanings of human symbols aren’t fixed. Words change meaning over time: consider how the meaning of “liberalism” has shifted, and dictionaries are updated every few years to reflect such changes. Even within the same era, a single word can carry different meanings depending on who uses it and in what context. For example, the Japanese adjective “yabai” can mean either “terrible” or “amazing,” depending on the situation. Human symbols are socially constructed, fluid, and context‑dependent, making them fundamentally different from the rigid symbols of computers. AI researcher Luc Steels highlighted this distinction in his 2008 book chapter by coining the terms c‑symbol (computer symbol) and m‑symbol (meaning symbol).

How should we understand human symbols? The study of symbols—semiotics—long predates modern computers. One of its founding figures was American philosopher Charles Sanders Peirce (late 19th to early 20th century). Peirce’s semiotics comprises three components: the sign (e.g., a word or image), the object it refers to, and the interpretant.

By introducing the interpretant, Peirce gave us a dynamic model of meaning: the link between sign and object isn’t fixed but is recreated in each context. In his view, a symbol isn’t a static entity like a word or picture but a process arising from the interaction of sign, object, and interpretant. Meaning, then, exists from the very beginning.

This process‑oriented perspective sidesteps Harnad’s grounding problem. Symbols are grounded as they arise, but it also raises new questions. How do such symbol‑making processes actually begin? We can’t simply declare, “From tomorrow, I’ll call bananas ‘apples’”—that would only create confusion. While we can invent rules in small settings, most of our everyday symbols emerge socially, not by individual fiat. When large‑scale properties arise from many local interactions, that’s emergence; and symbols are precisely an emergent phenomenon. One key challenge is to uncover the mechanisms by which symbols collectively emerge in society.

There’s another mystery: how do people enter the symbolic world? At birth, a baby’s experience is pure sensory chaos—light, sound, touch—without any built‑in “meaning.” How does a child learn to wield symbols? This question parallels efforts in robotics—particularly within SEST’s “symbol emergence robotics” branch—to enable machines to develop their own symbols. Ultimately, how humans and robots together create and participate in symbol‑making is the “symbol emergence problem” that SEST sets out to solve.

Grounding in contemporary AI

Finally, let’s briefly examine today’s AI through the lens of the symbol grounding problem. As noted earlier, the 21st century has become the era of neural networks, superseding classical symbolic AI. Rather than having human programmers explicitly define (computer) symbols, modern AI learns its own representations from the data it ingests. In this sense, the learning process itself performs symbol grounding. Harnad himself, in his 1990 paper, highlighted the neural-network (or connectionist) approach as a prime candidate for solving the symbol grounding problem.

Large language models (LLMs) are often said to be “ungrounded” since they are trained solely on text and have never actually seen or tasted an “apple.” Yet when an LLM outputs the word “apple,” that word is grounded in the real world for us—because “apple” as a sign was originally created by humans, and its meaning is transmitted to the AI via the human‑generated training data. This perspective becomes even clearer through the Collective Predictive Coding Hypothesis discussed in Part 2.

More fundamentally, the key issue is how our own language practices shape AI and conversely how AI’s language output influences our linguistic activities. This dynamic interaction represents a new symbol emergence problem in the era of generative AI.

In this section, we’ve surveyed the symbol emergence challenge that SEST addresses as a science of meaning in the generative AI era. Tackling symbol emergence scientifically requires viewing the context in which symbols arise as an interconnected system.

1-3 What is an symbol emergence system?

Let’s dive deeper into the symbol emergence problem and examine what kind of system it arises within.

Symbol emergence problem at the agent level

First, let’s look at this from the perspective of a single agent. When we learn a foreign language, we begin with confusion and gradually piece together meaning. Likewise, a newborn baby initially perceives light, sound, and bodily sensations as a formless jumble, but over time, they progress until they can use the same words adults do.

In the process, no one ever “programs” symbols into our brains. We can’t open someone’s head and directly read the meanings they assign to words—our cognitive systems are, in that sense, closed. Yet by interacting with the world and with others through our bodies, we come to use external symbols (not the “computer symbols” of classical AI, as a reminder) as meaningful tools for communication. Explaining how this happens is the essence of the symbol emergence problem at the level of agents, the subjects of cognition and action.

What is internal representation?

To answer that, imagine a mother holding an apple and saying, “Look, it’s an apple!” The secret of how babies learn language lies in what’s happening inside their heads. In cognitive science, those “things in the head” are called representations.

Representation is a notoriously tricky concept in philosophy, but here we’ll use a broad definition: something that stands in for something else. A map represents a city; a thermometer represents room temperature. Those are external representations that we can see. In contrast, internal (or mental) representations can’t be observed directly, and they’re the focus of symbol‑emergence theory.

To make this more concrete, consider two fields that literally look inside:

Neuroscience reveals neural representations in the brain. For example, hippocampal “place cells” fire only when you occupy a specific location, forming a spatial map.
Machine learning identifies analogous structures in artificial neural networks. A network trained on cat images develops internal activations for “cat‑face outline,” “eyes and nose,” and ultimately “category: cat.”

In both cases, these representations aren’t preprogrammed but are learned through experience, a process called representation learning. Unlike hand‑coded computer symbols, they emerge from the bottom up.

Cognitive science and symbol‑emergence system theory operate at a more abstract, computational level based on David Marr’s 3 levels of analysis, modeling internal representations as unobservable latent variables with probability distributions. Though abstract, these models share with neuroscience and machine learning the fact that their representations are acquired through interaction and learning.

Bringing it all together, the core idea is this: agents have internal representations that evolve through embodied interaction with the world, enabling them to grasp symbol meanings. Thus, the symbol emergence problem at the agent level asks:

How do agents (humans or robots) learn to form and express internal representations, drawing on sensory inputs and signs like language, and come to understand the meanings of those signs?

If we treat the agent as a robot, the question becomes: How can we build a robot that acquires language? This constructivist approach, which involves understanding cognition by creating it, offers a path to uncovering how human cognitive systems enable symbolic thought.

Symbol Emergence Problem at the Societal Level

There is the other half of the puzzle: how do meanings of signs arise within society? In principle, there is no necessary link between a sign and its meaning—one could just as well call an apple an orange and vice versa. In reality, however, none of us can redefine meanings at will. Although great thinkers occasionally coin new definitions, most meanings stabilize only as individual interpretations accumulate and coalesce through social interaction.

Signs interrelate to form wider networks of meaning, or symbol systems, which in turn shape how we think, speak, and act. Organized symbol systems then exert a top‑down influence on each member of society. For example, employees at companies that forbid remote work must commute daily because shared symbols like “company,” “remote work,” and “duty” constrain their expectations and behavior.

In organizational theory and multi‑agent AI, a system in which bottom‑up patterns of individual activity self‑organize and then impose top‑down constraints on those same individuals is called a micro–macro loop. In complex‑systems science, any system governed by such a loop is termed an emergent system.

Thus, at the societal level, the symbol emergence problem asks:

How does a symbol system, an external representation of shared meaning, emerge as people and robots interact via signs?

The symbol emergence system

So far, we’ve treated agent‑level and society‑level emergence separately, but in reality they form a single, intertwined system. Each of us—human or robot—continually updates our internal representations through bodily interaction with the physical world, while simultaneously being guided by the societal symbol system. Conversely, new words and usages—like “remote work”—are born out of our collective communication.

Together, our cognitive systems and the social symbol system form a dynamic symbol emergence system, encompassing:

Physical interaction: each agent’s internal representation links to the real world through embodied interaction.
Symbolic interaction: agents’ internal representations connect with one another via symbolic communication.
Micro–macro loops: agents’ internal representations and the shared symbol system mutually shape each other through processes of organization and constraint.

This is what the Symbol Emergence Systems Theory (SEST) aims to formalize mathematically, as well as validate its principles through simulation and robotic implementation. In particular, probabilistic generative models enable robots to learn internal representations, and recent advances (e.g. Collective Predictive Coding) illuminate how internal‐ and external‐representation dynamics mirror one another. SEST was conceived around fifteen years ago and remains a young, evolving field, continually enriched by insights from neuroscience, machine learning, economics, AI, and complex systems science.

That concludes Part 1, where we sketched the motivations and worldview of symbol emergence systems. In the next part, we will explore how to build mathematical methods and the core hypothesis of SEST.

Further reading

Taniguchi, T. et al. (2018). Symbol emergence in cognitive developmental systems: A survey. arXiv:1801.08829. …A comprehensive survey that provides an overall overview of SEST and discusses trends in probabilistic generative models and robotic implementations.

=> Go to Part.2.

Written by: Ryuichi Maruyama

Editorial supervision: Tadahiro Taniguchi

Design: Reira Endo, Masaya Shimizu

Translation support: Momoha Hirose.