Our guest contributor Ran Mo talks about using AI to simulate life in video games. As a former product lead at EA, he worked on a classic in the field: The Sims. Now he wants to push the boundaries.
The simulation of life, friendships, and companionships has been a holy grail in video games. From simple implementations in Tamagotchi and Pokémon, to the complex lives of The Sims, the incorporation of virtual companionships have deeply touched millions of gamers and formed the backbone of some of the most enduring franchises.
At its core, the process of creating digital companions is also a quest to better understand the nature of sentience. And as we shall see, the techniques used will also have wide ranging applications beyond gaming.
As technology, in particular AI, becomes more powerful, new opportunities open to reimagine digital life and companionships. This essay is divided into two parts. Part 1 traces through some of the most important historical milestones in simulating life digitally. Part 2 explores our efforts at Proxima in furthering this pursuit. Let’s get started!
The starting point: Scripting “life” in video games
The starting point of modern video game programming is scripting. Scripting is a broad umbrella term that encapsulates many concepts, from very simple programs to complex decisions trees and state machines. Yet at the heart of it, scripting is less about “true intelligence”, and more about deterministic responses that follow a set of predefined rules— essentially digital versions of choose-your-own-adventure books.
Despite their mechanical nature, scripting can be incredibly powerful in creating immersion. Mass Effect and Dragon Age, two popular franchises from BioWare, use scripting to create deep relationship opportunities with player companions. Depending on their choices, players can unlock backstories, affect game outcomes, and even form romantic relationships with the digital companions. The popularity of the two franchises are a testimony to the power of human-created immersive storytelling.
The challenge with scripting is ultimately one of scalability. Designers not only need to design each interaction by hand, but also to account for every possible permutation of player choice. This means content cost scales exponentially to player experience. Consider the following: a player chooses from three different options for a particular interaction. Based on his choice, three new options open up, and so on, for a total of 30 choices throughout the game. This decision sequence (assuming no overlaps) would require more pre-programmed scenarios than grains of sands on Earth! Clearly, a different approach is needed to build immersion at scale.
The Sims and utility-based AI
I had the opportunity to work on The Sims franchise at EA, and it was incredible to see the passion that the franchise instills. Today, more than 70 million people play The Sims. The fourth installment of the game has grossed over $2 billion dollars, and it’s still growing in popularity.
At the heart of the franchise are the Sims– autonomous digital companions with their own needs, preferences, and desires. Players can control them from time to time, or build for their broader environments. But these agents are also perfectly capable of running their own lives. In contrast to the pre-planned and scripted stories of Mass Effect, The Sims emphasizes the emergent narratives that form through these autonomous companions. In simpler terms, the Sims are a simulation of life.
Will Wright, the creator of The Sims, consulted two sources for his ‘virtual dollhouse’: The first was Maslow’s “Theory of Human Motivation” in which human desires are categorized into hierarchies. The second was Charles Hampden-Turner’s “Maps of the Mind”, in which thoughts are cataloged and organized.
The combination of these two sources inspired the AI engine of The Sims, known as utility-based AI. In this system, the AI balances between two mechanics: commodities and utilities. Commodities represent the internal states, or psychological needs, of each Sim, and utility curves represent the means to satisfy those commodities. As an example, an internal need (the commodity) could be ‘hunger’, and different food options (cooking or heating up leftovers) represent ways of satisfying that need. The AI simultaneously evaluates hundreds of needs and associated decisions—the need to eat, the need to belong, the need to find love—and prioritizes them in decision-making. In many ways, this is not so different from how we make decisions as humans!
Yet despite the many achievements of The Sims AI, something feels missing. Each Sim is seemingly trapped on a perpetual treadmill of self-optimization, blind to the universe beyond its immediate requirements. It lacks the ability to forge connections with players beyond the scope of its own needs. Genuine relationships transcend mere optimization; it entails learning, experiencing, and growing together. To attain this, we need a different approach.
Black & White and reinforcement learning
The game Black & White launched in 2001. Black & White was a ‘god-game’ in which players played as divine beings that ruled over hapless citizens. But the real star was a creature companion that players indirectly influenced. The creature had the power to nurture or destroy, and had intentions and desires of its own.
Players couldn’t control the creature companion directly, but could influence its decisions through rewards and punishments (e.g., petting and slapping), and over time, through such actions, shape the creature for good or evil—hence the name ‘Black & White’.
Unbeknownst to players, the creature was controlled by reinforcement learning algorithms. Player actions like petting and slapping became the training inputs that shaped the creature’s desires, beliefs, and intentions over time. In simpler words, the creature could learn.
Black & White was one of the first games to use modern artificial intelligence in gaming. It was both a commercial and critical success, with IGN calling it a “miraculous experience”. But Black & White was also way ahead of its time: it was severely limited by the algorithms and compute power of its time. Processors possessed a tiny fraction of their powers today, and dedicated GPUs—a necessity for modern AI processing—was still in its infancy.
Yet in a sign of the intimacy between video games and frontier technology, Black & White’s story didn’t end there. The game’s AI programmer was a young engineer named Demis Hassabis. After Black & White and further adventures in the gaming industry, Hassabis went back to school to complete his PhD in cognitive neuroscience. After graduating, Hassabis founded the artificial intelligence company DeepMind, where he remains CEO today. In 2014, DeepMind was acquired by Google for around $500 million, and in 2016, the company made headlines when its AlphaGo program beat a world champion in the ancient game of Go. Today, DeepMind’s reinforcement learning technology is used from protein structure predictions to improving wind farm efficiency. It’s curious to think that all this began from building digital companions in games.
Today and beyond
The recent surge in AI innovation has rejuvenated interest in simulating life in games. One approach is to embed conversational chatbots directly in the game, such as with this Elder Scrolls mod. This approach is appealing because it’s relatively easy to envision and implement: hook up a chatbot to a game avatar, integrate speech recognition and text-to-speech, add a healthy dose of game lore, et voilà you have a bona fide talking NPC!
But such implementations are fairly shallow and not true simulations of life. The game simply acts as set dressing for the chatbot, and the novelty of such experiences can quickly wear off.
In contrast, a deeper implementation is the Minecraft Voyager project, in which an LLM-powered agent explores the Minecraft world and learns skills without human intervention. The agent proposed its own tasks, built its own knowledge library, and used those learnings to further its discoveries. Without human guidance, Voyager made sense of the Minecraft world, built its own house, and eventually mined diamonds.
Two things stood out to us: the agent’s ability to make sense of its world, and its ability to form long term memories through experience. What if we could harness those abilities not as an autonomous game agent, but rather to better simulate life and companionship?
As a starting point to what we aim to achieve, consider a very small moment with a dog named Nemo.
- Perception: Nemo sees an unknown, scary-looking person approaching its owner
- Input: The owner shouts loudly and waves her arms around
- Memory and Personality: Nemo remembers that he is very protective of his owner, and that he’s fearless when the owner is under threat.
In an instant, Nemo interprets all this and makes his decision. He springs into action, jumping between his owner and the interloper, and growls menacingly–ready to attack. In the aftermath, Nemo is appreciated for his bravery and rewarded with a treat, reinforcing his behavior.
But what if Nemo were not fearless but cowardly? Would he choose to bark from a distance instead? What if the interloper were actually a friend that the owner was excited to see? Would Nemo be scolded for growling at a friend, and if so, would he remember for the next time? Such emergent moments highlight the nuances of real life relationships that can’t be pre-programmed. Yet these moments are also what makes companions feel real and authentic. We believe modern technology has advanced to a point in which we can begin tackling such nuanced relationships.
Many modern AI models rely on a neural network architecture known as transformers. Through its attention mechanism, transformers excel at making sense of context and dependencies across large and disparate data sources. In simulating life in games, these data sources could represent memory, perception, user commands, and more. To better understand this, let’s recast Nemo from a real dog into a virtual companion.
- Perception: We built a system that converted the 3D game world into natural language in real time, so that Nemo can “perceive” his world around him at any given time.
- Memory, personality, intention: stored and interpreted digitally (as vector files), and continuously evolve through new experiences, just like in real life.
- User input: We added speech recognition for player voice commands. But these could easily also be control inputs or in any other form.
We included below a demonstration of the prototype.
To enable the aforementioned scenario, we apply a first layer of a large language model to translate “perception-to-intention” by taking inputs across perception, memory, user commands, and other cues. In the case of Nemo, the output would look something like “Oh no, my owner is in danger. I need to protect my owner!”
But this intention is not yet a game action. To achieve this, we need to introduce a second layer of LLM to translate “intention-to-action”: converting the intention into executable game commands in real time. This second layer is particularly difficult because it needs to understand the range of executable actions in the context of its intentions; any incorrect commands could crash the game. So here, we also added a third layer of AI system to self-correct any failure in logic and game state changes in real time.
Finally, we added a “real-time learning by association” system that commits observations and outcomes to memory, so that each action influences part of Nemo’s long-term memory, and affects the outcome of future decisions. We believe this ability to continuously learn will be a central part of future life simulations.
One more note: we built Nemo separately from the world. Nemo perceives, interprets, and learns from the world around him in real time, just as we do as players. This is distinct from the traditional approach to NPCs, which are built as ‘part of the world’. Nemo’s architecture “frees” him from his environment, and abstracts him to traverse with players across new experiences—opening up opportunity to myriad first-party and player-created adventures in the future.
Implications and the future
The simulation of life and companionships within games have important implications. Commercially, it has led to some of the most enduring and profitable franchises, like The Sims. For players, these companions have the capacity to deepen engagement within games. Beyond gaming, these pursuits also symbolize deeper approximation of human relationships and experiences.
To be clear, there are still many challenges and unsolved elements—and many pieces of puzzle not yet built. At the same time, the pace of technical innovations has been breathtaking to see: within weeks of Meta’s open source foundational model, researchers have trained light-weight, application specific models that perform at the highest levels.
Frontier models and technology are only part of the answer. To create truly emergent and immersive experiences, game makers need to marry innovative technology with deep artistry. At Proxima, we’re excited to push those frontiers in building the next generation of interactive experiences. We’re still early in that journey, and there’s a lot more we’re aiming to build. We believe it’s better to learn together than alone. So if you’re also researching or building against this space, we’d love to hear from you. If so, please reach out!