Sam has worn a helmet with sensors and camera from 6 to 25 months. He wasn’t always active. 61 hours of the boy’s life were recorded, in which he was exposed to 250,000 words. This is how Sam has served to teach artificial intelligences to speak.
Babies learn to speak at an astonishing speed. Before they are one year old they say their first word and at three they are already defending themselves in their daily lives with their mother tongue. They are the envy of many adults who want to learn a new language and take much longer to reach that level, if they achieve it. And they are also a role model for artificial intelligence, which needs much more data to learn a language. That’s why a team of researchers from New York University has put their algorithms under the skin of a baby, Sam, to see what they are capable of learning. Not literally, of course: they have done it by showing the system videos recorded from the child’s perspective, using a helmet with a camera.
The results of the study show that the system learns words by relating them to the things that the child sees and hears in their daily life. It is an advance to build artificial intelligences that learn more efficiently and similar to how people do.
Natural languages ??(such as Spanish or English) are forms of communication that people develop spontaneously. This differentiates them from artificial languages, such as programming languages ??or mathematical languages, created on purpose for a purpose.
Normally, in artificial languages ??everything has a single meaning that does not allow discussion: if we say 1 1=2, there is no doubt what we mean. But this is not the case in natural languages: if we say “see you at the bank,” have we met at a park seat or where we keep the money? Ambiguity makes natural languages ??especially difficult for machines. This is also why jokes, poetry and sarcasm give computers problems.
Already in the 1950s, interest arose in computers being able to work with human language (for example, the Georgetown Experiment, to translate between English and Russian, of great interest in the Cold War).
To achieve this, linguists and computer scientists described the structure of the language by writing syntactic rules, based on Chomsky’s theories. For example, a rule might say: a sentence is made up of a subject (which comes first) and a predicate (which comes after). But thousands of rules could be needed.
They were very limited systems: they did not resolve ambiguity well because they did not take context into account.
A major breakthrough occurred in the 1980s with the use of machine learning algorithms. They are algorithms that learn through examples like this: to translate between English and Russian, we give them thousands of English texts and their translations into Russian. From there, they manage to detect patterns and teach themselves to translate new texts. This makes their development easier (it is easier to get examples than to write a grammar) and improves the results, because they can take the context into account. But it maintains a limitation: each algorithm constructed in this way is only good for one thing. For example, a translation system only translates, it cannot summarize texts or answer questions.
The next big leap came in the late 2010s: the big language models emerged, the basis of ChatGPT. They are systems that learn to predict which word is most likely to come next. For example, from “the United States of,” a language model could predict “America.” If we then ask it to add another word, and another, it will be able to generate a coherent text. To achieve this, it is enough to show them a lot of texts, for example downloaded from the Internet.
No one really knows how they work, and, in fact, there is debate about whether these systems really understand anything. Some scientists argue that they act like simple parrots, imitating human language without understanding a word. Others say that, despite relying on statistics about the text they have seen, they are able to understand its meaning.
These great models are no longer limited to one task, but they bring new problems. To train a model like the latest ChatGPT, billions of words are used, a huge amount of text. This requires computers with great power and memory, something only available to large technology companies. And on top of that, they consume a lot of energy and pollute.
This brings us back to Sam. Children only hear a few tens of millions of words in their first three years of life, much fewer than ChatGPT. That is enough for them to defend themselves in their language.
Why do AI systems need a lot more data? One of the keys is that babies can associate words with objects and experiences. By pointing to a ball saying “ball,” we help them know what the word means. Systems like ChatGPT don’t have that help, they make do with raw texts.
Hence the relevance of the experiment with Sam. Can an AI learn language like babies do? The results are promising and could lead, in the future, to systems that require much less data, energy and emissions than current ones.
This article was originally published on The Conversation. Carlos Gómez Rodríguez is Professor of Computer Science and Artificial Intelligence at the Universidade da Coruña