Skip to main content

6th March 2024

AI learns its first words (and helps explain how humans acquire language)

How do we learn to associate specific objects with specific words? A team from New York University have developed an AI ‘baby’ to help us answer this question.
AI learns its first words (and helps explain how humans acquire language)

Provided that they are given enough care and attention from their guardians, children can slowly learn to process, use and communicate with language – this is an obvious fact that is not up for debate. Exactly how they are capable of this is a question that has long puzzled both scientists and linguists.

The theory of language

The influential theory of ‘Universal Grammar’ suggests that human brains have an intrinsic language acquisition framework which we use to learn a language. Essentially, our brains are pre-programmed to understand grammatical structures and rules shared across all human languages.  Therefore, our linguistic skills develop by using this innate knowledge of the rules that shape language. Other theories emphasise the importance of simple associations between words and objects in conjunction with infants’ emerging ability to apply reasoning and logic as their brains develop in language learning. While researchers have attempted to present evidence for these ideas and others, it remains difficult to determine their relative importance. 

AI babies

A team from New York University have developed a novel strategy to test the power of inference from simple associative learning only, using a machine learning artificial intelligence (AI) model. Unlike other models, such as ChatGPT, which are trained on enormous, complex datasets, the Child’s View for Contrastive Learning (CVCL) model was provided with something a bit more rudimentary – the sights and sounds experienced by a single baby.

Sam, from Adelaide, Australia, was fitted with a helmet attached to a video camera for approximately 1% of his waking hours between the ages of six months and two years, accounting for a total of 61 hours of recording. 

This footage, with all spoken words transcribed, was then fed into the CVCL model, which consisted of two neural networks – machine learning models designed based on human brains – a visual encoder and a language encoder. After the data was processed, the model’s understanding of language was tested by matching a category, such as ‘ball’, to one of a set of four images. In other words, The AI ‘baby’ learns just as Sam, the human baby does: by associating what it sees with what it hears.

When the images presented were taken directly from the training footage, the model had a 62% success rate, while when the images came from external sources, the model was able to correctly identify the subject 35% of the time. In both cases, this represented a significantly higher score than would the 25% to be expected by chance. 

What does this mean?

The model suffered from several limitations: by relying on transcribed words it could not pick up on informative differences in intonation and also, unlike a baby, it was incapable of actively learning by interacting with its surroundings. Nevertheless, drawing from this extremely limited input, it was still able to learn some basic words. This is a feat that those who believe that language processing requires specialised cognitive abilities in humans may have considered impossible. This relatively simplistic model has also left ample room for enhancements to make it better representative of a baby’s true experience, after which it would likely be able to amass a much larger vocabulary. 

The model’s successes hint at an alternative way to train AI involving slower data acquisition, perhaps eventually leading to a more naturalistic and ‘human’ form of intelligence. This could potentially end up addressing a gripe some have with current machine learning models, which often have to train on huge datasets to simply regurgitate what their human teachers have already said, contrasting with human brains which are capable of creativity and inference with relatively little input in a way which still escapes modern AI. 

More Coverage

Why are you laughing: The science of humour

While humour is an innate part of being human, dating back to ‘primate laughter’, exactly what makes something funny is still mostly unknown

In conversation with The Lion King’s Head of Masks and Puppets

The Mancunion was fortunate enough to attend an Insight Session at the Lyceum Theatre and sit down with The Lion King’s Head of Masks and Puppets Joseph Beagley to learn more about the science behind his craft

Can algorithms help you live a better life?

As the term drags on and student loans dwindle, many students start to feel unmotivated and unsatisfied with their lot in life. Could computer algorithms help you get back on track?

Killing consumerism: Are we headed to disposal doom?

In his keynote presentation, Professor Mark Miodownik deliberates on the perils of consumerism, how we might manage to bring it to a sustainable future, and its obsolescence