6th March 2024

AI learns its first words (and helps explain how humans acquire language)

How do we learn to associate specific objects with specific words? A team from New York University have developed an AI ‘baby’ to help us answer this question.

Categories: Science & Tech

AI learns its first words (and helps explain how humans acquire language)

Provided that they are given enough care and attention from their guardians, children can slowly learn to process, use and communicate with language – this is an obvious fact that is not up for debate. Exactly how they are capable of this is a question that has long puzzled both scientists and linguists.

The theory of language

The influential theory of ‘Universal Grammar’ suggests that human brains have an intrinsic language acquisition framework which we use to learn a language. Essentially, our brains are pre-programmed to understand grammatical structures and rules shared across all human languages. Therefore, our linguistic skills develop by using this innate knowledge of the rules that shape language. Other theories emphasise the importance of simple associations between words and objects in conjunction with infants’ emerging ability to apply reasoning and logic as their brains develop in language learning. While researchers have attempted to present evidence for these ideas and others, it remains difficult to determine their relative importance.

AI babies

A team from New York University have developed a novel strategy to test the power of inference from simple associative learning only, using a machine learning artificial intelligence (AI) model. Unlike other models, such as ChatGPT, which are trained on enormous, complex datasets, the Child’s View for Contrastive Learning (CVCL) model was provided with something a bit more rudimentary – the sights and sounds experienced by a single baby.

Sam, from Adelaide, Australia, was fitted with a helmet attached to a video camera for approximately 1% of his waking hours between the ages of six months and two years, accounting for a total of 61 hours of recording.

This footage, with all spoken words transcribed, was then fed into the CVCL model, which consisted of two neural networks – machine learning models designed based on human brains – a visual encoder and a language encoder. After the data was processed, the model’s understanding of language was tested by matching a category, such as ‘ball’, to one of a set of four images. In other words, The AI ‘baby’ learns just as Sam, the human baby does: by associating what it sees with what it hears.

When the images presented were taken directly from the training footage, the model had a 62% success rate, while when the images came from external sources, the model was able to correctly identify the subject 35% of the time. In both cases, this represented a significantly higher score than would the 25% to be expected by chance.

What does this mean?

The model suffered from several limitations: by relying on transcribed words it could not pick up on informative differences in intonation and also, unlike a baby, it was incapable of actively learning by interacting with its surroundings. Nevertheless, drawing from this extremely limited input, it was still able to learn some basic words. This is a feat that those who believe that language processing requires specialised cognitive abilities in humans may have considered impossible. This relatively simplistic model has also left ample room for enhancements to make it better representative of a baby’s true experience, after which it would likely be able to amass a much larger vocabulary.

The model’s successes hint at an alternative way to train AI involving slower data acquisition, perhaps eventually leading to a more naturalistic and ‘human’ form of intelligence. This could potentially end up addressing a gripe some have with current machine learning models, which often have to train on huge datasets to simply regurgitate what their human teachers have already said, contrasting with human brains which are capable of creativity and inference with relatively little input in a way which still escapes modern AI.

Eddie Fyles

More Coverage

Why are you laughing: The science of humour

Get to know: Who is Professor Duncan Ivison?

Disability and ethnicity pay gaps go up, gender goes ...

Manchester Leftist Action member speaks out against a...

Manchester’s continuing problem with inaccessib...

From Our Correspondent: Uncovering Berlin’s les...

Thread Therapy: In conversation with UoM’s Fash...

If Labour wants to regain trust, they must stick to t...

Main Library Musings – Rant column #2

My life has been failing the Bechdel test – and...

Why are you laughing: The science of humour

In conversation with The Lion King’s Head of Ma...