Rouse Award: Natalie Ronco’s research into gender bias in machine translation

Natalie Ronco (Upper Sixth) investigated the extent of gender bias in machine translation of languages for her Rouse Award-winning project.

Focusing on Spanish, she delved into the world of artificial intelligence (AI) and machine learning to detect the level at which words are skewed towards a particular gender based on traditional stereotypes.

“A lot of languages have a gender structure where you have masculine and feminine forms of words and sometimes when you select a word, you have to make a decision which it should be,” said Natalie, who was national runner-up in both Spanish and Mandarin in the prestigious Anthea Bell Prize for Young Translators in 2022.

“If you put a word with no context into something like Google Translate, it often defaults to the stereotype, so for example, with ‘nurse’ it just guesses it’s feminine, while other jobs might be masculine.

“They’ve tried to change it recently so if you put a word in by itself, it shows you both the masculine and feminine versions, but if you put a word in as part of a longer sentence, such as ‘the nurse is happy and the doctor is happy’, it defaults to the stereotype.”

Natalie said her interest in the topic was piqued while taking part in the inaugural Atlas Fellowship – a ground-breaking education programme in California’s Silicon Valley – last year, which included a deep dive into AI and the risks behind it.

She was particularly inspired by Brian Christian’s book The Alignment Problem, which considers the challenges in regulating AI with human thoughts and values, especially his look at the language processing algorithm Word2vec.

On Word2vec, Natalie explained: “Imagine every word is represented by a vector, or number, which tells you where it is within a 3D space. The position of each word within that space can tell you about how similar words are, so for example, ‘rain’ and ‘water’, or ‘rain’ and ‘sun’ because they’re both related to weather.

“This idea of representing words as numbers is used a lot in translation because the machine learning takes the number and gives you an answer.

“In The Alignment Problem, it took one of those data sets and added and subtracted words from each other, such as ‘Paris – France + England = London’, but it also took ‘computer programmer – man + woman’ and the closest answer was ‘home-maker’!

As part of her research, Natalie set herself the ambitious task of taking several books that had been translated from English to Spanish by a human and comparing them with how they appeared after being fed through Google Translate.

She said: “I generated the vector for both sets, but ultimately I didn’t manage to conclude a difference between them because I didn’t have a massive data set, although I could still see a difference.

“I’m very happy to have got the Rouse Award though, and I really enjoyed being interviewed by the Rouse panel. I had a very good chat with one of the panellists afterwards who suggested I could have made my results more accurate by changing the number of dimensions of my vector.”

Published

Designed by Svelte Design | Built by Highrise Digital