Decoding Alien Language with Project Ceti and Modu
Feb 20, 2025
Aliens are here! How Project CETI and Modu have worked together to uncover a new language from under the sea
Aliens [Google: a hypothetical or fictional being from another world.] They seem friendly enough and show no signs of wanting to eat us. We'd like to make contact. They might be super intelligent, but we have no way of knowing because we can't talk to them.
They don't speak English, or Mandarin, or Spanish. Their "language" appears to be a rapid series of clicks. We can't use our hands to sign either because the gestures they make are nothing like ours. It's unlikely they'll understand the mimed pen-to-paper international sign for "Can I get the bill, please?"
Everything about them is, well, alien.
Hollywood is no help if we can't talk to them ("Paul": 2011), sign at them ("The Abyss": 1989), play music to them ("Close Encounters of the Third Kind": 1977) or rely on the universal properties of mathematics ("Contact": 1997) although we are to some extent Louise Banks ("Arrival": 2016) having to rethink what language and intelligence even are.
We're not assuming these aliens have an English-like grammar with verbs, nouns, and adjectives because there's plenty of linguistic diversity on Earth. Like Pirahã from the Amazon Rainforest, which consists of only 11 phonemes, or vocal sounds (English has 44), but makes up for this lack of phonemic diversity by relying heavily on tone and stress. Or there's the Navajo language, famously used to encode secret messages by the Marines in WWII because of how hard it is to map to English.
It's still a very hard problem to solve though because, as open-minded as we are, we have no linguistic frame of reference to work with.
Oh, and by the way… this has happened. Those aliens are here.
Right now, very smart humans are trying to understand what sperm whales are saying to each other, and these cetaceans are every bit as alien as the Heptapods in Arrival. Their brains are 5 times bigger than ours. The human brain cortex (part of which is implicated in controlling language and speech) is about the size of a napkin, theirs is bigger than a newspaper, and they've had these brains for tens of millions of years longer than humans have.
Their clicks can be as loud as 230 decibels (a gunshot is about 180 decibels) which is enough, even in the dense environment of seawater, to cause tissue damage to humans. They can hold their breath for an hour and a half. They can dive to 2,000 metres (1.2 miles), which is about twice the depth that the most specialist military submarines are capable of (and 4x deeper than a normal sub). And when they sleep, only one-half of their brain has a nap at a time (a phenomenon known as unihemispheric sleep).
Alien, but so much like us…

(the image above may look like a hand, but it's a sperm whale's pectoral fin). They have complex social structures, exhibit strong signs of emotional intelligence, teach their young survival skills, play, pass on knowledge, traditions and cultural traits and mourn their dead.
These smart humans are part of Project CETI, named appropriately in homage to the SETI Institute the organisation looking for actual hep-tapods in the sky.
Project CETI is a collaborative community of experts in various fields working on what could be one of the most important scientific advancements in decades. Not just the ability to make sense of what sperm whales are saying, but the mechanics of a model to make sense of communication systems for which we have no key, no Rosetta Stone.Perhaps unsurprisingly, underlying this work is the clever application of Artificial Intelligence, specifically a class of machine learning known as Unsupervised Machine Translation, which attacks the problem from some interesting angles.
Whales communicate in a series of click patterns called codas (different from the echolocation clicks used to find their way around). We can't say codas equate to words, but they do have structure. And fascinatingly, they appear to have cultural, individual and regional variations, just as human language does. One coda, called the "five regular" (five evenly-spaced clicks) is used by whales all over the world, though each whale pronounces it differently. What does it signify? Is it the whale word for "whale" (surely it is: it's got five letters)? Is it the whale word for "the"? Is it "hello"? Or is it a filler like the English word "Ummm"?
Well, step one is to collect a lot of recordings of sperm whales talking to each other and look for patterns in the data. The resulting models show huge promise. A paper from five years ago showed 99.4% accuracy in identifying individual whales, something critical to the research because 'who is talking' and 'who is responding' are clearly fundamental to understanding what might be being communicated.
With more and more data, clustering, classification, feature extraction and language prediction all become possible. Large Language Models know that the phrase "the cat sat on the mat" is far more likely in English than "the cat sat on the marine biologist" even though both are correct. To bring meaning to this emerging understanding, it's critical to have access to contemporaneous observational data. Take a simplified example: if an orca is seen circling a cluster of sperm whales (orcas, or killer whales, are a natural predator of sperm whales) then patterns in the recorded codas might relate to the situation. They might indicate "danger" or even a sequence to specify "orca." Vervet monkeys have sounds that correlate to "snake," "leopard" and "eagle." Or, more accurately, the sounds are an indication of where a threat might be coming from (grass, plain, sky) because the "eagle" cry has been repurposed for "drone."
It was an especially proud moment to be able to announce that Modu has a small team working on a tablet app to enable CETI marine biologists to collect observational metadata that can be joined up with hydrophone recordings, digital imagery and data from tags and drones to create a universal timeline of enriched information about whale behaviour for the machine learning efforts. In the long term, CETI aims to make this information widely available to anyone who wants to contribute to the work.
So will we ever converse with sperm whales?
What about the works of Herman Melville? Well, probably not. Whatever their language looks like it's unlikely we'll have much in common culturally. But there are three aspects of this work that, to me, make it as significant as it is fascinating:
20,000 whales are killed every year in collisions with ships and whilst it's been proven that reducing the speed of ships can reduce the fatality rate, it's a hard thing to police. Marine exclusion zones would be better but they'd run headlong into protracted political and commercial arguments. It's not inconceivable that Project CETI could discover a way to warn whales of approaching ships by broadcasting sounds to get them to leave the area.
Building unsupervised learning models backed by observational data is the next frontier in AI, which is itself inspired by the "world model" developed in animals. Plus, what we learn about the sperm whale's ability to concisely communicate information may have implications for Information Theory - humans, regardless of their language, communicate information at roughly the same rate (it's why a native Italian speaker talks faster than an English speaker) and what we learn about the bit rate of whales will almost certainly open up new ways to think about language. Useful if real aliens ever decide to stop by.
And finally, Project CETI represents the best of humanity and science in one initiative. A combination of awe and passion and intellectual toil. Its success depends on experts from disparate fields coming together (a language translation challenge in itself) to collaborate on a problem. But it also brings to life the otherwise dry methodical application of mathematics to a new problem space by having it wholly depend upon people who love these animals, out on the ocean, patiently following, watching, tagging, recording and, ultimately, protecting them.