Emotional intelligence just got a new meaning
(Illustration: Saurabh Singh)
THE YOUNG GENERATION of today obsessively uses generative AI (GAI) tools for learning, and sometimes for cheating. GAI is all about giving prompts to get the best outcomes. As a result, there are job functions and courses that help people master prompts. All that is helpful in the use of GAI platforms that include the likes of OpenAI’s ChatGPT, which offers even essays and dissertations from prompts, and DALL·E images and Sora videos. One is literally spoilt for choice: there are endless other such AI—specifically GAI—tools from AI behemoths as well as startups.
As with the other concept that is being widely talked about, artificial general intelligence (AGI), scientists and entrepreneurs are a divided lot: some of them say we aren’t there yet, while the likes of Elon Musk offer dire predictions of the end of humanity with the arrival of such advanced AI tools that outshine humans. Musk, for his part, says we are only a few years from there. According to him, the year is 2029. OpenAI CEO Sam Altman, too, agrees that AGI could be developed in the “reasonably close-ish future”. But he is much less pessimistic than Musk is. A few other AI experts argue that AGI is already there—it is a futuristic, next-generation idea. It is about AI that possesses human-like capabilities, unlike mere AI robots, which, as IBM explains, “combine computer science and robust datasets to enable problem-solving.” Simply put, AGI, whether one likes it or not, is a term used to refer to AI that can vastly outrank humans.
The plot thickens now with the Alan Cowen-led startup Hume AI pitching its Empathic Voice Interface (EVI), its flagship product which is now in its beta version, billing it as “the first AI with emotional intelligence”. The company states that this tool is an outcome of more than 10 years of research and that it can capture “hundreds of dimensions of human expression in audio, video, and images”. This means those experts who say that traits of AGI are already present in some AI tools are right even if they haven’t reached their full potential yet. Whatever the truth be, EVI is sure to shock you with its elaborate and meticulous scanning of the tones, rhythms, and various other aspects of your voice, as this writer discovered.
If you tell EVI in a funny way, “I am crying”, it detects a bit of joy and amusement in your tone. But if you disclose some genuine feelings, it distinguishes empathic pain, sadness, confusion, interest, and sympathy in your tone, timbre, or what they call “prosody”, an expression denoting patterns of sound and rhythm we have all heard only in poetry. Interestingly, the startup is named after the Scottish philosopher David Hume.
EVI also responds the way counsellors respond with a sense of caution and empathy, offering to be a deep listener and encouraging you to speak and share more so that you can give vent to your pent-up emotions. Hume AI states that its language model used in this tool was trained “on human intensity ratings of large-scale, experimentally controlled emotional expression data” from methods described in two research papers. One of them is titled ‘Deep learning reveals what vocal bursts express in different cultures’ and the other, ‘Deep learning reveals what facial expressions mean to people in different cultures’. Both are co-authored by Cowen and collaborators.
EVI, Hume AI’s website says, can map vocal bursts in different cultures as well as what certain facial expressions mean to people from different populations. You can ask EVI to explain how you feel just in case you aren’t sure about it, and the company claims it can help you improve your lives by chatting with you and possibly lifting your spirits—the benefits, the startup promises, are much more if you connect to the webcam and allow EVI to gauge your facial expressions, armed with its training using the best of advanced studies and papers in psychology and communications. “Our API is based on our own empathic LLM [eLLM] and can blend in responses from an external LLM API. The demo incorporates Claude 3 Haiku,” it adds. Claude 3 Haiku is AI company Anthropic’s fastest, most compact model for “near-instant responsiveness”. API, or Application Programming Interface, is a set of rules for two machines talking to each other. Meanwhile, EVI is also designed to eliminate overlaps and awkwardness in your statements. Unlike ChatGPT or other GAI tools, this one goes silent and listens to your new statements when you interrupt EVI. The company avers that EVI, therefore, behaves the way empathetic humans do.
On its website, Hume AI says its models measure 53 expressions identified ‘through the subtleties of emotional language and 48 expressions discerned from facial cues, vocal bursts, and speech prosody’. As with facial expressions, the website says its tools can measure even subtle changes and movements
Hume AI recently said it raised $50 million in financing from EQT Ventures, Union Square Ventures, Nat Friedman & Daniel Gross, Metaplanet, Northwell Holdings, Comcast Ventures, and LG Technology Ventures. Its CEO Cowen is a former researcher at Google DeepMind.
On its website, Hume AI says its models measure 53 expressions identified “through the subtleties of emotional language and 48 expressions discerned from facial cues, vocal bursts, and speech prosody”. As with facial expressions, the website says its tools can measure even subtle changes and movements “often seen as expressing love or admiration, awe, disappointment, or cringes of empathic pain, along 48 distinct dimensions of emotional meaning”. It adds, in speech prosody, it can demystify the tone, rhythm, and timbre of speech, and so on. When it comes to vocal bursts, including laughs, it also decodes “sighs, huhs, hmms, cries and shrieks [to name a few],” says the company. From facial cues, vocal bursts, and speech prosody, its models measure among others, admiration, anger, aesthetic appreciation, contempt and contemplation, desire, doubt, ecstasy, guilt, love, shame, pain, and triumph, from among the 53 expressions mentioned earlier.
The social media is abuzz with comments from AI entrepreneurs as well as users expressing appreciation for EVI. It has also attracted rave reviews on technology sites as well as in mainstream media.
Cowen stresses that voice interfaces are inevitable. His logic is that speech is at least four times faster than typing and frees up the eyes and hands. He posted on X that besides other benefits, speech “carries more info in its tune, rhythm, & timbre. So, we built the first AI with EQ (emotional quotient) to understand the voice beyond words. It can better predict when to speak, what to say, & how to say it.”
Nikita Bier, AI influencer and entrepreneur, posted on X after navigating through EVI, “This feels like the moment where we cross the chasm from sentence finisher bot to something much closer to a companion. It detected urgency and sarcasm in my tests.” Guillermo Rauch, CEO of Vercel who is also an author, took to X to say, “Absolutely incredible. Easily one of the best AI demos I’ve seen to date. Incredible latency and capability. Seeing & hearing is believing.” EVI offers both voice responses and transcripts.
The New York-based Hume AI has set standards higher in AI thanks to its new AI chatbot, and thanks to its own language model (eLLM). It boasts of interpreting and decoding human emotions much more than any other AI tool so far. Certainly, the startup launched in 2021 offers great scope for potential applications, including therapeutic uses.
Emotional intelligence just got a new meaning, and the debate over whether it works for or against humans is beginning to get interesting.
More Columns
Madan Mohan’s Legacy Kaveree Bamzai
Cult Movies Meet Cool Tech Kaveree Bamzai
Memories of a Fall Nandini Nair