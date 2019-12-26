Elvis talking to Mercedes in Mercedes Super Bowl Commercial

Sitting in his dorm room at Stanford University in 2004, SoundHound CEO and co-founder Keyvan Mohajer had a vision of a world where people could talk to the things around them and the things would talk back. A Trekkie and serial entrepreneur, he had been searching for the next big problem to solve and looked to the deck of the Starship Enterprise for inspiration.

“I was a science fiction fan and realized there were a number of cool concepts that hadn’t been developed yet,” he told me. “There was teleportation where you could beam to any location, there was the Holodex which could turn any room into any environment, and there was the replicator that could make anything like food or devices. But what stood out most was voice AI. I knew 20 years down the road this would become our reality.”

Few investors were willing to roll the dice on a 20 year plan to power this chatty world, so he bootstrapped the company with classmates and went to market with a music discovery app which would go on to become Shazam’s biggest competitor with over 300 million downloads. In 2005, SoundHound landed their first outside investor check from retired Google executive, Aydin Senkut of Felicis, and in 2006, they closed their Series A round.

Today, SoundHound is a Series D unicorn making the Internet of Things talk. I had a chance to catch up with Mohajer on the state of voice AI, its emerging consciousness, and what it’s like to compete against titans. What follows is an edited transcript of our conversation.

You have an impressive list of partners. What is it about your platform that brands want over voice assistants from Google, Amazon and Apple?

Google, Apple and Amazon have a certain vision of the world. They want their assistants everywhere and they want people to say their name, “Hey Google,” “Hey Alexa,” “Hey Siri.”

But imagine 20 to 30 years in the future, when 10 billion people are living among 20 billion robots, some are doctors, some are lawyers, some are teachers. Should they all be called Alexa?

That’s not what brands want. Brands want customers to say their name. “Hey Mercedes.” “Hey Honda.” Our platform allows for that kind of personalization.

We’re on a mission to bring voice AI to all things – cars, kitchen appliances, smart speakers, hotel rooms, wearables, cell phones, computers – and power some of the most popular brands in the world including Citroen, Deutsche Telekom, Samsung | Harman, HERE Technologies, Honda, Hyundai, Kia, Mercedes-Benz, Motorola, Pandora, and Peugeot.

Is your AI able to have deeper, more meaningful conversations than Google, Alexa, and Siri?

Yes, our technology is superior. We use speech-to-meaning (not speech-to-text-to-meaning) which makes our IoT conversations faster and more contextual. We also use deep meaning understanding which is capable of processing complex sentences of arbitrary length, with compound criteria and multiple exclusions. This is different than standard NLU (natural language understanding) which uses hard-coded “entity detection” and can only understand simple queries like “Show me sushi restaurants in San Francisco.”

People have low expectations of AI’s ability to understand complex questions and converse with assistants with short, simple, keyword-based queries, but it shouldn’t be that way. Computers are better at computing than humans. With our technology, users can talk to their cars like they’re people and ask multiple questions across different domains of understanding. For example, “Hey Mercedes, show me five star sushi restaurants in San Francisco open after 9pm, but don’t include those without wifi, and please let me know if it’s raining.”

Can your AI sense mood and emotions?

We’re working on it. In order to talk to devices the way we talk to each other there needs to be both the intelligence component and the emotional component.

How much of the movie Her will become our reality?

In the near future, there will be a lot of smart devices and they’ll be part of our daily lives. We’ll talk to our alarm clocks, our coffee machine, then our car, at work we’ll talk to our computers and devices, and then we’ll go home and talk to our tv. AI will be everywhere, with an emotional element, and in time, people will come to accept it as its own being.

When I first saw the movie Her, my team was in the midst of programming our platform to talk back and we were debating whether our AI should refer to itself as “I” when answering a request like “Show me some restaurants.” We wondered whether the response should be, “Here is what I found” vs. “Here are some restaurants.”

There was a deep philosophical split across the industry. Google avoided saying “I” while Apple was making Siri sound like a person.

I found the message of the film to be very powerful and it convinced me that this thing exists and deserves to refer to itself as “I” – today that’s the norm.

This conversation has been edited and condensed for clarity.

