AI Ecosystem – Voice part II

This time it should work properly.

  • OpenAI is not launching a new model this year, but it is joining everyone else in declaring that voice assistants are the next big thing just as everyone did in 2018 but this time, there is a good chance that 2018’s hype becomes a reality.
  • Crucially, OpenAI is also opening up GPT4o to developers so that they can implement a voice assistant in their apps and services which is OpenAI’s next move on taking a lead in the developing AI Ecosystem although it needs inference at the edge to really make it work.
  • OpenAI is holding its 2nd developer conference and this year, it is focusing on extending its appeal to developers to entice them to use GPT as their foundation model as opposed to Gemini, Llama, Claude and so on.
  • Last year, OpenAI made its first move on the AI ecosystem by announcing a software development kit (SDK) to help developers create services on GPT and a store through which they can distribute and sell these services to users.
  • This is precisely what Apple and Google did in 2008 and 2012 which served as the launching point for the smartphone ecosystem which has become a massive industry.
  • Unsurprisingly, everyone else has followed suit and now there are kits for most of the major models as the contenders all jostle for position to become the go-to place to create generative AI services.
  • Ever since generative AI exploded into the public consciousness in 2022, I have thought that one of the superpowers of LLMs is the ability to converse with humans using natural language.
  • If one takes this ability and puts a realistic voice synthesiser in front of it, one has the makings of a system that can finally produce a man-machine interface that has a decent experience using voice.
  • 9 years ago, when Alexa, Siri and Google Assistant first appeared on the scene, we were promised the same level of performance, but it turned out that the AI was so poor that these assistants were not very good at anything beyond playing music, setting timers and turning the lights on and off.
  • The problem was natural language but with Google’s transformer architecture in 2017 and OpenAI’s work on LLMs, this problem has largely been solved and this is what OpenAI is attempting to capitalise on.
  • This is a smart move because it turns out that in the real world, LLMs have no real understanding of the tasks that they are asked to perform and so claims about true reasoning ability and super intelligent machines are highly dubious in my opinion.
  • OpenAI is turning to one of the true superpowers of LLMs and by allowing developers to include the real-time voice assistant on their apps and services, it is moving to capture an area where I think there is going to be real traction.
  • Corporate dreamers and popular fiction have speculated about voice as a man-machine interface for years and even though a lot of the messaging around generative AI is little more than hype, this is an area that I think is grounded in reality.
  • There are many use cases, but I think that the upgrading of the hapless voice assistants that already exist and voice in the vehicle are two that make the most immediate sense.
  • The only problem here is that to have a natural conversation, only a few 10s of milliseconds of latency are acceptable which means that the model needs to be implemented on the device from where the service is being requested and not in the cloud.
  • This is why the demos that we have seen from OpenAI and Meta so far use smartphones that are hardwired to the server where the agent resides, but this is not a viable proposition for anything in the real world.
  • This is just another reason why inference at the edge is going to be so important and in the next few years, smartphones, vehicles and smart home look like they will be segments that move first.
  • When it comes to models at the edge, OpenAI is well behind everyone else and so it is not inconceivable that unless OpenAI fixes this shortcoming, it will lose the race to someone else.
  • This is just another reason (see here for others) why I think OpenAI’s current raise at a pre-money valuation of $150bn is so problematic.
  • If I were forced to invest in AI directly, I would stick with Nvidia but the adjacencies of inference at the edge (where I own Qualcomm) and nuclear power (where I own physical Uranium and miners) continue to look more attractive to me.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.