Artificial Intelligence – Agent Unreliable

An unreliable agent is worse than useless.

  • Agentic AI is all the rage but a recent product launch from OpenAI demonstrates that LLMs lack some of the essential qualities to be good agents meaning that a lot more needs to be done before the hype can become a reality.
  • OpenAI has released in beta a new service called Tasks that allows the user to ask ChatGPT to do things on their behalf ahead of time.
  • These tasks are the sorts of things that users have been asking Siri, Google and Alexa to do for years like setting timers, reminding the user to do something every day at a certain time and so on.
  • This sounds great but as usual, the devil is in the details.
  • Using ChatGPT to set reminders has already been demonstrated to be replete with pitfalls.
  • Tests have seen ChatGPT use the wrong time zone, change the time of the reminder randomly, fail to issue the reminder in its entirety or fail to issue a push notification when it promised it would.
  • In short, this executive assistant function is the equivalent of having the most lazy, incompetent and inefficient human imaginable working for you.
  • Consequently, the service is worse than useless because relying on it is likely to make the user’s life worse than just using the calendar or reminder function that every smartphone has had for years.
  • It looks to me like OpenAI felt forced to rush this out in order to be seen to have something in this space since agents are all anyone is talking about in AI at the moment and OpenAI cannot afford to be seen to be left behind.
  • It is also another clear sign that these systems have no causal understanding of what it is that they are doing.
  • If you ask a system based on software to set a reminder, it will do so and as long as the software continues to function properly, it will not fail.
  • This is because ruled-based software understands causality and can reason explaining why hardly anyone buys an alarm clock anymore.
  • However, based on this demonstration we are years or decades away from the time when generalised LLMs like ChatGPT are capable of the required reliability.
  • Instead, what is much more likely is that smaller models that have been specifically trained to be the user interface in a car or speaker end up being used.
  • With a specific use case, it’s possible to work out 90%+ of all the requests that will be made and train the model to handle them.
  • Hence, with the model being trained for almost every possibility, the reliability will be far better and because the model is smaller, it will be much cheaper to implement and run.
  • Smaller models will be able to run on the edge device giving a large economic benefit to the owner of the model (who does not have to run it in the cloud) as well as lower latency, better security and privacy.
  • This is how developers of AI services can achieve good reliability which will be crucial for user adoption.
  • I am not yet sold on the idea of using AI agents as the main user interface on a smartphone, but the automobile is another matter entirely.
  • This is because when the user is driving, a touch-based icon grid is an inferior user experience, meaning there is a big uplift in utility if the model creator can get the voice-based experience right.
  • This is one of the superpowers of the current crop of generative AI and that we are finally in a position to be able to use voice or natural language as the man-machine interface.
  • Hence, I think that the future of AI agents is in small models trained specifically for a certain task as opposed to using a general model like ChatGPT.
  • I suspect that the vehicle will be the real test of agentic AI and it is there where I am looking for a proof point that humans can at last use natural language to communicate with and issue commands to machines.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.

Leave a Comment