Artificial Intelligence – Edge Debate.

Inference at the edge remains a no-brainer.

  • Now that LLMs are starting to show the limits of their performance, attention is likely to turn to use cases where RFM research has long argued that the real opportunity is to run inference at the edge of the network as opposed to in the cloud.
  • Delving into the subject more deeply leads RFM research to conclude:
    • First, inference could be much bigger than training: which I expect to be exacerbated by eventual consolidation in the number of models available.
    • There is no doubt that there is a financial correction coming which will cause the number of foundation models in circulation to fall as the weaker players run out of money.
    • This means that more services will be fine-tuned on fewer foundation models leading to fewer resources being deployed to create foundation models.
    • At the same time, inference will become concentrated on fewer models.
    • Resources required for inference are variable while training is fixed which is why as services are adopted, resources for inference could easily outstrip those required for training.
    • This is how the opportunity for inference could be bigger than the opportunity for training (which is where all the excitement currently is).
    • Second, Inference will be at the edge: wherever possible.
    • Most commentators will cite data security and privacy as the main reasons for this, but I tend to disagree as I think that economics is by far the bigger driver.
    • If the service provider runs the inference for its service in the cloud, then it bears all of the cost but if the service is run locally then the only cost borne by the service provider is the cost to deploy the service to the edge device.
    • There are issues around the implementation of models onto battery-powered devices and other problems with software fragmentation, but I think that the economic incentive is so strong that these problems will be overcome.
    • With such a strong incentive, service providers are likely to move their services to edge devices wherever they can with improvements in latency, security and privacy being the icing on the cake.
    • Third, NPU fragmentation is a problem.
    • These services will be targeted at the NPU (where they will run most efficiently) but fragmentation is going to be a problem.
    • There are already many players offering NPUs which can be used to run generative AI services, but all of these NPUs are different.
    • This means that a developer will have to port its service from one NPU to another to cover the whole opportunity which will be expensive and time-consuming.
    • Some may even decide to run the service on the CPU (less efficient) as this has far more consistency and will not involve porting when moving from one manufacturer to another.
    • This is where the Edge AI Ecosystem will become important (see below).
    • Fourth, Edge AI Ecosystem: which is in the early stages of being built.
    • This is the offering that chipset providers will make to developers in terms of making it easy to run their models on their hardware.
    • Qualcomm’s AI Hub is an example of this which features many pretrained, quantized and optimised models that developers can use as the starting point for running their services.
    • Arm’s Kleidi platform is a competitor but for the CPU while MediaTek has its NeuroPilot SDK.
    • It is the quality of these offerings and their ability to incentivise developers to use them that will determine which company wins a leading position in silicon optimised to run AI at the edge.
    • Fifth, Qualcomm leads but others are chasing.
    • Edge AI will begin in higher-end devices and Android smartphones, automotive and XR, Qualcomm already has market leadership which gives developers a big incentive to use the AI Hub to create services for those device categories.
    • If it gains early momentum, this will put it in a commanding position which will help it fend off the competition.
    • However, MediaTek’s partnership with Nvidia for automotive could prove to be very effective and it would make sense for this partnership also to address PCs as Qualcomm has the Windows-on-Arm market pretty much to itself at the moment.
    • MediaTek and Nvidia are two of the better-known competitors but there are any number of start-ups all looking to cash in on the Edge AI opportunity.
  • Inference has not emerged as quickly as I thought it would as 2024 has remained focused on training and the LLM arms race but sooner or later, investors will be asking for a return.
  • Inference is where the money is going to be made because it is only through inference that the service is delivered.
  • The net result is that inference at the edge could be bigger than training and predominantly deployed at the edge of the network.
  • Qualcomm is undoubtedly the leader in edge AI at the moment creating a great opportunity, but it will have to keep moving fast and executing effectively if it is going to stay ahead of the legions that also want to grab a piece of the action.
  • Hence, it is the best long-term way to play the AI theme without being forced to pay bubble-like prices.
  • Nuclear power is another way to play the AI theme but it is somewhat further removed from AI than inference at the edge.
  • I continue to own both Qualcomm and I have exposure to the nuclear fuel industry in my portfolio.

 

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.

Blog Comments

[disclosure: I work for Microsoft but not in any area related directly to AI, so this is personal curiosity]

Do you see something like DirectX but for NPUs being something OS vendors could create? Thus abstracting the differences in NPUs by the developer being able to write to “NPUX” instead of directly to the chip, the way game developers don’t have to write games directly targeting different GPUs?

I know very little about NPUs, so maybe this makes no sense. But it seems like invariably hardware gets abstracted into software, and I’m wondering if the chip vendors are the only ones who can do that.