Alibaba AI – The Edge Game

Inference at the edge moves forward.

  • Alibaba has added to the ever-increasing deluge of AI models with the release of Qwen3 which it claims is better, faster and more efficient than anything that DeepSeek has produced, adding weight to my view that DeepSeek may have been unique only because it was first.
  • It also advances the case for running models on edge devices as smaller models are becoming more capable and require fewer resources to run, making them better suited to devices like laptops and smartphones.
  • Alibaba has released Qwen3 which is available in 8 formats, two of which look like they are making use of the techniques that allowed DeepSeek to shock the world in January 2025.
  • The flagship model Qwen3-235B-A22B is a 235bn Mixture of Experts (MoE) model where each expert consists of 22bn parameters and can be activated independently depending on requirements.
  • The second MoE model is Qwen3-30B-A3B which is 30bn parameters with each expert being 3bn parameters and able to be activated independently.
  • The other 6 are dense models and come in 32, 14, 8, 4, 1.7 &0.6 billion parameters representing a significant step down in terms of size for what is usually released as a foundation model.
  • However, despite the smaller size, performance is reported by Alibaba to be on par with the current generation of models of similar or somewhat larger size.
  • For example, Qwen3-30B-A3B competes effectively with DeepSeek R1 (670bn parameters), OpenAI_o1 (>100bn parameters) and Gemini 2.5-Pro (>100bn parameters).
  • This is what is interesting about Qwen3-30B-A3B, as its small size and being MoE make it better suited for edge devices than its predecessors.
  • All of the models are available with their weights in the open source community which leads me to think that Alibaba’s testing is a reasonable representation of reality, as anyone can now download and test Alibaba’s claims.
  • The real story here is not the test scores themselves but the fact that Alibaba is able to produce competitive scores with much smaller models than before.
  • Qwen3 only offers incremental improvements over older versions, but because it is now using the MoE architecture, it can be much more efficient.
  • MoE means that the model is broken up into a number of “mini-models” which work together to answer queries and also means that only the bits that are needed are activated as opposed to having all of the model running all of the time.
  • This is one of the core changes that DeepSeek made that allowed it to be trained and run inference more efficiently.
  • However, it is clear that this is not unique to DeepSeek, as many have already used the technique and made claims very similar to DeepSeek.
  • This tells me that DeepSeek has no real competitive advantage and that all of China and the West will soon be running just as efficiently (or even more so) making DeepSeek look like a flash in the pan.
  • Hence, it looks like 2025 will see a large increase in efficiency as the techniques that DeepSeek was the first to use spread like wildfire both through China and the rest of the world.
  • I do not see this as bad news as cheaper AI will mean that it can have more economically viable use cases and therefore be more widely used, resulting in a larger opportunity rather than a smaller one caused by falling requirements.
  • It is bad news for companies that believe that their gigantic models give them protection from competition, which is increasingly being shown not to be the case.
  • The model to watch here is Qwen3-30B-A3B which performs as well as many of its much larger peers but is only 30bn parameters in size and not all of its experts are active at once.
  • This means that it needs fewer resources to run meaning that these sorts of models will soon be commercially viable to be run on edge devices (smartphones and laptops).
  • Running models on the edge makes a lot of economic sense for service providers as they no longer have to pay for the compute to run inference meaning that they will deploy to the edge where they can.
  • This is where the example of Qwen3-30B-A3B is so interesting as it meaningfully raises the bar in terms of what is possible on an edge device, meaning that more inference can be run at the edge than before.
  • Meta and now Alibaba are demonstrating that DeepSeek’s techniques are relatively easy to emulate and so I think the impact will be a stepwise increase in efficiency across the industry.
  • This is good news for those that benefit from more AI being run on edge devices where MediaTek, Qualcomm and Arm are some of the most exposed.
  • This remains one of my favourite themes in AI, and I continue to have a position in Qualcomm to gain exposure to it.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.

Leave a Comment