Alibaba QwQ – 32bn Questions

More questions than answers.

  • Alibaba has produced a model that performs pretty well, but it is so small that China is once again challenging the Western “bigger is better” philosophy.
  • However, there are so few details about how this was created that it is impossible to know whether this represents another step forward for China or if it is merely a public relations exercise.
  • The new model is called QwQ-32B (see here) which has 32bn parameters and it performs extremely well against DeepSeek R1 (671bn), and o1-mini (estimates 100bn) on a number of the usual tests.
  • These tests have been carried out by Alibaba and so would have been chosen and run to make QwQ-32B look as good as possible, but at face value, it looks like an impressive achievement.
  • This flies directly in the face of the Western approach to generative AI which is that the bigger the models are made and the more data that is pumped through them and the more compute they expend in inference, the better they perform.
  • I have long thought that this approach, which has worked well for a few years, has reached the end of its usefulness and that a new approach is needed.
  • Furthermore, I have also been of the opinion that when it comes to innovations around AI efficiency, the Chinese would get there first (see here).
  • This is not because Chinese engineers are more brilliant than Western ones, but merely because export restrictions and capital limitations have forced them to do more with less.
  • By contrast, the Western players are flush with cash, have access to ever more powerful GPUs and have been able to focus solely on pushing the boundaries of what AI can deliver.
  • QwQ-32B looks important because if this kind of performance can be delivered with a 32bn parameter model, then pretty soon we will see it running on smartphones and laptops.
  • However, there are a lot of questions which Alibaba has not answered.
    • First: Inference where we have no idea how long QwQ-32bn is running inference before it delivers its answers or how this compares to everyone else.
    • Increasing compute time for inference, is a key strategy to improve models that are solving reasoning tasks.
    • Hence, QwQ-32bn may be able to perform particularly well by scaling up its reasoning time which means that it is not really as efficient as Alibaba would have us believe.
    • QwQ-32B is available in the open source and so 3rd party independent testing should be able to answer this question in time.
    • Second: training data, where we have no idea how much data was pumped through QwQ-32B to get it to perform at its current level.
    • Experiments by DeepMind (Chinchilla) some time ago demonstrated that by training models with more data, smaller models could be made to perform better than models 4 times their size.
    • If Alibaba has used this technique, then what it gains in model size it has lost in terms of training data meaning that it was not cheaper to train than its larger counterparts.
    • However, it will be cheaper to run inference given its smaller size and this is where a real saving could be found.
    • Third: Pruning which I have often referred to as the nuclear fusion of AI.
    • This is a technique where one can remove 90% of a model and see no degradation in its performance.
    • The problem with this is that it is so time-consuming to work out which parts make up the 90% of the model to remove that it is not worth the effort.
    • However, once again if one is purely focused on the efficiency of inference, this is a technique worth considering.
  • Alibaba has not said whether it has used any of these techniques and has made no claim about how much it costs to train and how much inference it consumes and so this development needs to be taken with a dose of caution.
  • The model is available on Hugging Face meaning that anyone can download and test it but the same issue as there is with DeepSeek also applies here.
  • This is because the National Security Law of China requires that AI technology of this nature is granted a licence from the CCP before it is allowed to be exported.
  • Alibaba will have had to have obtained a license from the CCP to export QwQ-32B immediately raising the question of why the CCP would allow important Chinese IP to fall into the hands of its rivals.
  • If independent testing verifies that QwQ-32B performs as well as its much larger rivals, this will be another feather in the cap of China’s reputation as an AI powerhouse which RFM Research and Alavan Independent suspect may have been the main reason to grant Alibaba an export license.
  • The net result is that if QwQ-32B performs as well as promised, it will accelerate the trend of deploying models on edge devices rather than in the cloud.
  • It is far cheaper for a service provider like OpenAI to deploy its models on edge devices because then it does not have to pay the cost of running inference.
  • This is by far the biggest draw for inference at the edge and QwQ-32B promises to take performance on edge devices to another level.
  • The main beneficiaries here are China (whose reputation is boosted once again), Alibaba that is proving its AI chops versus DeepSeek, Qualcomm, MediaTek, Arm and Broadcom who are all selling chips that can run inference on edge devices.
  • I continue to like this theme as a way to invest in the current AI craze and Qualcomm is the stock that I own.
  • I also own Alibaba which is becoming less and less of a drag on my portfolio and remain happy to stick with it and sit out the recovery now that it seems to be finally here.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.