Alibaba – ChatCCP

CCP hurts the bug, not the feature.

  • AI proponents will look at the immediate clampdown on large language model (LLM) chatbots by the CCP with dismay, but I think that for the use cases that I have identified for these systems, it won’t make much difference.
  • At its 2023 Alibaba Cloud Summit, Alibaba announced an LLM-based chatbot called Tonyi Qianwen (truth from a thousand questions) that it intends to integrate into its workplace collaboration products.
  • This follows moves by Baidu with Ernie and SenseTime with SenseChat but initial tests of these systems indicate that they have some way to go before they can catch up with ChatGPT.
  • This is probably due to the fact that they have not been working on this for as long as OpenAI, Google and so on and also due to the more limited data set.
  • These chatbots will have been largely trained on a Chinese data set which given the comments by the Cyberspace Administration of China (CAC) will have to have been very carefully scrubbed of unwanted data.
  • Hot on the heels of these launches the CAC has put its foot down and specified that these chatbots should support core socialist values, and not produce views that could undermine state power or the national unity of China.
  • Prior to launch, these chatbots have to be submitted to the CAC which will try and trick them into becoming revolutionaries and only when they pass the tests will they be released for public consumption.
  • This combined with the inability to purchase the latest processors for training and inference is going to make it almost impossible for China to produce chatbots that work like ChatGPT.
  • This is because I don’t think that with these restrictions, the Chinese will be able to make models anything like as large because they will have had to scrub all the data to ensure that it will be acceptable to the CAC.
  • With ChatGPT at 175bn parameters and GPT-4 at 2tn (RFM estimate), it would take 63,419 humans checking 1 parameter per second and working nonstop, one year to check every piece of data in the dataset for GPT-4.
  • This is a practical impossibility and so, I suspect that the Chinese will have to go with much smaller datasets that have been very carefully curated.
  • OpenAI et al, have no such constraints and so can effectively just dump the entire dataset of the internet into the system and see what pops out the other end.
  • I think that it is the fact that their creators have no real idea what is in the dataset that they used that has produced some of the craziness and gaffes that have been witnessed.
  • The rest is created by the fact that these systems have no causal understanding of what they are doing and so despite having ingested almost the entire knowledge of the Internet, they can’t do things that my 6-year-old can easily accomplish.
  • What I find to be really useful about these chatbots is their ability to comprehend human discourse and take into account context and circumstance as well as their ability to ingest and retrieve data.
  • With this use case, I find Bard to be better than Bing as it is more accurate in the data that it produces probably because it is using the Google search algorithm rather than Bing’s.
  • Consequently, these chatbots are like junior research assistants where a task is set but then the sources have to be checked in order to ensure accuracy.
  • Even including the time it takes to check the sources, this is a significant time saver as one no longer has to search for the data, merely check its accuracy.
  • This is why I think that in the medium term, the real use case for these chatbots remains the cataloguing of data without having to use a database and the man-machine interface in the vehicle (see here).
  • The $64,000 question in the case of China is will the more limited datasets, use and diligent scrubbing of data prevent the Chinese LLMs from performing these functions to the same level as their Western counterparts?
  • So far, their performance appears to have been underwhelming but I suspect that this is because of the high expectations that have been set by chatGPT but the use cases I have outlined do not require these models to appear sentient merely to be able to ingest and index data and properly understand requests.
  • A lot of this is down to the use of transformers which Google made available and effectively open-sourced in 2017 meaning that everyone has access to them.
  • Hence, on balance, I think that the Chinese LLMs should be able to fulfil these use cases and the regulation of the CAC will serve to remove the craziness that everyone seems to find so appealing but not the usefulness that will make these models economically viable.
  • China’s inability to access the latest silicon for training should not be a huge problem but it will take Chinese companies more time and more money to train models and run inference than their Western counterparts.
  • Although Chinese LLMs should be able to fulfil the use case I have outlined, they are going to be of little use outside of China and the Chinese language.
  • This is because they will have been trained only on Chinese data meaning that the minute they are asked to go outside that scope they are likely to go off the rails and I suspect their programmers will have added fail-safes to prevent them from answering in these situations.
  • Hence, I think that Chinese LLMs are going to be useful but the artificial limitations that are being placed upon them will mean that when it comes to pushing back the boundaries of AI, China may start to fall behind.
  • This is a new state of affairs as China is currently a leader in AI but placing a heavy burden of regulation could very well cause that crown to slip.
  • With its strong position in Chinese cloud infrastructure, Alibaba is well positioned to benefit from the rise of LLM usage in China, but it will remain a sideshow compared to e-commerce.
  • This is what will underpin any recovery of the share price which in itself is dependent on an economic recovery.
  • So far the CCP has been reluctant to stimulate the economy so one will just have to wait for it to recover on its own.
  • Alibaba trades at a fraction of its global peers and remains in a strong position in its home market and so I am happy to sit and wait.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.