Grok 3 – Spoilt for Choice

The count of equally good models continues to rise

  • xAI has released the latest version of its Grok model and, while it scores very well on all of the most advanced benchmarks, it struggles like everyone else with the basic stuff implying once again that LLMs alone are not going to deliver the superintelligence that the industry craves and that the valuations are too high.
  • One large data centre and 200,000 GPUs later, Grok 3 was born having consumed 15x the compute of Grok 2 and likely to continue to consume huge resources as it begins to answer questions.
  • On the stuff that is used to market these products, Grok 3 performs better than Grok 2 and better than its peers on the usual maths and coding tests.
  • When it comes to “reasoning” it also does pretty well beating out DeepSeek R1 and o3 on the usual AIME24 and GPQA benchmarks but only marginally so.
  • Consequently, Grok 3 delivers an incremental improvement over what has gone before but when one considers the 15x increase in compute that was required to deliver it, it looks to me like we are firmly in the arena of diminishing returns.
  • This is in no way limited to Grok and is why there is no GPT-5 (see here) and is also a factor in how everyone has been able to match OpenAI’s performance in a relatively short time.
  • The net result is that we have a large range of models from different parts of the world that are roughly equivalent in terms of their capability and from what their makers choose to demonstrate, these capabilities look amazing.
  • However, just like all of its predecessors and its peers, Grok 3 completely falls over when it comes to very simple things.
  • Grok 3 fails to be able to draw a picture of a person writing with their left hand and cannot draw a picture of a common English word on a piece of paper with the vowels circled.
  • Once again, to be fair to Grok, all models fail these simple tests unless they have been specifically programmed which is a sign of RFM’s long-held view that these models have no understanding of causality.
  • They also continue to fail the simplest of reasoning tests supporting RFM’s view that these models are incapable of true reasoning.
  • This is important because if a statistical-based system can be taught to truly reason rather than this incredibly sophisticated simulation of reasoning, then we will truly be on the path to super-intelligent machines.
  • This doesn’t mean doing PhD maths with a high score, it means being able to reason that if A=B then it follows that B=A without having to be given the data for both directions.
  • It means being able to draw pictures of people writing with their left hand, driving vehicles as well as humans in all scenarios and telling and reproducing the time on an analogue clock.
  • Of this, there is no sign despite hundreds of billions of dollars being spent on both compute and engineering salaries leading me to think that this AGI is not going to be solved with LLMs alone.
  • The problem is that there is so much hype and excitement around LLMs that other avenues of scientific enquiry are being ignored and underfunded.
  • Hence, I think that expectations of super-intelligent machines powered by LLMs are not going to materialise and so there is a correction coming.
  • Despite the limitations, LLMs have very large and likely to prove very lucrative use cases, and so the correction will be nothing like as severe as the internet bubble of 1999 – 2000.
  • However, many of the AI startups will not survive and so they will most likely be absorbed into the bowels of large companies setting up another battle for the ecosystem fought by the large players as well as between the USA and China.
  • Nvidia is one of the ones that will suffer the least as it is the only one that is making real money from generative AI right now and even with a correction in demand for its processors, it will suffer much less than those offering generative AI services for $20 per month.
  • Hence, if I were forced into direct investment in the generative AI sector, I would choose Nvidia, but I prefer to look more laterally at inference at the edge or nuclear power to run all of these new data centres that are springing up.
  • I have positions in both of these themes.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.

Leave a Comment