Meta Platforms – Llama-Seek

The fightback begins.

  • Meta is demonstrating what RFM Research and Alavan Independent have suspected which is that DeepSeek’s methods are not that hard to replicate and that we are seeing the start of a race to the bottom in open source that will soon be replicated elsewhere.
  • Meta has released Llama 4 which does not show any sudden jump in performance but comes in new shapes and sizes and with new structures and techniques that make it much more efficient to train and operate.
  • Llama 4 (see here) is currently available in three flavours all of which use the Mixture of Experts architecture that was first used (but not invented) by DeepSeek R1 in January 2025.
  • I suspect that more Llama 4 variants will be released over time with some certain to come at LlamaCon on April 29th, which looks like the replacement of the old F8 developers conference last held in 2021.
    • First, Llama 4 Behemoth: which is 2tn parameters in size with 16 experts of which 288bn are active when one of the experts is engaged.
    • This is primarily a distillation model meaning that it is designed to be used to help run post-training on smaller models and make them more performant.
    • Second Llama 4 Maverick: which is a 400bn parameter model with 128 experts of which 17bn parameters will be active when one of the experts is engaged.
    • This looks like the evolution of Llama 3.2 405bn has been designed to be the flagship in terms of being used to answer requests.
    • Third, Llama 4 Scout: which is currently the smallest at 109bn with 16 experts of which 17bn parameters will be active when one of the experts is engaged.
    • This model has a massive 10m token context window which is 5x bigger than Gemini and is as far as I can tell by far the largest available.
    • This would translate to roughly 27,000 pages or about 75 novels.
    • Clearly, the main use case for this variant is to upload massive amounts of data and then be able to answer questions about it.
  • The performance of these models is unremarkable in that they are in line or just ahead of their relevant peers, but what Meta is really highlighting is that they are cheaper to use.
  • Meta claims that Maverick (400bn) costs $0.19-$0.49 per $1m input tokens compared to DeepSeek v.3.1 at $0.48 and GPT-4o at $4.38.
  • Digging into the footnotes, I would estimate that the real cost will be something closer to DeepSeek leaving Gemini as the cost leader although there are too many unknown parameters to have a really clear view of how this will shake out.
  • It looks to me like the focus of Llama 4 is on competing in the open-source community where Meta has adopted a number of the techniques that were used by DeepSeek to produce R1 in January 2025.
  • Specifically, these would be Mixture of Experts and quantisation where Meta is now quantising in the cloud down to FP8 and even showed some data on its smallest model running at FP4.
  • RFM and Alavan Independent have written about these techniques in depth and concluded that these were the two that produced most of the savings claimed by DeepSeek (see here).
  • These techniques are already spreading quickly across China, and I expect that the other open-source model providers will adopt these techniques almost as quickly as Meta has.
  • Google claims (without a shred of evidence) that it is already running more efficiently than DeepSeek R1, but I have my doubts given the huge pile of cash that has to invest in AI and the tendency of Western model makers to shoot for superintelligent machines by making models bigger with more compute rather than focusing on efficiency.
  • I think it may now look to adopt these techniques as it comes under pricing pressure and so the scene is set for a race to the bottom in terms of pricing.
  • This is yet another sign that the world of foundation models is commoditising fast meaning that pricing and developer traction will be key to building the next digital ecosystem.
  • There is no doubt that OpenAI currently leads this race but both Meta and Google have billions of users that they can migrate to their services meaning that OpenAI will have to fight hard to keep up.
  • The net result is that this brings a correction closer as returns from investment fall due to pricing pressure and service providers start to miss their targets.
  • I am not in a hurry to pick up any of these names which are now a bit cheaper thanks to tariff and trade war uncertainty which looks set to continue for at least a few more sessions.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.

Leave a Comment