Grok Revealed

What the open-sourcing of Grok tells us about Grok ... and open AI models

Mar 19, 2024

open-sourcing Grok-1 is a massive deal because Elon has officially picked a side, and it's open-source AI! - Bindu Reddy

Grok Opens Up

x.AI has opened up Grok-1, releasing base model weights and the code used to train Grok-1, turning this into a significant open AI model. With an X announcement (“Weights in Bio”), a torrent magnet link, a GitHub repo, and a blog post, they fulfilled Elon Musk’s promise and gave us great insight into their LLM.

What the release reveals is that Grok-1 LLM is a 314 billion parameter 8-way Mixture-of-Experts (MoE) model, with 2 experts active for each token, and at inference time 86 billion parameters or about 25% of the weights will be active. This seems to be an 8x38B MoE, which is similar to the Mixtral 8x7B model but much larger. .

The Grok-1 release reveals key details about its training and architecture: xAI built the model using a custom training stack built on JAX and Rust. It has a context window length of 8192 tokens, utilizes a vast vocabulary of 131,072 tokens, and has a deep 64-layer architecture.

This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.

Grok-1 Performance

The actual Grok-1 released weights was not a fine-tuned LLM but a “base model trained on a large amount of text data, not fine-tuned for any particular task.” This means that base model might not perform as well as other models, but it leaves the opportunity for fine-tuning to improve on it further.

In general terms, Grok-1 capabilities on benchmarks and user reviews show it to be a slightly better than the original GPT-3.5 ChatGPT, with Grok-1 scoring 73% on MMLU and 63% on HumanEval.

Figure. Grok-1 (column 5) is a better-than-GPT-3.5 AI model.

In his tests of Grok-1, Matthew Berman and others found Grok to be pretty good at math and logic questions, generally uncensored, with some failures that would not be unexpected for a GPT-3.5-level LLM.

Opening Grok Opens Doors for Grok

As a closed model, Grok was a big of an also-ran, but now Grok is one of the biggest and best open AI models out there now, surpassing Llama2 70B. However, while a decent model, there are better and more efficient open AI models, like the higher-scoring Yi 34B and Mixtral 8x7B.

Grok’s large size is perhaps its biggest disadvantage for open source hobbyists trying to use it. You can’t even fit it on an 80GB GPU, let alone a consumer grade RTX. Within 24 hours of the weights being opened to the world, Grok-1 has been served on the cloud, but it took 8 H100 GPUs to serve it.

Grok has not been quantized yet, which could make it smaller, but even quantized, it’s hard to imagine this being run locally.

It’s a valid question as to why Grok-1 isn’t better for its size. For a given AI model size, AI model quality comes down to dataset quality and data / compute volume. It could a consequence of either undertraining (they trained this model fairly quickly, after all), or that the X.AI twitter-centric dataset yields a lower quality LLM.

Why we need Open AI Models

One can argue the legal merits of Elon Musk’s lawsuit (weak), and point out that Elon Musk has his own agenda for doing this. It seems Elon Musk needed to make a point to respond to OpenAI’s release of communications that indicate Elon was on-board OpenAI’s plans to not open source AI models.

Whatever the motivation, Elon Musk ended up doing the right thing.

As Bindu Reddy says, Elon picked a side, and it’s the right side:

Grok-1 has decent metrics, though it will need some fine-tuning to make it usable. We at Abacus AI will start working on it and should have an update/release in a few weeks. The model is too large for the open-source community to iterate on it, but I expect functional quantized versions to be available in the next month. However, open-sourcing Grok-1 is a massive deal because Elon has officially picked a side, and it's open-source AI! This is a seminal moment and a significant step towards open-source AGI!!

Open AI models give more transparency to AI users; open AI models help accelerate progress, as innovations are shared and refined). Finally, contrary to the view of some regulation-happy AI worriers, open AI models reduce AI risks. Transparency, progress and diversity of choices through open AI models makes AI safer and more secure.

Apple Enters MM1 Into the MLLM Race

Apple just published a new paper unveiling MM1, a new family of multimodal AI models — with the largest at 30B parameters. They said:

MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks.”

What is remarkable about this paper and the model is both the tremendous progress being made with MM1, but also where it is coming from. Apple isn’t known for being an open company. Their ecosystems are famously controlled. Apple has also been seemingly missing-in-action on the AI front. Yet even Apple surprises us with good results and open research.

Today also brought us news of Apple making an agreement with Google to power some of it’s AI features. There are rumors Apple is up to something big with Siri, and many believe the next Siri update will be Apple’s ChatGPT-like moment.

MM1 could be viewed as a preview of what the next Siri will look like, and Gemini could be either Apple’s backup plan or a complementary part of their AI offering. Apple may need to provide the kind of assistant tools that Microsoft offers in co-pilot.

Apple’s moves towards building and adopting AI are another sign that there won’t be just a few players building AI models and applications; most tech companies are doing something in AI. They have to. There is no moat.

Summary

Open-source AI fosters innovation. It allows for continuous fine-tuning and optimization of base LLMs after their release. This dynamic contrasts with the concentrated power of a few companies possessing the immense resources (talent and computing power) required to train high-quality AI models from scratch.

When more AI model builders choose to open their AI models, the more powerful open AI models become, which accelerates AI progress. This can become a powerful flywheel advancing AI. Open-sourcing Grok-1 elevates its significance and impact.

Regardless of motivations, Elon and X.AI have provided a valuable gift - a capable AI model and transparent insights into its creation. As Freedom with AI says “Team Humanity, Embrace Open Source.”

AI Changes Everything

Discussion about this post