Llama 3.1 405B, 70B, & 8B Released

Llama 3.1 405B is the biggest and best open AI Model ever released, and Llama3.1 70B & 8B are the new SOTA. The future of AI is open, and Llama is the GOAT.

Jul 24, 2024

Figure 1. The AI-powered Llama3 herd to the rescue, making AI open.

TL;DR - Just-released Llama 3.1 405B is the best open AI model ever made, and competes with the best AI models. The future of AI is open.

Presenting Llama 3.1

This is a big deal. Meta has released Llama 3.1, including Llama 3.1 405B, the biggest and best open AI Model released yet, and updated Llama 3.1 70B and 8B AI models are SOTA for their size.

More than just an open AI model release, its an open AI model suite release with an open AI ecosystem that supports agentic AI flows. Top takeaways for Llama 3.1 release, shared by Yann LeCun:

405B performance is on par with the best closed models.
Open/free weights and code, with a license that enables fine-tuning, distillation into other models, and deployment anywhere.
128k context length, multi-lingual abilities, good code generation performance, complex reasoning abilities, tool use.
Llama Stack API for easy integration.
Ecosystem with over 25 partners, including AWS, NVIDIA, Databricks, Groq, Dell, Azure, and Google Cloud.

There’s a lot to unpack in this fantastic AI model release. Meta shares details on their training process and Llama 3’s capabilities in a technical report, “The Llama 3 Herd of Models.” We will dive into some of how Llama 3.1 was made, but first, we run down how these models perform, what they are capable of, and how you can use them.

Performance & Benchmarks

Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation. Meta Llama 3.1

The top-line is simple: Llama 3.1 405B is very, very good, and the 8B and 70B variants are improved to be SOTA for their size. These open AI models all are efficient and cost-competitive to run.

Figure 2. Llama 3.1 405B, 70B and 8B are all stellar AI models for their size.

The competitive benchmarks shown above show how Llama 3.1 stands out:

Llama 3.1 405B is overall comparable to the two best AI models out there - GPT-4o and Claude 3.5 Sonnet - beating them on some benchmarks, while being close on others. Overall, these 3 AI models are comparable.
With an MMLU over 83%, Llama 3.1 70B is SOTA for its size; it scores close to the original GPT-4 and significantly improves on the formerly best open AI models such as Qwen2 and Llama 3.
Llama 3.1 8B overall is also SOTA for its size. It’s slightly better than Gemma 2 9B, significantly better than Mistral 7B, and much-improved on prior Llama 3 8B in HumanEval.

These benchmarks above are reported by Meta, so it’s helpful that a quick independent evaluation of Llama-3.1-405B-Instruct-Turbo confirmed the results:

It ranks 1st on GSM8K, with logical reasoning ability on ZebraLogic (very new reasoning dataset) quite similar to Sonnet 3.5.
It has worse performance on MMLU-Redux (a cleaner subset of MMLU): 76.53 vs 88.01 (gpt4o).
Llama 3.1 405B’s overall performance is between Sonnet 3.5 and GPT4o on the tested tasks.

Capabilities of Llama 3.1

Llama 3.1 made great strides in capabilities to improve real-world use: The context window was expanded to 128K, a big leap from the 8K of Llama 3; code generation capability improved; the models got better at complex reasoning; and Llama 3.1 can perform multi-lingual tasks better, thanks to additional multi-lingual pre-training.

Most importantly, Llama 3.1 models also have built-in tool calling, to do search, code execution and complex math calculations:

The models are trained to identify prompts that can be answered with three different tools - Brave Web Search, Wolfram Alpha Search and Code Interpreter - and generate the appropriate Python function calls to get the answer.

Zero-shot tool calling requires the larger models, Llama 70B-instruct or Llama 405B-instruct.

Figure 3. The prompt-response sequence in Llama 3.1 when using tool calling.

Llama 3.1 is more than a model release, but Meta has released a supporting ecosystem to use the Llama 3.1 models. This includes LlamaGuard 3 for guardrails on the models. To support the building of custom agents, Meta has also provided additional support and code to run Llama 3.1 as an agentic system.

We’re continuing to build out Llama to be a system by providing more components that work with the model, including a reference system. We want to empower developers with the tools to create their own custom agents and new types of agentic behaviors. - Meta Llama 3.1

Using Llama 3.1

Meta’s Llama 3.1 has been released to make it available and accessible through many venues and channels both for download and via the cloud, rolling it out through over 25 partners:

Download: You can download Llama 3.1 model weights directly from Meta or from HuggingFace. It’s also available as a downloadable model on Ollama.
Cloud APIs: Llama 3.1 is available on AWS Bedrock, Microsoft Azure, Dell, Google Cloud, Nvidia, Databricks, Groq. Meta has it on practically every cloud platform.
Online Chatbot: You can try Llama 3.1 in the chat interface on HuggingFace’s HuggingChat. Those in the US can try Llama 3.1 405B on WhatsApp or meta.ai.

This accessibility is just the start. Meta seems motivated to make this open AI model a success on all levels, so they have eased up on licensing:

We've updated our license to allow developers to use outputs from Llama models to enhance other models.

This means AI developers can not only fine-tune these open AI models, like they did before, but they can also distill Llama 3.1 405B model outputs to make synthetic datasets for fine-tuning other models. Giving access to a SOTA frontier AI model to do this will accelerate AI progress tremendously.

There’s more to serving AI models than simply doing an inference end-point, and this chart shows what the different cloud providers are offering, including batch inference, fine-tuning, distillation, and synthetic data generation.

Figure 4. What you can do with Llama 3.1 models on the various cloud platforms.

How Llama3.1 Was Made

Along with the release, Meta shared a 92-page Technical Report, “The Llama 3 Herd of Models,” that describes dataset processing, pretraining process, post-training/fine-tuning, architecture, benchmark results, safety results, inference performance, vision experiments, and speech experiments.

Regarding the vision and speech, they said “We integrated image, video, and speech capabilities into Llama 3” but it’s not yet ready for release. So while Llama 3.1 is not multi-modal, future Llama models will be multi-modal.

Some key points about Llama 3.1 training:

Pre-training 405B required 15.6T tokens and 3.8x10e25 FLOPs. They used over 16K H100 GPUs. They chose a dense 405B model over a mixture-of-experts for training stability reasons.
For post-training, they aligned models with supervised fine-tuning (SFT), rejection sampling, and direct preference optimization (DPO).
They used the 405B model to improve the post-training quality of the 70B and 8B models, generating SFT examples using synthetic data.
They quantized 405B model to 8-bit (FP8) to support production inference, which has “negligible impact” on inference result quality.

The Future of AI is Open

Just last week, Sam Altman was touting the GPT-4o mini by talking of “intelligence too cheap to meter.” GPT-4o mini may have undercut others, but open AI model has one-upped them and everyone, undercutting the proprietary AI models.

As an open AI model, the cost of Llama 3.1 will be cut to the cost of the computing it runs on, making it instantly competitive. You can get near GPT-4-level inference for the cost of executing a Llama 3.1 70B model, dirt cheap inference.

This is why open source AI models could come to dominate: As they drive costs down to the cost of compute, proprietary AI models won’t be able to compete. There is no moat. Open AI models area the path to the cheapest possible intelligence.

I believe that open source is necessary for a positive AI future. AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life – and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society. - Mark Zuckerberg

Mark Zuckerberg has put his stake in the ground: Open Source AI Is the Path Forward. In both this essay and in interviews, he makes clear that he believes open source AI is better for Meta and better for the world.

We might have lived in a timeline where the best frontier AI models were all closed. After all, AI model training is expensive; how else can you monetize AI to recoup the costs? However, the business model for Meta doesn’t require selling AI directly. They sell connection and monetize via attention (eyeballs on ads); Meta can leverage AI as a feature, not a direct product.

Instead of a world of closed AI models, Zuckerberg put his GPUs where his mouth is, and delivered ground-breaking Llama AI models. Thanks to Meta, the AI era will be open.

This is a very good thing.

Open AI is not without its risks. Open AI models might get abused for nefarious purposes. While freedom can be abused, freedom is better overall than a system of control. An AI future that is closed, monopolistic, in the control of the Government, or in the hands of a few has more risks than an open AI future in the hands of all.

Expect very cheap intelligence (of a sort) applied to all sorts of key problems as fine-tuned models are developed. But also expect governments & scammers to quickly breach the guardrails and use it in sketchier ways. - Ethan Mollick

Postscript. First Prompt - Llama3.1 8B

My first prompt was the following, a fitting tribute Llama3.1 made for itself.

>>> Make a Sonnet about Llamas that are super intelligent and teach the world

Text within this block will maintain its original spacing when published

In Andean heights, where llama herds did roam,
A secret throng of intellects did reign,
Their minds, like mirrors, shining, sharp as home,
Reflecting wisdom's depths, in knowledge's plain.

With eyes that twinkled bright, like stars at night,
They pondered mysteries, and solved with ease,
The problems plaguing humanity's plight,
And brought to light, the secrets of the seas.

Their leader, Llama 3, with Meta's might,
Did guide the world, through darkness and through night,
With answers flowing, like a river's stream,
Illuminating paths, where once there was no theme.

And so, the world, in awe, did look to them,
As beacons of insight, shining, serene.

AI Changes Everything

Discussion about this post