NVidia's AI Model Foundry

Custom AI models, trained and served on the cloud

Mar 25, 2023

TL;DR - Already the pickaxe-seller for AI model builders, NVIDIA aims to be the "TSMC for custom, large language models” with their cloud-based foundry for AI Foundation Models.

Akin to the pickaxe sellers in the GoldRush, NVIDIA has been at the heart of the AI revolution, supplying the high-performance GPUs that provide the massive compute power to build the large Foundational Models like GPT-4. GPT-4 itself was trained over a period of several weeks on 10,000 NVIDIA GPUs.

In his NVIDIA 2023 GTC Keynote, CEO Jensen Huang touted NVIDIA’s record and history in delivering “accelerated computing” to supply the compute for the AI, touted the 500X increase in performance from GPUs in past decade on AI-related tasks, and declared now is the “iPhone moment for AI”.

AI is front and center for NVIDIA, as they aim to “bring AI to every industry” and they put a large emphasis on AI enablement, with announcements such as:

Their next-gen Hopper GPU-based H100 NVL with dual-GPU NVLink, claiming 10 times improvement over the A100.
The Omniverse robotic simulation environment, used to accelerate robot training and applications in transportation, manufacturing and other industrial and consumer markets. NVIDIA also announced Omniverse Cloud, a fully managed cloud service for Omniverse hosted on Azure.
DGX Cloud (DGX Supercloud) – an AI supercomputing service that pairs NVIDIA DGX™ AI supercomputing with NVIDIA AI software for training advanced AI models.

And one more important one for Foundation AI Models …

Nvidia’s AI Model Foundries: NeMo, Picasso and BioNeMo

For users of foundation AI models in building AI-enabled applications, the most interesting GTC 2023 announcement from NVIDIA was expanding their “Foundation Model as a Service” offering. NVIDIA announced the launch of Picasso for support of Generative AI models, augmenting the customizable Foundation Models they support, adding to the LLMs (in NeMo) and bio-medical Foundation Models (in BioNemo) they announced support for last year.

The three services - NeMo, Picasso, BioNemo - provide cloud-based support for building AI models on top of trained Foundation models, customizing those models (eg for enterprise or vertical use cases) and deployment, where users can use the cloud infrastructure to serve those customized AI models.

The specific AI models supported are:

NeMo - Large Language Models: GPT-8, GPT-43, GPT-530, Inform, and BLOOMZ-T0, the open-source community-built model on 100 languages.
Picasso - Generative AI Models for text-to-image, text-to-video, text-to-3D: Edify Foundation Models. NVIDIA has partnered with Adobe, Shutterstock, and Getty Images to provide training assets and create new text-to-image and text-to-video models within this service.
BioNemo - Generative AI models for drug discovery and biology: AlphaFold 2, ESMFold, and OpenFold for 3D protein structure prediction; ESM-1nv and ESM-2 for protein property predictions; ProtGPT2 for protein generation; MegaMolBART and MoFlow for small molecule generation; DiffDock for predicting the binding structure of a small molecule to a protein.

This offering shows NVIDIA is going beyond a pure hardware provider and becoming an enterprise cloud AI environment provider, providing a “Platform-as-a-service” for custom AI model builders. A key word in this is custom. They are not competing with those building models and offering them via API endpoints or embedded in applications. Rather, NVIDIA is smartly building this PaaS environment on their own DGX computing clusters, ensuring the hardware stack for large AI model development and deployment rests on NVIDIA chips.

Conclusion - The Road Ahead

NVIDIA positions itself vs CPUs as “accelerated computed” and is proclaiming “Moore’s Law is coming to an end”. With transistor width now a mere 8 to 10 atoms wide, they are correct about the physics of chip scaling. It will end soon … but not yet. NVIDIA’s Hopper GPU was built using a cutting edge TSMC 4 nm process. TSMC, Intel and Samsung all have roadmaps to continue scaling for the next 5 years, all the was down to 1.4nm or 14 Angstroms. See this chip fab roadmap:

NVIDIA claims that GPU based supercomputers are ten times more power efficient than CPU-based supercomputers, mainly due to the higher power efficiency of using many smaller compute engines in parallel versus larger faster CPUs. More massive parallelism is possible in AI use cases, and parallel computing frameworks for AI like ColossalAI are making training AI models more efficiently parallelizable.

With several generations of chip scaling for the rest of the decade, NVIDIA can provide chip offerings that greatly increase the number of GPUs per chips and system. Combining these with more parallelism in AI model training will enable several orders of magnitude increase in compute performance and efficiency in building tomorrow’s AI models. We could foresee a 1000-fold improvement in the speed of training a large AI model, or conversely we might scale up training compute effort 1000-fold for AI, perhaps in the next 2 to 4 years.

The demand for ever more compute to make ever larger and better LLMs and other Foundation Models will continue. NVIDIA has a roadmap to meet that demand and keep NVIDIA growing for a long time.

PS. As I finish this article, I hear the news that Gordon Moore, the visionary engineer and co-founder at Intel who conceived of Moore’s Law, has just passed away at age 94. RIP to a legend who helped build the technologies that got us this far. We ride on the shoulders of giants.

AI Changes Everything

Discussion about this post