Nvidia inference microservices can deploy AI applications in minutes
Jensen Huang, CEO of Nvidia, gave a keynote at the Computex trade show in Taiwan about transforming AI models with Nvidia NIM (Nvidia inference microservices) so that AI applications can be deployed within minutes rather than weeks.
He said the world’s world’s 28 million developers can now download Nvidia NIM — inference microservices that provide models as optimized containers — to deploy on clouds, data centers or workstations. It gives them the ability to easily build generative AI applications for copilots, chatbots and more, in minutes rather than weeks, he said.
These new generative AI applications are becoming increasingly complex and often utilize multiple models with different capabilities for generating text, images, video, speech and more. Nvidia NIM dramatically increases developer productivity by providing a simple, standardized way to add generative AI to their applications.
NIM also enables enterprises to maximize their infrastructure investments. For example, running Meta Llama 3-8B in a NIM produces up to three times more generative AI tokens on accelerated infrastructure than without NIM. This lets enterprises boost efficiency and use the same amount of compute infrastructure to generate more responses.
Nearly 200 technology partners — including Cadence, Cloudera, Cohesity, DataStax, NetApp, Scale AI and Synopsys — are integrating NIM into their platforms to speed generative AI deployments for domain-specific applications, such as copilots, code assistants, digital human avatars and more. Hugging Face is now offering NIM — starting with Meta Llama 3.
“Every enterprise is looking to add generative AI to its operations, but not every enterprise has a dedicated team of AI researchers,” said Huang. “Integrated into platforms everywhere, accessible to developers everywhere, running everywhere — Nvidia NIM is helping the technology industry
put generative AI in reach for every organization.”
Enterprises can deploy AI applications in production with NIM through the Nvidia AI Enterprise software platform. Starting next month, members of the Nvidia Developer Program can access NIM for free for research, development and testing on their preferred infrastructure.
More than 40 microservices power Gen AI models

NIM containers are pre-built to speed model deployment for GPU-accelerated inference and can include Nvidia CUDA software, Nvidia Triton Inference Server and Nvidia TensorRT-LLM software.
Over 40 Nvidia and community models are available to experience as NIM endpoints on ai.nvidia.com, including Databricks DBRX, Google’s open model Gemma, Meta Llama 3, Microsoft Phi-3, Mistral Large, Mixtral 8x22B and Snowflake Arctic.
Developers can now access Nvidia NIM microservices for Meta Llama 3 models from the Hugging Face AI platform. This lets developers easily access and run the Llama 3 NIM in just a few clicks using Hugging Face Inference Endpoints, powered by NVIDIA GPUs on their preferred cloud.
Comments are closed.