MLPerf Inference v3.1: Introducing New LLM and Recommendation Benchmarks
New LLM and Recommendation Benchmarks in MLPerf Inference v3.1
MLPerf Inference, the benchmark suite that measures the speed of AI systems in executing models, has released its latest version, v3.1. This update introduces new benchmarks in the areas of Large Language Models (LLM) and recommendation systems.
Record Participation and Improved Performance
MLPerf Inference v3.1 has witnessed a significant increase in participation, with over 13,500 performance results. The benchmark suite has also demonstrated up to a 40 percent improvement in performance compared to previous versions.
Diverse Pool of Submitters
The achievement of MLPerf Inference v3.1 is distinguished by the diverse pool of submitters. A total of 26 different organizations have participated, including major tech giants like Google, Intel, and NVIDIA. Additionally, newcomers such as Connect Tech, Nutanix, Oracle, and TTA have joined the MLPerf Inference benchmark for the first time.
New LLM Benchmark
The latest benchmark in MLPerf Inference v3.1 is focused on Large Language Models (LLM). This benchmark utilizes the GPT-J reference model to summarize CNN news articles. It has garnered submissions from 15 different participants, highlighting the growing adoption of generative AI.
Updated Recommendation Benchmark
The v3.1 update also introduces an updated recommender benchmark that aligns more closely with industry practices. This benchmark uses the DLRM-DCNv2 reference model and larger datasets. It has attracted nine submissions and aims to push the boundaries of AI and ensure industry-standard benchmarks remain up-to-date with the latest trends in AI adoption.
Importance and Impact of MLPerf Inference
MLPerf Inference plays a critical role in evaluating the performance of AI systems in various deployment scenarios. These scenarios range from generative AI chatbots to safety-enhancing features in vehicles. The latest benchmarks in v3.1 are designed to provide valuable guidance to customers, vendors, and researchers in selecting the best AI accelerators for their respective domains.
Wide Coverage and Categories
MLPerf Inference v3.1 covers a wide range of domains, including computer vision, recommender systems, and language processing. The benchmark suite includes both open and closed submissions in the performance, power, and networking categories. Closed submissions use the same reference model to ensure fair competition among systems, while open division participants have the flexibility to submit various models.
Continued Relevance of MLPerf Inference
As AI continues to advance and integrate into different aspects of daily life, MLPerf’s benchmarks are crucial in evaluating AI technologies and shaping their future development. They provide a standardized and objective measure of performance, helping both industry players and researchers drive innovation.
Find the detailed results of MLPerf Inference v3.1 here.