ETFOptimize | High-performance ETF-based Investment Strategies

Quantitative strategies, Wall Street-caliber research, and insightful market analysis since 1998.


ETFOptimize | HOME
Close Window

New MLPerf Inference Benchmark Results Highlight The Rapid Growth of Generative AI Models

With 70 billion parameters, Llama 2 70B is the largest model added to the MLPerf Inference benchmark suite

Today, MLCommons® announced new results from our industry-standard MLPerf® Inference v4.0 benchmark suite, which delivers industry standard machine learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner.

MLPerf Inference v4.0

The MLPerf Inference benchmark suite, which encompasses both data center and edge systems, is designed to measure how quickly hardware systems can run AI and ML models in a variety of deployment scenarios. In order to keep pace with today’s ever-changing generative AI landscape, the working group created a new task force to determine which of these models should be added to the v4.0 version of the benchmark. This task force analyzed factors including model licenses, ease of use and deployment, and accuracy in their decision-making process.

After careful consideration, we opted to include two new benchmarks to the suite. The Llama 2 70B model was chosen to represent the “larger” LLMs with 70 billion parameters, while Stable Diffusion XL was selected to represent text-to-image generative AI models.

Llama 2 70B is an order of magnitude larger than the GPT-J model introduced in MLPerf Inference v3.1 and correspondingly more accurate. One of the reasons it was selected for inclusion in the MLPerf Inference v4.0 release is that this larger model size requires a different class of hardware than smaller LLMs, which provides a great benchmark for higher-end systems. We are thrilled to collaborate with Meta to bring Llama 2 70B to the MLPerf Inference v4.0 benchmark suite. You can learn more about the selection of Llama 2 in our deep-dive post.

"Generative AI use-cases are front and center in our v4.0 submission round,” said Mitchelle Rasquinha, co-chair of the MLPerf Inference working group. “In terms of model parameters, Llama 2 is a dramatic increase to the models in the inference suite. Dedicated task-forces worked around the clock to set up the benchmarks and both models received competitive submissions. Congratulations to all!"

For the second new generative AI benchmark, the task force chose Stability AI’s Stable Diffusion XL, with 2.6 billion parameters. This popular model is used to create compelling images through a text-based prompt. By generating a high number of images, the benchmark is able to calculate metrics such as latency and throughput to understand overall performance.

“The v4.0 release of MLPerf Inference represents a full embrace of generative AI within the benchmark suite,” said Miro Hodak, MLPerf Inference co-chair. “A full third of the benchmarks are generative AI workloads: including a small and a large LLM and a text-to-image generator, ensuring that MLPerf Inference benchmark suite captures the current state of the art.”

MLPerf Inference v4.0 includes over 8500 performance results and 900 Power results from 23 submitting organizations, including: ASUSTeK, Azure, Broadcom, Cisco, CTuning, Dell, Fujitsu, Giga Computing, Google, Hewlett Packard Enterprise, Intel, Intel Habana Labs, Juniper Networks, Krai, Lenovo, NVIDIA, Oracle, Qualcomm Technologies, Inc., Quanta Cloud Technology, Red Hat, Supermicro, SiMa, and Wiwynn.

Four firms–Dell, Fujitsu, NVIDIA, and Qualcomm Technologies, Inc.–submitted data center-focused power numbers for MLPerf Inference v.4.0. The power tests require power-consumption measurements to be captured while the MLPerf Inference benchmarks are running, and the ensuing results indicate the power-efficient performance of the systems tested. The latest submissions demonstrate continued progress in efficient AI acceleration.

“Submitting to MLPerf is quite challenging and a real accomplishment,” said David Kanter, executive director of MLCommons. “Due to the complex nature of machine-learning workloads, each submitter must ensure that both their hardware and software stacks are capable, stable, and performant for running these types of ML workloads. This is a considerable undertaking. In that spirit, we celebrate the hard work and dedication of Juniper Networks, Red Hat, and Wiwynn, who are all first-time submitters for the MLPerf Inference benchmark.”

View the Results

To view the results for MLPerf Inference v4.0 visit the Datacenter and Edge results pages.

About MLCommons

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets, and best practices.

For additional information on MLCommons and details on becoming a member or affiliate, please visit MLCommons.org or contact participation@mlcommons.org.

"Generative AI use-cases are front and center in our v4.0 submission round”

Contacts

Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the following
Privacy Policy and Terms Of Service.


 

IntelligentValue Home
Close Window

DISCLAIMER

All content herein is issued solely for informational purposes and is not to be construed as an offer to sell or the solicitation of an offer to buy, nor should it be interpreted as a recommendation to buy, hold or sell (short or otherwise) any security.  All opinions, analyses, and information included herein are based on sources believed to be reliable, but no representation or warranty of any kind, expressed or implied, is made including but not limited to any representation or warranty concerning accuracy, completeness, correctness, timeliness or appropriateness. We undertake no obligation to update such opinions, analysis or information. You should independently verify all information contained on this website. Some information is based on analysis of past performance or hypothetical performance results, which have inherent limitations. We make no representation that any particular equity or strategy will or is likely to achieve profits or losses similar to those shown. Shareholders, employees, writers, contractors, and affiliates associated with ETFOptimize.com may have ownership positions in the securities that are mentioned. If you are not sure if ETFs, algorithmic investing, or a particular investment is right for you, you are urged to consult with a Registered Investment Advisor (RIA). Neither this website nor anyone associated with producing its content are Registered Investment Advisors, and no attempt is made herein to substitute for personalized, professional investment advice. Neither ETFOptimize.com, Global Alpha Investments, Inc., nor its employees, service providers, associates, or affiliates are responsible for any investment losses you may incur as a result of using the information provided herein. Remember that past investment returns may not be indicative of future returns.

Copyright © 1998-2017 ETFOptimize.com, a publication of Optimized Investments, Inc. All rights reserved.