Is AMD narrowing the AI gap on Nvidia?

2023-07-05
关注

  •  

AMD-built artificial intelligence chips are “almost” as fast as the industry leading devices from Nvidia. That is according to a new study by Databricks-owned AI software company MosaicML which found AMD’s technology achieved 80% of Nvidia’s performance when training large language models and performing other AI-intensive tasks.

MosaicML put the AMD MI250 against the Nvidia A100 and had both train different sized large language models (Photo: Jimmy Tudeschi / Shutterstock)
MosaicML put the AMD MI250 against the Nvidia A100 and had both train different sized large language models (Photo: Jimmy Tudeschi / Shutterstock)

Nvidia currently dominates the market when it comes to training AI models such as those used to run ChatGPT or Midjourney. The success of these products and demand for compute power has pushed Nvidia to a $1trn valuation and sparked a shortage of GPUs. 

MosaicML recently put AMDs M1250 GPUs to the test against the Nvidia A100s. Both devices, which are one generation behind their respective developer’s top of the range chip, were used to train large language models, with researchers finding that the AMD and Nvidia chips both worked “out of the box” in training the models and AMD had about 80% of the Nvidia performance.

The team trained models ranging from one billion to 13 billion parameters, similar to those being used in enterprise to provide AI-driven tools for search and summary of large company datasets.  They were trained on a single node of four GPUs and found the throughput of the MI250 was within 80% of the A100s. The MI250 had a slight edge in terms of floating-point operations per second and memory, which according to MosaicML allows for larger models per GPU.

The company plans to profile larger models on larger clusters of GPUs to confirm whether the AMD systems can perform at scale and are doing so in partnership with hyperscalers. There are also plans to create inference benchmarks and use other models like diffiusion models on both systems to test a wider range of options.

While the chips weren’t the top-tier products from each company, both are widely used in datacentres and in training AI models. MosaicML says new ML training hardware is necessary to “increase compute availability amid the Nvidia supply crunch”.

AMD driven by software

MosaicML says the AMD performance was related to a new version of the vendor’s software that was released last year and interacts with open-source AI software PyTorch. Hanlin Tang, MosaicML CTO says further software updates from AMD for the MI250 will allow it to match the performance of the Nvidia A100 by the end of the year.

He said that AMD had done particularly well in software, allowing it to keep pace with and catch up to Nvidia despite differences in hardware performance. Tang says its possible to switch to AMD without requiring changes to code bases or re-writing the large language model, adding that he believes “they’re essentially interchangeable”.

Content from our partners

The security challenges of digitalising the energy grid

The security challenges of digitalising the energy grid

A renewed demand for film rewards Kodak’s legacy

A renewed demand for film rewards Kodak’s legacy

Why plugging the sustainability skills gap is key to ESG

Why plugging the sustainability skills gap is key to ESG

Tang said AMD did not pay it to conduct the research. His company produces software designed to make it easier for enterprise to create AI models and train them in-house rather than rely on tools from OpenAI or other large AI labs. He said the research was to show there are choices beyond Nvidia.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

“Overall, we are incredibly optimistic about the future market for AI training hardware,” he said. “More good options means more compute supply, more market pressure on prices, and ultimately lower costs for users who want to train their own models.”

Databricks revealed it had paid $1.3bn for MosaicML last week as part of a wider effort to build an ecosystem of enterprise-ready open-source AI models. Both companies produce tools that make AI algorithms smaller and cheaper to run on large datasets but the MosaicML software will be used to enhance Databricks offering.

The report comes as Intel announced its long-term plans last week to compete on AI chips from 2025. It is shifting its strategy to focus on building products that go up against hardware from Nvidia and AMD.

Last week Intel announced its Falcon Shores chip will have 288gb of memory and support 8-bit floating point computation, which is important for training AI models. Intel also claims its Ponte Vecchio AI chip outperforms the Nvidia H100. The Ponte Vecchio has faced delays but it will be at the core of the latest supercomputer from the Argonne National Lab, with shipments due to be complete this year.

Read more: France wants to become Europe’s capital for AI

Topics in this article : AI , AMD , NVIDIA

  •  

  • en
您觉得本篇内容如何
评分

相关产品

EN 650 & EN 650.3 观察窗

EN 650.3 version is for use with fluids containing alcohol.

Acromag 966EN 温度信号调节器

这些模块为多达6个输入通道提供了一个独立的以太网接口。多量程输入接收来自各种传感器和设备的信号。高分辨率,低噪音,A/D转换器提供高精度和可靠性。三路隔离进一步提高了系统性能。,两种以太网协议可用。选择Ethernet Modbus TCP\/IP或Ethernet\/IP。,i2o功能仅在6通道以太网Modbus TCP\/IP模块上可用。,功能

雷克兰 EN15F 其他

品牌;雷克兰 型号; EN15F 功能;防化学 名称;防化手套

Honeywell USA CSLA2EN 电流传感器

CSLA系列感应模拟电流传感器集成了SS490系列线性霍尔效应传感器集成电路。该传感元件组装在印刷电路板安装外壳中。这种住房有四种配置。正常安装是用0.375英寸4-40螺钉和方螺母(没有提供)插入外壳或6-20自攻螺钉。所述传感器、磁通收集器和壳体的组合包括所述支架组件。这些传感器是比例测量的。

TMP Pro Distribution C012EN RF 音频麦克风

C012E射频从上到下由实心黄铜制成,非常适合于要求音质的极端环境,具有非常坚固的外壳。内置的幻像电源模块具有完全的射频保护,以防止在800 Mhz-1.2 Ghz频段工作的GSM设备的干扰。极性模式:心形频率响应:50赫兹-18千赫灵敏度:-47dB+\/-3dB@1千赫

ValueTronics DLRO200-EN 毫欧表

"The DLRO200-EN ducter ohmmeter is a dlro from Megger."

评论

您需要登录才可以回复|注册

提交评论

广告

techmonitor

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

Car-Free Cities Are the Future, Biometrics Reveal

提取码
复制提取码
点击跳转至百度网盘