Intel is set to launch two China-exclusive models of its Gaudi 3 AI accelerator, and they’ll be substantially crippled to fit in with US sanctions.
The existence of two models approved for sale in the Middle Kingdom is detailed in Intel’s Gaudi 3 whitepaper. Called the HL-328 and HL-388, the made-for-China processors are in the OAM and PCIe form factors respectively, with the former launching in June and the latter in September, alongside the other PCIe form-factor Gaudi 3.
Overall, the HL-328 and the HL-388 look more or less the same as the others, with the same 128GB of HBM2e VRAM with 3.7TB/s of bandwidth, 96MB of cache, PCIe 5.0 x16 interface, and decoding standards.
The only difference is in respect to thermal design power, which is 450 watts for both the OAM and PCIe card models. This is a substantial reduction from the other models. The non-China, PCIe HL-338 has a TDP of 600 watts, and the OAM form-factor HL-325L and HL-335 carry 900 watts. The relatively low TDP on the China Gaudi 3 models is presumably why there’s no liquid-cooled version.
While not explicitly stated in the whitepaper, making the changes was almost certainly necessary to comply with the US government’s export controls on processors, which prohibit American companies from exporting chips to China depending on performance.
We can’t really know what Intel has done with Gaudi 3 to make it compliant and how fast these approved-for-China chips perform with these changes, but there are some clues. The HL-328 and HL-388 still use two dies, like the other Gaudi 3 variants, since the memory and cache configuration is unchanged. Using two dies instead of one helps to reduce performance density, allowing for the chip to hit the higher export limit of 4,800 total processing power (TPP).
What that 4,800 TPP limit means is that no chip can have 150 TFLOPS or more of 16-bit performance, and since Gaudi 3 can do up to 1,835 TFLOPS at BF16, Intel would need to cut performance down severely. This would have to be accomplished by a truly massive cut on core count and clock speed, or some other performance-limiting method.
We’ve asked Intel for clarification on the China-exclusive Gaudi 3 models and we haven’t heard back yet. We’ll update if Intel discloses any info.
We can probably expect the HL-328 and HL-388 to perform similarly to Nvidia’s H20, that silicon titan’s fastest GPU that’s approved for sale in China. It has 148 TFLOPS of FB16 and FP16 performance, just under the 150-TFLOPS limit.
Since raw core performance will be more or less equal between the H20 and the China models of Gaudi 3, the main difference will come down to memory, where Intel has more capacity but slightly less bandwidth, and software, which has always been a selling point for Nvidia chips. ®