Tachyum Radically Cuts the Cost of DeepSeek by Quantizing it to 2-bits

June 03, 2025 at 12:12 PM EDT

Tachyum^® today announced the release of a new white paper detailing how it efficiently scales Large Language Model (LLM) training and inference through the Mixture of Experts (MoE) approach. The company’s method is further improved by a DeepSeekMoE architecture with 4-bit FP4 data types for activations quantization and 2-bit Tachyum AI (TAI2) sparse weights quantization.

The white paper, “Tachyum Successfully Quantized DeepSeek LLM to its 2-bit TAI2,” illustrates how Tachyum integrates MoE with low-bit data formats to unlock scalable AI with unmatched efficiency. The combination allows for the development of more powerful models while significantly lowering resource requirements.

MoEs can match the performance of dense models using approximately 4 times less computing and memory bandwidth, while only memory capacity needs to be increased by approximately 4 times. It is expected that that ratio will continue to grow. This architecture benefits from Tachyum’s proprietary high-performance memory, eliminating the need for costly high-bandwidth memory (HBM) solutions. Successfully quantizing DeepSeek LLM to 2-bit TAI2 further doubles benefit of DeepSeekMoE LLM compared to other architectures.

Tachyum’s AI researchers applied FP4 activation quantization and 2-bit TAI2 sparse weights quantization to DeepSeekMoE and Llama 3.1 models. Benchmark testing demonstrated up to 25x faster inference speeds and a 20x cost reduction per token, marking a major leap in LLM deployment efficiency.

“DeepSeek approach has shown the potential to make next-generation models 10 times more efficient at today’s costs, avoiding the exponential scaling challenges faced by organizations today,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “With the Prodigy platform, we’re enabling this kind of breakthrough efficiency for AI applications at global scale.”

The white paper also emphasizes the critical role of Tachyum’s hardware in facilitating this transformation, showcasing the Prodigy Universal Processor’s ability to support high-efficiency AI workloads with industry-leading performance.

As a Universal Processor offering industry-leading performance for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 256 high-performance custom-designed 64-bit compute cores to deliver up to 18x the highest performing GPU for AI applications, 3x the performance of the highest-performing x86 processors for cloud workloads, and up to 8x that of the highest performing GPU for HPC.

Those interested in reading the “Tachyum Successfully Quantized DeepSeek LLM to its 2-bit TAI2” white paper can visit https://www.tachyum.com/resources/whitepapers/2025/06/03/tachyum-successfully-quantized-deepseek-llm-to-its-2-bit-tai2/ to download.

Follow Tachyum

https://x.com/tachyum

https://www.linkedin.com/company/tachyum

https://www.facebook.com/Tachyum/

About Tachyum

Tachyum is transforming the economics of AI, HPC, public and private cloud workloads with Prodigy, the world’s first Universal Processor. Prodigy unifies the functionality of a CPU, a GPU, and a TPU in a single processor to deliver industry-leading performance, cost and power efficiency for both specialty and general-purpose computing. As global data center emissions continue to contribute to a changing climate, with projections of their consuming 10 percent of the world’s electricity by 2030, the ultra-low power Prodigy is positioned to help balance the world’s appetite for computing at a lower environmental cost. Tachyum received a major purchase order from a US company to build a large-scale system that can deliver more than 50 exaflops performance, which will exponentially exceed the computational capabilities of the fastest inference or generative AI supercomputers available anywhere in the world today. Tachyum has offices in the United States, Slovakia and the Czech Republic. For more information, visit https://www.tachyum.com/.

View source version on businesswire.com: https://www.businesswire.com/news/home/20250603330710/en/

Tachyum Radically Cuts the Cost of DeepSeek by Quantizing it to 2-bits

Contacts

Mark Smith

JPR Communications

818-398-1424

marks@jprcom.com

About Us

Tachyum Radically Cuts the Cost of DeepSeek by Quantizing it to 2-bits

Contacts