Baseten Launches New Inference Products to Accelerate MVPs into Production Applications

May 21, 2025 at 13:30 PM EDT

Baseten announces first platform expansion powered by the Baseten Inference Stack: APIs for open-source AI models and features for training models to improve inference performance

Baseten, the leader for mission-critical inference, announced the public launch of Baseten Model APIs and the closed beta of Baseten Training today. These new products enable AI teams to seamlessly transition from rapid prototyping to scaling in production, building on Baseten’s proprietary inference stack.

In recent months, new releases of DeepSeek, Llama, and Qwen models erased the quality gap between open and closed models. Organizations are more incentivized than ever to use open models in their products. Many AI teams have been limited to testing open models at low scale due to insufficient performance, reliability, and economics offered by model endpoint providers. While easy to get started with, the deficiencies of these shared model endpoints have fundamentally gated enterprises’ ability to convert prototypes into high-functioning products.

Baseten’s new products - Model APIs and Training - solve two critical bottlenecks in the AI lifecycle. Both products are built using Baseten’s Inference Stack and Inference-optimized Infrastructure, which power inference at scale in production for leading AI companies like Writer, Descript, and Abridge. Using Model APIs, developers can instantly access open-source models optimized for maximum inference performance and cost-efficiency to rapidly create production-ready minimum viable products (MVPs) or test new workloads.

“In the AI market, your number one differentiator is how fast you can move,” said Tuhin Srivastava, co-founder and CEO of Baseten. “Model APIs give developers the speed and confidence to ship AI features knowing that we’ve handled the heavy lifting on performance and scale.” Baseten Model APIs enable AI engineers to test open models with a confident scaling story in place from day one. As inference increases, Model APIs customers can easily transfer to Dedicated Deployments that provide greater reliability, performance, and economics at scale.

"With Baseten, we now support open-source models like DeepSeek and Llama in Retool, giving users more flexibility for what they can build,” said DJ Zappegos, Engineering Manager at Retool. “Our customers are creating AI apps and workflows, and Baseten's Model APIs deliver the enterprise-grade performance and reliability they need to ship to production."

Customers can also use Baseten’s new Training product to rapidly train and tune models, which will result in superior inference performance, quality, and cost-efficiency to further optimize inference workloads. Unlike traditional training solutions that operate in siloed research environments, Baseten Training runs on the same production-optimized infrastructure that powers its inference. This coherence ensures that models trained or fine-tuned on Baseten will behave consistently in production, with no last-minute refactoring. Together, the latest offerings enable customers to get products to market more rapidly, improve performance and quality, and reduce costs for mission-critical inference workloads

These launches reinforce Baseten’s belief that product-focused AI teams must care deeply about inference performance, cost, and quality. “Speed, reliability, and cost-efficiency are non-negotiables, and that’s where we devote 100 percent of our focus,” said Amir Haghighat, co-founder and CTO of Baseten. “Our Baseten Inference Stack is purpose-built for production AI because you can’t just have one piece work well. It takes everything working well together, which is why we ensure that each layer of the Inference Stack is optimized to work with the other pieces.”

“Having lifelike text-to-speech requires models to operate with very low latency and very high quality,” said Amu Varma, co-founder of Canopy Labs. “We chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Baseten’s Inference Stack allows our customers to create voice applications that sound as close to human as possible.”

Teams can start with a quick MVP and seamlessly scale it to a dedicated, production-grade deployment when needed, without changing platforms. An enterprise can prototype a feature on Baseten Cloud, then graduate to its own private clusters or on-prem deployment (via Baseten’s hybrid and self-hosted options) for greater control, performance tuning, and cost optimization, all with the same code and tooling. This “develop once, deploy anywhere” capability directly results from Baseten’s Inference-optimized Infrastructure, which abstracts the complexity of multi-cloud and on-premise orchestration for the user.

The news follows on a year of considerable growth for the company. In February, Baseten announced the close of a series C funding round co-led by IVP and Spark and which moved its total amount of venture capital funding to $135 million. It was recently named to Forbes AI 50 2025, a list of the pre-eminent privately held tech companies in AI which also featured a number of companies that Baseten powers 100 percent of the inference for, like Writer and Abridge.

About Baseten

Baseten is the leader in infrastructure software for high-scale AI products, offering the industry's most powerful AI inference platform. Committed to delivering exceptional performance, reliability, and cost-efficiency, Baseten is on a mission to help the next great AI products scale. Top-tier investors, including IVP, Spark, Greylock, Conviction, Base Case, and South Park Commons back Baseten. Learn more at Baseten.co

View source version on businesswire.com: https://www.businesswire.com/news/home/20250521139153/en/

Contacts

Media contact:

Creighton Vance for Baseten

Creighton@heycmand.com

Baseten Launches New Inference Products to Accelerate MVPs into Production Applications

Contacts

More News

Recent Quotes