SAN FRANCISCO, CA, April 16, 2026 /24-7PressRelease/ -- ERNIE Image, a dedicated resource hub for Baidu's newly open-sourced ERNIE-Image text-to-image model, is now live with a free browser-based generator, a full architectural walkthrough, and benchmark comparisons — giving developers and content creators everything they need to evaluate and deploy the model.
About the ERNIE Image Model
ERNIE Image is an 8-billion-parameter Diffusion Transformer (DiT) released by the ERNIE-Image Team at Baidu on April 15, 2026 under the Apache 2.0 license. Open weights are available on Hugging Face. The model is built for complex instruction following, long-form text rendering inside generated images, and structured visual outputs including posters, comics, and multi-panel layouts.
It ships in two variants: a quality-focused SFT version running approximately 50 inference steps, and ERNIE Image Turbo, a speed-optimized distillation that produces images in just 8 steps — roughly 6x faster — while maintaining strong benchmark performance.
On GenEval, the compositional text-to-image benchmark, ERNIE Image scores 0.8856 Overall, leading several contemporary open-weight models. On LongTextBench, which measures the legibility of long text rendered inside generated images, it averages 0.9733 across English and Chinese subsets — a defining strength for marketing materials, labeled infographics, and bilingual signage. Both English and Chinese prompts are natively supported through the same encoder pipeline.
What Is Available on the Site
The ERNIE Image site brings together resources that are otherwise scattered across Hugging Face model cards, GitHub repositories, and academic papers:
— A live generator powered by the official ERNIE Image Turbo Hugging Face Space. Users can type English or Chinese prompts and see results directly in the browser — no download, no account, no cost.
— A detailed architecture section covering the DiT backbone (36 layers, 4096 hidden dimension, 32 attention heads), the Prompt Enhancer (Ministral3ForCausalLM, 26 layers), and the multimodal text encoder — all verified against the model's published configuration files.
— Benchmark tables for GenEval, OneIG-Bench (EN and ZH tracks), and LongTextBench, with date stamps and source links so scores can be cross-checked and reproduced.
— A limitations and tradeoffs section documenting the measured effects of the Prompt Enhancer toggle, undisclosed training data details, and benchmark drift considerations.
Hardware and Deployment
The ERNIE Image model runs on a single consumer GPU with 24 GB of VRAM. The SFT variant is recommended at guidance scale 4.0 and 50 steps for maximum quality; the Turbo variant operates at guidance scale 1.0 and 8 steps for latency-sensitive applications. Both support Diffusers and SGLang deployment paths, with code examples available on the site.
To try the free ERNIE Image generator or read the full technical breakdown, visit ernie-image.org.
ERNIE Image (ernie-image.org) is the resource hub for the ERNIE Image text-to-image model released by Baidu. The site offers a free browser-based generator, full architectural documentation, and benchmark data sourced from the official model card and peer-reviewed papers.
---
Press release service and press release distribution provided by https://www.24-7pressrelease.com

