LongCat AI – Next-Generation Multi-Modal Models

Open-source MoE LLMs by Meituan: Flash-Chat, Flash-Thinking, Video, Image (generation & editing), Audio-Codec, and Omni. Fast, efficient, and production-ready.

Latest Release: LongCat-Image

Meituan LongCat releases and open-sources LongCat-Image, a 6B parameter AI image generation and editing model. Through high-performance architecture design, systematic training strategies, and data engineering, it achieves performance comparable to larger models, providing developers and industry with a "high-performance, low-threshold, fully open" solution.

Open-source SOTA achievements: Image editing (GEdit-Bench 7.60/7.64, ImgEdit-Bench 4.50), Chinese text rendering (ChineseWord 90.7, covering all 8,105 standard Chinese characters), and competitive text-to-image performance (GenEval 0.87, DPG-Bench 86.8). Available on LongCat Web and LongCat APP (24 templates, image-to-image).

✨ Integrated Generation & Editing

  • Simple prompts, high-quality output: Deep semantic understanding enables simple prompts to generate highly aligned images.
  • 15 editing task types: Object add/remove, style transfer, perspective change, portrait refinement, text modification — all with natural language.
  • Multi-round editing without quality loss: Maintains style, lighting, and consistency; preserves facial features in portraits.

✨ Superior Chinese Text Rendering

  • High-quality character rendering: Accurate text in shop signs, posters, book covers, and natural scenes (Chinese and English).
  • Rare character support: High accuracy for uncommon characters, variant forms, and calligraphy styles (Kai, Xing).
  • Smart typography: Automatically matches scene context for font size, color, and spacing.

✨ Studio-Quality Output

  • Fast generation: Lightweight optimization enables efficient high-resolution image generation.
  • Photography-grade quality: Optimized composition and lighting; accurate textures and realistic proportions.

Model Series

Image

6B parameters | Open-source SOTA on image editing (GEdit-Bench, ImgEdit-Bench) and Chinese text rendering (ChineseWord: 90.7). Covers all 8,105 standard Chinese characters.

Latest Release | Hugging Face | GitHub

Omni

All-modality real-time interaction. Text, image, audio, video unified.

Released: Nov 2025

Video

DiT-based video generation. 5-minute coherent videos at 720p/30fps.

Released: Oct 27, 2025

Flash-Thinking

Enhanced reasoning with dual-path framework. 64.5% token savings in agentic scenarios.

Released: Sept 22, 2025

Flash-Chat

Foundation dialogue model (560B params, MoE). Achieves 100+ tokens/s on H800 GPUs with ~27B active params/token.

Released: Sept 1, 2025

Key Highlights

  • High-throughput inference: 100+ tokens/s on H800 GPUs
  • Zero-Computation Experts: Activates only ~27B params/token from 560B pool
  • Extended context: Up to 128K tokens
  • AI image generation: Fast, studio-quality image generation and editing with superior Chinese text rendering
  • Open-source SOTA: Leading performance on Omni-Bench, WorldSense, MMLU, and more
  • Production-ready: Deployed across Meituan's services