LongCat AI - Next-Generation Multi-Modal AI Models

Latest Release: LongCat-Flash-Lite

The Meituan LongCat team officially releases and open-sources LongCat-Flash-Lite, a lightweight MoE model built on a new scaling direction: N-gram embedding expansion. Compared with traditional MoE scaling that mainly increases expert count, embedding expansion achieves a better efficiency frontier under practical system constraints.

68.5B total parameters with ~2.9B–4.5B activated per inference through dynamic sparse activation
31.4B parameters (46%) allocated to the N-gram embedding layer to enhance local-context semantic understanding
Up to 256K context via YARN for long documents and large-scale code analysis
High throughput: 500–700 token/s under a typical 4K input / 1K output workload (LongCat API)
Strong in agents & coding: leading results on τ²-Bench industry scenarios and strong SWE-Bench / TerminalBench performance

Learn More Hugging Face ModelScope Tech Report

Previous Release: LongCat-Flash-Thinking-2601

Today, the Meituan LongCat team officially releases and open-sources LongCat-Flash-Thinking-2601. As an upgraded version of the previously released LongCat-Flash-Thinking model, LongCat-Flash-Thinking-2601 achieves open-source SOTA performance on core evaluation benchmarks including Agentic Search, Agentic Tool Use, and TIR (Tool Interaction Reasoning).

The model demonstrates exceptional generalization capabilities in tool calling, outperforming Claude in random complex tasks that rely on tool calling, significantly reducing the training cost for adapting to new tools in real-world scenarios. It is also the first fully open-source model that supports online free experience of the "Re-thinking Mode", simultaneously activating 8 parallel reasoning paths to ensure thorough thinking and reliable decision-making.

This feature is now available for free experience on https://longcat.ai (the Re-thinking Mode is triggered when selecting the deep thinking function).

🧠 Revolutionary "Re-thinking" Mode

The newly upgraded "Re-thinking" mode teaches the model to "think carefully" before acting. When encountering high-difficulty problems, the model breaks down the thinking process into two steps: parallel thinking and summary synthesis.

Parallel thinking phase: The model simultaneously and independently explores multiple reasoning paths, similar to how humans consider different solutions when facing difficult problems, ensuring diversity of thought to avoid missing optimal solutions
Summary synthesis phase: Multiple paths are organized, optimized, and synthesized, with optimized results fed back to form closed-loop iterative reasoning, continuously deepening the thinking process
Reinforcement learning enhancement: Additional reinforcement learning components specifically designed to refine the model's summary synthesis capabilities, enabling LongCat-Flash-Thinking-2601 to truly "think clearly before acting"

📊 Comprehensive Benchmark Performance

Comprehensive and rigorous evaluation shows that LongCat-Flash-Thinking-2601 leads across programming, mathematical reasoning, agentic tool calling, and agentic search dimensions:

Programming capability: Achieves 82.8 on LCB benchmark and 47.7 on OIBench EN, ranking in the first tier of similar models, demonstrating solid code foundation capabilities
Mathematical reasoning: Outstanding performance with Re-thinking mode enabled, achieving 100.0 (perfect score) on AIME-25 and 86.8 (current SOTA) on IMO-AnswerBench
Agentic tool calling: Scores 88.2 on τ²-Bench and 29.3 on VitaBench, both achieving open-source SOTA, demonstrating excellent performance in multi-domain tool calling scenarios
Agentic search: Achieves 73.1 on BrowseComp (best among all models) and 79.5 on RW Search, demonstrating strong information retrieval and scenario adaptation capabilities, reaching open-source leading levels

🔧 Advanced Training Techniques

Environment expansion + multi-environment reinforcement learning: Built diverse "high-intensity training grounds" with multiple high-quality training environments, each integrating 60+ tools with dense dependency graphs and complex interactions
Noise robustness training: Active injection of multiple noise types during training, simulating API call failures, error returns, and incomplete data, using curriculum learning to gradually increase noise types and intensity
Enhanced DORA infrastructure: Extended self-developed reinforcement learning infrastructure, enabling stable parallel training of large-scale multi-environment agents while maintaining efficient asynchronous training characteristics

Learn More GitHub Hugging Face ModelScope Try Online

Previous Release: LongCat-Video-Avatar

Following the successful releases of InfiniteTalk and LongCat-Video, the LongCat team officially releases and open-sources LongCat-Video-Avatar, a SOTA-level avatar video generation model. Built on the LongCat-Video base, it achieves breakthrough improvements in three key dimensions: realistic motion, long-video stability, and identity consistency, providing developers with a more stable, efficient, and practical solution for virtual human generation.

🎭 Open-Source SOTA Realism

Full-body synchronization: Synchronously controls lip sync, eye movements, facial expressions, and body gestures to achieve rich and full emotional expression
Natural micro-movements: Maintains natural blinking, breathing, and posture adjustments during silent segments through Disentangled Unconditional Guidance
Multi-mode support: Audio-Text-to-Video (AT2V), Audio-Text-Image-to-Video (ATI2V), and video continuation

🎬 Long-Sequence High-Quality Generation

5-minute+ stable generation: Cross-Chunk Latent Stitching enables stable video generation without quality degradation, maintaining stable colors and clear details for videos with ~5,000 frames
No quality loss: Eliminates VAE cycle-induced quality loss by performing operations directly in latent space
Improved inference efficiency: Direct latent space operations without pixel domain decoding

✅ Commercial-Grade Identity Consistency

Identity consistency: Reference Skip Attention mechanism ensures consistent character appearance throughout long sequences
Motion diversity: Avoids "copy-paste" effects and rigid movements, making videos both stable and varied
SOTA benchmark performance: Leading performance on HDTF, CelebV-HQ, EMTD, and EvalTalker datasets

Learn More Hugging Face GitHub Project Page

View All Models

Key Highlights

High-throughput inference: 100+ tokens/s on H800 GPUs
Zero-Computation Experts: Activates only ~27B params/token from 560B pool
Extended context: Up to 128K tokens
AI image generation: Fast, studio-quality image generation and editing with superior Chinese text rendering
Open-source SOTA: Leading performance on Omni-Bench, WorldSense, MMLU, and more
Production-ready: Deployed across Meituan's services

Learn About Technology View Performance

LongCat AI – Next-Generation Multi-Modal Models

Latest Release: LongCat-Flash-Lite

Previous Release: LongCat-Flash-Thinking-2601

🧠 Revolutionary "Re-thinking" Mode

📊 Comprehensive Benchmark Performance

🔧 Advanced Training Techniques

Previous Release: LongCat-Video-Avatar

🎭 Open-Source SOTA Realism

🎬 Long-Sequence High-Quality Generation

✅ Commercial-Grade Identity Consistency

Model Series

Flash-Lite

Image

Omni

Video-Avatar

Video

Flash-Thinking

Flash-Chat

Key Highlights

Quick Links