Documentation

Rate Limit

This page lists all the models currently supported by the Xiaomi MiMo API Open Platform and their rate-limiting quotas, helping you plan your request frequency before integration.

Rate Limiting Instructions

The platform sets a model concurrency limit for each account. When the server load is high, response delays or 429 error may occur. We recommend that you plan your request frequency reasonably and implement request retry and backoff strategies in high-concurrency scenarios to avoid triggering rate limits.

  • RPM (Requests Per Minute): The maximum number of requests initiated per minute. The calculation scope is the sum of the total number of requests from all API Keys under a single account when calling the same model.
  • TPM (Tokens Per Minute): The maximum number of Tokens that can be interacted with per minute. The calculation scope is the sum of the total number of requested Tokens for all API Keys under a single account when calling the same model.

Text Generation Model

Model Series Model ID RPM TPM
Pro Series mimo-v2.5-pro 100 10M
mimo-v2-pro 100 10M
Omni Series mimo-v2.5 100 10M
mimo-v2-omni 100 10M
Flash Series mimo-v2-flash 100 10M

Automatic Speech Recognition Model (ASR)

Model ID RPM TPM
mimo-v2.5-asr 100 10K

Text-to-Speech (TTS) Model

Model ID RPM TPM
mimo-v2.5-tts 100 10M
mimo-v2.5-tts-voiceclone 100 10M
mimo-v2.5-tts-voicedesign 100 10M
mimo-v2-tts 100 10M
Update Time June 11, 2026

Copyright©2026 Xiaomi. All Rights Reserved | Cookie Policy | Cookie Preferences

We use cookies and similar technologies of our own to ensure the proper functioning of the website, customize content according to user preferences and analyze users' interactions on the website, as well as their browsing habits. You can find more information in our Cookie Policy. Select an option or go to Cookie Settings to manage your preferences. Learn More.