Deep Thinking
Deep Thinking enables the model to perform deep reasoning before generating the final answer, analyzing problems step-by-step through an internal Chain of Thought (CoT), significantly improving accuracy on complex tasks. It is suitable for scenarios requiring deep analysis, such as complex reasoning, code generation, mathematical computation, and multi-step analysis.
Core Capabilities
-
Deep Reasoning: Breaks down complex problems into multiple steps for analysis, improving reasoning accuracy
-
Transparent Thinking Process: Returns the complete thinking process, enhancing interpretability
-
Flexible Control: Supports enabling/disabling, allowing on-demand use based on task complexity
Supported Models
Currently supported models: mimo-v2.5-pro, mimo-v2.5, mimo-v2-pro, mimo-v2-omni, mimo-v2-flash.
Request Parameters
Set the thinking.type parameter in the request to control deep thinking: enabled to enable deep thinking or disabled to disable deep thinking.
Default Status:
-
Enabled by default:
mimo-v2.5-pro,mimo-v2.5,mimo-v2-pro,mimo-v2-omni -
Disabled by default:
mimo-v2-flash
Important Notes
Parameter Limitations
In deep thinking, mimo-v2.5-pro, mimo-v2.5, mimo-v2-pro, and mimo-v2-omni models do not support custom temperature and top_p parameters. Even if these parameters are provided, the actual effective values will be forced to use the recommended defaults of 1.0 and 0.95.
Multi-turn Conversation Pass-through Requirements
When deep thinking is enabled in Agent product multi-turn conversations, and historical conversations contain tool calls, the assistant responses passed back in all subsequent user interaction rounds that contain tool calls must completely pass back the reasoning_content field, otherwise the API will return a 400 error. For the correct pass-back method, please refer to the "Multi-turn Tool Calls in Thinking Mode" section in the Call Examples.
If historical reasoning_content is missing, the model's context will be incomplete, which may result in decreased instruction following and increased hallucinations.
Affected Agent Products:
| Protocol | Affected Agent Products |
|---|---|
| OpenAI Compatible Protocol | TRAE, Cursor, Roo Code, Codex, GitHub Copilot CLI, Zed, AutoGen, Goose |
| Anthropic Compatible Protocol | TRAE, GitHub Copilot CLI, AutoGen, Goose, OpenClaw, OpenCode, Kilo Code |
Other Notes
-
Output Length Limit:
max_completion_tokenslimits the total length of thinking content and the final answer. If the thinking process is long, the token space available for the final answer will be reduced accordingly. It is recommended to set a sufficientmax_completion_tokensto avoid answer truncation. -
Response Time: Enabling deep thinking will increase response latency, especially for complex tasks. It is recommended to use
stream: trueto view the thinking process in real-time.
Call Examples
The thinking field is not a standard OpenAI parameter. When passing thinking-related parameters via the OpenAI Python SDK, they must be included in extra_body.
Thinking Enabled
Curl
curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"model": "mimo-v2.5-pro",
"messages": [
{
"role": "system",
"content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
},
{
"role": "user",
"content": "Introduce machine learning in three sentences."
}
],
"max_completion_tokens": 1024,
"thinking": {
"type": "enabled"
}
}'
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1"
)
completion = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{
"role": "system",
"content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
},
{
"role": "user",
"content": "Introduce machine learning in three sentences."
}
],
max_completion_tokens=1024,
extra_body={
"thinking": {"type": "enabled"}
}
)
print(completion.model_dump_json())
Response Example
{
"id": "2b92b0964c9b4335bffad7c2f75cfe9e",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Machine learning is a branch of artificial intelligence that enables systems to automatically learn and improve from experience without being explicitly programmed. It works by identifying patterns in data to make predictions or decisions. This technology powers a wide range of applications, from recommendation systems and speech recognition to autonomous vehicles and medical diagnosis.",
"role": "assistant",
"tool_calls": null,
"reasoning_content": "Hmm, the user wants a concise three-sentence introduction to machine learning. This seems like a straightforward request for a clear, high-level explanation. \n\nI should focus on the core idea without technical jargon, mention its practical use, and end with its significance. The first sentence can define it simply, the second can give an example, and the third can highlight its impact. \n\nKeeping it neutral and informative fits the user's likely need for a quick overview. No need for extra details or fluff since they specifically asked for brevity."
}
}
],
"created": 1781233054,
"model": "mimo-v2.5-pro",
"object": "chat.completion",
"usage": {
"completion_tokens": 171,
"prompt_tokens": 60,
"total_tokens": 231,
"completion_tokens_details": {
"reasoning_tokens": 110
},
"prompt_tokens_details": {}
}
}
Thinking Disabled
Curl
curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"model": "mimo-v2.5-pro",
"messages": [
{
"role": "system",
"content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
},
{
"role": "user",
"content": "Write a short paragraph about the beauty of nature."
}
],
"max_completion_tokens": 1024,
"thinking": {
"type": "disabled"
}
}'
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1"
)
completion = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{
"role": "system",
"content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
},
{
"role": "user",
"content": "Write a short paragraph about the beauty of nature."
}
],
max_completion_tokens=1024,
extra_body={
"thinking": {"type": "disabled"}
}
)
print(completion.model_dump_json())
Response Example
{
"id": "f914c393444e4a35a4f7b1e337e032cb",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "From the gentle rustle of leaves in an ancient forest to the fiery spectacle of a sunset painting the sky, nature's beauty is a symphony for the senses. It is found in the delicate symmetry of a snowflake, the vibrant hues of a wildflower meadow, and the silent majesty of a mountain range draped in morning mist. This ever-changing tapestry offers a profound sense of peace and wonder, reminding us of a world that exists beyond our own making. Whether in a vast, untouched wilderness or a single dewdrop clinging to a spider's web, nature's artistry is a constant, humbling source of inspiration and renewal.",
"role": "assistant",
"tool_calls": null
}
}
],
"created": 1781233927,
"model": "mimo-v2.5-pro",
"object": "chat.completion",
"usage": {
"completion_tokens": 131,
"prompt_tokens": 64,
"total_tokens": 195,
"completion_tokens_details": {
"reasoning_tokens": 0
},
"prompt_tokens_details": {}
}
}
Streaming Response (Thinking Enabled)
During streaming responses, thinking content and answer content are output sequentially: first, the thinking process is returned step-by-step via reasoning_content, and after thinking is complete, the final answer is output step-by-step via content.
Curl
curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"model": "mimo-v2.5-pro",
"messages": [
{
"role": "system",
"content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
},
{
"role": "user",
"content": "Give me some tips for improving work efficiency."
}
],
"max_completion_tokens": 1024,
"stream": true,
"thinking": {
"type": "enabled"
}
}'
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1"
)
completion = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{
"role": "system",
"content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
},
{
"role": "user",
"content": "Give me some tips for improving work efficiency."
}
],
max_completion_tokens=1024,
stream=True,
extra_body={
"thinking": {"type": "enabled"}
}
)
for chunk in completion:
print(chunk.model_dump_json())
Response Example
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":"","role":"assistant","tool_calls":null,"reasoning_content":null},"finish_reason":null,"index":0}],"created":1781234029,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":null,"role":null,"tool_calls":null,"reasoning_content":"The user is asking"},"finish_reason":null,"index":0}],"created":1781234029,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":null,"role":null,"tool_calls":null,"reasoning_content":" for tips on"},"finish_reason":null,"index":0}],"created":1781234029,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":null,"role":null,"tool_calls":null,"reasoning_content":" improving work efficiency."},"finish_reason":null,"index":0}],"created":1781234029,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
...
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":null,"role":null,"tool_calls":null,"reasoning_content":", etc"},"finish_reason":null,"index":0}],"created":1781234030,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":null,"role":null,"tool_calls":null,"reasoning_content":"."},"finish_reason":null,"index":0}],"created":1781234030,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":"# Tips","role":null,"tool_calls":null,"reasoning_content":null},"finish_reason":null,"index":0}],"created":1781234030,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":" for Improving Work","role":null,"tool_calls":null,"reasoning_content":null},"finish_reason":null,"index":0}],"created":1781234030,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
...
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":" on any specific","role":null,"tool_calls":null,"reasoning_content":null},"finish_reason":null,"index":0}],"created":1781234037,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":" area?","role":null,"tool_calls":null,"reasoning_content":null},"finish_reason":null,"index":0}],"created":1781234037,"model":"mimo-v2.5-pro","object":"chat.completion.chunk"}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[{"delta":{"content":null,"role":null,"tool_calls":null,"reasoning_content":null},"finish_reason":"stop","index":0}],"created":1781234037,"model":"mimo-v2.5-pro","object":"chat.completion.chunk","usage":null}
data: {"id":"4e57d676fe464c09aa2f27fa652abc40","choices":[],"created":1781234037,"model":"mimo-v2.5-pro","object":"chat.completion.chunk","usage":{"completion_tokens":339,"prompt_tokens":61,"total_tokens":400,"completion_tokens_details":{"reasoning_tokens":41},"prompt_tokens_details":{}}}
data: [DONE]
Multi-turn Tool Calls in Thinking Mode
In multi-turn conversations with deep thinking enabled, passing back reasoning_content when tool calls are involved ensures thinking continuity and improves model output quality.
import os
import json
from openai import OpenAI
# Initialize client
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1"
)
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a given city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name, e.g. Beijing"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "get_time",
"description": "Get the current time in a given timezone",
"parameters": {
"type": "object",
"properties": {
"timezone": {"type": "string", "description": "Timezone, e.g. Asia/Shanghai"}
},
"required": ["timezone"]
}
}
}
]
# Tool execution functions (replace with real API calls in production)
def get_current_weather(location: str, unit: str = "celsius") -> str:
weather_data = {"Beijing": "Sunny 25°C", "Shanghai": "Cloudy 22°C", "Shenzhen": "Rainy 28°C"}
return weather_data.get(location, f"Weather unknown for {location}")
def get_time(timezone: str) -> str:
from datetime import datetime
return datetime.now().strftime(f"%Y-%m-%d %H:%M:%S ({timezone})")
TOOL_MAP = {
"get_current_weather": lambda **kw: get_current_weather(**kw),
"get_time": lambda **kw: get_time(**kw)
}
def run_turn(messages, turn_num):
"""Execute a single user turn: call model, run tools in a loop until final answer."""
request_num = 0
while True:
request_num += 1
print(f"\nRequest {turn_num}-{request_num}:")
response = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=messages,
tools=tools,
extra_body={"thinking": {"type": "enabled"}}
)
assistant_message = response.choices[0].message
messages.append(assistant_message)
# Print full model response
print(f"reasoning_content: {assistant_message.reasoning_content}")
print(f"content: \"{assistant_message.content}\"")
print(f"tool_calls: {assistant_message.tool_calls}")
# If no tool calls, we have the final answer
if not assistant_message.tool_calls:
break
# Execute each tool call and append results
for tool_call in assistant_message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
result = TOOL_MAP[func_name](**func_args)
print(f"-> Tool result [{func_name}]: {result}")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
# --- Multi-turn conversation ---
messages = []
# Turn 1
print("=== Turn 1 ===")
messages.append({"role": "user", "content": "How is the weather in Beijing today? What time is it now?"})
run_turn(messages, turn_num=1)
# Turn 2: reasoning_content from Turn 1 is already in messages via assistant_message
print("\n=== Turn 2 ===")
messages.append({"role": "user", "content": "How about Shanghai? And is it hotter or colder than Beijing?"})
run_turn(messages, turn_num=2)
Example Output
Turn 1: The user asks about the weather in Beijing and the current time. After receiving the user message, the model thinks and decides to call both get_current_weather and get_time tools simultaneously (Request 1-1). The client executes the tools and appends the results as role: "tool" messages to messages, then requests the model again. The model generates the final answer based on the tool results (Request 1-2).
=== Turn 1 ===
Request 1-1:
reasoning_content: The user wants to know two things: 1. The current weather in Beijing 2. The current time in Beijing I can call both functions at the same time since they are independent of each other.
content: ""
tool_calls: [ChatCompletionMessageFunctionToolCall(id='call_dd34ce1810be4afbaaa11c9a', function=Function(arguments='{"location": "Beijing"}', name='get_current_weather'), type='function'), ChatCompletionMessageFunctionToolCall(id='call_cf4c667abd094ce090b40f00', function=Function(arguments='{"timezone": "Asia/Shanghai"}', name='get_time'), type='function')]
-> Tool result [get_current_weather]: Sunny 25°C
-> Tool result [get_time]: 2026-05-12 16:37:26 (Asia/Shanghai)
Request 1-2:
reasoning_content: I got the results for both calls. Let me present this information in a friendly way.
content: "Here's the information for Beijing: ☀️ **Weather: **Sunny, 25°C — a lovely day! 🕒 **Current Time: **May 12, 2026 16:37 (Beijing Time) Looks like a beautiful afternoon in Beijing! Perfect weather for being outdoors. Is there anything else you'd like to know? 😊"
tool_calls: None
Turn 2: The user follows up asking about Shanghai's weather and comparing it with Beijing. Since the first turn's assistant message (containing reasoning_content, content, tool_calls) has been accumulated into the conversation history via messages.append(), the model can directly access Beijing's 25°C information from the context, and only needs to call get_current_weather for Shanghai (Request 2-1), then provides a comparison answer based on both cities' weather data (Request 2-2).
=== Turn 2 ===
Request 2-1:
reasoning_content: The user wants to know the weather in Shanghai and compare it with Beijing. I already know Beijing is 25°C, so I just need to get Shanghai's weather.
content: ""
tool_calls: [ChatCompletionMessageFunctionToolCall(id='call_f4fc7fdbfbd14cb497026d21', function=Function(arguments='{"location": "Shanghai"}', name='get_current_weather'), type='function')]
-> Tool result [get_current_weather]: Cloudy 22°C
Request 2-2:
reasoning_content: Shanghai is 22°C and cloudy, while Beijing is 25°C and sunny. So Shanghai is 3°C cooler than Beijing.
content: "Here's the weather for Shanghai: ☁️ **Weather: **Cloudy, 22°C And comparing the two cities:
| City | Temperature | Conditions |
|------|------------|------------|
| Beijing | 25°C | ☀️ Sunny |
| Shanghai | 22°C | ☁️ Cloudy |
**Beijing is 3°C warmer **than Shanghai right now! Beijing also has clearer skies, while Shanghai is a bit cloudier. Both are pleasant temperatures though — great weather in both cities! 😊 Is there anything else you'd like to check?"
tool_calls: None