MiMo-V2.5-TTS-VoiceClone

数秒参考音频，即可高保真复刻目标音色。

无需训练或标注，克隆后仍可叠加风格指令与情绪标签。

API 接入 Token Plan 订阅模型 Blog

模型规格

模态

输入音频+文本

输出音频

模型能力

音色克隆

语音合成

流式输出

性能

上下文长度8K tokens

最大输出8K tokens

RPM100

TPM10M

模型价格

价格

限时免费

模型优势

高保真音色复刻

少量参考音频（数秒即可），无需任何训练或标注，即可高保真复刻目标音色，不仅还原音色特征，更保留气息、停顿习惯与韵律，听感自然无伪造感。

无需训练即用

零 fine-tuning、零数据标注，提供参考音频即可生成，无需等待训练周期，大幅降低音色复刻的门槛与成本，适合需要快速批量产出的内容创作场景。

完整控制能力继承

克隆后的音色完整继承系列模型的控制能力——自然语言风格指令与内联音频标签均可继续叠加，同一音色可演绎截然不同的角色气质。

跨语言克隆支持

支持中英文跨语言音色克隆，英文参考音频可直接生成英文输出，中文参考音频同样适用，语言切换不影响音色特征的还原精度。

真实任务中的表现

中文克隆：原声迁移

Text

风轻轻地吹过，带来了远方的花香，和记忆里那个夏天的味道。

参考音频

0:00

克隆输出

0:00

克隆后叠加风格指令

Instruct

用尖锐刻薄的嗓音，带着狐假虎威的得意感说话，在提到大人物的身份时故意放慢语速并加重语气，营造压迫感。

Text

你以为我是谁，也敢在这儿跟我耍横？我告诉你，站在我身后的那个人，说出来吓死你——是当今的——万岁爷！你今天要是不给我个说法，我让你这铺子明天就开不了门。

参考音频

0:00

克隆输出

0:00

英文克隆：赛博朋克叙事

Text

Ignore the sirens, ignore the neon bleeding through your eyelids, and just breathe with me. In, and out. They sit up there in their glass towers, thinking they've engineered peace out of algorithms and sterile air. But true stillness isn't manufactured, little bird.

参考音频

0:00

克隆输出

0:00

英文克隆 + 风格叠加：赛后评论员

Instruct

Broadcast this like a blistering post-match pundit tearing into a disastrous performance. Voice is fast and openly fed up, smashing down on loaded words to drive home how badly things fell apart.

Text

No shape, no urgency, no clue what they're trying to do out there. The Ironhawks were top of the league six weeks ago—SIX WEEKS—and now they can't string two passes together without handing it back.

参考音频

0:00

克隆输出

0:00

选择适合你的接入方式

按量计费 API 接入

获取 API Key

在控制台创建账号，生成专属 API Key（TTS 系列当前限时免费）。

API Key

示例代码

上传参考音频并通过 messages 传递文本内容即可复刻目标音色。

import os
import base64
import urllib.request
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1",
)

# Example audio URL
audio_url = "https://example-files.cnbj1.mi-fds.com/example-files/audio/audio_example.wav"
audio_file = "audio_example.wav"

# Download the audio file from URL (skip if already exists)
if not os.path.exists(audio_file):
    urllib.request.urlretrieve(audio_url, audio_file)

# To use a local file directly, replace the above with:
# audio_file = "your_local_audio.wav"
with open(audio_file, "rb") as f:
    audio_bytes = f.read()
voice_base64 = base64.b64encode(audio_bytes).decode("utf-8")

completion = client.chat.completions.create(
    model="mimo-v2.5-tts-voiceclone",
    messages=[
        {
            "role": "user",
            "content": ""
        },
        {
            "role": "assistant",
            "content": "Yes, I had a sandwich."
        }
    ],
    audio={
        "format": "wav",
        "voice": f"data:audio/wav;base64,{voice_base64}"
    }
)

message = completion.choices[0].message
audio_bytes = base64.b64decode(message.audio.data)
with open("audio_file.wav", "wb") as f:
    f.write(audio_bytes)

Token Plan 订阅

购买套餐

包月/包年订阅，覆盖 MiMo V2.5 全系模型，高用量场景性价比显著优于按量计费（TTS 系列当前限时免费，不消耗 credits）。

购买套餐查看定价