模型规格
模态
模型能力
性能
模型价格
模型优势
高保真音色复刻
少量参考音频(数秒即可),无需任何训练或标注,即可高保真复刻目标音色,不仅还原音色特征,更保留气息、停顿习惯与韵律,听感自然无伪造感。
无需训练即用
零 fine-tuning、零数据标注,提供参考音频即可生成,无需等待训练周期,大幅降低音色复刻的门槛与成本,适合需要快速批量产出的内容创作场景。
完整控制能力继承
克隆后的音色完整继承系列模型的控制能力——自然语言风格指令与内联音频标签均可继续叠加,同一音色可演绎截然不同的角色气质。
跨语言克隆支持
支持中英文跨语言音色克隆,英文参考音频可直接生成英文输出,中文参考音频同样适用,语言切换不影响音色特征的还原精度。
真实任务中的表现
中文克隆:原声迁移
Text
风轻轻地吹过,带来了远方的花香,和记忆里那个夏天的味道。
克隆后叠加风格指令
Instruct
用尖锐刻薄的嗓音,带着狐假虎威的得意感说话,在提到大人物的身份时故意放慢语速并加重语气,营造压迫感。
Text
你以为我是谁,也敢在这儿跟我耍横?我告诉你,站在我身后的那个人,说出来吓死你——是当今的——万岁爷!你今天要是不给我个说法,我让你这铺子明天就开不了门。
英文克隆:赛博朋克叙事
Text
Ignore the sirens, ignore the neon bleeding through your eyelids, and just breathe with me. In, and out. They sit up there in their glass towers, thinking they've engineered peace out of algorithms and sterile air. But true stillness isn't manufactured, little bird.
英文克隆 + 风格叠加:赛后评论员
Instruct
Broadcast this like a blistering post-match pundit tearing into a disastrous performance. Voice is fast and openly fed up, smashing down on loaded words to drive home how badly things fell apart.
Text
No shape, no urgency, no clue what they're trying to do out there. The Ironhawks were top of the league six weeks ago—SIX WEEKS—and now they can't string two passes together without handing it back.
选择适合你的接入方式
按量计费 API 接入
示例代码
上传参考音频并通过 messages 传递文本内容即可复刻目标音色。
import os
import base64
import urllib.request
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1",
)
# Example audio URL
audio_url = "https://example-files.cnbj1.mi-fds.com/example-files/audio/audio_example.wav"
audio_file = "audio_example.wav"
# Download the audio file from URL (skip if already exists)
if not os.path.exists(audio_file):
urllib.request.urlretrieve(audio_url, audio_file)
# To use a local file directly, replace the above with:
# audio_file = "your_local_audio.wav"
with open(audio_file, "rb") as f:
audio_bytes = f.read()
voice_base64 = base64.b64encode(audio_bytes).decode("utf-8")
completion = client.chat.completions.create(
model="mimo-v2.5-tts-voiceclone",
messages=[
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"content": "Yes, I had a sandwich."
}
],
audio={
"format": "wav",
"voice": f"data:audio/wav;base64,{voice_base64}"
}
)
message = completion.choices[0].message
audio_bytes = base64.b64decode(message.audio.data)
with open("audio_file.wav", "wb") as f:
f.write(audio_bytes)