模型规格
模态
模型能力
性能
模型价格
模型优势
自然语言生成音色
通过一句自然语言描述即可凭空生成全新音色,无需任何参考音频样本,让音色创作像写一段文字一样直观高效。
多维度音色定制
可从年龄、性别、口音、音色、语速、气质、录音质感等任意维度自由描述,支持从 ASMR 双耳女声到纪录片磁性旁白的跨度极大的音色定义。
复杂模糊描述理解
基于大规模预训练,模型对复杂、模糊甚至矛盾的描述具备良好理解能力,既能还原从未听过的独特音色,也能精准复刻带有鲜明特征的经典人物原型。
多语言多风格覆盖
支持中英文及多种口音的音色设计,中文描述可生成普通话、方言音色,英文描述可生成不同国家口音,语言与风格跨度灵活。
真实任务中的表现
俄罗斯口音中年男性
Instruct
Heavy Russian accent, gruff middle-aged male, blunt and matter-of-fact.
Text
You want my opinion? Fine. This plan will not work. I have seen many plan like this before, all fail. You think you are special? No. You are not. But you never listen, so go, try. When everything fall apart, I will be here, drinking tea. I already told you.
ASMR 双耳女声
Instruct
Young female, extreme close-up with a binaural, ear-to-ear ASMR feel. Audible breathing, subtle swallowing, and soft natural lip sounds. She speaks very slowly, creating a deeply relaxing and immersive experience.
Text
[Whispering in your ear] Shhh... just relax, come a little closer. I'm right here beside you now. Breathe slowly and gently, and let your mind drift, as if you're sinking into warm water.
纪录片旁白
Instruct
一位中年男性,说标准普通话,嗓音低沉有磁性,带有轻微的沙哑质感,像纪录片旁白解说员,沉稳而有感染力。
Text
当最后一缕阳光消失在地平线之下,这片沉睡了亿万年的大地开始显露它真正的面貌。在这寂静的荒野中,每一块岩石都记录着时间的流逝,每一阵风都在诉说着古老的故事。
年迈老先生旁白
Instruct
一位年迈的老先生,说带北方口音的普通话,语速缓慢而沉稳,嗓音略带沙哑和沧桑感,仿佛一位饱经风霜的老爷爷在讲故事,充满岁月的智慧。
Text
我这辈子啊,走南闯北六十多年。见过最热闹的集市,也见过最安静的戈壁。到头来才明白一个道理——这人哪,不在走了多远的路,在于记住了多少风景。年轻人,别光顾着赶路,偶尔也停下来看看天。
选择适合你的接入方式
按量计费 API 接入
示例代码
通过 messages 传递音色描述和文本内容即可调用。
import os, base64
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1",
)
completion = client.chat.completions.create(
model="mimo-v2.5-tts-voicedesign",
messages=[
{"role": "user", "content": "Give me a young male tone."},
{"role": "assistant", "content": "Yes, I had a sandwich."}
],
audio={"format": "wav", "optimize_text_preview": True}
)
message = completion.choices[0].message
audio_bytes = base64.b64decode(message.audio.data)
with open("audio_file.wav", "wb") as f:
f.write(audio_bytes)