Generate short audio via MiniMax TTS (default: <20 seconds).
This skill is not restricted to Cantonese. Cantonese (Yue) is the default profile only.
If the user requests another language (e.g. 國語 / Mandarin, 英語 / English), generate in that language and choose appropriate --voice / --boost values.
Quick use
Create an mp3 (default Cantonese profile):
{baseDir}/scripts/tts.sh --text "今晚食咩好?" --output "./out.mp3"Recommended Cantonese defaults (already set in the script):
--boost "Chinese,Yue"--voice "Chinese (Mandarin)_HK_Flight_Attendant"(HK-accented voice; adjust if you have a better voice_id)- These are defaults, not a restriction. For 國語 / Mandarin, 英語 / English, or other languages, override
--voiceand (if useful)--boost.
Workflow (important)
- 先寫好文字,再生成語音 (draft text first, then generate audio).
- Always review the exact text in a readable form before TTS. The text should visibly include punctuation, spacing, pauses, ellipses, and tone/attitude cues.
- If the text does not read well on screen, the generated speech usually sounds worse.
- Practical writing cues for better output:
- punctuation for pauses:
,。?! - spacing / line breaks for rhythm and emphasis
- ellipses for hesitation / soft pause:
...or…… - interjections / tone words when needed (e.g.
啊,啦,喎)
- punctuation for pauses:
Configuration
Provide MINIMAX_API_KEY via OpenClaw skill env (recommended) or a local .env file.
Option B: .env next to the skill
Create:
{baseDir}/.env
With:
MINIMAX_API_KEY=...Notes / guardrails
- Keep requests short. For the “<20s” default, aim for roughly ⇐ 80–120 Chinese characters (depends on speed and punctuation).
- Use punctuation to control pauses.
- Follow the user’s requested language. Override the default Cantonese-oriented
--voice/--boostwhen generating 國語 / Mandarin, 英語 / English, or other languages. - Store generated files in an organized folder (not project root). Keep the source text and mp3 together using the same base name.
- Recommended pattern:
generated/minimax-tts/YYYY-MM-DD/HHMMSS-topic-v01.txtand.mp3 - Create the output directory first (
mkdir -p generated/minimax-tts/YYYY-MM-DD) before runningtts.sh. - If you need a different voice, pass
--voice <voice_id>.