Irodori-TTS-500M-v3 Demo

Model | GitHub

Flow-matching based Japanese TTS model (500M parameters). Generates speech from text using rectified flow over DACVAE latents.

  • Reference audio: Optional. Upload to condition the speaker voice. Leave blank for unconditional generation.
  • Duration: By default, v3 predicts the output duration automatically. Use Duration Scale for small adjustments or Seconds for exact manual control.
1 120
1 32
0.5 1.5
CFG Guidance Mode
0 10
0 10