VoxCPM TTS — KVM4 (CPU-only)

CPU inference: ~tens of seconds per phrase. First request downloads the model (~couple min).