Skip to content

SoulX Podcast-1.7B

Deploy SoulX Podcast-1.7B on a Spheron GPU instance. SoulX Podcast-1.7B is a 1.7B parameter speech generation model that produces multi-speaker podcast dialogues with speaker switching, zero-shot voice cloning, and paralinguistic elements such as laughter and sighs. It supports English, Mandarin, and several Chinese dialects.

Key capabilities

  • Multi-speaker dialogue: Maintains consistency across turns and handles interruptions
  • Zero-shot voice cloning: Replicates voices from 10 to 30 second reference samples
  • Paralinguistics: Generates laughter, sighs, throat clearing, and intonation shifts
  • Multi-language: Supports English, Mandarin, Sichuanese, Henanese, and Cantonese
  • Efficient deployment: Runs on GPUs ranging from RTX 4060 to H100

GPU Tiers

Resources:

Manual setup

Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.

Step 1: Provision a Spheron instance

  1. Sign up at app.spheron.ai
  2. Add credits (card/crypto)
  3. Click DeployRTX 4090 (or RTX 4060+ for testing) → Region → Ubuntu 22.04 → add your SSH key → Deploy

See Getting Started or SSH Connection for details.

Step 2: Connect to your instance

ssh <user>@<ipAddress>

Replace <user> with the username shown in the instance details panel (e.g., root or ubuntu) and <ipAddress> with your instance's public IP.

Step 3: Set up the environment

sudo apt update && apt install -y software-properties-common curl ca-certificates
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 4: Install Python 3.11

sudo apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Step 5: Install Miniconda

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
$HOME/miniconda3/bin/conda init bash
source ~/.bashrc

Step 6: Create conda environment

conda create -n soulxpodcast -y python=3.11

Accept ToS if prompted:

conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

Step 7: Activate environment

conda activate soulxpodcast

Step 8: Clone the repository

git clone https://github.com/Soul-AILab/SoulX-Podcast.git
cd SoulX-Podcast

Step 9: Install dependencies

pip install -r requirements.txt
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
pip install "transformers==4.57.1" "huggingface_hub<1.0,>=0.34.0"

Download models

Base model (English/Mandarin)

huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B \
  --local-dir pretrained_models/SoulX-Podcast-1.7B

Dialect model (Sichuanese/Henanese/Cantonese)

huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B-dialect \
  --local-dir pretrained_models/SoulX-Podcast-1.7B-dialect

Test the model

bash example/infer_dialogue.sh

Check outputs/ directory for generated .wav files.

Launch the WebUI

Modify webui.py

Change share=False to share=True:

# In webui.py:
share=True

Start the WebUI

Base model:

python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B

Dialect model:

python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B-dialect

Open the Gradio link printed in the terminal (e.g., https://baccd06ba693323c35.gradio.live) to access the interface.

Troubleshooting

Issue: Low audio quality

Set a higher sample rate:

python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B --sample_rate 48000

Issue: Out-of-memory (OOM) error

Increase the CUDA memory split size:

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

Issue: Model download failures

Set a custom cache directory and retry the download:

export HF_HOME=/path/to/cache
huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B

Issue: CUDA errors

Verify CUDA availability, check GPU status, and reinstall PyTorch if needed:

python -c "import torch; print(torch.cuda.is_available())"
nvidia-smi
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Issue: Gradio interface not accessible

Ensure share=True is set in webui.py, then allow the port through the firewall or specify an alternate port:

sudo ufw allow 7860/tcp
python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B --server_port 8080

Cloud-init startup script (optional)

If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the environment setup. After the instance is ready, SSH in, activate the conda environment, and follow the Download Models and Launch WebUI steps.

#cloud-config
runcmd:
  - apt update && apt install -y software-properties-common curl ca-certificates git
  - add-apt-repository -y ppa:deadsnakes/ppa
  - apt update
  - apt install -y python3.11 python3.11-venv python3.11-dev
  - curl -fsSL -o /tmp/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  - bash /tmp/miniconda.sh -b -p /opt/miniconda3
  - /opt/miniconda3/bin/conda create -n soulxpodcast -y python=3.11
  - /opt/miniconda3/bin/conda run -n soulxpodcast pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
  - git clone https://github.com/Soul-AILab/SoulX-Podcast.git /opt/SoulX-Podcast
  - /opt/miniconda3/bin/conda run -n soulxpodcast pip install -r /opt/SoulX-Podcast/requirements.txt
  - /opt/miniconda3/bin/conda run -n soulxpodcast pip install "transformers==4.57.1" "huggingface_hub<1.0,>=0.34.0"

After cloud-init completes, SSH in and download the model:

source /opt/miniconda3/bin/activate soulxpodcast
cd /opt/SoulX-Podcast
huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B \
  --local-dir pretrained_models/SoulX-Podcast-1.7B
python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B

What's next