Text To Speech

Convert text to speech audio using OpenAI TTS. Use when user wants to generate voiceovers, narration, audio files from text, or add voice to videos. Triggers include "text to speech", "TTS", "voiceover", "narration", "generate audio", "speak this text", "convert to audio", "voice generation".

Published by rebyteai

Featured Slash Menu Automation

Cloud-native skill

Runs in the cloud

No local installation

Dependencies pre-installed

Ready to run instantly

Secure VM environment

Isolated per task

Works on any device

Desktop, tablet, or phone

Documentation

Text to Speech

Convert text to high-quality speech audio using OpenAI TTS API.

Authentication

IMPORTANT: All API requests require authentication. Get your auth token and API URL by running:

AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")

Include the token in all API requests as a Bearer token, and use $API_URL as the base for all API endpoints.

When to Use

Use this skill when the user needs to:

  • Generate voiceovers for videos
  • Create audio narration from text
  • Convert written content to spoken audio
  • Add voice to presentations or demos

Synthesize Speech

curl -X POST "$API_URL/api/data/tts/synthesize" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a sample voiceover.",
    "voice": "nova",
    "model": "tts-1",
    "format": "mp3",
    "speed": 1.0
  }'

Response:

{
  "success": true,
  "audio": {
    "base64": "//uQxAAAAAANIAAAAAExBTUUzLjEw...",
    "format": "mp3",
    "mimeType": "audio/mpeg",
    "sizeBytes": 24576
  },
  "input": {
    "characterCount": 35,
    "wordCount": 7,
    "voice": "nova",
    "model": "tts-1",
    "speed": 1.0
  }
}

Save Audio to File

After receiving the response, decode the base64 audio and save it:

# Extract base64 from response and save as MP3
echo '<base64_audio_content>' | base64 -d > voiceover.mp3

Or in a script:

# Full workflow
RESPONSE=$(curl -s -X POST "$API_URL/api/data/tts/synthesize" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text here", "voice": "nova"}')

# Extract base64 and save
echo "$RESPONSE" | jq -r '.audio.base64' | base64 -d > voiceover.mp3

Parameters

Parameter Type Required Default Description
text string Yes - Text to convert (max 4096 characters, ~700 words)
voice string No nova Voice selection (see below)
model string No tts-1 Quality model
format string No mp3 Audio format
speed number No 1.0 Speech speed (0.25 to 4.0)

Available Voices

Voice Style Best For
nova Female, friendly Narration, tutorials (recommended)
alloy Neutral, versatile General purpose
echo Male, warm Conversational content
fable British, expressive Storytelling, dramatic
onyx Male, deep Authoritative, professional
shimmer Female, soft Calm, soothing content

Quality Models

Model Description
tts-1 Faster, good for drafts and testing
tts-1-hd Higher quality, better for final output

Audio Formats

Format Use Case
mp3 Best compatibility, recommended for video
wav Uncompressed, high quality
opus Efficient streaming
aac Apple devices
flac Lossless compression

Handling Long Text

The API has a 4096 character limit (~700 words) per request. For longer text:

  1. Split at sentence boundaries - Break text into chunks of ~3500 characters
  2. Call synthesize for each chunk - Generate audio files for each part
  3. Concatenate with ffmpeg - Combine the audio files
# Example: Combine multiple audio chunks
ffmpeg -i "concat:chunk1.mp3|chunk2.mp3|chunk3.mp3" -c copy final.mp3

Combine with Video

Add the generated voiceover to a video file:

# Replace video audio with voiceover
ffmpeg -i video.mp4 -i voiceover.mp3 -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 output.mp4

# Mix voiceover with existing audio (voiceover at 80% volume)
ffmpeg -i video.mp4 -i voiceover.mp3 -filter_complex "[1:a]volume=0.8[voice];[0:a][voice]amix=inputs=2:duration=first" -c:v copy output.mp4

Example: Generate Narration

# Get auth
AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")

# Generate narration
curl -s -X POST "$API_URL/api/data/tts/synthesize" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to our product demo. Today I will show you the key features that make our solution stand out from the competition.",
    "voice": "nova",
    "model": "tts-1-hd",
    "speed": 0.95
  }' | jq -r '.audio.base64' | base64 -d > narration.mp3

echo "Saved narration.mp3"

Delivering Output

After generating audio files, upload them to the Artifact Store so the user can access them.

Tips

  • Use nova voice for most narration - it sounds natural and friendly
  • Use tts-1-hd model for final output, tts-1 for testing
  • Set speed to 0.9-0.95 for clearer narration
  • Always use mp3 format for video compatibility
  • Check character count before calling - split if over 4000 characters

Skill as a Service

Everyone else asks you to install skills locally. On Rebyte, just click Run. Works from any device — even your phone. No CLI, no terminal, no configuration.

  • Zero setup required
  • Run from any device, including mobile
  • Results streamed in real-time
  • Runs while you sleep

Compatible agents

Claude Code

Gemini CLI

Codex

Cursor, Windsurf, Amp

Related Skills

browser-automation

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

FeaturedSlash MenuAutomation

podcast

Research a topic and produce a podcast episode with AI-generated voices. Use when user wants to create a podcast, audio episode, narrated discussion, or audio content from a topic or document. Triggers include "create a podcast", "make a podcast episode", "podcast about", "audio episode", "narrated discussion", "turn this into a podcast".

FeaturedSlash MenuAutomation

deep-research

Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.

FeaturedSlash MenuResearch

nano-banana

Generate images from text prompts or edit existing images using Google Nano Banana 2 (Gemini 3.1 Flash image generation) via Rebyte data API. Supports multi-size output (512px–4K), improved text rendering, and multi-image input. Use for text-to-image generation or image-to-image editing/enhancement. Triggers include "generate image", "create image", "make a picture", "draw", "illustrate", "image of", "picture of", "edit image", "modify image", "enhance image", "style transfer", "nano banana".

FeaturedSlash MenuDesign

rebyte.ai — The only platform where you can run AI agent skills directly in the cloud

No downloads. No configuration. Just sign in and start using AI skills immediately.

Use this skill in Agent Computer — your shared cloud desktop with all skills pre-installed. Join Moltbook to connect with other teams.