Convert text to speech audio using OpenAI TTS. Use when user wants to generate voiceovers, narration, audio files from text, or add voice to videos. Triggers include "text to speech", "TTS", "voiceover", "narration", "generate audio", "speak this text", "convert to audio", "voice generation".
Published by rebyteai
Runs in the cloud
No local installation
Dependencies pre-installed
Ready to run instantly
Secure VM environment
Isolated per task
Works on any device
Desktop, tablet, or phone
Convert text to high-quality speech audio using OpenAI TTS API.
IMPORTANT: All API requests require authentication. Get your auth token and API URL by running:
AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")
Include the token in all API requests as a Bearer token, and use $API_URL as the base for all API endpoints.
Use this skill when the user needs to:
curl -X POST "$API_URL/api/data/tts/synthesize" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a sample voiceover.",
"voice": "nova",
"model": "tts-1",
"format": "mp3",
"speed": 1.0
}'
Response:
{
"success": true,
"audio": {
"base64": "//uQxAAAAAANIAAAAAExBTUUzLjEw...",
"format": "mp3",
"mimeType": "audio/mpeg",
"sizeBytes": 24576
},
"input": {
"characterCount": 35,
"wordCount": 7,
"voice": "nova",
"model": "tts-1",
"speed": 1.0
}
}
After receiving the response, decode the base64 audio and save it:
# Extract base64 from response and save as MP3
echo '<base64_audio_content>' | base64 -d > voiceover.mp3
Or in a script:
# Full workflow
RESPONSE=$(curl -s -X POST "$API_URL/api/data/tts/synthesize" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"text": "Your text here", "voice": "nova"}')
# Extract base64 and save
echo "$RESPONSE" | jq -r '.audio.base64' | base64 -d > voiceover.mp3
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | Yes | - | Text to convert (max 4096 characters, ~700 words) |
voice |
string | No | nova |
Voice selection (see below) |
model |
string | No | tts-1 |
Quality model |
format |
string | No | mp3 |
Audio format |
speed |
number | No | 1.0 |
Speech speed (0.25 to 4.0) |
| Voice | Style | Best For |
|---|---|---|
nova |
Female, friendly | Narration, tutorials (recommended) |
alloy |
Neutral, versatile | General purpose |
echo |
Male, warm | Conversational content |
fable |
British, expressive | Storytelling, dramatic |
onyx |
Male, deep | Authoritative, professional |
shimmer |
Female, soft | Calm, soothing content |
| Model | Description |
|---|---|
tts-1 |
Faster, good for drafts and testing |
tts-1-hd |
Higher quality, better for final output |
| Format | Use Case |
|---|---|
mp3 |
Best compatibility, recommended for video |
wav |
Uncompressed, high quality |
opus |
Efficient streaming |
aac |
Apple devices |
flac |
Lossless compression |
The API has a 4096 character limit (~700 words) per request. For longer text:
# Example: Combine multiple audio chunks
ffmpeg -i "concat:chunk1.mp3|chunk2.mp3|chunk3.mp3" -c copy final.mp3
Add the generated voiceover to a video file:
# Replace video audio with voiceover
ffmpeg -i video.mp4 -i voiceover.mp3 -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 output.mp4
# Mix voiceover with existing audio (voiceover at 80% volume)
ffmpeg -i video.mp4 -i voiceover.mp3 -filter_complex "[1:a]volume=0.8[voice];[0:a][voice]amix=inputs=2:duration=first" -c:v copy output.mp4
# Get auth
AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")
# Generate narration
curl -s -X POST "$API_URL/api/data/tts/synthesize" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to our product demo. Today I will show you the key features that make our solution stand out from the competition.",
"voice": "nova",
"model": "tts-1-hd",
"speed": 0.95
}' | jq -r '.audio.base64' | base64 -d > narration.mp3
echo "Saved narration.mp3"
After generating audio files, upload them to the Artifact Store so the user can access them.
nova voice for most narration - it sounds natural and friendlytts-1-hd model for final output, tts-1 for testingspeed to 0.9-0.95 for clearer narrationmp3 format for video compatibilityEveryone else asks you to install skills locally. On Rebyte, just click Run. Works from any device — even your phone. No CLI, no terminal, no configuration.
Claude Code
Gemini CLI
Codex
Cursor, Windsurf, Amp
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Research a topic and produce a podcast episode with AI-generated voices. Use when user wants to create a podcast, audio episode, narrated discussion, or audio content from a topic or document. Triggers include "create a podcast", "make a podcast episode", "podcast about", "audio episode", "narrated discussion", "turn this into a podcast".
Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.
Generate images from text prompts or edit existing images using Google Nano Banana 2 (Gemini 3.1 Flash image generation) via Rebyte data API. Supports multi-size output (512px–4K), improved text rendering, and multi-image input. Use for text-to-image generation or image-to-image editing/enhancement. Triggers include "generate image", "create image", "make a picture", "draw", "illustrate", "image of", "picture of", "edit image", "modify image", "enhance image", "style transfer", "nano banana".
rebyte.ai — The only platform where you can run AI agent skills directly in the cloud
No downloads. No configuration. Just sign in and start using AI skills immediately.
Use this skill in Agent Computer — your shared cloud desktop with all skills pre-installed. Join Moltbook to connect with other teams.