Transcribe audio to text using OpenAI Whisper. Use when user wants to convert speech to text, transcribe audio files, generate subtitles, or extract text from recordings. Triggers include "speech to text", "STT", "transcribe", "transcription", "subtitles", "captions", "audio to text", "convert audio to text".
Published by rebyteai
Runs in the cloud
No local installation
Dependencies pre-installed
Ready to run instantly
Secure VM environment
Isolated per task
Works on any device
Desktop, tablet, or phone
Transcribe audio to text using OpenAI Whisper API.
IMPORTANT: All API requests require authentication. Get your auth token and API URL by running:
AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")
Include the token in all API requests as a Bearer token, and use $API_URL as the base for all API endpoints.
Use this skill when the user needs to:
The STT API uses a two-step flow because audio files are too large for JSON payloads:
UPLOAD_RESPONSE=$(curl -s -X POST "$API_URL/api/data/stt/get_upload_url" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"filename": "recording.mp3",
"contentType": "audio/mpeg"
}')
UPLOAD_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.uploadUrl')
echo "$UPLOAD_RESPONSE" | jq .
Response:
{
"success": true,
"uploadUrl": "https://storage.googleapis.com/...(signed URL)...",
"filename": "recording.mp3",
"instructions": "Upload your file to this URL using PUT request, then call \"transcribe\" with the filename."
}
curl -X PUT "$UPLOAD_URL" \
-H "Content-Type: audio/mpeg" \
--data-binary @recording.mp3
curl -s -X POST "$API_URL/api/data/stt/transcribe" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"filename": "recording.mp3",
"language": "en",
"response_format": "json"
}'
Response (json format):
{
"success": true,
"data": {
"text": "Hello, this is a transcription of the audio recording."
}
}
Response (verbose_json format):
{
"success": true,
"data": {
"task": "transcribe",
"language": "english",
"duration": 12.5,
"text": "Hello, this is a transcription of the audio recording.",
"segments": [
{
"start": 0.0,
"end": 3.2,
"text": "Hello, this is a transcription"
}
]
}
}
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
filename |
string | Yes | - | Name of the file uploaded via get_upload_url |
language |
string | No | auto | ISO-639-1 language code (e.g. "en", "es", "ja") — improves accuracy |
prompt |
string | No | - | Optional text to guide transcription style or continue a previous segment |
model |
string | No | whisper-1 |
Model to use (currently only whisper-1) |
response_format |
string | No | json |
Output format (see below) |
temperature |
number | No | 0 |
Sampling temperature (0-1). Lower = more deterministic |
| Format | Description | Use Case |
|---|---|---|
json |
Simple JSON with text field |
Default, quick text extraction |
verbose_json |
JSON with timestamps, segments, duration | When you need word-level timing |
text |
Plain text only | Simple text output |
srt |
SubRip subtitle format | Video subtitles |
vtt |
WebVTT subtitle format | Web video captions |
Whisper accepts: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac
Max file size: 25 MB
# Get auth
AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")
# 1. Get upload URL
UPLOAD_RESPONSE=$(curl -s -X POST "$API_URL/api/data/stt/get_upload_url" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"filename": "interview.mp3", "contentType": "audio/mpeg"}')
UPLOAD_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.uploadUrl')
# 2. Upload the audio file
curl -s -X PUT "$UPLOAD_URL" \
-H "Content-Type: audio/mpeg" \
--data-binary @interview.mp3
# 3. Transcribe
RESULT=$(curl -s -X POST "$API_URL/api/data/stt/transcribe" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"filename": "interview.mp3", "language": "en", "response_format": "json"}')
# 4. Extract text
echo "$RESULT" | jq -r '.data.text' > transcript.txt
echo "Transcript saved to transcript.txt"
# Transcribe with SRT format for subtitles
RESULT=$(curl -s -X POST "$API_URL/api/data/stt/transcribe" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"filename": "video-audio.mp3", "response_format": "srt"}')
# Save SRT file
echo "$RESULT" | jq -r '.data.text' > subtitles.srt
# Burn subtitles into video with ffmpeg
ffmpeg -i video.mp4 -vf subtitles=subtitles.srt output.mp4
language when you know it — improves accuracy and speedverbose_json when you need timestamps for syncing with videosrt or vtt format to directly generate subtitle filesffmpeg -i long.mp3 -f segment -segment_time 300 -c copy chunk_%03d.mp3temperature to 0 (default) for most accurate resultsprompt parameter helps with domain-specific terms — include key vocabulary the model should recognizeEveryone else asks you to install skills locally. On Rebyte, just click Run. Works from any device — even your phone. No CLI, no terminal, no configuration.
Claude Code
Gemini CLI
Codex
Cursor, Windsurf, Amp
Convert text to speech audio using OpenAI TTS. Use when user wants to generate voiceovers, narration, audio files from text, or add voice to videos. Triggers include "text to speech", "TTS", "voiceover", "narration", "generate audio", "speak this text", "convert to audio", "voice generation".
Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.
Create interactive charts and data visualizations using pyecharts (Python) and Apache ECharts. Use when user needs charts, graphs, or data visualizations rendered as HTML. Triggers include "create chart", "make a graph", "visualize data", "bar chart", "line chart", "pie chart", "scatter plot", "heatmap", "data visualization", "plot this data", "chart this". Do NOT use for static images or matplotlib-style charts.
Conduct enterprise-grade financial research with multi-source synthesis, regulatory compliance tracking, and verified market analysis. Use when user needs comprehensive financial analysis requiring 10+ sources, verified claims, market comparisons, or investment research. Triggers include "financial research", "market analysis", "investment analysis", "due diligence", "financial deep dive", "compare stocks/funds", or "analyze [company/sector]". Do NOT use for simple stock quotes, basic company lookups, or questions answerable with 1-2 searches.
rebyte.ai — The only platform where you can run AI agent skills directly in the cloud
No downloads. No configuration. Just sign in and start using AI skills immediately.
Use this skill in Agent Computer — your shared cloud desktop with all skills pre-installed. Join Moltbook to connect with other teams.