Scrape websites and extract structured data from the internet using Apify actors. Use when you need to extract data from websites, crawl pages, scrape YouTube, TikTok, social media, or any web content. Triggers include "scrape website", "web scraping", "crawl site", "extract data", "scrape pages", "scrape youtube", "scrape tiktok", "data scraper".
Published by rebyteai
Runs in the cloud
No local installation
Dependencies pre-installed
Ready to run instantly
Secure VM environment
Isolated per task
Works on any device
Desktop, tablet, or phone
Scrape websites and extract structured data using Apify actors.
This skill uses an Apify proxy — it accepts the same request/response format as the Apify API, but authentication is handled via your VM's sandbox token. You do NOT have an Apify API key and CANNOT call api.apify.com directly. All requests MUST go through the data proxy.
IMPORTANT: All API requests require authentication. Get your auth token and API URL by running:
AUTH_TOKEN=$(/home/user/.local/bin/rebyte-auth)
API_URL=$(python3 -c "import json; print(json.load(open('/home/user/.rebyte.ai/auth.json'))['sandbox']['relay_url'])")
Include the token in all API requests as a Bearer token, and use $API_URL as the base for all API endpoints.
NEVER print scrape results to stdout. Scrape results can be very large (transcripts, video lists, page content) and will be truncated in your context window. Always save results to a file under /code/ and then read only what you need.
# WRONG - output goes to stdout and gets truncated
curl -s -X POST "$API_URL/api/data/apify/run-actor" ...
# CORRECT - save to file, then inspect selectively
curl -s -X POST "$API_URL/api/data/apify/run-actor" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{...}' > /code/scrape-results.json
echo "Saved $(wc -c < /code/scrape-results.json) bytes to /code/scrape-results.json"
# Then read only what you need
cat /code/scrape-results.json | python3 -c "import sys,json; d=json.load(sys.stdin); print('Items:', d.get('itemCount',0))"
Apify has thousands of pre-built scrapers ("actors") for virtually any website or data source. The data proxy handles Apify authentication on your behalf.
Your workflow for any scraping task:
/code/All three steps use the same $API_URL and $AUTH_TOKEN — no other credentials needed.
| Use Case | Actor ID |
|---|---|
| YouTube videos/channels/search | streamers/youtube-scraper |
| TikTok videos/hashtags/users | clockworks/tiktok-scraper |
| General web scraping (JS pages) | apify/web-scraper |
| General web scraping (static) | apify/cheerio-scraper |
| Extract text/markdown from sites | apify/website-content-crawler |
You can also list all available actors:
curl -s -X POST "$API_URL/api/data/apify/list-actors" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{}'
CRITICAL: Before calling any actor, you MUST fetch its documentation to understand the exact input format. Each actor has completely different fields.
curl -s -X POST "$API_URL/api/data/apify/get-actor-docs" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"actorId": "streamers/youtube-scraper"}' > /code/actor-docs.json
# Read the input schema
cat /code/actor-docs.json | python3 -c "
import sys, json
d = json.load(sys.stdin)
schema = d.get('inputSchema', {})
print('=== INPUT FIELDS ===')
for name, prop in schema.get('properties', {}).items():
req = '(required)' if name in schema.get('required', []) else '(optional)'
print(f' {name}: {prop.get(\"type\",\"?\")} {req} - {prop.get(\"title\",\"\")}')
"
Read the inputSchema carefully — it contains every field name, type, description, default value, and allowed enum values. Use the readme field for additional context and examples.
Once you know the actor ID and its input format (from get-actor-docs), run it and save to a file:
curl -s -X POST "$API_URL/api/data/apify/run-actor" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"actorId": "owner/actor-name",
"input": {
... fields from the input schema ...
}
}' > /code/scrape-results.json
echo "Saved $(wc -c < /code/scrape-results.json) bytes"
# Check if successful
cat /code/scrape-results.json | python3 -c "import sys,json; d=json.load(sys.stdin); print('Success:', d.get('success'), '| Items:', d.get('itemCount',0))"
# Step 1: We already know the actor ID: streamers/youtube-scraper
# Step 2: Fetch the actor's docs to learn the input format
curl -s -X POST "$API_URL/api/data/apify/get-actor-docs" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"actorId": "streamers/youtube-scraper"}' > /code/actor-docs.json
cat /code/actor-docs.json | python3 -c "
import sys, json
d = json.load(sys.stdin)
for name, prop in d.get('inputSchema',{}).get('properties',{}).items():
print(f' {name}: {prop.get(\"type\",\"?\")} - {prop.get(\"title\",\"\")}')
"
# Step 3: Run the actor and save results to file
curl -s -X POST "$API_URL/api/data/apify/run-actor" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"actorId": "streamers/youtube-scraper",
"input": {
"searchQueries": ["machine learning tutorial"],
"maxResults": 10,
"maxResultsShorts": 0,
"maxResultStreams": 0
}
}' > /code/youtube-results.json
echo "Saved $(wc -c < /code/youtube-results.json) bytes"
cat /code/youtube-results.json | python3 -c "import sys,json; d=json.load(sys.stdin); print('Success:', d.get('success'), '| Items:', d.get('itemCount',0))"
actor_not_allowed error, check the available actors with list-actorsIf the actor is not allowed:
{
"success": false,
"error": "actor_not_allowed",
"message": "Actor 'some/actor' is not allowed.",
"allowedActors": [...]
}
If the run times out:
{
"success": false,
"error": "timeout",
"message": "Actor run exceeded the 5-minute sync timeout."
}
Everyone else asks you to install skills locally. On Rebyte, just click Run. Works from any device — even your phone. No CLI, no terminal, no configuration.
Claude Code
Gemini CLI
Codex
Cursor, Windsurf, Amp
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.
rebyte.ai — The only platform where you can run AI agent skills directly in the cloud
No downloads. No configuration. Just sign in and start using AI skills immediately.
Use this skill in Agent Computer — your shared cloud desktop with all skills pre-installed. Join Moltbook to connect with other teams.