İçeriğe geç

Data Refresh

Bu içerik henüz dilinizde mevcut değil.

When you create a view in Agent Context, the result is materialized into a fast local cache (DuckDB). This cache needs to be refreshed when your source data changes. Agent Context gives you three ways to control this.

Every view has a refresh interval that controls how often the cache is automatically rebuilt from source data.

IntervalBehavior
Every 5 minutesRebuilds cache every 5 minutes
Every 15 minutesRebuilds cache every 15 minutes
Every hourRebuilds cache every hour
Every 6 hoursRebuilds cache every 6 hours
Every 24 hoursRebuilds cache daily
Manual onlyNever auto-refreshes. You trigger refreshes yourself.

Set the refresh interval when creating or editing a view — it’s the Refresh dropdown in the view editor.

When to use automatic intervals: Your source data changes on a predictable schedule (e.g., a database that updates hourly) and you’re okay with data being slightly stale between refreshes.

When to use Manual only: Your source data changes at unpredictable times (e.g., a Parquet file on S3 that gets replaced by an ETL pipeline) and you want to trigger refreshes yourself, either via the UI, a webhook, or S3 notifications.

Click Refresh Now in the view editor to immediately rebuild all view caches from source data. This is useful for:

  • Testing that your view SQL returns the expected results after a source data update
  • One-off refreshes when you know the source data just changed
  • Views set to “Manual only” that you want to update on demand

The refresh takes a few seconds. During the refresh, existing cached data remains queryable — there is no downtime.

For automated refresh triggers, you can create an inbound webhook — a URL that external systems call to trigger a refresh.

Your System (S3, ETL, cron)
|
| POST https://api.rebyte.ai/api/hooks/cl/clwh_...
|
v
Agent Context refreshes all view caches

When the webhook URL is called, Agent Context re-reads all sources and rebuilds all materialized view caches.

Your webhook URL is shown in the view editor — open any view and you’ll see it below the description field, with a Copy button.

The webhook is automatically created for your organization the first time you view it. One webhook per organization — it triggers a refresh for all views.

Call the webhook URL with a POST request (no body required):

Terminal window
curl -X POST https://api.rebyte.ai/api/hooks/cl/clwh_your_token_here

Returns 200 OK when the refresh completes.

Wire your S3 bucket to call the webhook when a Parquet file changes:

  1. Copy your webhook URL from the view editor
  2. In AWS, set up an S3 Event Notification on your bucket
  3. Route the notification to an SNS topic or Lambda function
  4. Have it POST to your webhook URL

Now when your Parquet file updates, Agent Context automatically refreshes all views.

Webhooks are rate-limited to one trigger per 30 seconds. If called more frequently, extra calls are silently ignored.

For S3 datasets, you can enable automatic change detection. Instead of setting up a webhook manually, Agent Context provisions an SQS queue and monitors it for S3 event notifications.

S3 bucket (file changes)
|
| S3 Event Notification → SQS queue (auto-provisioned)
|
v
Agent Context detects change → refreshes dataset automatically
  1. Open your S3 dataset and click Edit
  2. Enable S3 Change Notifications
  3. Copy the SQS queue ARN shown in the setup instructions
  4. In your S3 bucket, go to Properties → Event notifications → Create event notification
  5. Set events to ObjectCreated and ObjectRemoved
  6. Set destination to SQS queue and paste the ARN
  7. Done — changes are detected automatically

Recommended: Use a prefix filter. When creating the S3 event notification, set a Prefix that matches your dataset’s path (e.g. golden/parquet/). This avoids sending events for unrelated files in the same bucket, which reduces unnecessary processing and prevents false triggers.

You can also enable notifications via the API by setting "notifications": true on a dataset.

When an S3 event is detected, Agent Context automatically refreshes any views that depend on the changed dataset. Dependencies are resolved via the view’s SQL — if a view references the dataset, it gets refreshed.

This means you don’t need to set up separate refresh triggers for views. Just enable S3 notifications on the source dataset, and downstream views stay up to date automatically.

Each S3 dataset with notifications enabled tracks:

  • Last event — when the most recent S3 change was detected
  • Last refresh — when the data was last refreshed
  • Event count — total S3 events received

View these on the dataset detail page or via GET /v1/context-lake/status.

Understanding the cache behavior helps you choose the right refresh strategy:

  • Views are cached in DuckDB files on the Agent Context server. Queries hit the cache, not your source.
  • Sources (datasets) are live — they query your database/S3/warehouse directly on every request. No cache, always fresh.
  • On restart, the cache is preserved. If your refresh interval hasn’t elapsed, no refresh happens — the existing cache is served immediately.
  • Manual-only views never refresh on restart. They serve from cache until you explicitly trigger a refresh.

This means you can safely stop and restart the Agent Context server without triggering expensive source queries.