Parquet Files
Aquesta pàgina encara no està disponible en el teu idioma.
Parquet is the recommended format for Agent Context. It’s self-describing (schema embedded in the file), columnar (fast aggregations), and compressed. No configuration needed — just point at the file.
How It Works
Section titled “How It Works”When you connect a Parquet file (from S3, GCS, or local upload), Agent Context:
- Reads the embedded schema directly from the Parquet metadata
- Maps Parquet types to SQL types automatically
- Enables filter pushdown — queries that filter columns only read relevant row groups
No sampling, no inference, no guessing. The schema is exact.
Supported Types
Section titled “Supported Types”All standard Parquet/Arrow types are supported:
| Category | Types |
|---|---|
| Numeric | Int8, Int16, Int32, Int64, Float32, Float64, Decimal128, Decimal256 |
| Text | Utf8 (String), LargeUtf8 |
| Binary | Binary, LargeBinary |
| Boolean | Boolean |
| Date/Time | Date32, Date64, Timestamp, Time32, Time64, Duration |
| Complex | List, LargeList, Map, Struct |
Configuration
Section titled “Configuration”Parquet files work with zero configuration. Just select Parquet as the file format (or let auto-detection pick it up from the .parquet extension).
Performance
Section titled “Performance”Parquet files get automatic optimizations:
- Filter pushdown —
WHEREclauses push down into Parquet row group/page filtering, skipping irrelevant data - Column pruning —
SELECT col1, col2only reads those columns from the file - Page index — row-level filtering using Parquet page indexes (enabled by default)
Partitioned Datasets
Section titled “Partitioned Datasets”Hive-style partitioned directories (e.g., year=2024/month=01/data.parquet) are supported by SpiceD but not currently configurable through the UI. Point at a single Parquet file or a flat directory of Parquet files for the best experience.
Limitations
Section titled “Limitations”| Limitation | Details |
|---|---|
| Complex types in queries | Struct, List, and Map columns are supported in the schema but may need explicit casting in some SQL contexts. |
| Hive partitioning not in UI | Partitioned directory structures are supported by the engine but not currently configurable through the UI. |
Best Practices
Section titled “Best Practices”- Use Parquet when possible. It’s faster, smaller, and schema-exact compared to CSV or JSON.
- Use Snappy compression (default for most tools) — best speed/size tradeoff.
- Partition large datasets by date or category for faster filtered queries.