JSON & JSONL Files
Este conteúdo não está disponível em sua língua ainda.
Agent Context supports two JSON formats: JSONL (one JSON object per line) and JSON arrays (a single array of objects). Both infer schema by sampling, with optional nested object flattening.
Formats
Section titled “Formats”JSONL (Recommended)
Section titled “JSONL (Recommended)”One JSON object per line. Also called NDJSON (Newline-Delimited JSON) or LDJSON.
{"id": 1, "name": "Alice", "department": "Engineering"}{"id": 2, "name": "Bob", "department": "Marketing"}{"id": 3, "name": "Carol", "department": "Engineering"}JSONL is preferred because it can be read in parallel — each line is independent.
JSON Array
Section titled “JSON Array”A single JSON array wrapping all records:
[ {"id": 1, "name": "Alice", "department": "Engineering"}, {"id": 2, "name": "Bob", "department": "Marketing"}, {"id": 3, "name": "Carol", "department": "Engineering"}]JSON arrays cannot be parallelized — the entire file is read as one partition. Use JSONL for large datasets.
How It Works
Section titled “How It Works”When you connect a JSON file, Agent Context:
- Detects the format from the file extension (
.jsonl→ JSONL,.json→ JSONL by default) - Samples the first 1,000 records to infer the schema
- Maps JSON types to SQL types
- Optionally flattens nested objects into dot-notation columns
Schema Inference
Section titled “Schema Inference”| JSON value | Inferred SQL type |
|---|---|
42 | Int64 |
3.14 | Float64 |
"hello" | Utf8 (String) |
true / false | Boolean |
null | Null (type from other rows) |
[1, 2, 3] | List<Int64> |
{"key": "val"} | Struct |
Multiple files: When pointing at a directory with multiple JSON files, schemas are inferred from each file and merged via Arrow’s Schema::try_merge(). If the same field has incompatible types across files, the merge may fail with a schema error.
Nested Objects
Section titled “Nested Objects”By default, nested objects become Struct columns:
{"user": {"name": "Alice", "age": 30}, "score": 95}This creates a user column of type Struct(name: Utf8, age: Int64). You can query nested fields using SQL dot notation in your views.
Configuration
Section titled “Configuration”These options are available in the UI when adding a JSON data source:
| Option | Default | Description |
|---|---|---|
| File Format | auto-detect | Set to JSON or JSONL explicitly |
| Compression | auto-detect | For .json.gz or other compressed files |
| Schema Inference Sample Size | 1000 | How many records to sample for type inference. Increase if structure varies across records. |
Note: The UI currently defaults to JSONL format. If your file is a JSON array ([{...}, {...}]), select JSON as the format.
Limitations
Section titled “Limitations”| Limitation | Details |
|---|---|
| Nested arrays | Arrays within JSON objects remain as List columns and must be queried with SQL array functions. |
| Mixed types across records | If field is int in one record and string in another, schema inference may widen the type or the merge may fail. |
| JSON arrays can’t parallelize | Large single-array files are read as one partition — slower than JSONL. |
| Schema merging conflicts | Multiple files with incompatible types for the same field will fail at schema merge time. |
| Sampling-based inference | Only the first N records are checked. Late type changes can cause query errors. |
Best Practices
Section titled “Best Practices”- Use JSONL over JSON arrays for large datasets — enables parallel reading.
- Increase the Schema Inference Sample Size if your data has variable structure across records.
- Consider Parquet for production — no inference needed, exact schema, much faster.