Saltar al contingut

JSON & JSONL Files

Aquesta pàgina encara no està disponible en el teu idioma.

Agent Context supports two JSON formats: JSONL (one JSON object per line) and JSON arrays (a single array of objects). Both infer schema by sampling, with optional nested object flattening.

One JSON object per line. Also called NDJSON (Newline-Delimited JSON) or LDJSON.

{"id": 1, "name": "Alice", "department": "Engineering"}
{"id": 2, "name": "Bob", "department": "Marketing"}
{"id": 3, "name": "Carol", "department": "Engineering"}

JSONL is preferred because it can be read in parallel — each line is independent.

A single JSON array wrapping all records:

[
{"id": 1, "name": "Alice", "department": "Engineering"},
{"id": 2, "name": "Bob", "department": "Marketing"},
{"id": 3, "name": "Carol", "department": "Engineering"}
]

JSON arrays cannot be parallelized — the entire file is read as one partition. Use JSONL for large datasets.

When you connect a JSON file, Agent Context:

  1. Detects the format from the file extension (.jsonl → JSONL, .json → JSONL by default)
  2. Samples the first 1,000 records to infer the schema
  3. Maps JSON types to SQL types
  4. Optionally flattens nested objects into dot-notation columns
JSON valueInferred SQL type
42Int64
3.14Float64
"hello"Utf8 (String)
true / falseBoolean
nullNull (type from other rows)
[1, 2, 3]List<Int64>
{"key": "val"}Struct

Multiple files: When pointing at a directory with multiple JSON files, schemas are inferred from each file and merged via Arrow’s Schema::try_merge(). If the same field has incompatible types across files, the merge may fail with a schema error.

By default, nested objects become Struct columns:

{"user": {"name": "Alice", "age": 30}, "score": 95}

This creates a user column of type Struct(name: Utf8, age: Int64). You can query nested fields using SQL dot notation in your views.

These options are available in the UI when adding a JSON data source:

OptionDefaultDescription
File Formatauto-detectSet to JSON or JSONL explicitly
Compressionauto-detectFor .json.gz or other compressed files
Schema Inference Sample Size1000How many records to sample for type inference. Increase if structure varies across records.

Note: The UI currently defaults to JSONL format. If your file is a JSON array ([{...}, {...}]), select JSON as the format.

LimitationDetails
Nested arraysArrays within JSON objects remain as List columns and must be queried with SQL array functions.
Mixed types across recordsIf field is int in one record and string in another, schema inference may widen the type or the merge may fail.
JSON arrays can’t parallelizeLarge single-array files are read as one partition — slower than JSONL.
Schema merging conflictsMultiple files with incompatible types for the same field will fail at schema merge time.
Sampling-based inferenceOnly the first N records are checked. Late type changes can cause query errors.
  • Use JSONL over JSON arrays for large datasets — enables parallel reading.
  • Increase the Schema Inference Sample Size if your data has variable structure across records.
  • Consider Parquet for production — no inference needed, exact schema, much faster.