JSON & JSONL Files

Este conteúdo não está disponível em sua língua ainda.

Agent Context supports two JSON formats: JSONL (one JSON object per line) and JSON arrays (a single array of objects). Both infer schema by sampling, with optional nested object flattening.

Formats

JSONL (Recommended)

One JSON object per line. Also called NDJSON (Newline-Delimited JSON) or LDJSON.

{"id": 1, "name": "Alice", "department": "Engineering"}
{"id": 2, "name": "Bob", "department": "Marketing"}
{"id": 3, "name": "Carol", "department": "Engineering"}

JSONL is preferred because it can be read in parallel — each line is independent.

JSON Array

A single JSON array wrapping all records:

[
  {"id": 1, "name": "Alice", "department": "Engineering"},
  {"id": 2, "name": "Bob", "department": "Marketing"},
  {"id": 3, "name": "Carol", "department": "Engineering"}
]

JSON arrays cannot be parallelized — the entire file is read as one partition. Use JSONL for large datasets.

How It Works

When you connect a JSON file, Agent Context:

Detects the format from the file extension (.jsonl → JSONL, .json → JSONL by default)
Samples the first 1,000 records to infer the schema
Maps JSON types to SQL types
Optionally flattens nested objects into dot-notation columns

Schema Inference

JSON value	Inferred SQL type
`42`	Int64
`3.14`	Float64
`"hello"`	Utf8 (String)
`true` / `false`	Boolean
`null`	Null (type from other rows)
`[1, 2, 3]`	List<Int64>
`{"key": "val"}`	Struct

Multiple files: When pointing at a directory with multiple JSON files, schemas are inferred from each file and merged via Arrow’s Schema::try_merge(). If the same field has incompatible types across files, the merge may fail with a schema error.

Nested Objects

By default, nested objects become Struct columns:

{"user": {"name": "Alice", "age": 30}, "score": 95}

This creates a user column of type Struct(name: Utf8, age: Int64). You can query nested fields using SQL dot notation in your views.

Configuration

These options are available in the UI when adding a JSON data source:

Option	Default	Description
File Format	auto-detect	Set to `JSON` or `JSONL` explicitly
Compression	auto-detect	For `.json.gz` or other compressed files
Schema Inference Sample Size	1000	How many records to sample for type inference. Increase if structure varies across records.

Note: The UI currently defaults to JSONL format. If your file is a JSON array ([{...}, {...}]), select JSON as the format.

Limitations

Limitation	Details
Nested arrays	Arrays within JSON objects remain as `List` columns and must be queried with SQL array functions.
Mixed types across records	If `field` is `int` in one record and `string` in another, schema inference may widen the type or the merge may fail.
JSON arrays can’t parallelize	Large single-array files are read as one partition — slower than JSONL.
Schema merging conflicts	Multiple files with incompatible types for the same field will fail at schema merge time.
Sampling-based inference	Only the first N records are checked. Late type changes can cause query errors.

Best Practices

Use JSONL over JSON arrays for large datasets — enables parallel reading.
Increase the Schema Inference Sample Size if your data has variable structure across records.
Consider Parquet for production — no inference needed, exact schema, much faster.