CSV Files

Questi contenuti non sono ancora disponibili nella tua lingua.

CSV files don’t have an embedded schema, so Agent Context infers column types by sampling rows. This works well for most cases but has limitations you should know about.

How It Works

When you connect a CSV file (from S3, GCS, or local upload), Agent Context:

Reads the first 1,000 rows to infer column types
Detects types automatically — integers, floats, strings, booleans
Falls back to String for any column where values are ambiguous
Assumes the first row is a header (column names)

Schema Inference

Type detection samples the first N rows (default: 1,000) and picks the narrowest type that fits all sampled values:

If sampled values look like…	Inferred type
`1`, `42`, `100`	Int64
`1.5`, `3.14`	Float64
`true`, `false`	Boolean
Anything else (including dates)	Utf8 (String)

Note: Date and timestamp values in CSV are inferred as strings, not date types. Use SQL CAST() to convert them in your queries or views.

Important: If the first 1,000 rows of a column are all integers but row 1,001 is a string, the schema will say Int64 and queries may fail for that row. Increase the Schema Inference Sample Size setting if your data has mixed types deeper in the file.

Configuration

These options are available in the UI when adding a CSV data source:

Option	Default	Description
File Format	auto-detect	Set to `CSV` or `TSV` explicitly if auto-detection doesn’t work
CSV Has Header	Yes	First line contains column names
CSV Delimiter	`,`	Column separator character
Compression	auto-detect	For `.csv.gz` or other compressed CSVs
Schema Inference Sample Size	1000	How many rows to sample for type inference. Increase if types vary deeper in the file.

TSV Files

Select TSV as the file format, or use the .tsv file extension (auto-detected). TSV uses tab as the delimiter.

Limitations

Limitation	Details
UTF-8 only	Non-UTF-8 encoded files will fail. Convert with `iconv` first.
Single-char delimiters	Multi-character delimiters (e.g., `
No nested data	CSV is flat. Nested JSON-like values in cells are treated as strings.
Type inference is sampling-based	Only the first N rows are checked. Late-appearing type changes can cause errors.
No schema evolution	If columns change between files, behavior is undefined.

Best Practices

Increase the Schema Inference Sample Size if your CSV has inconsistent types across rows.
Use headers — files without headers get auto-generated column names (column_0, column_1, …).
Consider converting to Parquet for production workloads — exact schema, smaller files, faster queries.
Prefer UTF-8 encoding with BOM stripped.