Salta ai contenuti

CSV Files

Questi contenuti non sono ancora disponibili nella tua lingua.

CSV files don’t have an embedded schema, so Agent Context infers column types by sampling rows. This works well for most cases but has limitations you should know about.

When you connect a CSV file (from S3, GCS, or local upload), Agent Context:

  1. Reads the first 1,000 rows to infer column types
  2. Detects types automatically — integers, floats, strings, booleans
  3. Falls back to String for any column where values are ambiguous
  4. Assumes the first row is a header (column names)

Type detection samples the first N rows (default: 1,000) and picks the narrowest type that fits all sampled values:

If sampled values look like…Inferred type
1, 42, 100Int64
1.5, 3.14Float64
true, falseBoolean
Anything else (including dates)Utf8 (String)

Note: Date and timestamp values in CSV are inferred as strings, not date types. Use SQL CAST() to convert them in your queries or views.

Important: If the first 1,000 rows of a column are all integers but row 1,001 is a string, the schema will say Int64 and queries may fail for that row. Increase the Schema Inference Sample Size setting if your data has mixed types deeper in the file.

These options are available in the UI when adding a CSV data source:

OptionDefaultDescription
File Formatauto-detectSet to CSV or TSV explicitly if auto-detection doesn’t work
CSV Has HeaderYesFirst line contains column names
CSV Delimiter,Column separator character
Compressionauto-detectFor .csv.gz or other compressed CSVs
Schema Inference Sample Size1000How many rows to sample for type inference. Increase if types vary deeper in the file.

Select TSV as the file format, or use the .tsv file extension (auto-detected). TSV uses tab as the delimiter.

LimitationDetails
UTF-8 onlyNon-UTF-8 encoded files will fail. Convert with iconv first.
Single-char delimitersMulti-character delimiters (e.g., `
No nested dataCSV is flat. Nested JSON-like values in cells are treated as strings.
Type inference is sampling-basedOnly the first N rows are checked. Late-appearing type changes can cause errors.
No schema evolutionIf columns change between files, behavior is undefined.
  • Increase the Schema Inference Sample Size if your CSV has inconsistent types across rows.
  • Use headers — files without headers get auto-generated column names (column_0, column_1, …).
  • Consider converting to Parquet for production workloads — exact schema, smaller files, faster queries.
  • Prefer UTF-8 encoding with BOM stripped.