CSV Files
Это содержимое пока не доступно на вашем языке.
CSV files don’t have an embedded schema, so Agent Context infers column types by sampling rows. This works well for most cases but has limitations you should know about.
How It Works
Section titled “How It Works”When you connect a CSV file (from S3, GCS, or local upload), Agent Context:
- Reads the first 1,000 rows to infer column types
- Detects types automatically — integers, floats, strings, booleans
- Falls back to String for any column where values are ambiguous
- Assumes the first row is a header (column names)
Schema Inference
Section titled “Schema Inference”Type detection samples the first N rows (default: 1,000) and picks the narrowest type that fits all sampled values:
| If sampled values look like… | Inferred type |
|---|---|
1, 42, 100 | Int64 |
1.5, 3.14 | Float64 |
true, false | Boolean |
| Anything else (including dates) | Utf8 (String) |
Note: Date and timestamp values in CSV are inferred as strings, not date types. Use SQL CAST() to convert them in your queries or views.
Important: If the first 1,000 rows of a column are all integers but row 1,001 is a string, the schema will say Int64 and queries may fail for that row. Increase the Schema Inference Sample Size setting if your data has mixed types deeper in the file.
Configuration
Section titled “Configuration”These options are available in the UI when adding a CSV data source:
| Option | Default | Description |
|---|---|---|
| File Format | auto-detect | Set to CSV or TSV explicitly if auto-detection doesn’t work |
| CSV Has Header | Yes | First line contains column names |
| CSV Delimiter | , | Column separator character |
| Compression | auto-detect | For .csv.gz or other compressed CSVs |
| Schema Inference Sample Size | 1000 | How many rows to sample for type inference. Increase if types vary deeper in the file. |
TSV Files
Section titled “TSV Files”Select TSV as the file format, or use the .tsv file extension (auto-detected). TSV uses tab as the delimiter.
Limitations
Section titled “Limitations”| Limitation | Details |
|---|---|
| UTF-8 only | Non-UTF-8 encoded files will fail. Convert with iconv first. |
| Single-char delimiters | Multi-character delimiters (e.g., ` |
| No nested data | CSV is flat. Nested JSON-like values in cells are treated as strings. |
| Type inference is sampling-based | Only the first N rows are checked. Late-appearing type changes can cause errors. |
| No schema evolution | If columns change between files, behavior is undefined. |
Best Practices
Section titled “Best Practices”- Increase the Schema Inference Sample Size if your CSV has inconsistent types across rows.
- Use headers — files without headers get auto-generated column names (
column_0,column_1, …). - Consider converting to Parquet for production workloads — exact schema, smaller files, faster queries.
- Prefer UTF-8 encoding with BOM stripped.