What is CSV?
CSV is a text format for tabular data. Each line is a row, and commas separate the values (columns). It's been around since the 1970s and refuses to die because it just works.
name,email,age
Alice,alice@example.com,28
Bob,bob@example.com,34
Charlie,charlie@example.com,22
That's it. No schema, no types, no fancy features. Just rows and commas.
Why CSV Exists
- Universal - Every spreadsheet app can read and write it
- Human readable - Open it in any text editor
- Lightweight - No overhead, just your data
- Easy to generate - Trivial to create programmatically
The trade-off? No data types, no nested structures, and a surprising number of edge cases that will haunt you.
The "Standard" (RFC 4180)
There's technically a spec, but many CSV files ignore it:
| Rule | Description |
|---|---|
| Delimiter | Comma (,) separates fields |
| Line ending | CRLF (\r\n) ends each row |
| Quoting | Fields with commas, quotes, or newlines must be quoted |
| Escaping quotes | Double the quote: "She said ""hello""" |
| Header row | Optional but recommended |
In practice, you'll see tabs, semicolons, pipes, and chaos.
Basic Examples
Simple Data
product,price,quantity
Widget,9.99,100
Gadget,24.99,50
Quoted Fields (Commas in Values)
name,address,city
"Smith, John","123 Main St",Boston
"Doe, Jane","456 Oak Ave",Chicago
Escaped Quotes
title,quote
Hamlet,"To be, or not to be"
Wisdom,"He said ""always quote your fields"""
Multiline Values
name,notes
Alice,"Line one
Line two
Line three"
Bob,"Single line"
Where You'll See This
- Spreadsheet exports - Excel, Google Sheets, Numbers
- Database dumps - Quick and dirty data export
- Data imports - Bulk uploads to web apps
- Log analysis - Server logs, analytics exports
- Financial data - Bank statements, trading data
- ETL pipelines - Moving data between systems
Common Gotchas
Excel defaults to Windows-1252 encoding, not UTF-8. If your CSV has special characters and looks garbled in Excel, add a UTF-8 BOM (\xEF\xBB\xBF) at the start of the file.
European locales use semicolons (;) instead of commas because they use commas for decimal points (3,14 instead of 3.14). Always check your locale settings.
- No data types - Everything is a string.
"123"and123are indistinguishable. Leading zeros disappear when Excel "helps" (007becomes7). - Inconsistent quoting - Some tools quote everything, some quote nothing, some quote only when needed. Be liberal in what you accept.
- Null values - Is an empty field
null, an empty string, or"NULL"? Nobody agrees. - Newlines in values - Perfectly valid, but many parsers choke on them.
- Trailing commas - Does
a,b,c,have 3 or 4 columns? Depends on the parser. - Large files - CSV has no streaming hints. A 10GB file means loading 10GB into memory for naive parsers.
CSV vs Alternatives
| Format | Best For | Drawback |
|---|---|---|
| CSV | Tabular data, spreadsheets | No types, quoting edge cases |
| TSV | Data with commas | Tabs in data still break it |
| JSON | Nested/typed data | Larger, harder to edit manually |
| Parquet | Big data, analytics | Binary, not human-readable |
| Excel | Rich spreadsheets | Proprietary, large files |
In Code
// Simple parsing (don't use in production)
const rows = csv.split('\n').map(row => row.split(','));
// Proper parsing with a library (Papa Parse)
import Papa from 'papaparse';
const result = Papa.parse(csvString, {
header: true, // First row is header
dynamicTyping: true, // Convert numbers
skipEmptyLines: true
});
// result.data = [{name: "Alice", age: 28}, ...]
// Generate CSV
const data = [
['name', 'email'],
['Alice', 'alice@example.com'],
['Bob', 'bob@example.com']
];
const csv = data.map(row => row.join(',')).join('\n');
// With proper escaping
function escapeCSV(value) {
if (/[,"\n\r]/.test(value)) {
return `"${value.replace(/"/g, '""')}"`;
}
return value;
}
# Python's csv module handles edge cases
import csv
# Read
with open('data.csv', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['name'], row['email'])
# Write
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['name', 'email'])
writer.writerow(['Alice', 'alice@example.com'])
Try It
Convert CSV to JSON"CSV: the file format that's one misplaced comma away from ruining your entire afternoon."