DataQL
A powerful CLI tool for querying and transforming data across multiple formats
DataQL is a CLI tool developed in Go that allows you to query and manipulate data files using SQL statements. It loads data into an SQLite database (in-memory or file-based) enabling powerful SQL operations on your data.
Why DataQL?
The Problem
Working with data files has always been tedious. You either write throwaway scripts, load everything into pandas, or copy-paste into spreadsheets. With LLMs entering the workflow, a new problem emerged: how do you analyze a 10MB CSV without burning through your entire context window?
Traditional approaches fail:
- Send file to LLM context: 10MB CSV = ~100,000+ tokens. Expensive, slow, often impossible.
- Write a script: Context switch, setup overhead, not conversational.
- Use pandas/Excel: Great for humans, useless for LLM automation.
The Solution
DataQL lets you query any data file using SQL. One command, instant results:
# Instead of sending 50,000 rows to an LLM...
dataql run -f sales.csv -q "SELECT region, SUM(revenue) FROM sales GROUP BY region"
# You get just what you need:
# region | SUM(revenue)
# North | 1,234,567
# South | 987,654
Why This Matters
| Scenario | Without DataQL | With DataQL |
|---|---|---|
| Analyze 10MB CSV with LLM | ~100,000 tokens ($3+) | ~500 tokens ($0.01) |
| Query data from S3 | Download → Script → Parse | One command |
| Join CSV + JSON + Database | Custom ETL pipeline | Single SQL query |
| Automate data reports | Complex scripts | Simple CLI + cron |
| LLM data analysis | Context overflow | No size limit |
Key Benefits
- Token Efficient: LLMs get query results, not raw data. 99% reduction in token usage.
- Universal Format Support: CSV, JSON, Parquet, Excel, XML, YAML, Avro, ORC - all queryable with SQL.
- Any Data Source: Local files, URLs, S3, GCS, Azure, PostgreSQL, MySQL, MongoDB.
- LLM-Native: Built-in MCP server for Claude, Codex, Gemini. Skills for Claude Code.
- Zero Setup: Single binary, no dependencies, no configuration files.
- Familiar Syntax: If you know SQL, you know DataQL.
Features
Supported File Formats:
- CSV (with configurable delimiter)
- JSON (arrays or single objects)
- JSONL/NDJSON (newline-delimited JSON)
- XML
- YAML
- Parquet
- Excel (.xlsx, .xls)
- Avro
- ORC
Data Sources:
- Local files
- HTTP/HTTPS URLs
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- Standard input (stdin)
Database Connectors:
- PostgreSQL
- MySQL
- DuckDB
- MongoDB
Key Capabilities:
- Execute SQL queries using SQLite syntax
- Export results to CSV or JSONL formats
- Interactive REPL mode with command history
- Progress bar for large file operations
- Parallel file processing for multiple inputs
- Automatic flattening of nested JSON objects
- Join data from multiple sources
LLM Integration:
- MCP Server for Claude Code, OpenAI Codex, Google Gemini
- Auto-activating Claude Code Skills
- Token-efficient data processing for AI assistants
Quick Start
Installation
Linux / macOS:
curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash
Windows (PowerShell):
irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex
Hello World
# Create a sample CSV file
echo -e "id,name,age\n1,Alice,28\n2,Bob,35\n3,Charlie,42" > users.csv
# Query the data
dataql run -f users.csv -q "SELECT * FROM users WHERE age > 30"
Basic Usage
# Query a CSV file
dataql run -f data.csv -q "SELECT * FROM data WHERE amount > 100"
# Query a JSON file
dataql run -f users.json -q "SELECT name, email FROM users WHERE status = 'active'"
# Query from URL
dataql run -f "https://example.com/data.csv" -q "SELECT * FROM data"
# Query from S3
dataql run -f "s3://my-bucket/data.csv" -q "SELECT * FROM data"
# Query from PostgreSQL
dataql run -f "postgres://user:pass@localhost/db?table=users" -q "SELECT * FROM users"
# Read from stdin
cat data.csv | dataql run -f - -q "SELECT * FROM stdin"
# Export results
dataql run -f input.csv -q "SELECT * FROM input" -e output.jsonl -t jsonl
Interactive Mode
dataql run -f sales.csv
dataql> .tables
dataql> .schema sales
dataql> SELECT product, SUM(amount) as total FROM sales GROUP BY product ORDER BY total DESC;
dataql> .exit
Documentation
- Getting Started - Installation and Hello World examples
- CLI Reference - Complete command-line reference
- Data Sources - Working with S3, GCS, Azure, URLs, and stdin
- Database Connections - Connect to PostgreSQL, MySQL, DuckDB, MongoDB
- LLM Integration - Use DataQL with Claude, Codex, Gemini
- MCP Setup - Configure MCP server for LLM integration
- Examples - Real-world usage examples and automation scripts
About
A rewrite of csvql (2019), built entirely with AI assistance. An experiment in AI-assisted development that turned out pretty well.
License
This project is licensed under the MIT License - see the LICENSE file for details.