DataQL

A powerful CLI tool for querying and transforming data across multiple formats

DataQL is a CLI tool developed in Go that allows you to query and manipulate data files using SQL statements. It loads data into an SQLite database (in-memory or file-based) enabling powerful SQL operations on your data.

Why DataQL?

The Problem

Working with data files has always been tedious. You either write throwaway scripts, load everything into pandas, or copy-paste into spreadsheets. With LLMs entering the workflow, a new problem emerged: how do you analyze a 10MB CSV without burning through your entire context window?

Traditional approaches fail:

Send file to LLM context: 10MB CSV = ~100,000+ tokens. Expensive, slow, often impossible.
Write a script: Context switch, setup overhead, not conversational.
Use pandas/Excel: Great for humans, useless for LLM automation.

The Solution

DataQL lets you query any data file using SQL. One command, instant results:

# Instead of sending 50,000 rows to an LLM...
dataql run -f sales.csv -q "SELECT region, SUM(revenue) FROM sales GROUP BY region"

# You get just what you need:
# region    | SUM(revenue)
# North     | 1,234,567
# South     | 987,654

Why This Matters

Scenario	Without DataQL	With DataQL
Analyze 10MB CSV with LLM	~100,000 tokens ($3+)	~500 tokens ($0.01)
Query data from S3	Download → Script → Parse	One command
Join CSV + JSON + Database	Custom ETL pipeline	Single SQL query
Automate data reports	Complex scripts	Simple CLI + cron
LLM data analysis	Context overflow	No size limit

Key Benefits

Token Efficient: LLMs get query results, not raw data. 99% reduction in token usage.
Universal Format Support: CSV, JSON, Parquet, Excel, XML, YAML, Avro, ORC - all queryable with SQL.
Any Data Source: Local files, URLs, S3, GCS, Azure, PostgreSQL, MySQL, MongoDB.
LLM-Native: Built-in MCP server for Claude, Codex, Gemini. Skills for Claude Code.
Zero Setup: Single binary, no dependencies, no configuration files.
Familiar Syntax: If you know SQL, you know DataQL.

Features

Supported File Formats:

CSV (with configurable delimiter)
JSON (arrays or single objects)
JSONL/NDJSON (newline-delimited JSON)
XML
YAML
Parquet
Excel (.xlsx, .xls)
Avro
ORC

Data Sources:

Local files
HTTP/HTTPS URLs
Amazon S3
Google Cloud Storage
Azure Blob Storage
Standard input (stdin)

Database Connectors:

PostgreSQL
MySQL
DuckDB
MongoDB

Key Capabilities:

Execute SQL queries using SQLite syntax
Export results to CSV or JSONL formats
Interactive REPL mode with command history
Progress bar for large file operations
Parallel file processing for multiple inputs
Automatic flattening of nested JSON objects
Join data from multiple sources

LLM Integration:

MCP Server for Claude Code, OpenAI Codex, Google Gemini
Auto-activating Claude Code Skills
Token-efficient data processing for AI assistants

Quick Start

Installation

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex

Hello World

# Create a sample CSV file
echo -e "id,name,age\n1,Alice,28\n2,Bob,35\n3,Charlie,42" > users.csv

# Query the data
dataql run -f users.csv -q "SELECT * FROM users WHERE age > 30"

Basic Usage

# Query a CSV file
dataql run -f data.csv -q "SELECT * FROM data WHERE amount > 100"

# Query a JSON file
dataql run -f users.json -q "SELECT name, email FROM users WHERE status = 'active'"

# Query from URL
dataql run -f "https://example.com/data.csv" -q "SELECT * FROM data"

# Query from S3
dataql run -f "s3://my-bucket/data.csv" -q "SELECT * FROM data"

# Query from PostgreSQL
dataql run -f "postgres://user:pass@localhost/db?table=users" -q "SELECT * FROM users"

# Read from stdin
cat data.csv | dataql run -f - -q "SELECT * FROM stdin"

# Export results
dataql run -f input.csv -q "SELECT * FROM input" -e output.jsonl -t jsonl

Interactive Mode

dataql run -f sales.csv

dataql> .tables
dataql> .schema sales
dataql> SELECT product, SUM(amount) as total FROM sales GROUP BY product ORDER BY total DESC;
dataql> .exit

Documentation

Getting Started - Installation and Hello World examples
CLI Reference - Complete command-line reference
Data Sources - Working with S3, GCS, Azure, URLs, and stdin
Database Connections - Connect to PostgreSQL, MySQL, DuckDB, MongoDB
LLM Integration - Use DataQL with Claude, Codex, Gemini
MCP Setup - Configure MCP server for LLM integration
Examples - Real-world usage examples and automation scripts

About

A rewrite of csvql (2019), built entirely with AI assistance. An experiment in AI-assisted development that turned out pretty well.

License

This project is licensed under the MIT License - see the LICENSE file for details.

DataQL - SQL for Any Data Format

Query any data file using SQL. One command, instant results.