Skip to the content.

DataQL

Go Version Build CI Go Report Card License

A powerful CLI tool for querying and transforming data across multiple formats

DataQL is a CLI tool developed in Go that allows you to query and manipulate data files using SQL statements. It loads data into an SQLite database (in-memory or file-based) enabling powerful SQL operations on your data.


Why DataQL?

The Problem

Working with data files has always been tedious. You either write throwaway scripts, load everything into pandas, or copy-paste into spreadsheets. With LLMs entering the workflow, a new problem emerged: how do you analyze a 10MB CSV without burning through your entire context window?

Traditional approaches fail:

The Solution

DataQL lets you query any data file using SQL. One command, instant results:

# Instead of sending 50,000 rows to an LLM...
dataql run -f sales.csv -q "SELECT region, SUM(revenue) FROM sales GROUP BY region"

# You get just what you need:
# region    | SUM(revenue)
# North     | 1,234,567
# South     | 987,654

Why This Matters

Scenario Without DataQL With DataQL
Analyze 10MB CSV with LLM ~100,000 tokens ($3+) ~500 tokens ($0.01)
Query data from S3 Download → Script → Parse One command
Join CSV + JSON + Database Custom ETL pipeline Single SQL query
Automate data reports Complex scripts Simple CLI + cron
LLM data analysis Context overflow No size limit

Key Benefits


Features

Supported File Formats:

Data Sources:

Database Connectors:

Key Capabilities:

LLM Integration:

Quick Start

Installation

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex

Hello World

# Create a sample CSV file
echo -e "id,name,age\n1,Alice,28\n2,Bob,35\n3,Charlie,42" > users.csv

# Query the data
dataql run -f users.csv -q "SELECT * FROM users WHERE age > 30"

Basic Usage

# Query a CSV file
dataql run -f data.csv -q "SELECT * FROM data WHERE amount > 100"

# Query a JSON file
dataql run -f users.json -q "SELECT name, email FROM users WHERE status = 'active'"

# Query from URL
dataql run -f "https://example.com/data.csv" -q "SELECT * FROM data"

# Query from S3
dataql run -f "s3://my-bucket/data.csv" -q "SELECT * FROM data"

# Query from PostgreSQL
dataql run -f "postgres://user:pass@localhost/db?table=users" -q "SELECT * FROM users"

# Read from stdin
cat data.csv | dataql run -f - -q "SELECT * FROM stdin"

# Export results
dataql run -f input.csv -q "SELECT * FROM input" -e output.jsonl -t jsonl

Interactive Mode

dataql run -f sales.csv
dataql> .tables
dataql> .schema sales
dataql> SELECT product, SUM(amount) as total FROM sales GROUP BY product ORDER BY total DESC;
dataql> .exit

Documentation

About

A rewrite of csvql (2019), built entirely with AI assistance. An experiment in AI-assisted development that turned out pretty well.

License

This project is licensed under the MIT License - see the LICENSE file for details.