Skip to the content.

DataQL

Go Version Build CI Go Report Card License

A powerful CLI tool for querying and transforming data across multiple formats

DataQL is a CLI tool developed in Go that allows you to query and manipulate data files using SQL statements. It loads data into a DuckDB database (in-memory or file-based) with automatic type inference, enabling powerful SQL operations optimized for analytical queries.


Why DataQL?

The Problem

Working with data files has always been tedious. You either write throwaway scripts, load everything into pandas, or copy-paste into spreadsheets. With LLMs entering the workflow, a new problem emerged: how do you analyze a 10MB CSV without burning through your entire context window?

Traditional approaches fail:

The Solution

DataQL lets you query any data file using SQL. One command, instant results:

# Instead of sending 50,000 rows to an LLM...
dataql run -f sales.csv -q "SELECT region, SUM(revenue) FROM sales GROUP BY region"

# You get just what you need:
# region    | SUM(revenue)
# North     | 1,234,567
# South     | 987,654

Why This Matters

Scenario Without DataQL With DataQL
Analyze 10MB CSV with LLM ~100,000 tokens ($3+) ~500 tokens ($0.01)
Query data from S3 Download → Script → Parse One command
Join CSV + JSON + Database Custom ETL pipeline Single SQL query
Automate data reports Complex scripts Simple CLI + cron
LLM data analysis Context overflow No size limit

Key Benefits


Features

Supported File Formats:

Data Sources:

Database Connectors:

Key Capabilities:

LLM Integration:

Quick Start

Installation

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex

Hello World

# Create a sample CSV file
echo -e "id,name,age\n1,Alice,28\n2,Bob,35\n3,Charlie,42" > users.csv

# Query the data
dataql run -f users.csv -q "SELECT * FROM users WHERE age > 30"

Basic Usage

# Query a CSV file
dataql run -f data.csv -q "SELECT * FROM data WHERE amount > 100"

# Query a JSON file
dataql run -f users.json -q "SELECT name, email FROM users WHERE status = 'active'"

# Query from URL
dataql run -f "https://example.com/data.csv" -q "SELECT * FROM data"

# Query from S3
dataql run -f "s3://my-bucket/data.csv" -q "SELECT * FROM data"

# Query from PostgreSQL
dataql run -f "postgres://user:pass@localhost/db?table=users" -q "SELECT * FROM users"

# Read from stdin
cat data.csv | dataql run -f - -q "SELECT * FROM stdin_data"

# Export results
dataql run -f input.csv -q "SELECT * FROM input" -e output.jsonl -t jsonl

Interactive Mode

dataql run -f sales.csv
dataql> .tables
dataql> .schema sales
dataql> SELECT product, SUM(amount) as total FROM sales GROUP BY product ORDER BY total DESC;
dataql> .exit

Documentation

About

A rewrite of csvql (2019), built entirely with AI assistance. An experiment in AI-assisted development that turned out pretty well.

License

This project is licensed under the MIT License - see the LICENSE file for details.