Skip to the content.

DataQL

CLI Reference

Complete command-line reference for DataQL.

Commands

dataql run

The main command to load data and execute queries.

dataql run [flags]

Flags

Flag Short Description Default Required
--file -f Input file path, URL, or - for stdin - Yes
--query -q SQL query to execute - No
--delimiter -d CSV field delimiter , No
--export -e Export results to file path - No
--type -t Export format (csv, jsonl, json, xml, yaml, excel, parquet) - No
--storage -s SQLite file path for persistence In-memory No
--lines -l Limit number of records to read All No
--collection -c Custom table name Filename No

Global Flags

Flag Short Description
--help -h Display help information
--version -v Display version information

Input Sources

Local Files

# CSV file
dataql run -f /path/to/data.csv

# JSON file
dataql run -f /path/to/data.json

# JSONL file
dataql run -f /path/to/data.jsonl

# XML file
dataql run -f /path/to/data.xml

# YAML file
dataql run -f /path/to/data.yaml

# Parquet file
dataql run -f /path/to/data.parquet

# Excel file
dataql run -f /path/to/data.xlsx

URLs

# HTTP/HTTPS URL
dataql run -f "https://example.com/data.csv" -q "SELECT * FROM data"

Standard Input (stdin)

# Pipe data from other commands
cat data.csv | dataql run -f - -q "SELECT * FROM stdin"

# Pipe JSON
echo '[{"a":1},{"a":2}]' | dataql run -f - -q "SELECT * FROM stdin"

Cloud Storage

# Amazon S3
dataql run -f "s3://bucket-name/path/to/data.csv"

# Google Cloud Storage
dataql run -f "gs://bucket-name/path/to/data.json"

# Azure Blob Storage
dataql run -f "az://container/path/to/data.parquet"

Databases

# PostgreSQL
dataql run -f "postgres://user:pass@host:5432/database?table=users"

# MySQL
dataql run -f "mysql://user:pass@host:3306/database?table=orders"

# DuckDB
dataql run -f "duckdb:///path/to/database.db?table=data"

# MongoDB
dataql run -f "mongodb://user:pass@host:27017/database?collection=documents"

# DynamoDB
dataql run -f "dynamodb://us-east-1/table-name"
dataql run -f "dynamodb://us-east-1/table-name?endpoint=http://localhost:4566"

Supported File Formats

Format Extensions Description
CSV .csv Comma-separated values (configurable delimiter)
JSON .json JSON arrays or single objects
JSONL .jsonl, .ndjson Newline-delimited JSON
XML .xml XML documents
YAML .yaml, .yml YAML documents
Parquet .parquet Apache Parquet columnar format
Excel .xlsx, .xls Microsoft Excel spreadsheets
Avro .avro Apache Avro format
ORC .orc Apache ORC format

Interactive Mode (REPL)

When you run DataQL without a query, it starts in interactive mode:

dataql run -f data.csv

REPL Commands

Command Description
.tables List all loaded tables
.schema [table] Show schema for a table
.count [table] Count rows in a table
.help Show available commands
.exit or .quit Exit the REPL
.clear Clear the screen
.version Show DataQL version
.paging [on\|off] Toggle paged output for large results
.pagesize [n] Set number of rows per page (default: 20)
.timing [on\|off] Toggle query execution timing
Ctrl+C Cancel current query
Ctrl+D Exit the REPL

REPL Features

Usage Examples

Basic Query

dataql run -f users.csv -q "SELECT * FROM users WHERE active = 1"

Query with Custom Delimiter

dataql run -f data.csv -d ";" -q "SELECT * FROM data"

Limit Records

dataql run -f large_file.csv -l 1000 -q "SELECT * FROM large_file"

Custom Table Name

dataql run -f data.csv -c my_table -q "SELECT * FROM my_table"

Export to CSV

dataql run -f input.json -q "SELECT * FROM input" -e output.csv -t csv

Export to JSONL

dataql run -f input.csv -q "SELECT * FROM input" -e output.jsonl -t jsonl

Export to JSON

dataql run -f input.csv -q "SELECT * FROM input" -e output.json -t json

Export to Excel

dataql run -f input.csv -q "SELECT * FROM input" -e output.xlsx -t excel

Export to Parquet

dataql run -f input.csv -q "SELECT * FROM input" -e output.parquet -t parquet

Export to XML

dataql run -f input.csv -q "SELECT * FROM input" -e output.xml -t xml

Export to YAML

dataql run -f input.csv -q "SELECT * FROM input" -e output.yaml -t yaml

Multiple Input Files

dataql run -f orders.csv -f customers.csv -q "
SELECT o.*, c.name as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
"

Persist to SQLite File

dataql run -f data.csv -s ./my_database.db

Query from URL

dataql run -f "https://raw.githubusercontent.com/datasets/population/main/data/population.csv" \
  -q "SELECT Country_Name, Year, Value FROM population WHERE Year = 2020 LIMIT 10"

Pipe from curl

curl -s "https://api.example.com/data.json" | dataql run -f - -q "SELECT * FROM stdin"

SQL Reference

DataQL uses SQLite under the hood. All standard SQLite SQL syntax is supported.

SELECT

SELECT column1, column2 FROM table_name;
SELECT * FROM table_name;
SELECT DISTINCT column FROM table_name;

WHERE

SELECT * FROM data WHERE amount > 100;
SELECT * FROM data WHERE status = 'active' AND age >= 18;
SELECT * FROM data WHERE name LIKE 'John%';
SELECT * FROM data WHERE id IN (1, 2, 3);
SELECT * FROM data WHERE email IS NOT NULL;

ORDER BY

SELECT * FROM data ORDER BY created_at DESC;
SELECT * FROM data ORDER BY category ASC, amount DESC;

LIMIT and OFFSET

SELECT * FROM data LIMIT 10;
SELECT * FROM data LIMIT 10 OFFSET 20;

GROUP BY

SELECT category, COUNT(*) as count FROM data GROUP BY category;
SELECT category, SUM(amount), AVG(price) FROM data GROUP BY category;

HAVING

SELECT category, COUNT(*) as count
FROM data
GROUP BY category
HAVING count > 5;

JOIN

-- INNER JOIN
SELECT a.*, b.extra
FROM table1 a
JOIN table2 b ON a.id = b.foreign_id;

-- LEFT JOIN
SELECT a.*, b.extra
FROM table1 a
LEFT JOIN table2 b ON a.id = b.foreign_id;

Aggregate Functions

Function Description
COUNT(*) Count rows
COUNT(column) Count non-null values
SUM(column) Sum of values
AVG(column) Average of values
MIN(column) Minimum value
MAX(column) Maximum value
GROUP_CONCAT(column) Concatenate values

String Functions

Function Description
LENGTH(str) String length
UPPER(str) Convert to uppercase
LOWER(str) Convert to lowercase
TRIM(str) Remove leading/trailing spaces
SUBSTR(str, start, len) Extract substring
REPLACE(str, old, new) Replace occurrences
INSTR(str, substr) Find position of substring

Date/Time Functions

Function Description
DATE(value) Extract date
TIME(value) Extract time
DATETIME(value) Date and time
STRFTIME(format, value) Format date/time
JULIANDAY(value) Julian day number

Type Conversion

SELECT CAST(amount AS INTEGER) FROM data;
SELECT CAST(price AS REAL) FROM data;
SELECT CAST(id AS TEXT) FROM data;

Environment Variables

Cloud Storage

Variable Description
AWS_ACCESS_KEY_ID AWS access key for S3/SQS
AWS_SECRET_ACCESS_KEY AWS secret key for S3/SQS
AWS_REGION AWS region for S3/SQS
AWS_ENDPOINT_URL Custom endpoint (for LocalStack, MinIO)
AWS_ENDPOINT_URL_S3 S3-specific custom endpoint
AWS_ENDPOINT_URL_SQS SQS-specific custom endpoint
AWS_ENDPOINT_URL_DYNAMODB DynamoDB-specific custom endpoint
GOOGLE_APPLICATION_CREDENTIALS Path to GCP service account JSON
AZURE_STORAGE_ACCOUNT Azure storage account name
AZURE_STORAGE_KEY Azure storage account key

Message Queues

Variable Description
KAFKA_BROKERS Kafka bootstrap servers (comma-separated)
KAFKA_SASL_USERNAME Kafka SASL username
KAFKA_SASL_PASSWORD Kafka SASL password
KAFKA_SASL_MECHANISM Kafka SASL mechanism (PLAIN, SCRAM-SHA-256, SCRAM-SHA-512)

Exit Codes

Code Description
0 Success
1 General error
2 Invalid arguments
3 File not found
4 Connection error
5 Query error

See Also