CLI Reference
Complete command-line reference for DataQL.
Commands
dataql run
The main command to load data and execute queries.
Flags
| Flag |
Short |
Description |
Default |
Required |
--file |
-f |
Input file path, URL, or - for stdin |
- |
Yes |
--query |
-q |
SQL query to execute |
- |
No |
--delimiter |
-d |
CSV field delimiter |
, |
No |
--export |
-e |
Export results to file path |
- |
No |
--type |
-t |
Export format (csv, jsonl, json, xml, yaml, excel, parquet) |
- |
No |
--storage |
-s |
SQLite file path for persistence |
In-memory |
No |
--lines |
-l |
Limit number of records to read |
All |
No |
--collection |
-c |
Custom table name |
Filename |
No |
Global Flags
| Flag |
Short |
Description |
--help |
-h |
Display help information |
--version |
-v |
Display version information |
Local Files
# CSV file
dataql run -f /path/to/data.csv
# JSON file
dataql run -f /path/to/data.json
# JSONL file
dataql run -f /path/to/data.jsonl
# XML file
dataql run -f /path/to/data.xml
# YAML file
dataql run -f /path/to/data.yaml
# Parquet file
dataql run -f /path/to/data.parquet
# Excel file
dataql run -f /path/to/data.xlsx
URLs
# HTTP/HTTPS URL
dataql run -f "https://example.com/data.csv" -q "SELECT * FROM data"
# Pipe data from other commands
cat data.csv | dataql run -f - -q "SELECT * FROM stdin"
# Pipe JSON
echo '[{"a":1},{"a":2}]' | dataql run -f - -q "SELECT * FROM stdin"
Cloud Storage
# Amazon S3
dataql run -f "s3://bucket-name/path/to/data.csv"
# Google Cloud Storage
dataql run -f "gs://bucket-name/path/to/data.json"
# Azure Blob Storage
dataql run -f "az://container/path/to/data.parquet"
Databases
# PostgreSQL
dataql run -f "postgres://user:pass@host:5432/database?table=users"
# MySQL
dataql run -f "mysql://user:pass@host:3306/database?table=orders"
# DuckDB
dataql run -f "duckdb:///path/to/database.db?table=data"
# MongoDB
dataql run -f "mongodb://user:pass@host:27017/database?collection=documents"
# DynamoDB
dataql run -f "dynamodb://us-east-1/table-name"
dataql run -f "dynamodb://us-east-1/table-name?endpoint=http://localhost:4566"
| Format |
Extensions |
Description |
| CSV |
.csv |
Comma-separated values (configurable delimiter) |
| JSON |
.json |
JSON arrays or single objects |
| JSONL |
.jsonl, .ndjson |
Newline-delimited JSON |
| XML |
.xml |
XML documents |
| YAML |
.yaml, .yml |
YAML documents |
| Parquet |
.parquet |
Apache Parquet columnar format |
| Excel |
.xlsx, .xls |
Microsoft Excel spreadsheets |
| Avro |
.avro |
Apache Avro format |
| ORC |
.orc |
Apache ORC format |
Interactive Mode (REPL)
When you run DataQL without a query, it starts in interactive mode:
REPL Commands
| Command |
Description |
.tables |
List all loaded tables |
.schema [table] |
Show schema for a table |
.count [table] |
Count rows in a table |
.help |
Show available commands |
.exit or .quit |
Exit the REPL |
.clear |
Clear the screen |
.version |
Show DataQL version |
.paging [on\|off] |
Toggle paged output for large results |
.pagesize [n] |
Set number of rows per page (default: 20) |
.timing [on\|off] |
Toggle query execution timing |
Ctrl+C |
Cancel current query |
Ctrl+D |
Exit the REPL |
REPL Features
- Command History: Use arrow keys to navigate through previous commands
- Multi-line Queries: Continue queries across multiple lines
- Tab Completion: Auto-complete table names, column names, and SQL keywords
- Syntax Highlighting: SQL keywords are highlighted for readability
- Paged Output: Large results are paginated for easier navigation
Usage Examples
Basic Query
dataql run -f users.csv -q "SELECT * FROM users WHERE active = 1"
Query with Custom Delimiter
dataql run -f data.csv -d ";" -q "SELECT * FROM data"
Limit Records
dataql run -f large_file.csv -l 1000 -q "SELECT * FROM large_file"
Custom Table Name
dataql run -f data.csv -c my_table -q "SELECT * FROM my_table"
Export to CSV
dataql run -f input.json -q "SELECT * FROM input" -e output.csv -t csv
Export to JSONL
dataql run -f input.csv -q "SELECT * FROM input" -e output.jsonl -t jsonl
Export to JSON
dataql run -f input.csv -q "SELECT * FROM input" -e output.json -t json
Export to Excel
dataql run -f input.csv -q "SELECT * FROM input" -e output.xlsx -t excel
Export to Parquet
dataql run -f input.csv -q "SELECT * FROM input" -e output.parquet -t parquet
Export to XML
dataql run -f input.csv -q "SELECT * FROM input" -e output.xml -t xml
Export to YAML
dataql run -f input.csv -q "SELECT * FROM input" -e output.yaml -t yaml
dataql run -f orders.csv -f customers.csv -q "
SELECT o.*, c.name as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
"
Persist to SQLite File
dataql run -f data.csv -s ./my_database.db
Query from URL
dataql run -f "https://raw.githubusercontent.com/datasets/population/main/data/population.csv" \
-q "SELECT Country_Name, Year, Value FROM population WHERE Year = 2020 LIMIT 10"
Pipe from curl
curl -s "https://api.example.com/data.json" | dataql run -f - -q "SELECT * FROM stdin"
SQL Reference
DataQL uses SQLite under the hood. All standard SQLite SQL syntax is supported.
SELECT
SELECT column1, column2 FROM table_name;
SELECT * FROM table_name;
SELECT DISTINCT column FROM table_name;
WHERE
SELECT * FROM data WHERE amount > 100;
SELECT * FROM data WHERE status = 'active' AND age >= 18;
SELECT * FROM data WHERE name LIKE 'John%';
SELECT * FROM data WHERE id IN (1, 2, 3);
SELECT * FROM data WHERE email IS NOT NULL;
ORDER BY
SELECT * FROM data ORDER BY created_at DESC;
SELECT * FROM data ORDER BY category ASC, amount DESC;
LIMIT and OFFSET
SELECT * FROM data LIMIT 10;
SELECT * FROM data LIMIT 10 OFFSET 20;
GROUP BY
SELECT category, COUNT(*) as count FROM data GROUP BY category;
SELECT category, SUM(amount), AVG(price) FROM data GROUP BY category;
HAVING
SELECT category, COUNT(*) as count
FROM data
GROUP BY category
HAVING count > 5;
JOIN
-- INNER JOIN
SELECT a.*, b.extra
FROM table1 a
JOIN table2 b ON a.id = b.foreign_id;
-- LEFT JOIN
SELECT a.*, b.extra
FROM table1 a
LEFT JOIN table2 b ON a.id = b.foreign_id;
Aggregate Functions
| Function |
Description |
COUNT(*) |
Count rows |
COUNT(column) |
Count non-null values |
SUM(column) |
Sum of values |
AVG(column) |
Average of values |
MIN(column) |
Minimum value |
MAX(column) |
Maximum value |
GROUP_CONCAT(column) |
Concatenate values |
String Functions
| Function |
Description |
LENGTH(str) |
String length |
UPPER(str) |
Convert to uppercase |
LOWER(str) |
Convert to lowercase |
TRIM(str) |
Remove leading/trailing spaces |
SUBSTR(str, start, len) |
Extract substring |
REPLACE(str, old, new) |
Replace occurrences |
INSTR(str, substr) |
Find position of substring |
Date/Time Functions
| Function |
Description |
DATE(value) |
Extract date |
TIME(value) |
Extract time |
DATETIME(value) |
Date and time |
STRFTIME(format, value) |
Format date/time |
JULIANDAY(value) |
Julian day number |
Type Conversion
SELECT CAST(amount AS INTEGER) FROM data;
SELECT CAST(price AS REAL) FROM data;
SELECT CAST(id AS TEXT) FROM data;
Environment Variables
Cloud Storage
| Variable |
Description |
AWS_ACCESS_KEY_ID |
AWS access key for S3/SQS |
AWS_SECRET_ACCESS_KEY |
AWS secret key for S3/SQS |
AWS_REGION |
AWS region for S3/SQS |
AWS_ENDPOINT_URL |
Custom endpoint (for LocalStack, MinIO) |
AWS_ENDPOINT_URL_S3 |
S3-specific custom endpoint |
AWS_ENDPOINT_URL_SQS |
SQS-specific custom endpoint |
AWS_ENDPOINT_URL_DYNAMODB |
DynamoDB-specific custom endpoint |
GOOGLE_APPLICATION_CREDENTIALS |
Path to GCP service account JSON |
AZURE_STORAGE_ACCOUNT |
Azure storage account name |
AZURE_STORAGE_KEY |
Azure storage account key |
Message Queues
| Variable |
Description |
KAFKA_BROKERS |
Kafka bootstrap servers (comma-separated) |
KAFKA_SASL_USERNAME |
Kafka SASL username |
KAFKA_SASL_PASSWORD |
Kafka SASL password |
KAFKA_SASL_MECHANISM |
Kafka SASL mechanism (PLAIN, SCRAM-SHA-256, SCRAM-SHA-512) |
Exit Codes
| Code |
Description |
| 0 |
Success |
| 1 |
General error |
| 2 |
Invalid arguments |
| 3 |
File not found |
| 4 |
Connection error |
| 5 |
Query error |
See Also