Skip to the content.

DataQL Logo

Getting Started with DataQL

Welcome! This guide will help you get up and running with DataQL in just a few minutes.

DataQL lets you query any data file using SQL - CSV, JSON, Parquet, Excel, and more. Whether you’re analyzing local files, querying data from S3, or integrating with LLMs, DataQL makes it simple.

Installation

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex

Install Specific Version

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash -s -- --version v0.1.0

Windows (PowerShell):

$env:DATAQL_VERSION="v0.1.0"; irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex

User Installation (No sudo required)

Linux / macOS (Installs to ~/.local/bin):

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash -s -- --local

Windows (PowerShell):

$env:DATAQL_USER_INSTALL="true"; irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.ps1 | iex

Build from Source

git clone https://github.com/adrianolaselva/dataql.git
cd dataql
make build
make install       # requires sudo
# or
make install-local # installs to ~/.local/bin

Verify Installation

dataql --version

Updating DataQL

Upgrade to latest version (Linux / macOS):

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash -s -- --upgrade

Force reinstall (even if same version):

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash -s -- --force

Clean install (remove all versions first):

# Remove all existing installations and install fresh
curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/install.sh | bash -s -- --clean --force

Uninstalling DataQL

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/uninstall.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/adrianolaselva/dataql/main/scripts/uninstall.ps1 | iex

Manual uninstall:

# Remove system installation
sudo rm /usr/local/bin/dataql

# Remove user installation
rm ~/.local/bin/dataql

Hello World Examples

Example 1: Query a CSV File

Create a sample CSV file:

cat > users.csv << 'EOF'
id,name,email,age,city
1,Alice,alice@example.com,28,New York
2,Bob,bob@example.com,35,Los Angeles
3,Charlie,charlie@example.com,42,Chicago
4,Diana,diana@example.com,31,Houston
5,Eve,eve@example.com,25,Phoenix
EOF

Query the data:

# Select all records
dataql run -f users.csv -q "SELECT * FROM users"

# Filter by age
dataql run -f users.csv -q "SELECT name, city FROM users WHERE age > 30"

# Count records by city
dataql run -f users.csv -q "SELECT city, COUNT(*) as count FROM users GROUP BY city"

Example 2: Query a JSON File

Create a sample JSON file:

cat > products.json << 'EOF'
[
  {"id": 1, "name": "Laptop", "category": "Electronics", "price": 999.99, "stock": 50},
  {"id": 2, "name": "Mouse", "category": "Electronics", "price": 29.99, "stock": 200},
  {"id": 3, "name": "Desk", "category": "Furniture", "price": 299.99, "stock": 30},
  {"id": 4, "name": "Chair", "category": "Furniture", "price": 199.99, "stock": 45},
  {"id": 5, "name": "Monitor", "category": "Electronics", "price": 399.99, "stock": 75}
]
EOF

Query the data:

# List all products
dataql run -f products.json -q "SELECT * FROM products"

# Calculate total value per category
dataql run -f products.json -q "SELECT category, SUM(price * stock) as total_value FROM products GROUP BY category"

# Find products with low stock
dataql run -f products.json -q "SELECT name, stock FROM products WHERE stock < 50 ORDER BY stock"

Example 3: Query a JSONL File (Newline-Delimited JSON)

Create a sample JSONL file:

cat > events.jsonl << 'EOF'
{"timestamp": "2024-01-15T10:30:00Z", "level": "INFO", "message": "Application started"}
{"timestamp": "2024-01-15T10:30:05Z", "level": "DEBUG", "message": "Loading configuration"}
{"timestamp": "2024-01-15T10:30:10Z", "level": "INFO", "message": "Connected to database"}
{"timestamp": "2024-01-15T10:31:00Z", "level": "ERROR", "message": "Failed to process request"}
{"timestamp": "2024-01-15T10:31:05Z", "level": "WARN", "message": "High memory usage detected"}
EOF

Query the data:

# Show all events
dataql run -f events.jsonl -q "SELECT * FROM events"

# Filter by log level
dataql run -f events.jsonl -q "SELECT timestamp, message FROM events WHERE level = 'ERROR'"

# Count events by level
dataql run -f events.jsonl -q "SELECT level, COUNT(*) as count FROM events GROUP BY level ORDER BY count DESC"

Example 4: Interactive Mode (REPL)

Start DataQL in interactive mode:

dataql run -f users.csv

You’ll see the DataQL prompt:

dataql>

Now you can run multiple queries:

dataql> SELECT * FROM users;
dataql> SELECT AVG(age) as avg_age FROM users;
dataql> SELECT city, COUNT(*) FROM users GROUP BY city;
dataql> .tables
dataql> .schema users
dataql> .exit

Example 5: Export Results

Export query results to different formats:

# Export to CSV
dataql run -f products.json -q "SELECT * FROM products WHERE category = 'Electronics'" -e electronics.csv -t csv

# Export to JSONL
dataql run -f users.csv -q "SELECT name, email FROM users" -e users_export.jsonl -t jsonl

Example 6: Join Multiple Files

Create two related files:

cat > orders.csv << 'EOF'
order_id,customer_id,product,amount
1,1,Laptop,1
2,2,Mouse,3
3,1,Monitor,2
4,3,Chair,1
EOF

cat > customers.csv << 'EOF'
customer_id,name,country
1,Alice,USA
2,Bob,Canada
3,Charlie,UK
EOF

Join the data:

dataql run -f orders.csv -f customers.csv -q "
SELECT
    o.order_id,
    c.name as customer_name,
    o.product,
    o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
"

Example 7: Query from URL

Query data directly from a URL:

# Query CSV from URL
dataql run -f "https://raw.githubusercontent.com/datasets/airport-codes/master/data/airport-codes.csv" \
  -q "SELECT name, municipality, iso_country FROM airport_codes WHERE iso_country = 'BR' LIMIT 10"

Example 8: Read from stdin

Pipe data directly to DataQL:

# Pipe CSV data
cat users.csv | dataql run -f - -q "SELECT * FROM stdin WHERE age > 30"

# Pipe JSON data
echo '[{"a":1},{"a":2}]' | dataql run -f - -q "SELECT * FROM stdin"

# Combine with other tools
curl -s "https://api.example.com/data.json" | dataql run -f - -q "SELECT * FROM stdin"

Next Steps