DataQL

MCP Server Setup Guide

Complete guide for configuring DataQL as a Model Context Protocol (MCP) server for LLM integration.

Overview

The MCP server enables LLMs to query data files using SQL without loading entire files into context. This dramatically reduces token consumption while providing powerful data analysis capabilities.

Prerequisites

DataQL installed and available in PATH
LLM client with MCP support (Claude Code, OpenAI Codex, Gemini CLI, etc.)

Verify installation:

dataql --version

Quick Setup

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "dataql": {
      "type": "stdio",
      "command": "dataql",
      "args": ["mcp", "serve"]
    }
  }
}

Restart Claude Code to load the new configuration.

OpenAI Codex

Add to ~/.codex/config.toml:

[mcp.servers.dataql]
transport = "stdio"
command = "dataql"
args = ["mcp", "serve"]

Google Gemini CLI

Add to your Gemini CLI configuration file:

mcpServers:
  dataql:
    command: dataql
    args: ["mcp", "serve"]

Available MCP Tools

Once configured, the following tools become available to your LLM:

Tool	Description
`dataql_query`	Execute SQL queries on data sources
`dataql_schema`	Get structure/schema of a data source
`dataql_preview`	Preview first N rows
`dataql_aggregate`	Perform count, sum, avg, min, max operations
`dataql_mq_peek`	Peek at message queue messages without consuming

Testing Your Setup

1. Verify MCP Server Starts

echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | dataql mcp serve

You should see a JSON response listing available tools.

2. Test a Query

# Create test data
echo -e "id,name,age\n1,Alice,28\n2,Bob,35\n3,Charlie,42" > test.csv

# Test via MCP
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"dataql_query","arguments":{"source":"test.csv","query":"SELECT * FROM test WHERE age > 30"}},"id":2}' | dataql mcp serve

3. Test in Your LLM Client

After configuring, ask your LLM:

“Use dataql to show me the schema of test.csv”

The LLM should use the dataql_schema tool to get the file structure.

Tool Reference

dataql_query

Execute SQL queries on any data source.

Parameters:

Parameter	Required	Description
source	Yes	File path, URL, S3 URI, or database connection
query	Yes	SQL query to execute
delimiter	No	CSV delimiter (default: comma)

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_query",
    "arguments": {
      "source": "sales.csv",
      "query": "SELECT product, SUM(amount) as total FROM sales GROUP BY product ORDER BY total DESC LIMIT 10"
    }
  },
  "id": 1
}

dataql_schema

Get the structure of a data source.

Parameters:

Parameter	Required	Description
source	Yes	File path, URL, or database connection

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_schema",
    "arguments": {
      "source": "users.json"
    }
  },
  "id": 1
}

dataql_preview

Preview first N rows of a data source.

Parameters:

Parameter	Required	Description
source	Yes	Data source
limit	No	Number of rows (default: 5, max: 100)

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_preview",
    "arguments": {
      "source": "data.parquet",
      "limit": 10
    }
  },
  "id": 1
}

dataql_aggregate

Perform aggregation operations on a column.

Parameters:

Parameter	Required	Description
source	Yes	Data source
column	Yes	Column to aggregate
operation	Yes	count, sum, avg, min, max
group_by	No	Column to group results by

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_aggregate",
    "arguments": {
      "source": "sales.csv",
      "column": "amount",
      "operation": "sum",
      "group_by": "category"
    }
  },
  "id": 1
}

dataql_mq_peek

Peek at message queue messages without consuming/deleting them. Supports AWS SQS and Apache Kafka.

Parameters:

Parameter	Required	Description
source	Yes	Message queue URL (SQS or Kafka)
max_messages	No	Maximum messages to retrieve (default: 10)
query	No	SQL query to filter/transform messages

Example Request (SQS):

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_mq_peek",
    "arguments": {
      "source": "sqs://my-queue?region=us-east-1",
      "max_messages": 20,
      "query": "SELECT body_event_type, COUNT(*) as count FROM my_queue GROUP BY body_event_type"
    }
  },
  "id": 1
}

Example Request (Kafka):

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_mq_peek",
    "arguments": {
      "source": "kafka://localhost:9092/events",
      "max_messages": 50,
      "query": "SELECT body_level, body_message FROM events WHERE body_level = 'ERROR'"
    }
  },
  "id": 1
}

Note: Messages are read in peek mode - they are NOT consumed or deleted, making this safe for troubleshooting.

Advanced Configuration

Debug Mode

Start the server with debug logging:

dataql mcp serve --debug

Environment Variables

The MCP server inherits environment variables, useful for cloud storage credentials:

# AWS S3
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

# Then query S3 files
# dataql_query with source="s3://bucket/file.csv"

Working Directory

The MCP server runs in the current working directory. File paths are relative to where the server starts.

For Claude Code, this is typically your project directory.

Supported Data Sources

The MCP server supports all DataQL data sources:

Local Files

CSV, JSON, JSONL, XML, YAML
Parquet, Avro, ORC
Excel (.xlsx, .xls)

Remote Sources

HTTP/HTTPS URLs
Amazon S3 (s3://bucket/path/file.csv)
Google Cloud Storage (gs://bucket/path/file.csv)
Azure Blob Storage (azure://container/path/file.csv)

Databases

PostgreSQL (postgres://user:pass@host/db?table=name)
MySQL (mysql://user:pass@host/db?table=name)
MongoDB (mongodb://user:pass@host/db?collection=name)

Message Queues (Peek Mode)

AWS SQS (sqs://queue-name?region=us-east-1)
Apache Kafka (kafka://broker:9092/topic)

Security Considerations

File Access: The MCP server can access any file the user running it can access. Be mindful of sensitive data.
Database Credentials: Use environment variables for database passwords instead of connection strings.
Remote Sources: Ensure proper authentication is configured for cloud storage.
Sandboxing: Consider running in a container or restricted environment for untrusted use cases.

Troubleshooting

Server Won’t Start

Error: Command not found

# Ensure dataql is in PATH
which dataql

# Or use full path in configuration
"command": "/usr/local/bin/dataql"

Connection Refused

Error: LLM client can’t connect

Verify the server starts manually:
```
echo '{}' | dataql mcp serve
```
Check client configuration syntax
Restart the LLM client

Query Errors

Error: Table not found

The table name is derived from the filename without extension:

users.csv → table name is users
my-data.json → table name is my-data

Use .schema to verify:

{"name": "dataql_schema", "arguments": {"source": "file.csv"}}

Performance Issues

For large files, use:

LIMIT clauses in queries
Aggregations instead of returning all rows
Specific column selection instead of SELECT *

DataQL

Query any data file using SQL. One command, instant results.

MCP Server Setup Guide

Overview

Prerequisites

Quick Setup

Claude Code

OpenAI Codex

Google Gemini CLI

Available MCP Tools

Testing Your Setup

1. Verify MCP Server Starts

2. Test a Query

3. Test in Your LLM Client

Tool Reference

dataql_query

dataql_schema

dataql_preview

dataql_aggregate

dataql_mq_peek

Advanced Configuration

Debug Mode

Environment Variables

Working Directory

Supported Data Sources

Local Files

Remote Sources

Databases

Message Queues (Peek Mode)

Security Considerations

Troubleshooting

Server Won’t Start

Connection Refused

Query Errors

Performance Issues

See Also