Skip to the content.

DataQL

MCP Server Setup Guide

Complete guide for configuring DataQL as a Model Context Protocol (MCP) server for LLM integration.

Overview

The MCP server enables LLMs to query data files using SQL without loading entire files into context. This dramatically reduces token consumption while providing powerful data analysis capabilities.

Prerequisites

Verify installation:

dataql --version

Quick Setup

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "dataql": {
      "type": "stdio",
      "command": "dataql",
      "args": ["mcp", "serve"]
    }
  }
}

Restart Claude Code to load the new configuration.

OpenAI Codex

Add to ~/.codex/config.toml:

[mcp.servers.dataql]
transport = "stdio"
command = "dataql"
args = ["mcp", "serve"]

Google Gemini CLI

Add to your Gemini CLI configuration file:

mcpServers:
  dataql:
    command: dataql
    args: ["mcp", "serve"]

Available MCP Tools

Once configured, the following tools become available to your LLM:

Tool Description
dataql_query Execute SQL queries on data sources
dataql_schema Get structure/schema of a data source
dataql_preview Preview first N rows
dataql_aggregate Perform count, sum, avg, min, max operations
dataql_mq_peek Peek at message queue messages without consuming

Testing Your Setup

1. Verify MCP Server Starts

echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | dataql mcp serve

You should see a JSON response listing available tools.

2. Test a Query

# Create test data
echo -e "id,name,age\n1,Alice,28\n2,Bob,35\n3,Charlie,42" > test.csv

# Test via MCP
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"dataql_query","arguments":{"source":"test.csv","query":"SELECT * FROM test WHERE age > 30"}},"id":2}' | dataql mcp serve

3. Test in Your LLM Client

After configuring, ask your LLM:

“Use dataql to show me the schema of test.csv”

The LLM should use the dataql_schema tool to get the file structure.

Tool Reference

dataql_query

Execute SQL queries on any data source.

Parameters:

Parameter Required Description
source Yes File path, URL, S3 URI, or database connection
query Yes SQL query to execute
delimiter No CSV delimiter (default: comma)

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_query",
    "arguments": {
      "source": "sales.csv",
      "query": "SELECT product, SUM(amount) as total FROM sales GROUP BY product ORDER BY total DESC LIMIT 10"
    }
  },
  "id": 1
}

dataql_schema

Get the structure of a data source.

Parameters:

Parameter Required Description
source Yes File path, URL, or database connection

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_schema",
    "arguments": {
      "source": "users.json"
    }
  },
  "id": 1
}

dataql_preview

Preview first N rows of a data source.

Parameters:

Parameter Required Description
source Yes Data source
limit No Number of rows (default: 5, max: 100)

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_preview",
    "arguments": {
      "source": "data.parquet",
      "limit": 10
    }
  },
  "id": 1
}

dataql_aggregate

Perform aggregation operations on a column.

Parameters:

Parameter Required Description
source Yes Data source
column Yes Column to aggregate
operation Yes count, sum, avg, min, max
group_by No Column to group results by

Example Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_aggregate",
    "arguments": {
      "source": "sales.csv",
      "column": "amount",
      "operation": "sum",
      "group_by": "category"
    }
  },
  "id": 1
}

dataql_mq_peek

Peek at message queue messages without consuming/deleting them. Supports AWS SQS and Apache Kafka.

Parameters:

Parameter Required Description
source Yes Message queue URL (SQS or Kafka)
max_messages No Maximum messages to retrieve (default: 10)
query No SQL query to filter/transform messages

Example Request (SQS):

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_mq_peek",
    "arguments": {
      "source": "sqs://my-queue?region=us-east-1",
      "max_messages": 20,
      "query": "SELECT body_event_type, COUNT(*) as count FROM my_queue GROUP BY body_event_type"
    }
  },
  "id": 1
}

Example Request (Kafka):

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "dataql_mq_peek",
    "arguments": {
      "source": "kafka://localhost:9092/events",
      "max_messages": 50,
      "query": "SELECT body_level, body_message FROM events WHERE body_level = 'ERROR'"
    }
  },
  "id": 1
}

Note: Messages are read in peek mode - they are NOT consumed or deleted, making this safe for troubleshooting.

Advanced Configuration

Debug Mode

Start the server with debug logging:

dataql mcp serve --debug

Environment Variables

The MCP server inherits environment variables, useful for cloud storage credentials:

# AWS S3
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

# Then query S3 files
# dataql_query with source="s3://bucket/file.csv"

Working Directory

The MCP server runs in the current working directory. File paths are relative to where the server starts.

For Claude Code, this is typically your project directory.

Supported Data Sources

The MCP server supports all DataQL data sources:

Local Files

Remote Sources

Databases

Message Queues (Peek Mode)

Security Considerations

  1. File Access: The MCP server can access any file the user running it can access. Be mindful of sensitive data.

  2. Database Credentials: Use environment variables for database passwords instead of connection strings.

  3. Remote Sources: Ensure proper authentication is configured for cloud storage.

  4. Sandboxing: Consider running in a container or restricted environment for untrusted use cases.

Troubleshooting

Server Won’t Start

Error: Command not found

# Ensure dataql is in PATH
which dataql

# Or use full path in configuration
"command": "/usr/local/bin/dataql"

Connection Refused

Error: LLM client can’t connect

  1. Verify the server starts manually:
    echo '{}' | dataql mcp serve
    
  2. Check client configuration syntax

  3. Restart the LLM client

Query Errors

Error: Table not found

The table name is derived from the filename without extension:

Use .schema to verify:

{"name": "dataql_schema", "arguments": {"source": "file.csv"}}

Performance Issues

For large files, use:

See Also