Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SQL

The query command allows you to run SQL queries against CSV, JSON, and Parquet files as if they were database tables.

Basic Usage

rabbet query --as <table_name> <file> -- "<SQL query>"

Arguments

  • --as: Alias name for the table in your SQL query
  • file: Input file (CSV, JSON, or Parquet)
  • --: Separator between options and the SQL query
  • SQL query: Standard SQL SELECT statement

Multiple Tables

You can query multiple files by specifying multiple --as options:

rabbet query --as customers customers.csv --as orders orders.csv -- \
  "SELECT * FROM customers JOIN orders ON customers.id = orders.customer_id"

Supported SQL Features

  • SELECT with column selection and aliases
  • FROM with table aliases
  • WHERE with complex conditions
  • JOIN (INNER, LEFT, RIGHT, FULL)
  • GROUP BY with aggregations
  • ORDER BY with ASC/DESC
  • LIMIT and OFFSET
  • Common functions (COUNT, SUM, AVG, MIN, MAX, etc.)
  • String functions (UPPER, LOWER, CONCAT, etc.)
  • Date functions (DATE_TRUNC, EXTRACT, etc.)

Examples

Basic Query

Test query command with simple SELECT statement

$ rabbet query --as orders data/orders/orders.csv -- "SELECT * FROM orders WHERE product_id = 'PRODUCT-005'"
╭────────────────────────────────────────────────────────────────────────╮
│ order_id    customer_id    product_id    quantity   price   order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001   CUSTOMER-003   PRODUCT-005   1          10.0    2022-01-01 │
│ ORDER-002   CUSTOMER-003   PRODUCT-005   2          20.0    2022-01-02 │
╰────────────────────────────────────────────────────────────────────────╯

Aggregation Query

Test query command with aggregation functions

$ rabbet query --as orders data/orders/orders.csv -- "SELECT product_id, COUNT(*) as order_count, SUM(quantity) as total_quantity, AVG(price) as avg_price FROM orders GROUP BY product_id ORDER BY order_count DESC, product_id ASC"
╭────────────────────────────────────────────────────────╮
│ product_id    order_count   total_quantity   avg_price │
╞════════════════════════════════════════════════════════╡
│ PRODUCT-005   2             3                15.0      │
│ PRODUCT-001   1             5                50.0      │
│ PRODUCT-002   1             4                40.0      │
│ PRODUCT-003   1             3                30.0      │
│ PRODUCT-004   1             6                60.0      │
╰────────────────────────────────────────────────────────╯

Complex Join Query

For complex multi-table queries:

rabbet query --as customers data/orders/customers.csv --as orders data/orders/orders.csv -- <<EOF
SELECT
    c.customer_name,
    c.customer_city,
    COUNT(o.order_id) as order_count,
    SUM(o.quantity * o.price) as total_spent,
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name, c.customer_city
HAVING order_count > 0
ORDER BY total_spent DESC
EOF
╭─────────────────────────────────────────────────────────────╮
│ customer_name     customer_city   order_count   total_spent │
╞═════════════════════════════════════════════════════════════╡
│ Robert Brown      Anytown         1             250.0       │
│ Emily Davis       Anytown         1             160.0       │
│ Michael Johnson   Anytown         3             140.0       │
╰─────────────────────────────────────────────────────────────╯

Notes

  • Table names in SQL must match the aliases specified with --as
  • SQL keywords are case-insensitive
  • Column names are case-sensitive and must match the file headers
  • The query must be a SELECT statement (no INSERT, UPDATE, DELETE)
  • Complex queries may require more memory for processing