Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rabbet

CI codecov

CLI to simplify shell scripts and glue code operating on relational data.

Rabbet Joint Diagram

Installation

From Source

$ cargo install --git https://github.com/rofinn/rabbet.git

Quick Start

Join two CSV files on matching columns:

$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ customer_id    customer_name     customer_email      customer_phone   customer_address   customer_city   customer_state   customer_zipcode   customer_country   order_id    product_id    quantity   price   order_date │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ CUSTOMER-003   Michael Johnson   michael.johnson@…   555-9876         789 Oak St         Anytown         CA               90210              USA                ORDER-001   PRODUCT-005   1          10.0    2022-01-01 │
│ CUSTOMER-003   Michael Johnson   michael.johnson@…   555-9876         789 Oak St         Anytown         CA               90210              USA                ORDER-002   PRODUCT-005   2          20.0    2022-01-02 │
│ CUSTOMER-003   Michael Johnson   michael.johnson@…   555-9876         789 Oak St         Anytown         CA               90210              USA                ORDER-003   PRODUCT-003   3          30.0    2022-01-03 │
│ CUSTOMER-004   Emily Davis       emily.davis@exam…   555-2468         321 Pine St        Anytown         CA               90210              USA                ORDER-004   PRODUCT-002   4          40.0    2022-01-04 │
│ CUSTOMER-005   Robert Brown      robert.brown@exa…   555-3698         654 Maple St       Anytown         CA               90210              USA                ORDER-005   PRODUCT-001   5          50.0    2022-01-05 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Installation

Usage

Viewing

Rabbet provides several commands for viewing and inspecting data files in a formatted table view. These commands are useful for quickly examining your data without performing any transformations.

Commands Overview

  • cat - Display the entire contents of a file
  • head - Display the first N rows of a file
  • tail - Display the last N rows of a file

Basic Usage

rabbet cat <file>
rabbet head <file> [-n <number>]
rabbet tail <file> [-n <number>]

Common Options

All viewing commands support these options:

  • --format: Output format - table (default), csv, json, or jsonl
  • --delimiter: Input file delimiter (default: ,)
  • --no-header: Treat the first row as data instead of column headers

Examples

Viewing Entire Files with cat

Display all rows from a CSV file in a formatted table:

Test cat command with table format output

$ rabbet cat data/orders/orders.csv
╭────────────────────────────────────────────────────────────────────────╮
│ order_id    customer_id    product_id    quantity   price   order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001   CUSTOMER-003   PRODUCT-005   1          10.0    2022-01-01 │
│ ORDER-002   CUSTOMER-003   PRODUCT-005   2          20.0    2022-01-02 │
│ ORDER-003   CUSTOMER-003   PRODUCT-003   3          30.0    2022-01-03 │
│ ORDER-004   CUSTOMER-004   PRODUCT-002   4          40.0    2022-01-04 │
│ ORDER-005   CUSTOMER-005   PRODUCT-001   5          50.0    2022-01-05 │
│ ORDER-006   CUSTOMER-006   PRODUCT-004   6          60.0    2022-01-06 │
╰────────────────────────────────────────────────────────────────────────╯

Viewing First Rows with head

Display the first 3 rows of a file:

Test head command with table format output

$ rabbet head data/orders/orders.csv -n 3
╭────────────────────────────────────────────────────────────────────────╮
│ order_id    customer_id    product_id    quantity   price   order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001   CUSTOMER-003   PRODUCT-005   1          10.0    2022-01-01 │
│ ORDER-002   CUSTOMER-003   PRODUCT-005   2          20.0    2022-01-02 │
│ ORDER-003   CUSTOMER-003   PRODUCT-003   3          30.0    2022-01-03 │
╰────────────────────────────────────────────────────────────────────────╯

Viewing Last Rows with tail

Display the last 3 rows of a file:

Test tail command with table format output

$ rabbet tail data/orders/orders.csv -n 3
╭────────────────────────────────────────────────────────────────────────╮
│ order_id    customer_id    product_id    quantity   price   order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-004   CUSTOMER-004   PRODUCT-002   4          40.0    2022-01-04 │
│ ORDER-005   CUSTOMER-005   PRODUCT-001   5          50.0    2022-01-05 │
│ ORDER-006   CUSTOMER-006   PRODUCT-004   6          60.0    2022-01-06 │
╰────────────────────────────────────────────────────────────────────────╯

Use Cases

  • Quick inspection: Use cat for small files to see all data at once
  • Preview large files: Use head to check the structure and first few rows
  • Check recent entries: Use tail to see the most recent records in time-series or log data
  • Verify headers: Use head -n 1 to quickly check column names
  • Data validation: Combine with other formats (--format csv) to verify parsing

Notes

  • The default number of rows for head and tail is 10 when -n is not specified
  • Table format automatically truncates long values for display
  • For very wide tables, consider using --format csv for better readability
  • These commands preserve the original data types and formatting

Joins

The join command combines rows from two tables based on related columns between them, similar to SQL JOIN operations.

Basic Usage

rabbet join <left-table> <right-table> --on <column> [options]

Arguments

  • left-table: First input CSV file
  • right-table: Second input CSV file
  • --on: Column name to join on (must exist in both tables)
  • --type: Join type - inner (default), left, right, or full
  • --left-on: Column name in left table (when join columns have different names)
  • --right-on: Column name in right table (when join columns have different names)
  • --format: Output format - table (default) or csv

Join Types

  • Inner Join (default): Returns only rows with matching values in both tables
  • Left Join: Returns all rows from the left table, with NULL values for non-matching right rows
  • Right Join: Returns all rows from the right table, with NULL values for non-matching left rows
  • Full Join: Returns all rows from both tables, with NULL values where there's no match

Examples

Basic Inner Join

Join two tables on a common column:

Test join command with CSV format output

$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id --format csv
customer_id,customer_name,customer_email,customer_phone,customer_address,customer_city,customer_state,customer_zipcode,customer_country,order_id,product_id,quantity,price,order_date
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-001,PRODUCT-005,1,10.0,2022-01-01
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-002,PRODUCT-005,2,20.0,2022-01-02
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-003,PRODUCT-003,3,30.0,2022-01-03
CUSTOMER-004,Emily Davis,emily.davis@example.com,555-2468,321 Pine St,Anytown,CA,90210,USA,ORDER-004,PRODUCT-002,4,40.0,2022-01-04
CUSTOMER-005,Robert Brown,robert.brown@example.com,555-3698,654 Maple St,Anytown,CA,90210,USA,ORDER-005,PRODUCT-001,5,50.0,2022-01-05

Left Join

Include all rows from the left table, even when there's no match in the right table:

Test left join between customers and orders

$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id --type left
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ customer_id    customer_name     customer_email      customer_phone   customer_address   customer_city   customer_state   customer_zipcode   customer_country   order_id    product_id    quantity   price   order_date │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ CUSTOMER-001   John Doe          john.doe@example…   555-1234         123 Main St        Anytown         CA               90210              USA                null        null          null       null    null       │
│ CUSTOMER-002   Jane Smith        jane.smith@examp…   555-5678         456 Elm St         Anytown         CA               90210              USA                null        null          null       null    null       │
│ CUSTOMER-003   Michael Johnson   michael.johnson@…   555-9876         789 Oak St         Anytown         CA               90210              USA                ORDER-001   PRODUCT-005   1          10.0    2022-01-01 │
│ CUSTOMER-003   Michael Johnson   michael.johnson@…   555-9876         789 Oak St         Anytown         CA               90210              USA                ORDER-002   PRODUCT-005   2          20.0    2022-01-02 │
│ CUSTOMER-003   Michael Johnson   michael.johnson@…   555-9876         789 Oak St         Anytown         CA               90210              USA                ORDER-003   PRODUCT-003   3          30.0    2022-01-03 │
│ CUSTOMER-004   Emily Davis       emily.davis@exam…   555-2468         321 Pine St        Anytown         CA               90210              USA                ORDER-004   PRODUCT-002   4          40.0    2022-01-04 │
│ CUSTOMER-005   Robert Brown      robert.brown@exa…   555-3698         654 Maple St       Anytown         CA               90210              USA                ORDER-005   PRODUCT-001   5          50.0    2022-01-05 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Notes

  • The join column must have the same data type in both tables
  • Column names from both tables are preserved in the output
  • If tables have overlapping column names (other than the join column), they will be prefixed to avoid conflicts
  • For best performance, ensure your data is sorted by the join column
  • Large joins may require significant memory

Aggregations

The aggregate command allows you to group and summarize data using various statistical operations.

Basic Usage

rabbet aggregate <table> --by <columns> --with <aggregations>

Arguments

  • table: Input CSV file or - for stdin
  • --by: Columns to group by (comma-separated)
  • --with: Aggregation operations as column=operation pairs (comma-separated)
  • --delimiter: Input file delimiter (default: ,)

Available Operations

  • sum: Sum of values
  • mean: Average of values
  • median: Median value
  • min: Minimum value
  • max: Maximum value
  • range: Difference between max and min
  • count: Count of non-null values
  • variance: Sample variance
  • stddev: Sample standard deviation
  • first: First value in group
  • last: Last value in group
  • describe: Summary statistics as a string

For row counting operations, use _=count, _=len, or _=nrow.

Examples

Simple Aggregation

Group by a single column and calculate one statistic:

Test aggregate command with simple group-by and single aggregation

$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean
╭────────────────────────────────────╮
│ Species           PetalLength_mean │
╞════════════════════════════════════╡
│ Iris-setosa       1.464            │
│ Iris-versicolor   4.26             │
│ Iris-virginica    5.552            │
╰────────────────────────────────────╯

Multiple Aggregations

Calculate multiple statistics for different columns:

Test aggregate command with multiple aggregation operations on the same group

$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean,PetalWidth=max,SepalLength=min
╭───────────────────────────────────────────────────────────────────────╮
│ Species           PetalLength_mean   PetalWidth_max   SepalLength_min │
╞═══════════════════════════════════════════════════════════════════════╡
│ Iris-setosa       1.464              0.6              4.3             │
│ Iris-versicolor   4.26               1.8              4.9             │
│ Iris-virginica    5.552              2.5              4.9             │
╰───────────────────────────────────────────────────────────────────────╯

Multiple Group-By Columns

Group by multiple columns to create finer-grained aggregations:

Test aggregate command with multiple group-by columns

$ rabbet aggregate data/iris/iris.csv --by Species,PetalWidth --with SepalLength=max
╭────────────────────────────────────────────────╮
│ Species           PetalWidth   SepalLength_max │
╞════════════════════════════════════════════════╡
│ Iris-setosa       0.2          5.8             │
│ Iris-setosa       0.4          5.7             │
│ Iris-setosa       0.3          5.7             │
│ Iris-setosa       0.1          5.2             │
│ Iris-setosa       0.5          5.1             │
│ Iris-setosa       0.6          5.0             │
│ Iris-versicolor   1.4          7.0             │
│ Iris-versicolor   1.5          6.9             │
│ Iris-versicolor   1.3          6.6             │
│ Iris-versicolor   1.6          6.3             │
│ Iris-versicolor   1.0          6.0             │
│ Iris-versicolor   1.1          5.6             │
│ Iris-versicolor   1.8          5.9             │
│ …                 …            …               │
│ Iris-virginica    2.5          7.2             │
│ Iris-virginica    1.9          7.4             │
│ Iris-virginica    2.1          7.6             │
│ Iris-virginica    1.8          7.3             │
│ Iris-virginica    2.2          7.7             │
│ Iris-virginica    1.7          4.9             │
│ Iris-virginica    2.0          7.9             │
│ Iris-virginica    2.4          6.7             │
│ Iris-virginica    2.3          7.7             │
│ Iris-virginica    1.5          6.3             │
│ Iris-virginica    1.6          7.2             │
│ Iris-virginica    1.4          6.1             │
╰────────────────────────────────────────────────╯

Notes

  • Column names in the output are automatically suffixed with the operation name (e.g., PetalLength_mean)
  • When grouping by multiple columns, each unique combination creates a separate group
  • Use --by without any columns to aggregate the entire dataset into a single row
  • Multiple operations can be applied to the same column by specifying it multiple times

SQL

The query command allows you to run SQL queries against CSV, JSON, and Parquet files as if they were database tables.

Basic Usage

rabbet query --as <table_name> <file> -- "<SQL query>"

Arguments

  • --as: Alias name for the table in your SQL query
  • file: Input file (CSV, JSON, or Parquet)
  • --: Separator between options and the SQL query
  • SQL query: Standard SQL SELECT statement

Multiple Tables

You can query multiple files by specifying multiple --as options:

rabbet query --as customers customers.csv --as orders orders.csv -- \
  "SELECT * FROM customers JOIN orders ON customers.id = orders.customer_id"

Supported SQL Features

  • SELECT with column selection and aliases
  • FROM with table aliases
  • WHERE with complex conditions
  • JOIN (INNER, LEFT, RIGHT, FULL)
  • GROUP BY with aggregations
  • ORDER BY with ASC/DESC
  • LIMIT and OFFSET
  • Common functions (COUNT, SUM, AVG, MIN, MAX, etc.)
  • String functions (UPPER, LOWER, CONCAT, etc.)
  • Date functions (DATE_TRUNC, EXTRACT, etc.)

Examples

Basic Query

Test query command with simple SELECT statement

$ rabbet query --as orders data/orders/orders.csv -- "SELECT * FROM orders WHERE product_id = 'PRODUCT-005'"
╭────────────────────────────────────────────────────────────────────────╮
│ order_id    customer_id    product_id    quantity   price   order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001   CUSTOMER-003   PRODUCT-005   1          10.0    2022-01-01 │
│ ORDER-002   CUSTOMER-003   PRODUCT-005   2          20.0    2022-01-02 │
╰────────────────────────────────────────────────────────────────────────╯

Aggregation Query

Test query command with aggregation functions

$ rabbet query --as orders data/orders/orders.csv -- "SELECT product_id, COUNT(*) as order_count, SUM(quantity) as total_quantity, AVG(price) as avg_price FROM orders GROUP BY product_id ORDER BY order_count DESC, product_id ASC"
╭────────────────────────────────────────────────────────╮
│ product_id    order_count   total_quantity   avg_price │
╞════════════════════════════════════════════════════════╡
│ PRODUCT-005   2             3                15.0      │
│ PRODUCT-001   1             5                50.0      │
│ PRODUCT-002   1             4                40.0      │
│ PRODUCT-003   1             3                30.0      │
│ PRODUCT-004   1             6                60.0      │
╰────────────────────────────────────────────────────────╯

Complex Join Query

For complex multi-table queries:

rabbet query --as customers data/orders/customers.csv --as orders data/orders/orders.csv -- <<EOF
SELECT
    c.customer_name,
    c.customer_city,
    COUNT(o.order_id) as order_count,
    SUM(o.quantity * o.price) as total_spent,
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name, c.customer_city
HAVING order_count > 0
ORDER BY total_spent DESC
EOF
╭─────────────────────────────────────────────────────────────╮
│ customer_name     customer_city   order_count   total_spent │
╞═════════════════════════════════════════════════════════════╡
│ Robert Brown      Anytown         1             250.0       │
│ Emily Davis       Anytown         1             160.0       │
│ Michael Johnson   Anytown         3             140.0       │
╰─────────────────────────────────────────────────────────────╯

Notes

  • Table names in SQL must match the aliases specified with --as
  • SQL keywords are case-insensitive
  • Column names are case-sensitive and must match the file headers
  • The query must be a SELECT statement (no INSERT, UPDATE, DELETE)
  • Complex queries may require more memory for processing

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development

This is a pretty small project and I don't have an overly strong opinion on how code should be structured. So far, I've maintained a relatively flat directory structure apart from main.rs and args.rs (for the top level clap struct) most of the other subcommands are isolated to their own files. Any shared functionality lives in an appropriately named modules at the same level (e.g., io). I tend to keep unit tests inline with the code they test and use trycmd in the examples directory for acceptance tests. For now, the /tests directory just contains a cli_tests.rs file for running our examples.

Apart from that there is the standard formatting, linting and testing commands provided below. I've bundled most of these into a .pre-commit-config.yaml file if you don't want to remember them all :)

$ pre-commit run

or

$ pre-commit install

Commands

Formatting:

$ cargo fmt

Linting:

$ cargo clippy --fix --allow-dirty

Testing: Run all tests (unit + integration):

$ cargo test

Testing w/ coverage:

$ cargo llvm-cov

Run only CLI integration tests:

$ cargo test --test cli_tests

Update CLI test snapshots (should be printed on failures):

$ TRYCMD=overwrite cargo test --test cli_tests

Benchmarking:

$ ./scripts/benchmark

This will just check that the binary size isn't too large and that cmd latency doesn't exceed 0.1s. Please update this script for new commands. We're just using hyperfine commands inside it.