Rabbet
CLI to simplify shell scripts and glue code operating on relational data.
Installation
From Source
$ cargo install --git https://github.com/rofinn/rabbet.git
Quick Start
Join two CSV files on matching columns:
$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ customer_id customer_name customer_email customer_phone customer_address customer_city customer_state customer_zipcode customer_country order_id product_id quantity price order_date │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-001 PRODUCT-005 1 10.0 2022-01-01 │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-002 PRODUCT-005 2 20.0 2022-01-02 │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-003 PRODUCT-003 3 30.0 2022-01-03 │
│ CUSTOMER-004 Emily Davis emily.davis@exam… 555-2468 321 Pine St Anytown CA 90210 USA ORDER-004 PRODUCT-002 4 40.0 2022-01-04 │
│ CUSTOMER-005 Robert Brown robert.brown@exa… 555-3698 654 Maple St Anytown CA 90210 USA ORDER-005 PRODUCT-001 5 50.0 2022-01-05 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Installation
Usage
Viewing
Rabbet provides several commands for viewing and inspecting data files in a formatted table view. These commands are useful for quickly examining your data without performing any transformations.
Commands Overview
cat
- Display the entire contents of a filehead
- Display the first N rows of a filetail
- Display the last N rows of a file
Basic Usage
rabbet cat <file>
rabbet head <file> [-n <number>]
rabbet tail <file> [-n <number>]
Common Options
All viewing commands support these options:
--format
: Output format -table
(default),csv
,json
, orjsonl
--delimiter
: Input file delimiter (default:,
)--no-header
: Treat the first row as data instead of column headers
Examples
Viewing Entire Files with cat
Display all rows from a CSV file in a formatted table:
Test cat command with table format output
$ rabbet cat data/orders/orders.csv
╭────────────────────────────────────────────────────────────────────────╮
│ order_id customer_id product_id quantity price order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001 CUSTOMER-003 PRODUCT-005 1 10.0 2022-01-01 │
│ ORDER-002 CUSTOMER-003 PRODUCT-005 2 20.0 2022-01-02 │
│ ORDER-003 CUSTOMER-003 PRODUCT-003 3 30.0 2022-01-03 │
│ ORDER-004 CUSTOMER-004 PRODUCT-002 4 40.0 2022-01-04 │
│ ORDER-005 CUSTOMER-005 PRODUCT-001 5 50.0 2022-01-05 │
│ ORDER-006 CUSTOMER-006 PRODUCT-004 6 60.0 2022-01-06 │
╰────────────────────────────────────────────────────────────────────────╯
Viewing First Rows with head
Display the first 3 rows of a file:
Test head command with table format output
$ rabbet head data/orders/orders.csv -n 3
╭────────────────────────────────────────────────────────────────────────╮
│ order_id customer_id product_id quantity price order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001 CUSTOMER-003 PRODUCT-005 1 10.0 2022-01-01 │
│ ORDER-002 CUSTOMER-003 PRODUCT-005 2 20.0 2022-01-02 │
│ ORDER-003 CUSTOMER-003 PRODUCT-003 3 30.0 2022-01-03 │
╰────────────────────────────────────────────────────────────────────────╯
Viewing Last Rows with tail
Display the last 3 rows of a file:
Test tail command with table format output
$ rabbet tail data/orders/orders.csv -n 3
╭────────────────────────────────────────────────────────────────────────╮
│ order_id customer_id product_id quantity price order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-004 CUSTOMER-004 PRODUCT-002 4 40.0 2022-01-04 │
│ ORDER-005 CUSTOMER-005 PRODUCT-001 5 50.0 2022-01-05 │
│ ORDER-006 CUSTOMER-006 PRODUCT-004 6 60.0 2022-01-06 │
╰────────────────────────────────────────────────────────────────────────╯
Use Cases
- Quick inspection: Use
cat
for small files to see all data at once - Preview large files: Use
head
to check the structure and first few rows - Check recent entries: Use
tail
to see the most recent records in time-series or log data - Verify headers: Use
head -n 1
to quickly check column names - Data validation: Combine with other formats (
--format csv
) to verify parsing
Notes
- The default number of rows for
head
andtail
is 10 when-n
is not specified - Table format automatically truncates long values for display
- For very wide tables, consider using
--format csv
for better readability - These commands preserve the original data types and formatting
Joins
The join
command combines rows from two tables based on related columns between them, similar to SQL JOIN operations.
Basic Usage
rabbet join <left-table> <right-table> --on <column> [options]
Arguments
left-table
: First input CSV fileright-table
: Second input CSV file--on
: Column name to join on (must exist in both tables)--type
: Join type -inner
(default),left
,right
, orfull
--left-on
: Column name in left table (when join columns have different names)--right-on
: Column name in right table (when join columns have different names)--format
: Output format -table
(default) orcsv
Join Types
- Inner Join (default): Returns only rows with matching values in both tables
- Left Join: Returns all rows from the left table, with NULL values for non-matching right rows
- Right Join: Returns all rows from the right table, with NULL values for non-matching left rows
- Full Join: Returns all rows from both tables, with NULL values where there's no match
Examples
Basic Inner Join
Join two tables on a common column:
Test join command with CSV format output
$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id --format csv
customer_id,customer_name,customer_email,customer_phone,customer_address,customer_city,customer_state,customer_zipcode,customer_country,order_id,product_id,quantity,price,order_date
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-001,PRODUCT-005,1,10.0,2022-01-01
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-002,PRODUCT-005,2,20.0,2022-01-02
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-003,PRODUCT-003,3,30.0,2022-01-03
CUSTOMER-004,Emily Davis,emily.davis@example.com,555-2468,321 Pine St,Anytown,CA,90210,USA,ORDER-004,PRODUCT-002,4,40.0,2022-01-04
CUSTOMER-005,Robert Brown,robert.brown@example.com,555-3698,654 Maple St,Anytown,CA,90210,USA,ORDER-005,PRODUCT-001,5,50.0,2022-01-05
Left Join
Include all rows from the left table, even when there's no match in the right table:
Test left join between customers and orders
$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id --type left
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ customer_id customer_name customer_email customer_phone customer_address customer_city customer_state customer_zipcode customer_country order_id product_id quantity price order_date │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ CUSTOMER-001 John Doe john.doe@example… 555-1234 123 Main St Anytown CA 90210 USA null null null null null │
│ CUSTOMER-002 Jane Smith jane.smith@examp… 555-5678 456 Elm St Anytown CA 90210 USA null null null null null │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-001 PRODUCT-005 1 10.0 2022-01-01 │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-002 PRODUCT-005 2 20.0 2022-01-02 │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-003 PRODUCT-003 3 30.0 2022-01-03 │
│ CUSTOMER-004 Emily Davis emily.davis@exam… 555-2468 321 Pine St Anytown CA 90210 USA ORDER-004 PRODUCT-002 4 40.0 2022-01-04 │
│ CUSTOMER-005 Robert Brown robert.brown@exa… 555-3698 654 Maple St Anytown CA 90210 USA ORDER-005 PRODUCT-001 5 50.0 2022-01-05 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Notes
- The join column must have the same data type in both tables
- Column names from both tables are preserved in the output
- If tables have overlapping column names (other than the join column), they will be prefixed to avoid conflicts
- For best performance, ensure your data is sorted by the join column
- Large joins may require significant memory
Aggregations
The aggregate
command allows you to group and summarize data using various statistical operations.
Basic Usage
rabbet aggregate <table> --by <columns> --with <aggregations>
Arguments
table
: Input CSV file or-
for stdin--by
: Columns to group by (comma-separated)--with
: Aggregation operations ascolumn=operation
pairs (comma-separated)--delimiter
: Input file delimiter (default:,
)
Available Operations
sum
: Sum of valuesmean
: Average of valuesmedian
: Median valuemin
: Minimum valuemax
: Maximum valuerange
: Difference between max and mincount
: Count of non-null valuesvariance
: Sample variancestddev
: Sample standard deviationfirst
: First value in grouplast
: Last value in groupdescribe
: Summary statistics as a string
For row counting operations, use _=count
, _=len
, or _=nrow
.
Examples
Simple Aggregation
Group by a single column and calculate one statistic:
Test aggregate command with simple group-by and single aggregation
$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean
╭────────────────────────────────────╮
│ Species PetalLength_mean │
╞════════════════════════════════════╡
│ Iris-setosa 1.464 │
│ Iris-versicolor 4.26 │
│ Iris-virginica 5.552 │
╰────────────────────────────────────╯
Multiple Aggregations
Calculate multiple statistics for different columns:
Test aggregate command with multiple aggregation operations on the same group
$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean,PetalWidth=max,SepalLength=min
╭───────────────────────────────────────────────────────────────────────╮
│ Species PetalLength_mean PetalWidth_max SepalLength_min │
╞═══════════════════════════════════════════════════════════════════════╡
│ Iris-setosa 1.464 0.6 4.3 │
│ Iris-versicolor 4.26 1.8 4.9 │
│ Iris-virginica 5.552 2.5 4.9 │
╰───────────────────────────────────────────────────────────────────────╯
Multiple Group-By Columns
Group by multiple columns to create finer-grained aggregations:
Test aggregate command with multiple group-by columns
$ rabbet aggregate data/iris/iris.csv --by Species,PetalWidth --with SepalLength=max
╭────────────────────────────────────────────────╮
│ Species PetalWidth SepalLength_max │
╞════════════════════════════════════════════════╡
│ Iris-setosa 0.2 5.8 │
│ Iris-setosa 0.4 5.7 │
│ Iris-setosa 0.3 5.7 │
│ Iris-setosa 0.1 5.2 │
│ Iris-setosa 0.5 5.1 │
│ Iris-setosa 0.6 5.0 │
│ Iris-versicolor 1.4 7.0 │
│ Iris-versicolor 1.5 6.9 │
│ Iris-versicolor 1.3 6.6 │
│ Iris-versicolor 1.6 6.3 │
│ Iris-versicolor 1.0 6.0 │
│ Iris-versicolor 1.1 5.6 │
│ Iris-versicolor 1.8 5.9 │
│ … … … │
│ Iris-virginica 2.5 7.2 │
│ Iris-virginica 1.9 7.4 │
│ Iris-virginica 2.1 7.6 │
│ Iris-virginica 1.8 7.3 │
│ Iris-virginica 2.2 7.7 │
│ Iris-virginica 1.7 4.9 │
│ Iris-virginica 2.0 7.9 │
│ Iris-virginica 2.4 6.7 │
│ Iris-virginica 2.3 7.7 │
│ Iris-virginica 1.5 6.3 │
│ Iris-virginica 1.6 7.2 │
│ Iris-virginica 1.4 6.1 │
╰────────────────────────────────────────────────╯
Notes
- Column names in the output are automatically suffixed with the operation name (e.g.,
PetalLength_mean
) - When grouping by multiple columns, each unique combination creates a separate group
- Use
--by
without any columns to aggregate the entire dataset into a single row - Multiple operations can be applied to the same column by specifying it multiple times
SQL
The query
command allows you to run SQL queries against CSV, JSON, and Parquet files as if they were database tables.
Basic Usage
rabbet query --as <table_name> <file> -- "<SQL query>"
Arguments
--as
: Alias name for the table in your SQL queryfile
: Input file (CSV, JSON, or Parquet)--
: Separator between options and the SQL querySQL query
: Standard SQL SELECT statement
Multiple Tables
You can query multiple files by specifying multiple --as
options:
rabbet query --as customers customers.csv --as orders orders.csv -- \
"SELECT * FROM customers JOIN orders ON customers.id = orders.customer_id"
Supported SQL Features
SELECT
with column selection and aliasesFROM
with table aliasesWHERE
with complex conditionsJOIN
(INNER, LEFT, RIGHT, FULL)GROUP BY
with aggregationsORDER BY
with ASC/DESCLIMIT
andOFFSET
- Common functions (COUNT, SUM, AVG, MIN, MAX, etc.)
- String functions (UPPER, LOWER, CONCAT, etc.)
- Date functions (DATE_TRUNC, EXTRACT, etc.)
Examples
Basic Query
Test query command with simple SELECT statement
$ rabbet query --as orders data/orders/orders.csv -- "SELECT * FROM orders WHERE product_id = 'PRODUCT-005'"
╭────────────────────────────────────────────────────────────────────────╮
│ order_id customer_id product_id quantity price order_date │
╞════════════════════════════════════════════════════════════════════════╡
│ ORDER-001 CUSTOMER-003 PRODUCT-005 1 10.0 2022-01-01 │
│ ORDER-002 CUSTOMER-003 PRODUCT-005 2 20.0 2022-01-02 │
╰────────────────────────────────────────────────────────────────────────╯
Aggregation Query
Test query command with aggregation functions
$ rabbet query --as orders data/orders/orders.csv -- "SELECT product_id, COUNT(*) as order_count, SUM(quantity) as total_quantity, AVG(price) as avg_price FROM orders GROUP BY product_id ORDER BY order_count DESC, product_id ASC"
╭────────────────────────────────────────────────────────╮
│ product_id order_count total_quantity avg_price │
╞════════════════════════════════════════════════════════╡
│ PRODUCT-005 2 3 15.0 │
│ PRODUCT-001 1 5 50.0 │
│ PRODUCT-002 1 4 40.0 │
│ PRODUCT-003 1 3 30.0 │
│ PRODUCT-004 1 6 60.0 │
╰────────────────────────────────────────────────────────╯
Complex Join Query
For complex multi-table queries:
rabbet query --as customers data/orders/customers.csv --as orders data/orders/orders.csv -- <<EOF
SELECT
c.customer_name,
c.customer_city,
COUNT(o.order_id) as order_count,
SUM(o.quantity * o.price) as total_spent,
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name, c.customer_city
HAVING order_count > 0
ORDER BY total_spent DESC
EOF
╭─────────────────────────────────────────────────────────────╮
│ customer_name customer_city order_count total_spent │
╞═════════════════════════════════════════════════════════════╡
│ Robert Brown Anytown 1 250.0 │
│ Emily Davis Anytown 1 160.0 │
│ Michael Johnson Anytown 3 140.0 │
╰─────────────────────────────────────────────────────────────╯
Notes
- Table names in SQL must match the aliases specified with
--as
- SQL keywords are case-insensitive
- Column names are case-sensitive and must match the file headers
- The query must be a SELECT statement (no INSERT, UPDATE, DELETE)
- Complex queries may require more memory for processing
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Development
This is a pretty small project and I don't have an overly strong opinion on how code should be structured.
So far, I've maintained a relatively flat directory structure apart from main.rs
and args.rs
(for the top level clap struct) most of the other subcommands are isolated to their own files.
Any shared functionality lives in an appropriately named modules at the same level (e.g., io
).
I tend to keep unit tests inline with the code they test and use trycmd
in the examples directory for acceptance tests.
For now, the /tests
directory just contains a cli_tests.rs
file for running our examples.
Apart from that there is the standard formatting, linting and testing commands provided below.
I've bundled most of these into a .pre-commit-config.yaml
file if you don't want to remember them all :)
$ pre-commit run
or
$ pre-commit install
Commands
Formatting:
$ cargo fmt
Linting:
$ cargo clippy --fix --allow-dirty
Testing: Run all tests (unit + integration):
$ cargo test
Testing w/ coverage:
$ cargo llvm-cov
Run only CLI integration tests:
$ cargo test --test cli_tests
Update CLI test snapshots (should be printed on failures):
$ TRYCMD=overwrite cargo test --test cli_tests
Benchmarking:
$ ./scripts/benchmark
This will just check that the binary size isn't too large and that cmd latency doesn't exceed 0.1s.
Please update this script for new commands.
We're just using hyperfine
commands inside it.