Joins
The join
command combines rows from two tables based on related columns between them, similar to SQL JOIN operations.
Basic Usage
rabbet join <left-table> <right-table> --on <column> [options]
Arguments
left-table
: First input CSV fileright-table
: Second input CSV file--on
: Column name to join on (must exist in both tables)--type
: Join type -inner
(default),left
,right
, orfull
--left-on
: Column name in left table (when join columns have different names)--right-on
: Column name in right table (when join columns have different names)--format
: Output format -table
(default) orcsv
Join Types
- Inner Join (default): Returns only rows with matching values in both tables
- Left Join: Returns all rows from the left table, with NULL values for non-matching right rows
- Right Join: Returns all rows from the right table, with NULL values for non-matching left rows
- Full Join: Returns all rows from both tables, with NULL values where there's no match
Examples
Basic Inner Join
Join two tables on a common column:
Test join command with CSV format output
$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id --format csv
customer_id,customer_name,customer_email,customer_phone,customer_address,customer_city,customer_state,customer_zipcode,customer_country,order_id,product_id,quantity,price,order_date
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-001,PRODUCT-005,1,10.0,2022-01-01
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-002,PRODUCT-005,2,20.0,2022-01-02
CUSTOMER-003,Michael Johnson,michael.johnson@example.com,555-9876,789 Oak St,Anytown,CA,90210,USA,ORDER-003,PRODUCT-003,3,30.0,2022-01-03
CUSTOMER-004,Emily Davis,emily.davis@example.com,555-2468,321 Pine St,Anytown,CA,90210,USA,ORDER-004,PRODUCT-002,4,40.0,2022-01-04
CUSTOMER-005,Robert Brown,robert.brown@example.com,555-3698,654 Maple St,Anytown,CA,90210,USA,ORDER-005,PRODUCT-001,5,50.0,2022-01-05
Left Join
Include all rows from the left table, even when there's no match in the right table:
Test left join between customers and orders
$ rabbet join data/orders/customers.csv data/orders/orders.csv --on customer_id --type left
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ customer_id customer_name customer_email customer_phone customer_address customer_city customer_state customer_zipcode customer_country order_id product_id quantity price order_date │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ CUSTOMER-001 John Doe john.doe@example… 555-1234 123 Main St Anytown CA 90210 USA null null null null null │
│ CUSTOMER-002 Jane Smith jane.smith@examp… 555-5678 456 Elm St Anytown CA 90210 USA null null null null null │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-001 PRODUCT-005 1 10.0 2022-01-01 │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-002 PRODUCT-005 2 20.0 2022-01-02 │
│ CUSTOMER-003 Michael Johnson michael.johnson@… 555-9876 789 Oak St Anytown CA 90210 USA ORDER-003 PRODUCT-003 3 30.0 2022-01-03 │
│ CUSTOMER-004 Emily Davis emily.davis@exam… 555-2468 321 Pine St Anytown CA 90210 USA ORDER-004 PRODUCT-002 4 40.0 2022-01-04 │
│ CUSTOMER-005 Robert Brown robert.brown@exa… 555-3698 654 Maple St Anytown CA 90210 USA ORDER-005 PRODUCT-001 5 50.0 2022-01-05 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Notes
- The join column must have the same data type in both tables
- Column names from both tables are preserved in the output
- If tables have overlapping column names (other than the join column), they will be prefixed to avoid conflicts
- For best performance, ensure your data is sorted by the join column
- Large joins may require significant memory