Aggregations

The aggregate command allows you to group and summarize data using various statistical operations.

Basic Usage

rabbet aggregate <table> --by <columns> --with <aggregations>

Arguments

table: Input CSV file or - for stdin
--by: Columns to group by (comma-separated)
--with: Aggregation operations as column=operation pairs (comma-separated)
--delimiter: Input file delimiter (default: ,)

Available Operations

sum: Sum of values
mean: Average of values
median: Median value
min: Minimum value
max: Maximum value
range: Difference between max and min
count: Count of non-null values
variance: Sample variance
stddev: Sample standard deviation
first: First value in group
last: Last value in group
describe: Summary statistics as a string

For row counting operations, use _=count, _=len, or _=nrow.

Examples

Simple Aggregation

Group by a single column and calculate one statistic:

Test aggregate command with simple group-by and single aggregation

$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean
╭────────────────────────────────────╮
│ Species           PetalLength_mean │
╞════════════════════════════════════╡
│ Iris-setosa       1.464            │
│ Iris-versicolor   4.26             │
│ Iris-virginica    5.552            │
╰────────────────────────────────────╯

Multiple Aggregations

Calculate multiple statistics for different columns:

Test aggregate command with multiple aggregation operations on the same group

$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean,PetalWidth=max,SepalLength=min
╭───────────────────────────────────────────────────────────────────────╮
│ Species           PetalLength_mean   PetalWidth_max   SepalLength_min │
╞═══════════════════════════════════════════════════════════════════════╡
│ Iris-setosa       1.464              0.6              4.3             │
│ Iris-versicolor   4.26               1.8              4.9             │
│ Iris-virginica    5.552              2.5              4.9             │
╰───────────────────────────────────────────────────────────────────────╯

Multiple Group-By Columns

Group by multiple columns to create finer-grained aggregations:

Test aggregate command with multiple group-by columns

$ rabbet aggregate data/iris/iris.csv --by Species,PetalWidth --with SepalLength=max
╭────────────────────────────────────────────────╮
│ Species           PetalWidth   SepalLength_max │
╞════════════════════════════════════════════════╡
│ Iris-setosa       0.2          5.8             │
│ Iris-setosa       0.4          5.7             │
│ Iris-setosa       0.3          5.7             │
│ Iris-setosa       0.1          5.2             │
│ Iris-setosa       0.5          5.1             │
│ Iris-setosa       0.6          5.0             │
│ Iris-versicolor   1.4          7.0             │
│ Iris-versicolor   1.5          6.9             │
│ Iris-versicolor   1.3          6.6             │
│ Iris-versicolor   1.6          6.3             │
│ Iris-versicolor   1.0          6.0             │
│ Iris-versicolor   1.1          5.6             │
│ Iris-versicolor   1.8          5.9             │
│ …                 …            …               │
│ Iris-virginica    2.5          7.2             │
│ Iris-virginica    1.9          7.4             │
│ Iris-virginica    2.1          7.6             │
│ Iris-virginica    1.8          7.3             │
│ Iris-virginica    2.2          7.7             │
│ Iris-virginica    1.7          4.9             │
│ Iris-virginica    2.0          7.9             │
│ Iris-virginica    2.4          6.7             │
│ Iris-virginica    2.3          7.7             │
│ Iris-virginica    1.5          6.3             │
│ Iris-virginica    1.6          7.2             │
│ Iris-virginica    1.4          6.1             │
╰────────────────────────────────────────────────╯

Notes

Column names in the output are automatically suffixed with the operation name (e.g., PetalLength_mean)
When grouping by multiple columns, each unique combination creates a separate group
Use --by without any columns to aggregate the entire dataset into a single row
Multiple operations can be applied to the same column by specifying it multiple times

Keyboard shortcuts

Rabbet