Aggregations
The aggregate
command allows you to group and summarize data using various statistical operations.
Basic Usage
rabbet aggregate <table> --by <columns> --with <aggregations>
Arguments
table
: Input CSV file or-
for stdin--by
: Columns to group by (comma-separated)--with
: Aggregation operations ascolumn=operation
pairs (comma-separated)--delimiter
: Input file delimiter (default:,
)
Available Operations
sum
: Sum of valuesmean
: Average of valuesmedian
: Median valuemin
: Minimum valuemax
: Maximum valuerange
: Difference between max and mincount
: Count of non-null valuesvariance
: Sample variancestddev
: Sample standard deviationfirst
: First value in grouplast
: Last value in groupdescribe
: Summary statistics as a string
For row counting operations, use _=count
, _=len
, or _=nrow
.
Examples
Simple Aggregation
Group by a single column and calculate one statistic:
Test aggregate command with simple group-by and single aggregation
$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean
╭────────────────────────────────────╮
│ Species PetalLength_mean │
╞════════════════════════════════════╡
│ Iris-setosa 1.464 │
│ Iris-versicolor 4.26 │
│ Iris-virginica 5.552 │
╰────────────────────────────────────╯
Multiple Aggregations
Calculate multiple statistics for different columns:
Test aggregate command with multiple aggregation operations on the same group
$ rabbet aggregate data/iris/iris.csv --by Species --with PetalLength=mean,PetalWidth=max,SepalLength=min
╭───────────────────────────────────────────────────────────────────────╮
│ Species PetalLength_mean PetalWidth_max SepalLength_min │
╞═══════════════════════════════════════════════════════════════════════╡
│ Iris-setosa 1.464 0.6 4.3 │
│ Iris-versicolor 4.26 1.8 4.9 │
│ Iris-virginica 5.552 2.5 4.9 │
╰───────────────────────────────────────────────────────────────────────╯
Multiple Group-By Columns
Group by multiple columns to create finer-grained aggregations:
Test aggregate command with multiple group-by columns
$ rabbet aggregate data/iris/iris.csv --by Species,PetalWidth --with SepalLength=max
╭────────────────────────────────────────────────╮
│ Species PetalWidth SepalLength_max │
╞════════════════════════════════════════════════╡
│ Iris-setosa 0.2 5.8 │
│ Iris-setosa 0.4 5.7 │
│ Iris-setosa 0.3 5.7 │
│ Iris-setosa 0.1 5.2 │
│ Iris-setosa 0.5 5.1 │
│ Iris-setosa 0.6 5.0 │
│ Iris-versicolor 1.4 7.0 │
│ Iris-versicolor 1.5 6.9 │
│ Iris-versicolor 1.3 6.6 │
│ Iris-versicolor 1.6 6.3 │
│ Iris-versicolor 1.0 6.0 │
│ Iris-versicolor 1.1 5.6 │
│ Iris-versicolor 1.8 5.9 │
│ … … … │
│ Iris-virginica 2.5 7.2 │
│ Iris-virginica 1.9 7.4 │
│ Iris-virginica 2.1 7.6 │
│ Iris-virginica 1.8 7.3 │
│ Iris-virginica 2.2 7.7 │
│ Iris-virginica 1.7 4.9 │
│ Iris-virginica 2.0 7.9 │
│ Iris-virginica 2.4 6.7 │
│ Iris-virginica 2.3 7.7 │
│ Iris-virginica 1.5 6.3 │
│ Iris-virginica 1.6 7.2 │
│ Iris-virginica 1.4 6.1 │
╰────────────────────────────────────────────────╯
Notes
- Column names in the output are automatically suffixed with the operation name (e.g.,
PetalLength_mean
) - When grouping by multiple columns, each unique combination creates a separate group
- Use
--by
without any columns to aggregate the entire dataset into a single row - Multiple operations can be applied to the same column by specifying it multiple times