This version is in beta. Some features may change before release.

Aggregates and annotate

Count, Sum, Avg, Max, Min on a QuerySet, with single-row aggregates and GROUP BY annotate.

Aggregates and annotate

Two QuerySet terminals plus a closed Aggregate enum cover the SQL standard aggregate functions for dashboards and reports.

The full surface lives in umbral::orm: Aggregate, QuerySet::aggregate, QuerySet::annotate.

The Aggregate enum

Code
rust
use umbral::orm::Aggregate;
 
Aggregate::count(); // COUNT(*)
Aggregate::count_col("body"); // COUNT(body), skips NULL bodies
Aggregate::sum("view_count"); // SUM(view_count)
Aggregate::avg("price"); // AVG(price)
Aggregate::max("created_at"); // MAX(created_at)
Aggregate::min("created_at"); // MIN(created_at)

Each variant carries the source column it operates on (or * for count()). Aggregates are pure runtime constructions, validated against the model's FIELDS list before any SQL runs.

Single-row: .aggregate(...)

Runs the WHERE clause and computes one row of named aggregates:

Code
rust
let summary = Post::objects()
.filter(post::PUBLISHED.eq(true))
.aggregate(&[
("count", Aggregate::count()),
("views", Aggregate::sum("view_count")),
("avg_views", Aggregate::avg("view_count")),
])
.await?;
 
// summary["count"] → JSON integer (always i64)
// summary["views"] → JSON integer or null (SUM of empty set is NULL)
// summary["avg_views"] → JSON float (always f64)

Return shape is serde_json::Value::Object keyed by the supplied names. ORDER BY / LIMIT / OFFSET are dropped on the aggregate path (they make no sense without GROUP BY). The WHERE accumulated by upstream .filter() / .exclude() calls is preserved.

Return-type rules:

AggregateType
Count(_)i64 (integer)
Avg(_)f64 (float)
Sum(col) / Max(col) / Min(col)matches the source column's SqlType

A SUM over zero rows is null in JSON; a COUNT(*) over zero rows is 0.

Grouped: .annotate(group_cols, ...)

Runs SELECT <group_cols>, AGG(...) AS name, ... GROUP BY <group_cols> with the accumulated WHERE clause:

Code
rust
let by_author = Post::objects()
.filter(post::PUBLISHED.eq(true))
.annotate(
&["author_id"],
&[("count", Aggregate::count()), ("views", Aggregate::sum("view_count"))],
)
.await?;
 
// [
// { "author_id": 1, "count": 12, "views": 350 },
// { "author_id": 2, "count": 3, "views": 80 }
// ]

Return shape is Vec<serde_json::Value::Object>, one entry per group. Each row carries the group column values (typed per their declared SqlType) plus each named aggregate.

Group columns are validated against Model::FIELDS; aggregate source columns are validated the same way. An unknown column name fails loudly with sqlx::Error::Protocol before any SQL runs.

Composing with the rest of the QuerySet API

Both terminals respect the upstream chain (filter, exclude, even .on(&pool) for tests):

Code
rust
let summary = Post::objects()
.filter(post::PUBLISHED.eq(true))
.exclude(post::AUTHOR_ID.eq(spam_user))
.on(&pool)
.aggregate(&[("count", Aggregate::count())])
.await?;

ORDER BY / LIMIT / OFFSET are intentionally dropped on the aggregate paths.

Aggregate over a relation: count a parent's children, or average a column on the related rows, in the same query. The annotation is query-builder state: it renders as a correlated scalar subquery inside the one SELECT every terminal builds, so it stacks, composes with .filter / .order_by / .limit, and shows up in .to_sql() and .explain(). The work happens in one query.

The relation is the same ReverseSet name prefetch_related accepts (#[umbral(reverse_fk = "...")] on the model). Any Aggregate works, with non-count aggregates naming a column on the child model:

Code
rust
let rows = Plugin::objects()
.filter(plugin::MODERATION.eq("approved"))
.annotate_count("comment_set") // COUNT(*) → "comment_set_count"
.annotate_related("rating_avg", "review_set", Aggregate::avg("rating"))
.fetch_annotated()
.await?; // Vec<(Plugin, Map<alias, value>)>
 
for (plugin, anns) in rows {
let notes = anns["comment_set_count"].as_i64().unwrap_or(0);
let rating = anns["rating_avg"].as_f64(); // None when no reviews, never fabricated
}

What the database runs (see it yourself with .to_sql() / .explain()):

Code
sql
SELECT "plugin".*,
(SELECT COUNT(*) FROM "plugin_comment"
WHERE "plugin_comment"."plugin" = "plugin"."id") AS "comment_set_count",
(SELECT AVG("rating") FROM "review"
WHERE "review"."plugin" = "plugin"."id") AS "rating_avg"
FROM "plugin"
WHERE "moderation" = 'approved' AND "deleted_at" IS NULL

Parents with zero children report 0 for counts and NULL for the other aggregates. An unknown relation name fails loudly (listing the valid names) on fetch_annotated and explain. Plain fetch() on an annotated queryset still returns typed rows and simply doesn't surface the extra columns. Use fetch_annotated to read them.

Soft-deleted children are excluded automatically

When the child model is #[umbral(soft_delete)], annotate_count folds AND <child>.deleted_at IS NULL into the correlated subquery for you. A comment trashed via the soft-delete delete() path drops out of the count, so you never count rows the rest of the app treats as gone. No opt-in: the child's soft_delete flag rides through to the annotation.

Filtered counts: .annotate_count_where(...)

Count only the children that match a predicate. annotate_count_where::<C>(alias, relation, pred) counts only the children matching pred, where C is the child model so the predicate is typed against the child's columns. It renders into the same correlated subquery alongside the FK correlation and the automatic soft-delete filter:

Code
rust
use plugin_directory::models::{self as pd, plugin, plugin_comment};
 
let rows = pd::Plugin::objects()
.filter(plugin::MODERATION.eq("approved"))
.annotate_count_where::<pd::PluginComment>(
"comment_set_count",
"comment_set",
plugin_comment::MODERATION.eq("visible"), // child-side predicate
)
.fetch_annotated()
.await?;
// only VISIBLE, non-trashed comments contribute to "comment_set_count"

Counting an M2M relation

annotate_count also resolves a many-to-many relation (#[umbral(m2m = "...")]). Instead of correlating against a child table, it counts rows of the junction table (<parent>_<field>, columns parent_id / child_id):

Code
rust
let rows = Post::objects()
.annotate_count("tags") // COUNT(*) over the post_tags junction → "tags_count"
.fetch_annotated()
.await?;

Annotating an undeclared relation (auto-discovery)

You don't have to declare a ReverseSet field to count a relation. When the relation name matches no declared #[umbral(reverse_fk = "...")] field, annotate_* scans the model registry for any child whose foreign key points back at this parent's table and resolves it for you: the same one-query correlated subquery, no field on the parent.

Code
rust
// Blog declares NO `entry_set: ReverseSet<Entry>` field. Entry just has
// `#[umbral(...)] pub blog: ForeignKey<Blog>` plus an `amount: i64` column.
let rows = Blog::objects()
.annotate_count("entry") // COUNT(*) → "entry_count"
.annotate_related("total", "entry", Aggregate::sum("amount"))
.annotate_related("mean", "entry", Aggregate::avg("amount"))
.annotate_related("lowest", "entry", Aggregate::min("amount"))
.annotate_related("top", "entry", Aggregate::max("amount"))
.fetch_annotated()
.await?;

All five aggregate kinds (count / sum / avg / min / max) work over an auto-discovered relation exactly as they do over a declared one, and an auto-discovered child that is #[umbral(soft_delete)] still has its trashed rows excluded.

The relation name is matched against each candidate child's conventional name forms: the child's table name, the child's struct name in snake_case and bare lowercase, and any of those with a _set suffix. So a child struct Entry with table ad_entry answers to "ad_entry", "entry", "entry_set", "ad_entry_set", and so on.

Precedence. A declared ReverseSet (or M2M) field always wins over auto-discovery. Declaring a field never changes existing behavior. Auto-discovery only runs as a fallback when no declared relation matches.

Ambiguity. If two candidates match (two child models that share a name form, or one child with two FK columns pointing at the same parent) the annotation is poisoned with a clear error rather than silently guessing:

Code
text
umbral::orm::annotate_related: ambiguous reverse relation `order` on `Store` - candidates: [ad_order.buyer_store, ad_order.seller_store]; declare a `#[umbral(reverse_fk = "<fk>")] ReverseSet<Child>` field to disambiguate

Declaring a #[umbral(reverse_fk = "buyer_store")] ReverseSet<Order> field picks the FK explicitly and resolves it. An unknown relation name fails loudly too, and the error lists the auto-discoverable children so you can see the available names.

Pitfalls

Empty-set SUM is NULL, not 0. Dashboards that display summary["total"] should treat the JSON null as zero explicitly. COUNT(*) is always an integer, never NULL: use it when you need an unconditional integer return.

AVG is always a float. Even when averaging an INTEGER column on SQLite, the aggregate returns f64. JSON serialisation produces a number with a decimal even when the value is whole (e.g. 2.0).

SUM / MAX / MIN inherit the source column's SqlType. Averaging timestamps would yield a non-meaningful float. MAX("created_at") returns the timestamp as an ISO string on SQLite. The runtime trusts the declared SqlType to dispatch the right try_get.

Deferred features. StdDev, Variance, and window-function aggregates are not in v1. Extend the Aggregate enum when a concrete consumer needs them.

See also

  • Querying: F-expressions, Q-objects, exclude, projections, subqueries, JSON ops.
  • Relationships: prefetch_related for batched M2M loads alongside aggregate dashboards.
  • Transactions: .atomic() for write paths. Aggregates are read-only and always single-statement.
ormaggregatescountsumgroup-byanalytics