Aggregates and annotate
Count, Sum, Avg, Max, Min on a QuerySet, with single-row aggregates and GROUP BY annotate.
Aggregates and annotate
Two QuerySet terminals plus a closed Aggregate enum cover the SQL standard aggregate functions for dashboards and reports.
The full surface lives in umbral::orm: Aggregate, QuerySet::aggregate, QuerySet::annotate.
The Aggregate enum
use umbral::orm::Aggregate; Aggregate::count(); // COUNT(*)Aggregate::count_col("body"); // COUNT(body), skips NULL bodiesAggregate::sum("view_count"); // SUM(view_count)Aggregate::avg("price"); // AVG(price)Aggregate::max("created_at"); // MAX(created_at)Aggregate::min("created_at"); // MIN(created_at)Each variant carries the source column it operates on (or * for count()). Aggregates are pure runtime constructions, validated against the model's FIELDS list before any SQL runs.
Single-row: .aggregate(...)
Runs the WHERE clause and computes one row of named aggregates:
let summary = Post::objects() .filter(post::PUBLISHED.eq(true)) .aggregate(&[ ("count", Aggregate::count()), ("views", Aggregate::sum("view_count")), ("avg_views", Aggregate::avg("view_count")), ]) .await?; // summary["count"] → JSON integer (always i64)// summary["views"] → JSON integer or null (SUM of empty set is NULL)// summary["avg_views"] → JSON float (always f64)Return shape is serde_json::Value::Object keyed by the supplied names. ORDER BY / LIMIT / OFFSET are dropped on the aggregate path (they make no sense without GROUP BY). The WHERE accumulated by upstream .filter() / .exclude() calls is preserved.
Return-type rules:
| Aggregate | Type |
|---|---|
Count(_) | i64 (integer) |
Avg(_) | f64 (float) |
Sum(col) / Max(col) / Min(col) | matches the source column's SqlType |
A SUM over zero rows is null in JSON; a COUNT(*) over zero rows is 0.
Grouped: .annotate(group_cols, ...)
Runs SELECT <group_cols>, AGG(...) AS name, ... GROUP BY <group_cols> with the accumulated WHERE clause:
let by_author = Post::objects() .filter(post::PUBLISHED.eq(true)) .annotate( &["author_id"], &[("count", Aggregate::count()), ("views", Aggregate::sum("view_count"))], ) .await?; // [// { "author_id": 1, "count": 12, "views": 350 },// { "author_id": 2, "count": 3, "views": 80 }// ]Return shape is Vec<serde_json::Value::Object>, one entry per group. Each row carries the group column values (typed per their declared SqlType) plus each named aggregate.
Group columns are validated against Model::FIELDS; aggregate source columns are validated the same way. An unknown column name fails loudly with sqlx::Error::Protocol before any SQL runs.
Composing with the rest of the QuerySet API
Both terminals respect the upstream chain (filter, exclude, even .on(&pool) for tests):
let summary = Post::objects() .filter(post::PUBLISHED.eq(true)) .exclude(post::AUTHOR_ID.eq(spam_user)) .on(&pool) .aggregate(&[("count", Aggregate::count())]) .await?;ORDER BY / LIMIT / OFFSET are intentionally dropped on the aggregate paths.
Related aggregates: .annotate_related(...) / .annotate_count(...)
Aggregate over a relation: count a parent's children, or average a column on the related rows, in the same query. The annotation is query-builder state: it renders as a correlated scalar subquery inside the one SELECT every terminal builds, so it stacks, composes with .filter / .order_by / .limit, and shows up in .to_sql() and .explain(). The work happens in one query.
The relation is the same ReverseSet name prefetch_related accepts (#[umbral(reverse_fk = "...")] on the model). Any Aggregate works, with non-count aggregates naming a column on the child model:
let rows = Plugin::objects() .filter(plugin::MODERATION.eq("approved")) .annotate_count("comment_set") // COUNT(*) → "comment_set_count" .annotate_related("rating_avg", "review_set", Aggregate::avg("rating")) .fetch_annotated() .await?; // Vec<(Plugin, Map<alias, value>)> for (plugin, anns) in rows { let notes = anns["comment_set_count"].as_i64().unwrap_or(0); let rating = anns["rating_avg"].as_f64(); // None when no reviews, never fabricated}What the database runs (see it yourself with .to_sql() / .explain()):
SELECT "plugin".*, (SELECT COUNT(*) FROM "plugin_comment" WHERE "plugin_comment"."plugin" = "plugin"."id") AS "comment_set_count", (SELECT AVG("rating") FROM "review" WHERE "review"."plugin" = "plugin"."id") AS "rating_avg"FROM "plugin"WHERE "moderation" = 'approved' AND "deleted_at" IS NULLParents with zero children report 0 for counts and NULL for the other aggregates. An unknown relation name fails loudly (listing the valid names) on fetch_annotated and explain. Plain fetch() on an annotated queryset still returns typed rows and simply doesn't surface the extra columns. Use fetch_annotated to read them.
Soft-deleted children are excluded automatically
When the child model is #[umbral(soft_delete)], annotate_count folds AND <child>.deleted_at IS NULL into the correlated subquery for you. A comment trashed via the soft-delete delete() path drops out of the count, so you never count rows the rest of the app treats as gone. No opt-in: the child's soft_delete flag rides through to the annotation.
Filtered counts: .annotate_count_where(...)
Count only the children that match a predicate. annotate_count_where::<C>(alias, relation, pred) counts only the children matching pred, where C is the child model so the predicate is typed against the child's columns. It renders into the same correlated subquery alongside the FK correlation and the automatic soft-delete filter:
use plugin_directory::models::{self as pd, plugin, plugin_comment}; let rows = pd::Plugin::objects() .filter(plugin::MODERATION.eq("approved")) .annotate_count_where::<pd::PluginComment>( "comment_set_count", "comment_set", plugin_comment::MODERATION.eq("visible"), // child-side predicate ) .fetch_annotated() .await?;// only VISIBLE, non-trashed comments contribute to "comment_set_count"Counting an M2M relation
annotate_count also resolves a many-to-many relation (#[umbral(m2m = "...")]). Instead of correlating against a child table, it counts rows of the junction table (<parent>_<field>, columns parent_id / child_id):
let rows = Post::objects() .annotate_count("tags") // COUNT(*) over the post_tags junction → "tags_count" .fetch_annotated() .await?;Annotating an undeclared relation (auto-discovery)
You don't have to declare a ReverseSet field to count a relation. When the relation name matches no declared #[umbral(reverse_fk = "...")] field, annotate_* scans the model registry for any child whose foreign key points back at this parent's table and resolves it for you: the same one-query correlated subquery, no field on the parent.
// Blog declares NO `entry_set: ReverseSet<Entry>` field. Entry just has// `#[umbral(...)] pub blog: ForeignKey<Blog>` plus an `amount: i64` column.let rows = Blog::objects() .annotate_count("entry") // COUNT(*) → "entry_count" .annotate_related("total", "entry", Aggregate::sum("amount")) .annotate_related("mean", "entry", Aggregate::avg("amount")) .annotate_related("lowest", "entry", Aggregate::min("amount")) .annotate_related("top", "entry", Aggregate::max("amount")) .fetch_annotated() .await?;All five aggregate kinds (count / sum / avg / min / max) work over an auto-discovered relation exactly as they do over a declared one, and an auto-discovered child that is #[umbral(soft_delete)] still has its trashed rows excluded.
The relation name is matched against each candidate child's conventional name forms: the child's table name, the child's struct name in snake_case and bare lowercase, and any of those with a _set suffix. So a child struct Entry with table ad_entry answers to "ad_entry", "entry", "entry_set", "ad_entry_set", and so on.
Precedence. A declared ReverseSet (or M2M) field always wins over auto-discovery. Declaring a field never changes existing behavior. Auto-discovery only runs as a fallback when no declared relation matches.
Ambiguity. If two candidates match (two child models that share a name form, or one child with two FK columns pointing at the same parent) the annotation is poisoned with a clear error rather than silently guessing:
umbral::orm::annotate_related: ambiguous reverse relation `order` on `Store` - candidates: [ad_order.buyer_store, ad_order.seller_store]; declare a `#[umbral(reverse_fk = "<fk>")] ReverseSet<Child>` field to disambiguateDeclaring a #[umbral(reverse_fk = "buyer_store")] ReverseSet<Order> field picks the FK explicitly and resolves it. An unknown relation name fails loudly too, and the error lists the auto-discoverable children so you can see the available names.
Pitfalls
Empty-set SUM is NULL, not 0. Dashboards that display summary["total"] should treat the JSON null as zero explicitly. COUNT(*) is always an integer, never NULL: use it when you need an unconditional integer return.
AVG is always a float. Even when averaging an INTEGER column on SQLite, the aggregate returns f64. JSON serialisation produces a number with a decimal even when the value is whole (e.g. 2.0).
SUM / MAX / MIN inherit the source column's SqlType. Averaging timestamps would yield a non-meaningful float. MAX("created_at") returns the timestamp as an ISO string on SQLite. The runtime trusts the declared SqlType to dispatch the right try_get.
Deferred features. StdDev, Variance, and window-function aggregates are not in v1. Extend the Aggregate enum when a concrete consumer needs them.
See also
- Querying: F-expressions, Q-objects,
exclude, projections, subqueries, JSON ops. - Relationships:
prefetch_relatedfor batched M2M loads alongside aggregate dashboards. - Transactions:
.atomic()for write paths. Aggregates are read-only and always single-statement.