Aggregates and annotate

Two QuerySet terminals plus a closed Aggregate enum cover the SQL standard aggregate functions for dashboards and reports.

The full surface lives in umbral::orm: Aggregate, QuerySet::aggregate, QuerySet::annotate.

The `Aggregate` enum

Code

rust

use umbral::orm::Aggregate;
 
Aggregate::count();              // COUNT(*)
Aggregate::count_col("body");    // COUNT(body), skips NULL bodies
Aggregate::sum("view_count");    // SUM(view_count)
Aggregate::avg("price");         // AVG(price)
Aggregate::max("created_at");    // MAX(created_at)
Aggregate::min("created_at");    // MIN(created_at)

Each variant carries the source column it operates on (or * for count()). Aggregates are pure runtime constructions, validated against the model's FIELDS list before any SQL runs.

Single-row: `.aggregate(...)`

Runs the WHERE clause and computes one row of named aggregates:

Code

rust

let summary = Post::objects()
    .filter(post::PUBLISHED.eq(true))
    .aggregate(&[
        ("count", Aggregate::count()),
        ("views", Aggregate::sum("view_count")),
        ("avg_views", Aggregate::avg("view_count")),
    ])
    .await?;
 
// summary["count"]    → JSON integer (always i64)
// summary["views"]    → JSON integer or null (SUM of empty set is NULL)
// summary["avg_views"] → JSON float (always f64)

Return shape is serde_json::Value::Object keyed by the supplied names. ORDER BY / LIMIT / OFFSET are dropped on the aggregate path (they make no sense without GROUP BY). The WHERE accumulated by upstream .filter() / .exclude() calls is preserved.

Return-type rules:

Aggregate	Type
`Count(_)`	i64 (integer)
`Avg(_)`	f64 (float)
`Sum(col)` / `Max(col)` / `Min(col)`	matches the source column's `SqlType`

A SUM over zero rows is null in JSON; a COUNT(*) over zero rows is 0.

Grouped: `.annotate(group_cols, ...)`

Runs SELECT <group_cols>, AGG(...) AS name, ... GROUP BY <group_cols> with the accumulated WHERE clause:

Code

rust

let by_author = Post::objects()
    .filter(post::PUBLISHED.eq(true))
    .annotate(
        &["author_id"],
        &[("count", Aggregate::count()), ("views", Aggregate::sum("view_count"))],
    )
    .await?;
 
// [
//   { "author_id": 1, "count": 12, "views": 350 },
//   { "author_id": 2, "count": 3,  "views": 80 }
// ]

Return shape is Vec<serde_json::Value::Object>, one entry per group. Each row carries the group column values (typed per their declared SqlType) plus each named aggregate.

Group columns are validated against Model::FIELDS; aggregate source columns are validated the same way. An unknown column name fails loudly with sqlx::Error::Protocol before any SQL runs.

Composing with the rest of the QuerySet API

Both terminals respect the upstream chain (filter, exclude, even .on(&pool) for tests):

Code

rust

let summary = Post::objects()
    .filter(post::PUBLISHED.eq(true))
    .exclude(post::AUTHOR_ID.eq(spam_user))
    .on(&pool)
    .aggregate(&[("count", Aggregate::count())])
    .await?;

ORDER BY / LIMIT / OFFSET are intentionally dropped on the aggregate paths.

Related aggregates: `.annotate_related(...)` / `.annotate_count(...)`

Aggregate over a relation: count a parent's children, or average a column on the related rows, in the same query. The annotation is query-builder state: it renders as a correlated scalar subquery inside the one SELECT every terminal builds, so it stacks, composes with .filter / .order_by / .limit, and shows up in .to_sql() and .explain(). The work happens in one query.

The relation is the same ReverseSet name prefetch_related accepts (#[umbral(reverse_fk = "...")] on the model). Any Aggregate works, with non-count aggregates naming a column on the child model:

Code

rust

let rows = Plugin::objects()
    .filter(plugin::MODERATION.eq("approved"))
    .annotate_count("comment_set")                                       // COUNT(*) → "comment_set_count"
    .annotate_related("rating_avg", "review_set", Aggregate::avg("rating"))
    .fetch_annotated()
    .await?;                                  // Vec<(Plugin, Map<alias, value>)>
 
for (plugin, anns) in rows {
    let notes = anns["comment_set_count"].as_i64().unwrap_or(0);
    let rating = anns["rating_avg"].as_f64();   // None when no reviews, never fabricated
}

What the database runs (see it yourself with .to_sql() / .explain()):

Code

sql

SELECT "plugin".*,
  (SELECT COUNT(*) FROM "plugin_comment"
    WHERE "plugin_comment"."plugin" = "plugin"."id") AS "comment_set_count",
  (SELECT AVG("rating") FROM "review"
    WHERE "review"."plugin" = "plugin"."id") AS "rating_avg"
FROM "plugin"
WHERE "moderation" = 'approved' AND "deleted_at" IS NULL

Parents with zero children report 0 for counts and NULL for the other aggregates. An unknown relation name fails loudly (listing the valid names) on fetch_annotated and explain. Plain fetch() on an annotated queryset still returns typed rows and simply doesn't surface the extra columns. Use fetch_annotated to read them.

Soft-deleted children are excluded automatically

When the child model is #[umbral(soft_delete)], annotate_count folds AND <child>.deleted_at IS NULL into the correlated subquery for you. A comment trashed via the soft-delete delete() path drops out of the count, so you never count rows the rest of the app treats as gone. No opt-in: the child's soft_delete flag rides through to the annotation.

Filtered counts: `.annotate_count_where(...)`

Count only the children that match a predicate. annotate_count_where::<C>(alias, relation, pred) counts only the children matching pred, where C is the child model so the predicate is typed against the child's columns. It renders into the same correlated subquery alongside the FK correlation and the automatic soft-delete filter:

Code

rust

use plugin_directory::models::{self as pd, plugin, plugin_comment};
 
let rows = pd::Plugin::objects()
    .filter(plugin::MODERATION.eq("approved"))
    .annotate_count_where::<pd::PluginComment>(
        "comment_set_count",
        "comment_set",
        plugin_comment::MODERATION.eq("visible"),   // child-side predicate
    )
    .fetch_annotated()
    .await?;
// only VISIBLE, non-trashed comments contribute to "comment_set_count"

Counting an `M2M` relation

annotate_count also resolves a many-to-many relation (#[umbral(m2m = "...")]). Instead of correlating against a child table, it counts rows of the junction table (<parent>_<field>, columns parent_id / child_id):

Code

rust

let rows = Post::objects()
    .annotate_count("tags")   // COUNT(*) over the post_tags junction → "tags_count"
    .fetch_annotated()
    .await?;

Annotating an undeclared relation (auto-discovery)

You don't have to declare a ReverseSet field to count a relation. When the relation name matches no declared #[umbral(reverse_fk = "...")] field, annotate_* scans the model registry for any child whose foreign key points back at this parent's table and resolves it for you: the same one-query correlated subquery, no field on the parent.

Code

rust

// Blog declares NO `entry_set: ReverseSet<Entry>` field. Entry just has
// `#[umbral(...)] pub blog: ForeignKey<Blog>` plus an `amount: i64` column.
let rows = Blog::objects()
    .annotate_count("entry")                                   // COUNT(*) → "entry_count"
    .annotate_related("total",  "entry", Aggregate::sum("amount"))
    .annotate_related("mean",   "entry", Aggregate::avg("amount"))
    .annotate_related("lowest", "entry", Aggregate::min("amount"))
    .annotate_related("top",    "entry", Aggregate::max("amount"))
    .fetch_annotated()
    .await?;

All five aggregate kinds (count / sum / avg / min / max) work over an auto-discovered relation exactly as they do over a declared one, and an auto-discovered child that is #[umbral(soft_delete)] still has its trashed rows excluded.

The relation name is matched against each candidate child's conventional name forms: the child's table name, the child's struct name in snake_case and bare lowercase, and any of those with a _set suffix. So a child struct Entry with table ad_entry answers to "ad_entry", "entry", "entry_set", "ad_entry_set", and so on.

Precedence. A declared ReverseSet (or M2M) field always wins over auto-discovery. Declaring a field never changes existing behavior. Auto-discovery only runs as a fallback when no declared relation matches.

Ambiguity. If two candidates match (two child models that share a name form, or one child with two FK columns pointing at the same parent) the annotation is poisoned with a clear error rather than silently guessing:

Code

text

umbral::orm::annotate_related: ambiguous reverse relation `order` on `Store` - candidates: [ad_order.buyer_store, ad_order.seller_store]; declare a `#[umbral(reverse_fk = "<fk>")] ReverseSet<Child>` field to disambiguate

Declaring a #[umbral(reverse_fk = "buyer_store")] ReverseSet<Order> field picks the FK explicitly and resolves it. An unknown relation name fails loudly too, and the error lists the auto-discoverable children so you can see the available names.

Pitfalls

Empty-set SUM is NULL, not 0. Dashboards that display summary["total"] should treat the JSON null as zero explicitly. COUNT(*) is always an integer, never NULL: use it when you need an unconditional integer return.

AVG is always a float. Even when averaging an INTEGER column on SQLite, the aggregate returns f64. JSON serialisation produces a number with a decimal even when the value is whole (e.g. 2.0).

SUM / MAX / MIN inherit the source column's SqlType. Averaging timestamps would yield a non-meaningful float. MAX("created_at") returns the timestamp as an ISO string on SQLite. The runtime trusts the declared SqlType to dispatch the right try_get.

Deferred features. StdDev, Variance, and window-function aggregates are not in v1. Extend the Aggregate enum when a concrete consumer needs them.

Aggregates and annotate

Aggregates and annotate

The Aggregate enum

Single-row: .aggregate(...)

Grouped: .annotate(group_cols, ...)

Composing with the rest of the QuerySet API

Related aggregates: .annotate_related(...) / .annotate_count(...)

Soft-deleted children are excluded automatically

Filtered counts: .annotate_count_where(...)

Counting an M2M relation

Annotating an undeclared relation (auto-discovery)

Pitfalls

See also

The `Aggregate` enum

Single-row: `.aggregate(...)`

Grouped: `.annotate(group_cols, ...)`

Related aggregates: `.annotate_related(...)` / `.annotate_count(...)`

Filtered counts: `.annotate_count_where(...)`

Counting an `M2M` relation