Reference¶

Every flag, marker, fixture, CLI command, and public function. The CLI and Python API below are rendered live from the source; the pytest surface (flags, marker, fixture, blob schema) is curated here. For the narrative versions see Quickstart, Choosing a metric, Grouping by dims, and Compare & gate CI.

pytest command-line flags¶

The plugin adds these to any pytest run (alongside pytest-benchmark's own flags). This table is generated from the plugin's own --help text, so it can't drift from the code:

Flag	Default	What
`--benchmark-memory`	off	Record peak memory (a memray pass) for every benchmark() call, not just the benchmark_memory fixture — no test changes. Off by default; the fixture is always measured, with or without this flag.
`--benchmark-memory-repeats=N`	—	Force a fixed number of memray passes per benchmark, suite-wide; the reported peak is the min across them. Overridden per-test by @pytest.mark.benchmem(repeats=N). Default: adaptive — run passes until the min floor settles (≥2, cap 10). Set this for a fixed, reproducible count (e.g. CI gating against a saved baseline).
`--benchmark-memory-warmup=N`	`1`	Untracked dry-runs of the action before measuring, suite-wide, to shed one-time costs (lazy imports, first-touch caches) so the measured passes aren't inflated by cold start. Overridden per-test by @pytest.mark.benchmem(warmup=N). Default: 1; set 0 to disable.
`--benchmark-memory-max-time=SECONDS`	—	Wall-clock budget for the adaptive memory passes (the analogue of --benchmark-max-time): caps how long adaptive sampling spends per benchmark. Ignored when --benchmark-memory-repeats forces a fixed count. Default: no time bound — the pass cap alone bounds it.
`--benchmark-memory-compare=REF`	off	Compare this run's peak memory against a prior saved run (a pytest-benchmark storage ref like 0001, or the latest if no value is given); folds base and delta-peak columns into the table.
`--benchmark-memory-compare-fail=FIELD:THRESHOLD`	—	Fail the session on a memory regression, e.g. peak:10%, peak:5MiB, allocations:5% (repeatable). Fields: peak, allocated, allocations, rss (rss needs isolated runs). Implies --benchmark-memory-compare.
`--benchmark-memory-profile=DIR`	—	Save the memray profile (.bin) into DIR — render later with `memray flamegraph` (or tree/summary). Scope follows the gate: WITH --benchmark-memory-compare-fail only the regressing ids, otherwise EVERY measured benchmark. Off by default (disk cost).
`--benchmark-memory-profile-native`	off	Capture native (C/C++/Rust) stacks in the kept profile, so the flamegraph attributes memory inside extension code (polars/numpy/solver bindings) instead of one opaque `??? at ???` bucket. Only affects --benchmark-memory-profile runs; opt-in (slower, bigger .bin). Per-test override: @pytest.mark.benchmem(profile_native=True). Off by default.
`--benchmark-memory-table`	`combined`	Layout for the memory metrics: combined (default) folds them into pytest-benchmark's timing table; split prints a separate memory table.
`--benchmark-memory-columns=peak,allocated,allocations,rss`	—	Which memory metrics the table shows, comma-separated and in order: peak, allocated, allocations, rss (rss only shows for isolated runs). Default: peak only.
`--benchmark-memory-stats=min,mean,max`	—	With repeats > 1, the stats each shown metric spreads into: min, mean, max, median, stddev. A single pass stays one column. Default: min,mean,max.

Timing regressions still use pytest-benchmark's own --benchmark-compare / --benchmark-compare-fail; the --benchmark-memory-compare* flags are the memory mirror. Their baseline comes from pytest-benchmark's storage (.benchmarks/) — save one first with --benchmark-save=NAME or --benchmark-autosave, or the gate finds nothing and passes. See Gate CI on a regression.

The `benchmem` marker¶

@pytest.mark.benchmem(repeats=3)
def test_build(benchmark_memory):
    ...

Kwarg	Default	What
`repeats`	auto	force a fixed `N` memray passes for this test (default: adaptive — see below). Every pass is kept (the blob stores the whole series); the headline `peak` is the minimum across them, and `--stat` reports any other. Overrides the suite-wide `--benchmark-memory-repeats`.
`warmup`	`1`	untracked dry-runs of the action before measuring, to shed one-time costs (lazy imports, first-touch caches). `0` disables. Overrides the suite-wide `--benchmark-memory-warmup`.
`isolate`	`False`	run each memray pass in a fresh process and also record whole-process resident memory as the `rss` metric — the physical/OOM-relevant peak memray's logical heap can't give. Per-test only (no suite-wide flag): `rss` is a whole-job capacity number, meaningful only for build+operate benchmarks, so you mark the specific ones. Needs a top-level, picklable benchmarked function (see the whole-job warning below).
`profile_native`	`False`	on the `--benchmark-memory-profile` path, capture native (C/C++/Rust) stacks in the kept `.bin`, so a flamegraph attributes extension-code memory (polars/numpy/solver bindings) instead of one opaque `??? at ???` bucket. Opt-in (slower, bigger `.bin`). Overrides the suite-wide `--benchmark-memory-profile-native`.
`max_peak`	—	fail the test if the headline `peak` exceeds this absolute ceiling. A size string (`"100MiB"`, units `B`/`KiB`/`MiB`/`GiB`) or a bare int (bytes).
`max_allocated`	—	as `max_peak`, on `allocated` (total bytes).
`max_allocations`	—	as above, on the `allocations` count — a bare number (no unit).

Isolated rss measures the whole job — build the state inside the callable

The rss metric (isolate=True) runs the action in a fresh, empty process. Two consequences:

The build must happen inside the measured callable, and the callable must be a top-level, picklable function. The child starts with nothing, so it must construct whatever it operates on; and spawn serializes the call with standard pickle, so a lambda or closure is rejected (we don't use cloudpickle) — pass a module-level function plus lightweight args.

# ✅ ships only the spec (~bytes); the child builds + writes cold = the whole job's RSS
benchmark_memory(build_and_write, spec, n)

# ❌ a lambda/closure — rejected; std pickle can't serialize it (even build-inside)
benchmark_memory(lambda: write(build(spec, n)))

# ❌ a top-level partial over a *pre-built* model pickles fine, but ships the model and
#    measures *deserializing* it, not building it — the build never re-runs in the child
model = build(spec, n)
benchmark_memory(partial(write, model))

You can't isolate a single sub-phase. Since the child must build before it can operate, isolated rss is a build-plus-operate capacity number by construction, never a per-phase figure (e.g. write-only). For per-phase memory, use the in-process peak metric, which can measure a write given an already-built model. So the rule is two-part: use a top-level function (no lambdas), and don't pass heavy pre-built state — build it inside.

Absolute ceilings — `max_peak` / `max_allocated` / `max_allocations`¶

@pytest.mark.benchmem(max_peak="100MiB", max_allocations=5000)
def test_build(benchmark_memory):
    benchmark_memory(build_model, 1000)

A baseline-free guardrail: the test fails if the measured metric exceeds the ceiling (test_build: peak 117 MiB exceeds max_peak 100 MiB). Thresholds are absolute only — there's no saved run to take a percent of; for relative gating against a prior run use --benchmark-memory-compare-fail or benchmem compare --fail-on. A ceiling is a worst-case budget, so with repeats > 1 (including adaptive sampling) the gate reads the worst pass — not the headline min — and fails if any pass breaches it; the two coincide for a single pass. The ceiling is enforced wherever memory is measured — the benchmark_memory fixture and the --benchmark-memory patch — but a plain benchmark() call without --benchmark-memory measures no memory, so the marker is a no-op there.

Scope: the benchmarked action only

This gates the benchmarked action only (the isolated call pytest-benchmem measures), not the whole test. For a whole-test limit or leak check, that's pytest-memray's limit_memory / limit_leaks — see the README's "With pytest-memray".

How many passes? By default pytest-benchmem samples adaptively — after an untracked warmup run, it runs the memray pass until the min floor settles (≥2 passes; capped at 10, or a --benchmark-memory-max-time budget). Deterministic code settles in ~3 passes; noisy code runs more. Set repeats=N (marker) or --benchmark-memory-repeats=N (suite) to force a fixed, reproducible count — what CI gating against a saved baseline wants. Full rationale and the noisy-workload guidance are in the guide: Repeats & adaptive sampling.

The `benchmark_memory` fixture¶

Depends on pytest-benchmark's benchmark fixture; measures peak in a separate untimed pass, then times via pytest-benchmark.

Order — memory first (cold), then timing

Every call form runs the memray pass first, then pytest-benchmark's timing (calibration + all rounds). This matters: timing runs the function thousands of times, which grows and fragments the allocator's arenas — so measuring memory after timing would report the warm plateau, not the fresh-process floor the headline min is meant to be. Memory-first measures the cold cost (the warmup pass still sheds the one-time cold-start within it); timing then runs cleanly, with no memray hooks active. This holds for __call__, pedantic, and the --benchmark-memory patch alike. The standalone measure_peak / measure_memory have no timing phase at all; warmup=0 skips the warmup, repeats=N forces a fixed count.

Call formPedantic form

Times then measures function(*args, **kwargs):

benchmark_memory(sorted, data)

Explicit control, like pytest-benchmark's pedantic plus a memory pass:

benchmark_memory.pedantic(target, args=(), kwargs=None, setup=None,
                          rounds=1, warmup_rounds=0, iterations=1)

setup — a callable run untracked before each measured call; if it returns (args, kwargs), those supply the call's arguments. Used for both the timed rounds and each (adaptive) memory sample — one setup rebuilds fresh state for both — so a stateful action's memory samples stay independent. The same applies to benchmark.pedantic(setup=…) under --benchmark-memory: no extra changes.
rounds, warmup_rounds, iterations — as in pytest-benchmark.

Mostly memory, little timing? There's no memory-only switch — the entry rides pytest-benchmark's timing. To trim it: --benchmark-min-rounds=1 --benchmark-max-time=0 (no test changes), or pedantic(rounds=1, warmup_rounds=0) for a single call. For pure memory outside pytest, use measure_peak / measure_memory.

Attributes (available after a call):

Attribute	What
`extra_info`	pytest-benchmark's per-benchmark dict. Set scalars here to attach analysis dims; the memory blob lands here under the `benchmem` key.
`peak_bytes`	peak memory (bytes) from the last call, or `None` before any call.
`result`	the full `MemoryResult` from the last call, or `None`.

The `extra_info.benchmem` blob¶

Each measured benchmark stores this dict under extra_info["benchmem"] — three flat per-repeat series, one entry per memray pass. Every reported number (headline peak = min, any --stat) derives from these on read:

Key	What
`peak_bytes`	per-repeat high-water of live bytes — the `peak` metric (headline = min)
`allocations`	per-repeat allocation count — the `allocations` metric
`total_bytes`	per-repeat total bytes allocated — the `allocated` metric (churn `peak` hides)
`rss_bytes`	per-repeat whole-process resident high-water (`ru_maxrss`) — the `rss` metric. Only present under `isolate=True` (each pass in a fresh process); absent otherwise.

{"peak_bytes": [800000, 805000], "allocations": [12, 12], "total_bytes": [800000, 805000]}

See Choosing a metric for when to reach for each, and --stat for distributions.

CLI — `benchmem`¶

Installed with pytest-benchmem[plot]. The full command tree and every option, captured live from the typer app as it actually renders in a terminal:

benchmem --help
Usage: benchmem [OPTIONS] COMMAND [ARGS]...

pytest-benchmem — plot and compare benchmark runs.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the │
│ installation. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────╮
│ plot Render an interactive plotly view from one or more pytest-benchmark runs. │
│ compare Print a per-id table for one run, or compare two or more (and optionally gate CI). │
│ sweep Run a benchmark suite across several installed versions of a package. │
│ flamegraph Render a kept memory profile in one step — resolve the ``.bin`` for a test and run │
│ memray. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

benchmem plot --help
Usage: benchmem plot [OPTIONS] RUNS...

Render an interactive plotly view from one or more pytest-benchmark runs.

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
│ * runs... PATH pytest-benchmark JSON file(s). [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --columns [time|peak|allocated|allocation Metric to plot: time | peak | │
│ s|rss] allocated | allocations | rss │
│ (rss = isolated runs only). One │
│ per figure (a plot has a single │
│ value axis) — same flag as │
│ `compare`; the spread shows as │
│ whiskers via --band. │
│ [default: time] │
│ --view TEXT compare | scatter | sweep | │
│ scaling (default: by count). │
│ --facet TEXT Dim to facet by. │
│ --x TEXT scaling: dim for the x-axis. │
│ --clip FLOAT Clamp the colour scale. │
│ --where TEXT Filter rows by dim: KEY=VALUE │
│ (repeatable, AND-combined). │
│ --free-axes [x|y|both] Free facet axes: x | y | both │
│ (needs --facet). │
│ --band [auto|minmax|none] scaling: spread whiskers on │
│ memory metrics — auto | minmax | │
│ none. │
│ [default: auto] │
│ --label -l TEXT Series label per run, in order │
│ (repeat). Default: stem. │
│ --output -o PATH HTML out. │
│ --open --no-open [default: no-open] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

benchmem compare --help
Usage: benchmem compare [OPTIONS] RUNS...

Print a per-id table for one run, or compare two or more (and optionally gate CI).

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
│ * runs... PATH One or more pytest-benchmark runs, oldest → newest. One prints a plain │
│ table; two or more compare (a sweep is N). │
│ [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --columns TEXT Comma list of metrics: time | peak | allocated | allocations | rss (rss │
│ = isolated runs only; e.g. peak or time,peak,rss). Default: time,peak. │
│ Each is shown across every --stat; a metric absent from every run is │
│ dropped. │
│ --group-by TEXT Group rows into sub-tables: fullname | name | func | group | module | │
│ class | param:NAME (comma-composable). │
│ [default: fullname] │
│ --stat TEXT Which stat column(s) per metric: min | max | mean | median | stddev, or │
│ all (the default) for the full spread side by side. │
│ --sort TEXT Row order: name (id) | value (largest in the last run) | change. │
│ [default: name] │
│ --csv PATH Also write the raw (unscaled) comparison to this CSV file. │
│ --fail-on TEXT Exit non-zero on a regression of the first run vs the last. │
│ FIELD:THRESHOLD, repeatable — e.g. --fail-on peak:10% --fail-on │
│ peak:5MiB --fail-on rss:10% (rss gates only isolated runs). │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

benchmem sweep --help
Usage: benchmem sweep [OPTIONS] PACKAGE VERSIONS...

Run a benchmark suite across several installed versions of a package.

Provisions one fresh uv venv per version, runs 'pytest <suite> --benchmark-only'
in each writing <out>/<version>.json, then prints the next step. --memory adds
the memory pass; forward any other pytest flag with --pytest-arg, e.g.
benchmem sweep mypkg 1.2.0 1.3.0 --suite benchmarks/ --memory --pytest-arg=-k.

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
│ * package TEXT Package under test; each plain version installs `<package>==<v>`. │
│ [required] │
│ * versions... TEXT Versions or pip specs to sweep, e.g. 1.2.0 1.3.0 │
│ git+https://github.com/me/pkg@main. │
│ [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ * --suite PATH Benchmark suite (dir or file) to run in each version's │
│ venv. │
│ [required] │
│ --out PATH Directory for the per-version JSON runs. │
│ [default: .benchmarks/sweep] │
│ --memory --no-memory Add --benchmark-memory to each pytest run. │
│ [default: no-memory] │
│ --pytest-arg TEXT Arg forwarded to pytest, one token each, repeatable │
│ (e.g. --pytest-arg=-k). │
│ --pin TEXT Extra pip spec installed alongside (repeatable). │
│ --as-of TEXT YYYY-MM-DD for uv --exclude-newer (reproducible │
│ resolve). │
│ --import-check TEXT Module asserted to resolve to the venv (isolation │
│ preflight). │
│ --copy-dir PATH Directory copied into each venv's cwd (the suite │
│ imports from here). │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

benchmem flamegraph --help
Usage: benchmem flamegraph [OPTIONS] PROFILE_DIR [TEST_ID]

Render a kept memory profile in one step — resolve the ``.bin`` for a test and run memray.

Closes the "regressed → *where*?" loop after ``--benchmark-memory-profile``: instead of
finding the right ``.bin`` and remembering the memray subcommand, point at the profile dir
and name the test (or ``--worst peak`` to auto-pick the heaviest). Defaults to an HTML
flamegraph written next to the ``.bin``; ``--report tree|summary|stats`` prints to the
terminal instead.

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
│ * profile_dir PATH Directory of kept .bin profiles (--benchmark-memory-profile). │
│ [required] │
│ [test_id] TEXT Test id (exact, or a unique substring) to render; omit with --worst. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --worst TEXT Auto-pick the heaviest: peak | allocated | allocations │
│ --report TEXT memray reporter: flamegraph | table | tree | summary | stats │
│ [default: flamegraph] │
│ --native --no-native Require the profile to carry native traces (captured via │
│ --benchmark-memory-profile-native); error if it doesn't. │
│ [default: no-native] │
│ --output -o PATH HTML out path (default: next to the .bin). │
│ --open --no-open Open the rendered HTML. [default: no-open] │
│ --force -f Overwrite an existing render. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Public Python API¶

Light to import — pytest_benchmem re-exports only the engine and the readers; pytest_benchmem.plotting pulls plotly and pytest_benchmem.sweep shells to uv, so import those submodules directly.

Engine¶

measure_peak ¶

measure_peak(
    action: Action, repeats: int | None = None
) -> int

Run action() under memray.Tracker and return peak bytes.

The bare one-liner for a REPL or notebook; :func:measure_memory returns the full result (allocation count, spread). repeats behaves as there — None (default) samples adaptively, an int forces a fixed pass count.

Parameters:

Name	Type	Description	Default
`action`	`Action`	The zero-argument callable to measure.	required
`repeats`	`int \| None`	Fixed pass count, or `None` to sample adaptively.	`None`

Returns:

Type	Description
`int`	Peak bytes (the headline `peak` = min across passes, after warmup).

measure_memory ¶

measure_memory(
    action: Action,
    repeats: int | None = None,
    *,
    warmup: int = _DEFAULT_WARMUP,
    isolate: bool = False,
    max_time: float | None = None,
    min_passes: int = _ADAPTIVE_MIN_PASSES,
    max_passes: int = _ADAPTIVE_MAX_PASSES,
    patience: int = _ADAPTIVE_PATIENCE,
    keep_bin: Path | None = None,
    native: bool = False,
    setup: Action | None = None,
) -> MemoryResult

Run action() under memray.Tracker → :class:MemoryResult, one pass per repeat.

warmup untracked dry-runs run first to shed one-time costs; then each measured pass gets a fresh tracker. The headline is the min across passes (see :class:MemoryResult); every pass's :class:Measurement is kept for spread stats.

With isolate=True each measured pass runs in a fresh spawned process (each warming itself), and that child's whole-process resident high-water (ru_maxrss) is recorded as :attr:Measurement.rss_bytes — a physical-memory reading attributable to the action, which an in-process pass can't give. The action (and setup) must be picklable (a top-level callable, not a lambda/closure); keep_bin is ignored in this mode.

Two modes, by repeats:

repeats=N (an int) — run exactly N passes. Fixed and reproducible; what CI gating and saved-baseline comparisons want.
repeats=None (default) — sample adaptively: keep running passes until the min stops moving (no new low for patience passes), bounded by min_passes (≥2), max_passes, and an optional max_time budget. Deterministic code settles in a few passes; noisy code runs more.

Parameters:

Name	Type	Description	Default
`action`	`Action`	The zero-argument callable to measure.	required
`repeats`	`int \| None`	Fixed pass count, or `None` to sample adaptively.	`None`
`warmup`	`int`	Untracked dry-runs (`setup` + `action`) before measuring; `0` disables.	`_DEFAULT_WARMUP`
`isolate`	`bool`	Run each pass in a fresh spawned process and record its `ru_maxrss` as :attr:`Measurement.rss_bytes`. Requires a picklable `action`/`setup`.	`False`
`max_time`	`float \| None`	Wall-clock budget (seconds) for adaptive sampling; `None` = no time bound.	`None`
`min_passes`	`int`	Minimum passes when sampling adaptively.	`_ADAPTIVE_MIN_PASSES`
`max_passes`	`int`	Hard ceiling on passes when sampling adaptively.	`_ADAPTIVE_MAX_PASSES`
`patience`	`int`	Stop adaptive sampling after this many consecutive passes with no new min.	`_ADAPTIVE_PATIENCE`
`keep_bin`	`Path \| None`	If set, the first pass's profile `.bin` is retained here (for a later :func:`render_flamegraph`); the rest still go to temp dirs and are discarded.	`None`
`native`	`bool`	Capture native (C/C++/Rust) stacks in the kept `.bin` so a flamegraph can attribute memory inside extension code instead of an opaque native bucket. Costs runtime and disk; only meaningful with `keep_bin` (ignored otherwise).	`False`
`setup`	`Action \| None`	Optional zero-arg callable run untracked before each pass (and each warmup run) — its allocations are not measured. Use it to rebuild fresh state so a stateful `action` (one that caches on or mutates a carried-over object) gives independent samples instead of a decaying/accumulating series. Mirrors pytest-benchmark's `pedantic(setup=...)`.	`None`

Returns:

Name	Type	Description
`A`	`MemoryResult`	class:`MemoryResult` over every measured pass (warmup runs are not retained).

MemoryResult `dataclass` ¶

A memory measurement across repeats passes, derived from the per-repeat samples.

The per-repeat :attr:samples are the single source of truth — that's all the blob stores (the series); everything else is derived from them on read.

The headline :attr:peak_bytes is the minimum peak across passes — the fresh-process floor, unbiased by the in-process warm plateau (repeated runs fragment/grow arenas and allocate more) that a central stat would report. :attr:allocations / :attr:total_bytes come from that same min-peak run (a coherent snapshot); :attr:peak_bytes_max is the worst peak, so the spread is visible. A warm-plateau / steady-state read is available via the mean / median --stat. A single pass collapses all of these to its own values.

repeats `property` ¶

repeats: int

How many passes were measured.

representative `property` ¶

representative: Measurement

The min-peak run — the one the headline peak/allocations/total_bytes come from.

peak_bytes `property` ¶

peak_bytes: int

The headline peak — the minimum high-water across passes (the fresh-process floor).

peak_bytes_max `property` ¶

peak_bytes_max: int

The worst peak across repeats (equals :attr:peak_bytes with one repeat).

allocations `property` ¶

allocations: int

Allocation count from the representative (min-peak) run.

total_bytes `property` ¶

total_bytes: int

Total bytes allocated by the representative (min-peak) run.

rss_bytes `property` ¶

rss_bytes: int | None

Headline whole-process RSS — the minimum ru_maxrss across isolated passes (the cold floor, like :attr:peak_bytes), or None if memory wasn't measured in isolation (in-process has no attributable process-global RSS).

series ¶

series(field: str) -> list[Any]

The per-repeat values of one series field (SERIES_FIELDS or optional).

as_dict ¶

as_dict() -> dict[str, Any]

The JSON blob stored under pytest-benchmark extra_info["benchmem"].

The three core per-repeat series, flat, plus any :data:OPTIONAL_SERIES_FIELDS that were measured (all-or-nothing per result). No denormalized scalars and no repeats (it's len of any series). Everything else derives on read.

from_blob `classmethod` ¶

from_blob(blob: Mapping[str, Any]) -> MemoryResult

Rebuild from a blob's per-repeat series. Core columns are required; any :data:OPTIONAL_SERIES_FIELDS are read when present (else left None).

Measurement `dataclass` ¶

One repeat's raw numbers — memray's peak high-water, allocation count, and total bytes allocated (cumulative churn, incl. temporaries GC later frees), plus an optional whole-process resident high-water.

rss_bytes is getrusage's ru_maxrss from an isolated pass (a fresh child process); None in-process, where a process-global RSS isn't attributable to the action.

Readers & loader¶

from_pytest_benchmark reads timing (seconds, from stats); memory_from_pytest_benchmark reads memory (bytes, from extra_info.benchmem). load_samples is the unified reader; load_long_df stacks runs into the tidy frame the plots pivot. discover_runs collects saved runs from pytest-benchmark's .benchmarks/ storage, so you can hand the readers a directory instead of listing files.

from_pytest_benchmark ¶

from_pytest_benchmark(
    path: str | Path, *, metric: str = "min"
) -> tuple[str, list[Sample], str]

Read timing out of a pytest-benchmark file → (label, samples, "s").

Dims come from each benchmark's parametrize params and extra_info, plus the structural node.* dims (see :func:_node_dims).

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	A pytest-benchmark JSON file.	required
`metric`	`str`	Which pytest-benchmark stat to read (`min` / `median` / …).	`'min'`

Returns:

Type	Description
`str`	`(label, samples, unit)` — the run label, one :class:`Sample` per benchmark,
`list[Sample]`	and the unit (`"s"`).

memory_from_pytest_benchmark ¶

memory_from_pytest_benchmark(
    path: str | Path,
    *,
    field: str = "peak_bytes",
    reduce: Callable[[list[float]], float] | None = None,
) -> tuple[str, list[Sample], str]

Read memory out of a pytest-benchmark file → (label, samples, unit).

The benchmark_memory fixture stores each run's memory blob under extra_info["benchmem"] (a flat per-repeat series per field), keyed by the same benchmark id pytest-benchmark uses. Benchmarks lacking the blob (timing-only tests) are skipped. Dims come from parametrize params and extra_info, plus the structural node.* dims (see :func:_node_dims).

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	A pytest-benchmark JSON file.	required
`field`	`str`	Which series to read — `peak_bytes` (unit `B`), `allocations` (count), or `total_bytes`.	`'peak_bytes'`
`reduce`	`Callable[[list[float]], float] \| None`	Reduce the per-repeat series to one scalar. Default (`None`) derives the headline (peak = min, allocations/total_bytes = the min-peak run); pass a callable for a distribution stat over the series instead.	`None`

Returns:

Type	Description
`str`	`(label, samples, unit)` — the run label, one :class:`Sample` per benchmark
`list[Sample]`	with the blob, and the unit (`B` or count).

load_samples ¶

load_samples(
    path: str | Path,
    *,
    metric: Metric = "time",
    stat: str | None = None,
) -> tuple[str, list[Sample], str]

Read one pytest-benchmark file for the chosen metric → (label, samples, unit).

The unified reader over :func:from_pytest_benchmark (timing) and :func:memory_from_pytest_benchmark (memory).

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	A pytest-benchmark JSON file.	required
`metric`	`Metric`	Which metric to read (`time` / `peak` / `allocated` / `allocations`).	`'time'`
`stat`	`str \| None`	Distribution stat over the metric's per-repeat series (`min` / `max` / `mean` / `median` / `stddev`); `None` reads the headline scalar. For `time` it selects the pytest-benchmark stat (default `min`).	`None`

Returns:

Type	Description
`tuple[str, list[Sample], str]`	`(label, samples, unit)` — the run label, its samples, and the metric's unit.

load_long_df ¶

load_long_df(
    runs: str | Path | Sequence[str | Path],
    *,
    metric: Metric = "time",
    stat: str | None = None,
    labels: Sequence[str] | None = None,
) -> tuple[pd.DataFrame, str]

Stack pytest-benchmark files (one path or a sequence) into one long frame → (df, unit).

One row per (run, id) for the chosen metric. Columns: snapshot (the series/version label), id, value, then one column per dim key seen (missing dims are NaN). Every plot view pivots this frame.

Parameters:

Name	Type	Description	Default
`runs`	`str \| Path \| Sequence[str \| Path]`	One path or a sequence of pytest-benchmark JSON files.	required
`metric`	`Metric`	Which metric to read (`time` / `peak` / `allocated` / `allocations`).	`'time'`
`stat`	`str \| None`	Distribution stat over the per-repeat series; `None` reads the headline scalar.	`None`
`labels`	`Sequence[str] \| None`	Overrides the `snapshot` label per run (one per path, same order), decoupling the display name from the filename; defaults to each file's stem.	`None`

Returns:

Type	Description
`tuple[DataFrame, str]`	`(df, unit)` — the long-form frame and the metric's unit.

discover_runs ¶

discover_runs(
    root: str | Path = ".benchmarks",
) -> list[Path]

Return pytest-benchmark JSON files under root (for CLI suggestions).

Parameters:

Name	Type	Description	Default
`root`	`str \| Path`	Directory to search (default: pytest-benchmark's `.benchmarks` store).	`'.benchmarks'`

Returns:

Type	Description
`list[Path]`	The JSON file paths found under `root`.

Sample ¶

Bases: NamedTuple

One measured result: an opaque id, a value, and analysis dims.

Plotting — `pytest_benchmem.plotting`¶

Every plot_* returns (figure, n_ids). snapshots is a list of run JSON paths; labels names the series per run (defaults to the file stems) — the API behind plot's -l/--label. plot_compare's sort is "absolute" (native units) or "relative" (percent).

plot_scaling ¶

plot_scaling(
    snapshots: Snapshots,
    *,
    metric: Metric = "time",
    x: str | None = None,
    color: str | None = None,
    facet: str | None = None,
    log: bool | Literal["auto"] = "auto",
    band: Literal["auto", "minmax", "none"] = "auto",
    where: Mapping[str, str] | None = None,
    free_axes: FreeAxes | None = None,
    labels: Sequence[str] | None = None,
) -> tuple[Figure, int]

Cost vs a numeric dim, coloured/faceted by other dims.

x/color/facet default to inference from the dims (the lone numeric dim → x); pass them to override.

Parameters:

Name	Type	Description	Default
`snapshots`	`Snapshots`	Run JSON path(s).	required
`metric`	`Metric`	Which metric to plot (`time` / `peak` / `allocated` / `allocations`).	`'time'`
`x`	`str \| None`	Dim for the x-axis (default: the lone numeric dim).	`None`
`color`	`str \| None`	Dim to colour by (default: inferred).	`None`
`facet`	`str \| None`	Dim to split into subplots (default: inferred).	`None`
`log`	`bool \| Literal['auto']`	`"auto"` log-scales when x is numeric and strictly positive; or force with a bool.	`'auto'`
`band`	`Literal['auto', 'minmax', 'none']`	Spread whiskers (`min`…`max` of each point's per-pass series) on memory metrics — `"auto"` shows them where there's spread, `"minmax"` forces them on, `"none"` off. The line stays the headline (the min floor); whiskers reach up to the worst pass. Ignored for `time`.	`'auto'`
`where`	`Mapping[str, str] \| None`	Keep only rows matching these `dim=value` pairs.	`None`
`free_axes`	`FreeAxes \| None`	`"x"` / `"y"` / `"both"` — unmatch a faceted axis from the shared default (`"x"` for incommensurable sweeps, `"y"` when facets have different cost scales, e.g. per function).	`None`
`labels`	`Sequence[str] \| None`	Names the snapshot in the title (default: file stem).	`None`

Returns:

Type	Description
`tuple[Figure, int]`	`(figure, n_ids)` — the plotly figure and the number of ids plotted.

plot_scatter ¶

plot_scatter(
    snapshots: Snapshots,
    *,
    metric: Metric = "time",
    facet: str | None = None,
    clip: float | None = None,
    where: Mapping[str, str] | None = None,
    free_axes: FreeAxes | None = None,
    labels: Sequence[str] | None = None,
) -> tuple[Figure, int]

Baseline cost (log-x) vs candidate/baseline ratio (log-y).

Top-right = slow and slower (the regressed corner). The first snapshot is the baseline; with 3+, the rest animate. Colour encodes the absolute Δ.

Parameters:

Name	Type	Description	Default
`snapshots`	`Snapshots`	Run JSON path(s); the first is the baseline, extras animate.	required
`metric`	`Metric`	Which metric to plot (`time` / `peak` / `allocated` / `allocations`).	`'time'`
`facet`	`str \| None`	Dim to split into subplots.	`None`
`clip`	`float \| None`	Clamp the colour scale (default p95).	`None`
`where`	`Mapping[str, str] \| None`	Keep only rows matching these `dim=value` pairs.	`None`
`free_axes`	`FreeAxes \| None`	Give each facet its own axes instead of sharing.	`None`
`labels`	`Sequence[str] \| None`	Series names per run (default: file stems).	`None`

Returns:

Type	Description
`tuple[Figure, int]`	`(figure, n_ids)` — the plotly figure and the number of ids plotted.

plot_compare ¶

plot_compare(
    snapshots: Snapshots,
    *,
    metric: Metric = "time",
    sort: SortMode = "absolute",
    facet: str | None = None,
    clip: float | None = None,
    where: Mapping[str, str] | None = None,
    free_axes: FreeAxes | None = None,
    labels: Sequence[str] | None = None,
) -> tuple[Figure, int]

Bar chart of per-id delta, sorted by the chosen Δ (biggest regressions on top).

The first two snapshots are compared; the first is the baseline.

Parameters:

Name	Type	Description	Default
`snapshots`	`Snapshots`	Run JSON path(s); only the first two are used.	required
`metric`	`Metric`	Which metric to plot (`time` / `peak` / `allocated` / `allocations`).	`'time'`
`sort`	`SortMode`	`absolute` plots `b - a` in the native unit; `relative` plots percent change.	`'absolute'`
`facet`	`str \| None`	Dim to split into subplots.	`None`
`clip`	`float \| None`	Clamp the colour scale (default symmetric p95).	`None`
`where`	`Mapping[str, str] \| None`	Keep only rows matching these `dim=value` pairs.	`None`
`free_axes`	`FreeAxes \| None`	Give each facet its own axes instead of sharing.	`None`
`labels`	`Sequence[str] \| None`	Series names for the two runs (default: file stems).	`None`

Returns:

Type	Description
`tuple[Figure, int]`	`(figure, n_ids)` — the plotly figure and the number of ids plotted.

plot_sweep ¶

plot_sweep(
    snapshots: Snapshots,
    *,
    metric: Metric = "time",
    clip: float | None = None,
    where: Mapping[str, str] | None = None,
    labels: Sequence[str] | None = None,
) -> tuple[Figure, int]

Heatmap of per-id fold-change (log2 ratio) vs the first snapshot.

Parameters:

Name	Type	Description	Default
`snapshots`	`Snapshots`	Run JSON paths; columns in order, the first is the reference.	required
`metric`	`Metric`	Which metric to plot (`time` / `peak` / `allocated` / `allocations`).	`'time'`
`clip`	`float \| None`	Clamp the colour scale.	`None`
`where`	`Mapping[str, str] \| None`	Keep only rows matching these `dim=value` pairs.	`None`
`labels`	`Sequence[str] \| None`	Column (version) names (default: file stems).	`None`

Returns:

Type	Description
`tuple[Figure, int]`	`(figure, n_ids)` — the plotly figure and the number of ids plotted.

Sweeps — `pytest_benchmem.sweep`¶

See Cross-version sweeps for the narrative, the Venv object, and the provision parameters.

sweep ¶

sweep(
    versions: Sequence[str],
    run: Callable[[Venv], None],
    **provision_kwargs: object,
) -> list[str]

Provision a venv per version and call run(venv) in each.

run does whatever the consumer needs (invoke pytest / a memory command with venv.python and cwd=venv.cwd). Returns the list of versions that failed to provision.

Reference¶

pytest command-line flags¶

The benchmem marker¶

Absolute ceilings — max_peak / max_allocated / max_allocations¶

The benchmark_memory fixture¶

The extra_info.benchmem blob¶

CLI — benchmem¶

Public Python API¶

Engine¶

measure_peak ¶

measure_memory ¶

MemoryResult dataclass ¶

repeats property ¶

representative property ¶

peak_bytes property ¶

peak_bytes_max property ¶

allocations property ¶

total_bytes property ¶

rss_bytes property ¶

series ¶

as_dict ¶

from_blob classmethod ¶

Measurement dataclass ¶

Readers & loader¶

from_pytest_benchmark ¶

memory_from_pytest_benchmark ¶

load_samples ¶

load_long_df ¶

discover_runs ¶

Sample ¶

Plotting — pytest_benchmem.plotting¶

plot_scaling ¶

plot_scatter ¶

plot_compare ¶

plot_sweep ¶

Sweeps — pytest_benchmem.sweep¶

sweep ¶

The `benchmem` marker¶

Absolute ceilings — `max_peak` / `max_allocated` / `max_allocations`¶

The `benchmark_memory` fixture¶

The `extra_info.benchmem` blob¶

CLI — `benchmem`¶

MemoryResult `dataclass` ¶

repeats `property` ¶

representative `property` ¶

peak_bytes `property` ¶

peak_bytes_max `property` ¶

allocations `property` ¶

total_bytes `property` ¶

rss_bytes `property` ¶

from_blob `classmethod` ¶

Measurement `dataclass` ¶

Plotting — `pytest_benchmem.plotting`¶

Sweeps — `pytest_benchmem.sweep`¶