PERF: use Cython for SparseArray groupby operations by jbrockmendel · Pull Request #64758 · pandas-dev/pandas

jbrockmendel · 2026-03-22T01:19:36Z

Summary

Implement SparseArray._groupby_op to route groupby reductions and transformations through the fast Cython path instead of falling back to slow Python aggregation
Remove the SparseArray special-case for any/all in _cython_agg_general since the new _groupby_op handles them natively
Add dedicated test coverage for sparse groupby operations

On the benchmark from the issue (1000x1000 sparse int DataFrame, groupby mean):

Before: ~3.9s (Python fallback)
After: ~21ms (~185x speedup)

closes #36123

Test plan

Existing sparse extension tests pass (pandas/tests/extension/test_sparse.py)
Full groupby test suite passes (pandas/tests/groupby/)
New pandas/tests/groupby/test_sparse.py covers reductions (sum, mean, min, max, std, var, sem, prod, median), boolean ops (any, all), positional (first, last), transforms (cumsum, cummin, cummax, cumprod, rank), index-based (idxmin, idxmax), NaN fill_value handling, and Series groupby — all parametrized over fill_value in [0, NaN]

🤖 Generated with Claude Code

Implement SparseArray._groupby_op to route groupby reductions and transformations through the fast Cython path instead of falling back to slow Python aggregation. This converts to dense before calling the Cython op, trading a small memory cost for ~185x speedup on the benchmark from the issue. Also removes the SparseArray special-case for any/all in _cython_agg_general since the new _groupby_op handles them natively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On 32-bit platforms (Linux-32-bit, Pyodide/wasm32), SparseDtype(int, ...) resolves to int32 while DataFrame int columns are int64, causing dtype mismatches. Use np.int64 explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jbrockmendel added Performance Memory or execution speed performance Groupby Sparse Sparse Data Type labels Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: use Cython for SparseArray groupby operations#64758

PERF: use Cython for SparseArray groupby operations#64758
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-36123

jbrockmendel commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jbrockmendel commented Mar 22, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant