BUG: DataFrame.rank does not preserve ExtensionArray dtypes by weeknd415 · Pull Request #63987 · pandas-dev/pandas

weeknd415 · 2026-02-02T08:50:34Z

closes BUG: DataFrame.rank does not return EA types when original type was an EADtype #52829
Tests added and passed
All CI tests passed
Added entry in doc/source/whatsnew/ (follow the existing format)

Summary

DataFrame.rank() converts PyArrow-backed and nullable ExtensionArray columns to float64, while Series.rank() correctly preserves the EA dtype. This is because the internal ranker() function calls data.values for 2D data (DataFrames), which goes through BlockManager.as_array() and strips all ExtensionArray type information.

Reproducer

import pandas as pd
import pyarrow as pa

s = pd.Series([1, 2, 3], dtype=pd.ArrowDtype(pa.int32()))
df = s.to_frame(name="a")

print(s.rank(method="min").dtype)       # uint64[pyarrow] ✓
print(df.rank(method="min").dtypes)     # float64 ✗ (should be uint64[pyarrow])

Fix

Replace the ranker() closure with block-level processing via self._mgr.apply(), following the same pattern used by _accumulate(). This processes each block independently:

ExtensionArray blocks → dispatch to EA._rank(), preserving dtype
NumPy blocks → dispatch to algos.rank(), same as before
axis=1 (cross-column ranking) → falls back to NumPy conversion since ranking across columns requires a single array

Tests Added

test_rank_ea_dtype_preservation — PyArrow int32/float64 columns across all 5 rank methods (average, min, max, first, dense)
test_rank_ea_dtype_preservation_nullable — Nullable Int64/Float64 columns with NA values

…ev#52829) DataFrame.rank() converted PyArrow-backed and nullable EA columns to float64 because the ranker() function called data.values for 2D data, which goes through BlockManager.as_array() and strips all ExtensionArray type information. Fix by using _mgr.apply() to process each block independently, dispatching to EA._rank() for ExtensionArrays and algos.rank() for numpy arrays. This follows the same pattern used by _accumulate(). For axis=1 (cross-column ranking), fall back to the numpy conversion path since ranking across columns requires a single array.

Numpy blocks in the BlockManager are stored transposed (n_cols, n_rows) relative to the user-facing DataFrame layout (n_rows, n_cols). Without transposing, algos.rank with axis=0 ranks along the wrong dimension. Follow the same transpose pattern used by _accumulate(). For 1D ExtensionArrays, .T is a no-op so the fix is safe for both code paths.

weeknd415 added 2 commits February 2, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: DataFrame.rank does not preserve ExtensionArray dtypes#63987

BUG: DataFrame.rank does not preserve ExtensionArray dtypes#63987
weeknd415 wants to merge 2 commits intopandas-dev:mainfrom
weeknd415:fix-dataframe-rank-ea-dtype-gh52829

weeknd415 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

weeknd415 commented Feb 2, 2026

Summary

Reproducer

Fix

Tests Added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant