Skip to content

PERF: use PyArrow native cast for int/bool to string conversion#64762

Draft
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-56505
Draft

PERF: use PyArrow native cast for int/bool to string conversion#64762
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-56505

Conversation

@jbrockmendel
Copy link
Member

Summary

  • Use PyArrow's native pc.cast() instead of element-wise lib.ensure_string_array() when casting integer and boolean dtypes to string[pyarrow] in ArrowStringArray._from_sequence
  • Applies to both numpy int/uint/bool arrays and BaseMaskedArray (IntegerArray/BooleanArray) code paths
  • ~10x speedup for integers, ~4x for booleans at the _from_sequence level

closes #56505

Test plan

  • Added tests for numpy int (multiple widths), bool, masked int, and masked bool → string conversion
  • Added end-to-end Series.astype("string") tests for int, bool, and nullable int
  • Existing string array and extension test suites pass

🤖 Generated with Claude Code

Use PyArrow's native pc.cast() instead of element-wise
lib.ensure_string_array() when casting integer and boolean dtypes
to string[pyarrow]. This applies to numpy int/uint/bool arrays and
BaseMaskedArray (IntegerArray/BooleanArray) paths in
ArrowStringArray._from_sequence.

closes pandas-dev#56505

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: casting to the new String dtype could be faster by leveraging pyarrow

1 participant