Skip to content

PERF: pre-compute date/datetime column classification in SAS7BDAT reader#64768

Draft
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-sas-column-classify
Draft

PERF: pre-compute date/datetime column classification in SAS7BDAT reader#64768
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-sas-column-classify

Conversation

@jbrockmendel
Copy link
Member

Summary

  • Pre-computes date/datetime column classification once at end of _parse_metadata instead of checking set membership against 67+20 element tuples per column per chunk in _chunk_to_dataframe
  • ~3-7% improvement in _chunk_to_dataframe, scaling with columns x chunks
  • Updates test_corrupt_read expected error message since column_count is now accessed before row_count

closes #47339

Test plan

  • pytest pandas/tests/io/sas/ — all 117 tests pass

🤖 Generated with Claude Code

@jbrockmendel jbrockmendel added Performance Memory or execution speed performance IO SAS SAS: read_sas labels Mar 22, 2026
jbrockmendel and others added 2 commits March 21, 2026 19:47
…der (pandas-dev#47339)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel force-pushed the perf-sas-column-classify branch from f566a9b to 2978841 Compare March 22, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO SAS SAS: read_sas Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Meta issue: SAS7BDAT parser improvements

1 participant