-
-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Labels
Master TrackerHigh level tracker for similar issuesHigh level tracker for similar issues
Description
This seems like a good starting point with all the changes in 3.0.0 from their release notes.
Not all will need to be taken care of, feel free to mark them as done when addressed.
Updated deprecation policy
- pandas.errors.Pandas4Warning: Warnings which will be enforced in pandas 4.0.
- pandas.errors.Pandas5Warning: Warnings which will be enforced in pandas 5.0.
- pandas.errors.PandasPendingDeprecationWarning: Base class of all warnings which emit a PendingDeprecationWarning, independent of the version they will be enforced.
- pandas.errors.PandasDeprecationWarning: Base class of all warnings which emit a DeprecationWarning, independent of the version they will be enforced.
- pandas.errors.PandasFutureWarning: Base class of all warnings which emit a FutureWarning, independent of the version they will be enforced.
Other enhancements
I/O:
- errors.DtypeWarning improved to include column names when mixed data types are detected (GH 58174)
- DataFrame.to_excel() argument merge_cells now accepts a value of "columns" to only merge MultiIndex column header header cells (GH 35384)
- DataFrame.to_excel() has a new autofilter parameter to add automatic filters to all columns (GH 61194)
- DataFrame.to_excel() now raises a UserWarning when the character count in a cell exceeds Excel’s limitation of 32767 characters (GH 56954)
- read_parquet() accepts to_pandas_kwargs which are forwarded to pyarrow.Table.to_pandas() which enables passing additional keywords to customize the conversion to pandas, such as maps_as_pydicts to read the Parquet map data type as python dictionaries (GH 56842)
- read_spss() now supports kwargs to be passed to pyreadstat (GH 56356)
- read_stata() now returns datetime64 resolutions better matching those natively stored in the stata format (GH 55642)
- DataFrame.to_csv() and Series.to_csv() now support f-strings (e.g., "{:.6f}") for the float_format parameter, in addition to the % format strings and callables (GH 49580)
- DataFrame.to_json() now encodes Decimal as strings instead of floats (GH 60698)
- Added "delete_rows" option to if_exists argument in DataFrame.to_sql() deleting all records of the table before inserting data (GH 37210).
- Added support to read and write from and to Apache Iceberg tables with the new read_iceberg() and DataFrame.to_iceberg() functions (GH 61383)
- Errors occurring during SQL I/O will now throw a generic DatabaseError instead of the raw Exception type from the underlying driver manager library (GH 60748)
- Restore support for reading Stata 104-format and enable reading 103-format dta files (GH 58554)
- Support reading Stata 102-format (Stata 1) dta files (GH 58978)
- Support reading Stata 110-format (Stata 7) dta files (GH 47176)
- Support reading value labels from Stata 108-format (Stata 6) and earlier files (GH 58154)
Groupby/resample/rolling:
- pandas.NamedAgg now supports passing *args and **kwargs to calls of aggfunc (GH 58283)
- DataFrameGroupBy and SeriesGroupBy methods sum, mean, median, prod, min, max, std, var and sem now accept skipna parameter (GH 15675)
- DataFrameGroupBy.transform(), SeriesGroupBy.transform(), DataFrameGroupBy.agg(), SeriesGroupBy.agg(), RollingGroupby.apply(), ExpandingGroupby.apply(), Rolling.apply(), Expanding.apply(), DataFrame.apply() with engine="numba" now supports positional arguments passed as kwargs (GH 58995)
- DataFrameGroupBy.transform(), SeriesGroupBy.transform(), DataFrameGroupBy.agg(), SeriesGroupBy.agg(), SeriesGroupBy.apply(), DataFrameGroupBy.apply() now support kurt (GH 40139)
- Rolling.aggregate(), Expanding.aggregate() and ExponentialMovingWindow.aggregate() now accept NamedAgg aggregations through **kwargs (GH 28333)
- Added Rolling.first(), Rolling.last(), Expanding.first(), and Expanding.last() (GH 33155)
- Added Rolling.nunique() and Expanding.nunique() (GH 26958)
- Added Rolling.pipe() and Expanding.pipe() (GH 57076)
Reshaping:
- pandas.merge() propagates the attrs attribute to the result if all inputs have identical attrs, as has so far already been the case for pandas.concat().
- pandas.merge() now validates the how parameter input (merge type) (GH 59435)
- pandas.merge(), DataFrame.merge() and DataFrame.join() now support anti joins (left_anti and right_anti) in the how parameter (GH 42916)
- DataFrame.pivot_table() and pivot_table() now allow the passing of keyword arguments to aggfunc through **kwargs (GH 57884)
- pandas.concat() will raise a ValueError when ignore_index=True and keys is not None (GH 59274)
- Improve error reporting through outputting the first few duplicates when merge() validation fails (GH 62742)
Missing:
- DataFrame.fillna() and Series.fillna() can now accept value=None; for non-object dtype the corresponding NA value will be used (GH 57723)
- Added support for axis=1 with dict or Series arguments in DataFrame.fillna() (GH 4514)
Numeric:
- DataFrame.agg() called with axis=1 and a func which relabels the result index now raises a NotImplementedError (GH 58807).
- DataFrame.corrwith() now accepts min_periods as optional arguments, as in DataFrame.corr() and Series.corr() (GH 9490)
- DataFrame.cummin(), DataFrame.cummax(), DataFrame.cumprod() and DataFrame.cumsum() methods now have a numeric_only parameter (GH 53072)
- DataFrame.ewm() now allows adjust=False when times is provided (GH 54328)
- Series.cummin() and Series.cummax() now supports CategoricalDtype (GH 52335)
- Series.map() can now accept kwargs to pass on to func (GH 59814)
- Series.nlargest() uses stable sort internally and will preserve original ordering in the case of equality (GH 55767)
- Series.round() now supports object dtypes when the underlying Python objects implement round (GH 63444)
- Support passing a Iterable[Hashable] input to DataFrame.drop_duplicates() (GH 59237)
Strings:
- Series.str.get_dummies() now accepts a dtype parameter to specify the dtype of the resulting DataFrame (GH 47872)
- Added Series.str.isascii() (GH 59091)
- Allow dictionaries to be passed to Series.str.replace() via pat parameter (GH 51748)
Datetimelike:
- Easter has gained a new constructor argument method which specifies the method used to calculate Easter — for example, Orthodox Easter (GH 61665)
- Holiday constructor argument days_of_week will raise a ValueError when type is something other than None or tuple (GH 61658)
- Holiday has gained the constructor argument and field exclude_dates to exclude specific datetimes from a custom holiday calendar (GH 54382)
- Added half-year offset classes HalfYearBegin, HalfYearEnd, BHalfYearBegin and BHalfYearEnd (GH 60928)
- Improved deprecation message for offset aliases (GH 60820)
- Multiplying two DateOffset objects will now raise a TypeError instead of a RecursionError (GH 59442)
Indexing:
- DataFrame.iloc() and Series.iloc() now support boolean masks in getitem for more consistent indexing behavior (GH 60994)
- Index.get_loc() now accepts also subclasses of tuple as keys (GH 57922)
Styler / output formatting:
- Styler.set_tooltips() provides alternative method to storing tooltips by using title attribute of td elements. (GH 56981)
- Added Styler.to_typst() to write Styler objects to file, buffer or string in Typst format (GH 57617)
- Styler.format_index_names() can now be used to format the index and column names (GH 48936 and GH 47489)
- frozenset elements in pandas objects are now natively printed (GH 60690)
Typing:
- pandas.api.typing.FrozenList is available for typing the outputs of MultiIndex.names, MultiIndex.codes and MultiIndex.levels (GH 58237)
- pandas.api.typing.NoDefault is available for typing no_default (GH 60696)
- pandas.api.typing.SASReader is available for typing the output of read_sas() (GH 55689)
- Many type aliases are now exposed in the new submodule pandas.api.typing.aliases (GH 55231)
Plotting:
- Series.plot() now correctly handle the ylabel parameter for pie charts, allowing for explicit control over the y-axis label (GH 58239)
- Added missing parameter weights in DataFrame.plot.kde() for the estimation of the PDF (GH 59337)
- DataFrame.plot.scatter() argument c now accepts a column of strings, where rows with the same string are colored identically (GH 16827 and GH 16485)
ExtensionArray:
- ArrowDtype now supports pyarrow.JsonType (GH 60958)
- Series.rank() and DataFrame.rank() with numpy-nullable dtypes preserve NA values and return UInt64 dtype where appropriate instead of casting NA to NaN with float64 dtype (GH 62043)
- Improve the resulting dtypes in DataFrame.where() and DataFrame.mask() with ExtensionDtype other (GH 62038)
Other:
- set_option() now accepts a dictionary of options, simplifying configuration of multiple settings at once (GH 61093)
- DataFrame.apply() supports using third-party execution engines like the Bodo.ai JIT compiler (GH 60668)
- Series.map() now accepts an engine parameter to allow execution with a third-party execution engine (GH 61125)
- Support passing a Series input to json_normalize() that retains the Index (GH 51452)
- Users can globally disable any PerformanceWarning by setting the option mode.performance_warnings to False (GH 56920)
Packaging:
- Switched wheel upload to PyPI Trusted Publishing (OIDC) for release-tag pushes in wheels.yml. (GH 61718)
- Wheels are now available for Windows ARM64 architecture (GH 61462)
- Wheels are now available for free-threading Python builds on Windows (in addition to the other platforms) (GH 61463)
Other API changes
- 3rd party py.path objects are no longer explicitly supported in IO methods. Use pathlib.Path objects instead (GH 57091)
- read_table()’s parse_dates argument defaults to None to improve consistency with read_csv() (GH 57476)
- Period.to_timestamp() and PeriodIndex.to_timestamp() now give microsecond-unit objects when possible, and nanosecond-unit objects in other cases. This affects the actual value of Period.end_time() and PeriodIndex.end_time() (GH 56164)
- All classes inheriting from builtin tuple (including types created with collections.namedtuple()) are now hashed and compared as builtin tuple during indexing operations (GH 57922)
- Made dtype a required argument in ExtensionArray._from_sequence_of_strings() (GH 56519)
- Passing a Series input to json_normalize() will now retain the Series Index, previously output had a new RangeIndex (GH 51452)
- Pickle and HDF (.h5) files created with Python 2 are no longer explicitly supported (GH 57387)
- Pickled objects from pandas version less than 1.0.0 are no longer supported (GH 57155)
- Removed Index.sort() which always raised a TypeError. This attribute is not defined and will raise an AttributeError (GH 59283)
- Unused dtype argument has been removed from the MultiIndex constructor (GH 60962)
- Updated DataFrame.to_excel() so that the output spreadsheet has no styling. Custom styling can still be done using Styler.to_excel() (GH 54154)
- When comparing the indexes in testing.assert_series_equal(), check_exact defaults to True if an Index is of integer dtype. (GH 57386)
- Index set operations (like union or intersection) will now ignore the dtype of an empty RangeIndex or empty Index with object dtype when determining the dtype of the resulting Index (GH 60797)
- IncompatibleFrequency now subclasses TypeError instead of ValueError. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (GH 55782)
- Series “flex” methods like Series.add() no longer allow passing a DataFrame for other; use the DataFrame reversed method instead (GH 46179)
- date_range() and timedelta_range() no longer default to unit="ns", instead will infer a unit from the start, end, and freq parameters. Explicitly specify a desired unit to override these (GH 59031)
- CategoricalIndex.append() no longer attempts to cast different-dtype indexes to the caller’s dtype (GH 41626)
- ExtensionDtype.construct_array_type() is now a regular method instead of a classmethod (GH 58663)
- Arithmetic operations between a Series, Index, or ExtensionArray with a list now consistently wrap that list with an array equivalent to Series(my_list).array. To do any other kind of type inference or casting, do so explicitly before operating (GH 62552)
- Comparison operations between Index and Series now consistently return Series regardless of which object is on the left or right (GH 36759)
- NumPy functions like np.isinf that return a bool dtype when called on a Index object now return a bool-dtype Index instead of np.ndarray (GH 52676)
- Methods that can operate in-place (replace(), fillna(), ffill(), bfill(), interpolate(), where(), mask(), clip()) now return the modified DataFrame or Series (self) instead of None when inplace=True (GH 63207)
- All Index constructors now copy numpy.ndarray and ExtensionArray inputs by default when copy=None, consistent with Series behavior (GH 63388)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Master TrackerHigh level tracker for similar issuesHigh level tracker for similar issues