Skip to content

Commit f553ed7

Browse files
authored
Merge branch 'main' into add-df-sample-weight-constraints
2 parents dc0aff6 + 7817cb2 commit f553ed7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+531
-529
lines changed

.github/workflows/unit-tests.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -140,9 +140,6 @@ jobs:
140140

141141
moto:
142142
image: motoserver/moto:5.0.27
143-
env:
144-
AWS_ACCESS_KEY_ID: foobar_key
145-
AWS_SECRET_ACCESS_KEY: foobar_secret
146143
ports:
147144
- 5000:5000
148145

AUTHORS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ About the Copyright Holders
77
led by Wes McKinney. AQR released the source under this license in 2009.
88
* Copyright (c) 2011-2012, Lambda Foundry, Inc.
99

10-
Wes is now an employee of Lambda Foundry, and remains the pandas project
10+
Wes became an employee of Lambda Foundry, and remained the pandas project
1111
lead.
1212
* Copyright (c) 2011-2012, PyData Development Team
1313

1414
The PyData Development Team is the collection of developers of the PyData
15-
project. This includes all of the PyData sub-projects, including pandas. The
15+
project. This includes all of the PyData sub-projects, such as pandas. The
1616
core team that coordinates development on GitHub can be found here:
1717
https://github.com/pydata.
1818

@@ -23,11 +23,11 @@ Our Copyright Policy
2323

2424
PyData uses a shared copyright model. Each contributor maintains copyright
2525
over their contributions to PyData. However, it is important to note that
26-
these contributions are typically only changes to the repositories. Thus,
26+
these contributions are typically limited to changes to the repositories. Thus,
2727
the PyData source code, in its entirety, is not the copyright of any single
2828
person or institution. Instead, it is the collective copyright of the
2929
entire PyData Development Team. If individual contributors want to maintain
30-
a record of what changes/contributions they have specific copyright on,
30+
a record of the specific changes or contributions they hold copyright to,
3131
they should indicate their copyright in the commit message of the change
3232
when they commit the change to one of the PyData repositories.
3333

@@ -50,7 +50,7 @@ Other licenses can be found in the LICENSES directory.
5050
License
5151
=======
5252

53-
pandas is distributed under a 3-clause ("Simplified" or "New") BSD
53+
pandas is distributed under the 3-clause ("Simplified" or "New") BSD
5454
license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
55-
BSD-compatible licenses, are included. Their licenses follow the pandas
55+
BSD-compatible licenses, are included. Their licenses are compatible with the pandas
5656
license.

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ dependencies:
5050
- pytz>=2023.4
5151
- pyxlsb>=1.0.10
5252
- s3fs>=2023.12.2
53-
- scipy>=1.12.0
53+
# TEMP upper pin for scipy (https://github.com/statsmodels/statsmodels/issues/9584)
54+
- scipy>=1.12.0,<1.16
5455
- sqlalchemy>=2.0.0
5556
- tabulate>=0.9.0
5657
- xarray>=2024.1.1

doc/source/reference/indexing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ Conversion
9898
:toctree: api/
9999

100100
Index.astype
101+
Index.infer_objects
101102
Index.item
102103
Index.map
103104
Index.ravel

doc/source/whatsnew/v2.3.0.rst

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -31,39 +31,6 @@ Other enhancements
3131
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for :class:`StringDtype` columns (:issue:`60633`)
3232
- The :meth:`~Series.sum` reduction is now implemented for :class:`StringDtype` columns (:issue:`59853`)
3333

34-
.. ---------------------------------------------------------------------------
35-
.. _whatsnew_230.notable_bug_fixes:
36-
37-
Notable bug fixes
38-
~~~~~~~~~~~~~~~~~
39-
40-
These are bug fixes that might have notable behavior changes.
41-
42-
.. _whatsnew_230.notable_bug_fixes.string_comparisons:
43-
44-
Comparisons between different string dtypes
45-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46-
47-
In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy
48-
49-
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
50-
51-
in determining the result dtype when there are different string dtypes compared. Some examples:
52-
53-
- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
54-
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
55-
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
56-
57-
.. _whatsnew_230.api_changes:
58-
59-
API changes
60-
~~~~~~~~~~~
61-
62-
- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
63-
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
64-
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
65-
Index (:issue:`60797`)
66-
6734
.. ---------------------------------------------------------------------------
6835
.. _whatsnew_230.deprecations:
6936

@@ -85,8 +52,6 @@ Numeric
8552

8653
Strings
8754
^^^^^^^
88-
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
89-
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
9055
- Bug in :meth:`Series.__pos__` and :meth:`DataFrame.__pos__` where an ``Exception`` was not raised for :class:`StringDtype` with ``storage="pyarrow"`` (:issue:`60710`)
9156
- Bug in :meth:`Series.rank` for :class:`StringDtype` with ``storage="pyarrow"`` that incorrectly returned integer results with ``method="average"`` and raised an error if it would truncate results (:issue:`59768`)
9257
- Bug in :meth:`Series.replace` with :class:`StringDtype` when replacing with a non-string value was not upcasting to ``object`` dtype (:issue:`60282`)

doc/source/whatsnew/v2.3.1.rst

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,57 @@ including other versions of pandas.
99
{{ header }}
1010

1111
.. ---------------------------------------------------------------------------
12-
.. _whatsnew_231.enhancements:
12+
.. _whatsnew_231.string_fixes:
13+
14+
Improvements and fixes for the StringDtype
15+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16+
17+
.. _whatsnew_231.string_fixes.string_comparisons:
18+
19+
Comparisons between different string dtypes
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy
23+
24+
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
25+
26+
in determining the result dtype when there are different string dtypes compared. Some examples:
27+
28+
- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
29+
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
30+
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
31+
32+
.. _whatsnew_231.string_fixes.ignore_empty:
33+
34+
Index set operations ignore empty RangeIndex and object dtype Index
35+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
36+
37+
When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
38+
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
39+
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
40+
Index (:issue:`60797`).
41+
42+
This ensures that combining such empty Index with strings will infer the string dtype
43+
correctly, rather than defaulting to ``object`` dtype. For example:
44+
45+
.. code-block:: python
46+
47+
>>> pd.options.mode.infer_string = True
48+
>>> df = pd.DataFrame()
49+
>>> df.columns.dtype
50+
dtype('int64') # default RangeIndex for empty columns
51+
>>> df["a"] = [1, 2, 3]
52+
>>> df.columns.dtype
53+
<StringDtype(na_value=nan)> # new columns use string dtype instead of object dtype
54+
55+
.. _whatsnew_231.string_fixes.bugs:
56+
57+
Bug fixes
58+
^^^^^^^^^
59+
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
60+
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
61+
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
1362

14-
Enhancements
15-
~~~~~~~~~~~~
16-
-
1763

1864
.. _whatsnew_231.regressions:
1965

@@ -26,7 +72,7 @@ Fixed regressions
2672

2773
Bug fixes
2874
~~~~~~~~~
29-
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
75+
-
3076

3177
.. ---------------------------------------------------------------------------
3278
.. _whatsnew_231.other:

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ Enhancement2
2828

2929
Other enhancements
3030
^^^^^^^^^^^^^^^^^^
31+
- :func:`pandas.merge` propagates the ``attrs`` attribute to the result if all
32+
inputs have identical ``attrs``, as has so far already been the case for
33+
:func:`pandas.concat`.
3134
- :class:`pandas.api.typing.FrozenList` is available for typing the outputs of :attr:`MultiIndex.names`, :attr:`MultiIndex.codes` and :attr:`MultiIndex.levels` (:issue:`58237`)
3235
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
3336
- Added :meth:`.Styler.to_typst` to write Styler objects to file, buffer or string in Typst format (:issue:`57617`)
@@ -745,9 +748,11 @@ Indexing
745748
- Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`)
746749
- Bug in :meth:`DataFrame.__getitem__` when slicing a :class:`DataFrame` with many rows raised an ``OverflowError`` (:issue:`59531`)
747750
- Bug in :meth:`DataFrame.from_records` throwing a ``ValueError`` when passed an empty list in ``index`` (:issue:`58594`)
751+
- Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` returning incorrect dtype when selecting from a :class:`DataFrame` with mixed data types. (:issue:`60600`)
748752
- Bug in :meth:`DataFrame.loc` with inconsistent behavior of loc-set with 2 given indexes to Series (:issue:`59933`)
749753
- Bug in :meth:`Index.get_indexer` and similar methods when ``NaN`` is located at or after position 128 (:issue:`58924`)
750754
- Bug in :meth:`MultiIndex.insert` when a new value inserted to a datetime-like level gets cast to ``NaT`` and fails indexing (:issue:`60388`)
755+
- Bug in :meth:`Series.__setitem__` when assigning boolean series with boolean indexer will raise ``LossySetitemError`` (:issue:`57338`)
751756
- Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
752757
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
753758

@@ -777,6 +782,7 @@ I/O
777782
- Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)
778783
- Bug in :meth:`DataFrame.to_excel` where the :class:`MultiIndex` index with a period level was not a date (:issue:`60099`)
779784
- Bug in :meth:`DataFrame.to_stata` when exporting a column containing both long strings (Stata strL) and :class:`pd.NA` values (:issue:`23633`)
785+
- Bug in :meth:`DataFrame.to_stata` when input encoded length and normal length are mismatched (:issue:`61583`)
780786
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
781787
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
782788
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)

environment.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,8 @@ dependencies:
6464
- dask-core
6565
- seaborn-base
6666

67-
# local testing dependencies
67+
# Mocking s3 tests
6868
- moto
69-
- flask
7069

7170
# benchmarks
7271
- asv>=0.6.1

pandas/_libs/src/datetime/pd_datetime.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,10 @@ static npy_datetime PyDateTimeToEpoch(PyObject *dt, NPY_DATETIMEUNIT base) {
192192
return npy_dt;
193193
}
194194

195+
/* Initializes and exposes a customer datetime C-API from the pandas library
196+
* by creating a PyCapsule that stores function pointers, which can be accessed
197+
* later by other C code or Cython code that imports the capsule.
198+
*/
195199
static int pandas_datetime_exec(PyObject *Py_UNUSED(module)) {
196200
PyDateTime_IMPORT;
197201
PandasDateTime_CAPI *capi = PyMem_Malloc(sizeof(PandasDateTime_CAPI));

pandas/compat/_optional.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,8 +152,8 @@ def import_optional_dependency(
152152
install_name = package_name if package_name is not None else name
153153

154154
msg = (
155-
f"Missing optional dependency '{install_name}'. {extra} "
156-
f"Use pip or conda to install {install_name}."
155+
f"`Import {install_name}` failed. {extra} "
156+
f"Use pip or conda to install the {install_name} package."
157157
)
158158
try:
159159
module = importlib.import_module(name)

0 commit comments

Comments
 (0)