Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

month_delta return incorrect value if passed a date object #84

Open
pounde opened this issue Oct 5, 2024 · 7 comments
Open

month_delta return incorrect value if passed a date object #84

pounde opened this issue Oct 5, 2024 · 7 comments

Comments

@pounde
Copy link

pounde commented Oct 5, 2024

I am attempting to replicate the functionality of relativedelta. The month_delta meets my current purposes but it return incorrect values if passed a date object.

This provided incorrect results. I would expect correct values or a ValueError

    df_ = df.filter(pl.col("date") <= end_date).with_columns(
        yr_for_calc=xdt.month_delta(pl.col("date"), end_date) / 12
    )

The following was my workaround:

    df_ = (
        df.filter(pl.col("date") <= end_date)
        .with_columns(end_date=end_date)
        .with_columns(
            yr_for_calc=xdt.month_delta(pl.col("date"), pl.col("end_date")) / 12
        )
    ).drop("end_date")
@MarcoGorelli
Copy link
Collaborator

thanks @pounde for the report

could you include a reproducible example please?

If I run

# ruff: noqa
from datetime import date, datetime
import polars as pl
import polars_xdt as xdt
df = pl.DataFrame(
    {
        "start_date": [
            date(2024, 3, 1),
            date(2024, 3, 31),
            date(2022, 2, 28),
            date(2023, 1, 31),
            date(2019, 12, 31),
        ],
        "end_date": [
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 1, 31),
            date(2023, 1, 1),
        ],
    },
)
print(df.with_columns(
    xdt.month_delta("start_date", date(2023, 2, 28)).alias("month_delta")
))
print(df.with_columns(
    xdt.month_delta("start_date", datetime(2023, 2, 28)).alias("month_delta")
))

then I do indeed get an error

polars.exceptions.ComputeError: the plugin failed with message: polars_xdt.month_delta only works on Date type. Please cast to Date first.

@pounde pounde changed the title month_delta return incorrect value if passed a datetime month_delta return incorrect value if passed a date object Oct 10, 2024
@pounde
Copy link
Author

pounde commented Oct 10, 2024

Apologies for not doing that from the start. I can't seem to replicate my code, nor can I replicate the code you posted. It is also worth noting that the object passed in the original issue was a 'date' object, not 'datetime'. I have updated the title to reflect that.

While attempting to run the code you provided, I get:

[line 22](vscode-notebook-cell:?execution_count=36&line=22)InvalidOperationError: Series month_delta, length 1 doesn't match the DataFrame height of 5

If you want expression: col("start_date")./usr/local/lib/python3.10/site-packages/polars_xdt/_internal.abi3.so:month_delta([2023-02-28]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

Not sure why I didn't get this before, nor why it can't be broadcast.

@MarcoGorelli
Copy link
Collaborator

While attempting to run the code you provided, I get:

are you sure you got that while running the code I posted? could you post the full traceback please?

@pounde
Copy link
Author

pounde commented Nov 18, 2024

Apologies for the long tail. I created a fresh env:

  • devcontainer
  • Python 3.11.10
  • Polars 1.14.0
  • Polars-xdt 0.16.0

I still get the shape error. The stacktrace is:

{
	"name": "InvalidOperationError",
	"message": "Series month_delta, length 1 doesn't match the DataFrame height of 5

If you want expression: col(\"start_date\")./usr/local/lib/python3.11/site-packages/polars_xdt/_internal.abi3.so:month_delta([2023-02-28]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').",
	"stack": "---------------------------------------------------------------------------
InvalidOperationError                     Traceback (most recent call last)
Cell In[18], line 23
      4 import polars_xdt as xdt
      5 df = pl.DataFrame(
      6     {
      7         \"start_date\": [
   (...)
     21     },
     22 )
---> 23 print(df.with_columns(
     24     xdt.month_delta(\"start_date\", date(2023, 2, 28)).alias(\"month_delta\")
     25 ))
     26 print(df.with_columns(
     27     xdt.month_delta(\"start_date\", datetime(2023, 2, 28)).alias(\"month_delta\")
     28 ))

File /usr/local/lib/python3.11/site-packages/polars/dataframe/frame.py:9202, in DataFrame.with_columns(self, *exprs, **named_exprs)
   9056 def with_columns(
   9057     self,
   9058     *exprs: IntoExpr | Iterable[IntoExpr],
   9059     **named_exprs: IntoExpr,
   9060 ) -> DataFrame:
   9061     \"\"\"
   9062     Add columns to this DataFrame.
   9063 
   (...)
   9200     └─────┴──────┴─────────────┘
   9201     \"\"\"
-> 9202     return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)

File /usr/local/lib/python3.11/site-packages/polars/lazyframe/frame.py:2029, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, streaming, engine, background, _eager, **_kwargs)
   2027 # Only for testing purposes
   2028 callback = _kwargs.get(\"post_opt_callback\", callback)
-> 2029 return wrap_df(ldf.collect(callback))

InvalidOperationError: Series month_delta, length 1 doesn't match the DataFrame height of 5

If you want expression: col(\"start_date\")./usr/local/lib/python3.11/site-packages/polars_xdt/_internal.abi3.so:month_delta([2023-02-28]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()')."
}

Thus far, I'm unable to replicate the issue I had so it may well have been user error. Let me know if there is more I can provide to help.

@MarcoGorelli
Copy link
Collaborator

Thanks

Sure, that should probably broadcast - @akmalsoliev fancy taking this on, as you'd introduced the feature?

@MarcoGorelli
Copy link
Collaborator

So, the issue is:

from datetime import date, datetime
import polars as pl
import polars_xdt as xdt
df = pl.DataFrame(
    {
        "start_date": [
            date(2024, 3, 1),
            date(2024, 3, 31),
            date(2022, 2, 28),
            date(2023, 1, 31),
            date(2019, 12, 31),
        ],
        "end_date": [
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 1, 31),
            date(2023, 1, 1),
        ],
    },
)
print(df.with_columns(
    xdt.month_delta("start_date", date(2023, 2, 28)).alias("month_delta")
))

outputs:

InvalidOperationError: Series month_delta, length 1 doesn't match the DataFrame height of 5

If you want expression: col("start_date")./home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/polars_xdt/_internal.abi3.so:month_delta([2023-02-28]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

@akmalsoliev
Copy link
Contributor

So, the issue is:

from datetime import date, datetime
import polars as pl
import polars_xdt as xdt
df = pl.DataFrame(
    {
        "start_date": [
            date(2024, 3, 1),
            date(2024, 3, 31),
            date(2022, 2, 28),
            date(2023, 1, 31),
            date(2019, 12, 31),
        ],
        "end_date": [
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 1, 31),
            date(2023, 1, 1),
        ],
    },
)
print(df.with_columns(
    xdt.month_delta("start_date", date(2023, 2, 28)).alias("month_delta")
))

outputs:

InvalidOperationError: Series month_delta, length 1 doesn't match the DataFrame height of 5

If you want expression: col("start_date")./home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/polars_xdt/_internal.abi3.so:month_delta([2023-02-28]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

Hey, interesting issue, this shouldn't happen, there should be an error raised that date(2023, 2, 28) is not a Series or if to be included then should be converted into Series, don't know if we should go that path, but do see utility in it.

pub(crate) fn impl_month_delta(start_dates: &Series, end_dates: &Series) -> PolarsResult<Series> {

this works fine

from datetime import date

import polars as pl

import polars_xdt as xdt

df = pl.DataFrame(
    {
        "start_date": [
            date(2024, 3, 1),
            date(2024, 3, 31),
            date(2022, 2, 28),
            date(2023, 1, 31),
            date(2019, 12, 31),
        ],
        "end_date": [
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 2, 28),
            date(2023, 1, 31),
            date(2023, 1, 1),
        ],
    },
)

df = df.with_columns(test=date(2023, 2, 28)).with_columns(
    result=xdt.month_delta("start_date", "test")
)

print(df)

out:

shape: (5, 4)
┌────────────┬────────────┬────────────┬────────┐
│ start_date ┆ end_date   ┆ test       ┆ result │
│ ---        ┆ ---        ┆ ---        ┆ ---    │
│ date       ┆ date       ┆ date       ┆ i32    │
╞════════════╪════════════╪════════════╪════════╡
│ 2024-03-01 ┆ 2023-02-28 ┆ 2023-02-28 ┆ -12    │
│ 2024-03-31 ┆ 2023-02-28 ┆ 2023-02-28 ┆ -13    │
│ 2022-02-28 ┆ 2023-02-28 ┆ 2023-02-28 ┆ 12     │
│ 2023-01-31 ┆ 2023-01-31 ┆ 2023-02-28 ┆ 1      │
│ 2019-12-31 ┆ 2023-01-01 ┆ 2023-02-28 ┆ 38     │
└────────────┴────────────┴────────────┴────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants