Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve error message when getting an invalid .data #339

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

maelle
Copy link
Collaborator

@maelle maelle commented Nov 14, 2024

Fix #127

Not docs but this was a low-hanging fruit to warm up. 😉

@maelle maelle requested a review from krlmlr November 14, 2024 13:44
@@ -33,7 +33,10 @@ as_duckplyr_df <- function(.data) {
}

if (!identical(class(.data), "data.frame") && !identical(class(.data), c("tbl_df", "tbl", "data.frame"))) {
cli::cli_abort("Must pass a plain data frame or a tibble to `as_duckplyr_df()`.")
cli::cli_abort(c(
"Must pass a plain data frame or a tibble to `as_duckplyr_df()`, not {.obj_type_friendly { .data}}.",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the space before .data is compulsory, I got this error from cli:

Error in "fun(..., .envir = .envir)" : 
  ! Invalid cli literal: `{.data}` starts with a dot.Interpreted literals must not start with a dot in cli >= 3.4.0.`{}` expressions starting with a dot are now only used for cli styles.To avoid this error, put a space character after the starting `{` or use parentheses: `{(.data)}`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should then be a comment, and/or a test.

Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if cf2b613 is merged into main:

  • ✔️001_tpch_01: 24.6ms -> 24.5ms [-4.31%, +3.47%]
  • ✔️001_tpch_02: 128ms -> 128ms [-1.34%, +0.92%]
  • ✔️001_tpch_03: 68.4ms -> 68.5ms [-1.7%, +2.16%]
  • ✔️001_tpch_04: 24.8ms -> 24.5ms [-2.85%, +0.5%]
  • ✔️001_tpch_05: 123ms -> 123ms [-1.91%, +1.24%]
  • ✔️001_tpch_06: 15.4ms -> 15.4ms [-1.92%, +1.98%]
  • ✔️001_tpch_07: 141ms -> 142ms [-0.54%, +1.44%]
  • ✔️001_tpch_08: 161ms -> 160ms [-1.39%, +0.5%]
  • ✔️001_tpch_09: 131ms -> 131ms [-1.37%, +0.52%]
  • ✔️001_tpch_10: 77.2ms -> 77.7ms [-0.81%, +2.15%]
  • ✔️001_tpch_11: 64.4ms -> 64.3ms [-2.26%, +1.98%]
  • ✔️001_tpch_12: 29.2ms -> 29.3ms [-1.07%, +1.99%]
  • ✔️001_tpch_13: 20.6ms -> 20.1ms [-4.62%, +0.45%]
  • ✔️001_tpch_14: 21.7ms -> 22ms [-0.8%, +3.33%]
  • ✔️001_tpch_15: 64.7ms -> 64.3ms [-2.53%, +1.13%]
  • ✔️001_tpch_16: 66.2ms -> 65.9ms [-1.65%, +0.84%]
  • ✔️001_tpch_17: 28.3ms -> 28.3ms [-3.01%, +2.96%]
  • ✔️001_tpch_18: 23.6ms -> 23.7ms [-1.67%, +2.02%]
  • ✔️001_tpch_19: 133ms -> 132ms [-1.9%, +0.97%]
  • ✔️001_tpch_20: 80.1ms -> 80.1ms [-1.6%, +1.41%]
  • ✔️001_tpch_21: 177ms -> 176ms [-1.92%, +0.4%]
  • ✔️001_tpch_22: 128ms -> 128ms [-1.35%, +1.14%]
  • ✔️010_tpch_01: 84.9ms -> 81.2ms [-8.82%, +0.28%]
  • ✔️010_tpch_02: 74.1ms -> 73.8ms [-4.7%, +3.91%]
  • ✔️010_tpch_03: 66.6ms -> 67.5ms [-0.23%, +2.91%]
  • ✔️010_tpch_04: 49.6ms -> 49.9ms [-1.29%, +2.31%]
  • ✔️010_tpch_05: 95.3ms -> 95.4ms [-1.1%, +1.25%]
  • ✔️010_tpch_06: 37ms -> 36.7ms [-8.4%, +6.48%]
  • ✔️010_tpch_07: 116ms -> 115ms [-3.28%, +0.24%]
  • ✔️010_tpch_08: 136ms -> 135ms [-2.43%, +0.55%]
  • ✔️010_tpch_09: 120ms -> 119ms [-2.2%, +1.33%]
  • ✔️010_tpch_10: 85.1ms -> 84.6ms [-2.82%, +1.65%]
  • ✔️010_tpch_11: 39.4ms -> 38.8ms [-4.16%, +1%]
  • ✔️010_tpch_12: 64.1ms -> 63.6ms [-4.21%, +2.69%]
  • ✔️010_tpch_13: 53.7ms -> 53.3ms [-3.18%, +1.69%]
  • ✔️010_tpch_14: 43.5ms -> 43.3ms [-8.27%, +7.58%]
  • ✔️010_tpch_15: 60.5ms -> 59.1ms [-7.95%, +3.12%]
  • ✔️010_tpch_16: 45.5ms -> 45.4ms [-1.97%, +1.49%]
  • ❗🐌010_tpch_17: 55.5ms -> 56.5ms [+0.48%, +3.2%]
  • ✔️010_tpch_18: 53.8ms -> 52.8ms [-8.1%, +4.52%]
  • ✔️010_tpch_19: 125ms -> 125ms [-1.94%, +1.86%]
  • ✔️010_tpch_20: 74.5ms -> 74.1ms [-4.97%, +3.83%]
  • ✔️010_tpch_21: 413ms -> 414ms [-1.22%, +1.62%]
  • ✔️010_tpch_22: 78.7ms -> 77.9ms [-3%, +0.95%]
  • ✔️100_tpch_01: 1.21s -> 1.19s [-7.47%, +5.69%]
  • ✔️100_tpch_02: 128ms -> 120ms [-12.16%, +0.49%]
  • ✔️100_tpch_03: 1.07s -> 1.09s [-1.14%, +5.84%]
  • ✔️100_tpch_04: 1.06s -> 1.05s [-2.75%, +2.41%]
  • ✔️100_tpch_05: 1.16s -> 1.15s [-3.55%, +2.6%]
  • ✔️100_tpch_06: 990ms -> 990ms [-1.26%, +1.11%]
  • 🚀100_tpch_07: 1.12s -> 1.11s [-1.69%, -0.34%]
  • 🚀100_tpch_08: 1.16s -> 1.14s [-2.63%, -0.59%]
  • ✔️100_tpch_09: 1.22s -> 1.24s [-1.1%, +4.37%]
  • ✔️100_tpch_10: 1.1s -> 1.1s [-1.87%, +1.86%]
  • ✔️100_tpch_11: 84.2ms -> 85.3ms [-9.54%, +12.02%]
  • ✔️100_tpch_12: 1.08s -> 1.09s [-3%, +4.39%]
  • ✔️100_tpch_13: 323ms -> 312ms [-8.52%, +1.63%]
  • ✔️100_tpch_14: 1.01s -> 1.01s [-2.45%, +3.47%]
  • ✔️100_tpch_15: 1.11s -> 1.1s [-3.52%, +3.02%]
  • ✔️100_tpch_16: 130ms -> 133ms [-17.99%, +23.5%]
  • ✔️100_tpch_17: 1.05s -> 1.04s [-2.94%, +0.08%]
  • ✔️100_tpch_18: 1.08s -> 1.08s [-1.88%, +1.26%]
  • ✔️100_tpch_19: 1.19s -> 1.17s [-6.11%, +3.08%]
  • ✔️100_tpch_20: 1.08s -> 1.07s [-4.36%, +2.57%]
  • ✔️100_tpch_21: 2.2s -> 2.25s [-1.07%, +5.61%]
  • ✔️100_tpch_22: 177ms -> 169ms [-10.33%, +1.75%]

Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm just envisioning someone removing that seemingly extra space in a future revision.

@@ -33,7 +33,10 @@ as_duckplyr_df <- function(.data) {
}

if (!identical(class(.data), "data.frame") && !identical(class(.data), c("tbl_df", "tbl", "data.frame"))) {
cli::cli_abort("Must pass a plain data frame or a tibble to `as_duckplyr_df()`.")
cli::cli_abort(c(
"Must pass a plain data frame or a tibble to `as_duckplyr_df()`, not {.obj_type_friendly { .data}}.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should then be a comment, and/or a test.


by_cyl <- dplyr::group_by(mtcars, cyl)
expect_snapshot(error = TRUE, {
as_duckplyr_df(by_cyl)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in this case the error is misleading, would you convert a grouped data.frame by converting it thus losing its grouping

Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if e718a07 is merged into main:

  • ✔️001_tpch_01: 24.5ms -> 25.3ms [-0.9%, +7.47%]
  • 🚀001_tpch_02: 133ms -> 130ms [-3.18%, -0.56%]
  • ✔️001_tpch_03: 71.7ms -> 72.7ms [-1.03%, +3.89%]
  • ❗🐌001_tpch_04: 24.7ms -> 25.5ms [+0.15%, +6.21%]
  • ✔️001_tpch_05: 122ms -> 122ms [-1.37%, +1.37%]
  • ✔️001_tpch_06: 15.4ms -> 15.7ms [-1.26%, +5.15%]
  • ✔️001_tpch_07: 149ms -> 148ms [-2.73%, +1.48%]
  • ✔️001_tpch_08: 167ms -> 165ms [-2.82%, +0.86%]
  • ✔️001_tpch_09: 139ms -> 138ms [-2.83%, +0.72%]
  • ✔️001_tpch_10: 79.4ms -> 79.1ms [-3.09%, +2.37%]
  • ✔️001_tpch_11: 66.3ms -> 66.4ms [-3.33%, +3.71%]
  • ✔️001_tpch_12: 29.6ms -> 29.8ms [-1.46%, +2.74%]
  • ✔️001_tpch_13: 21.1ms -> 21.1ms [-2.17%, +2.18%]
  • ✔️001_tpch_14: 22ms -> 21.9ms [-2.89%, +1.43%]
  • ❗🐌001_tpch_15: 64.3ms -> 65.4ms [+0.02%, +3.37%]
  • ✔️001_tpch_16: 66.4ms -> 67.4ms [-0.76%, +3.79%]
  • ✔️001_tpch_17: 28ms -> 27.7ms [-3.22%, +1.07%]
  • ✔️001_tpch_18: 23.5ms -> 24ms [-0.91%, +4.48%]
  • ✔️001_tpch_19: 130ms -> 131ms [-0.43%, +1.71%]
  • ✔️001_tpch_20: 79.4ms -> 79.4ms [-1.55%, +1.79%]
  • ✔️001_tpch_21: 176ms -> 175ms [-1.81%, +1.38%]
  • ✔️001_tpch_22: 125ms -> 125ms [-1.34%, +0.75%]
  • ✔️010_tpch_01: 82.5ms -> 82.8ms [-3.17%, +3.86%]
  • ✔️010_tpch_02: 70.9ms -> 71.7ms [-1.15%, +3.3%]
  • ❗🐌010_tpch_03: 66.5ms -> 67.6ms [+0.35%, +2.81%]
  • ✔️010_tpch_04: 49.3ms -> 49.5ms [-1.37%, +1.9%]
  • ✔️010_tpch_05: 95.4ms -> 95.8ms [-1.04%, +1.82%]
  • ✔️010_tpch_06: 35.8ms -> 36.1ms [-5.21%, +6.88%]
  • ✔️010_tpch_07: 119ms -> 119ms [-1.66%, +2.67%]
  • ✔️010_tpch_08: 140ms -> 143ms [-0.96%, +4.69%]
  • ✔️010_tpch_09: 124ms -> 122ms [-4.62%, +1.43%]
  • ✔️010_tpch_10: 86.4ms -> 86.5ms [-4.71%, +4.88%]
  • ✔️010_tpch_11: 39ms -> 38.2ms [-5.54%, +1.44%]
  • ✔️010_tpch_12: 64.2ms -> 66.4ms [-3.81%, +10.72%]
  • ✔️010_tpch_13: 52.8ms -> 52.9ms [-1.69%, +2.01%]
  • ✔️010_tpch_14: 43.4ms -> 44ms [-1.92%, +5.02%]
  • ✔️010_tpch_15: 58.4ms -> 61.6ms [-2.42%, +13.46%]
  • ✔️010_tpch_16: 46.9ms -> 47.1ms [-2%, +2.72%]
  • ✔️010_tpch_17: 58.5ms -> 58.2ms [-2.59%, +1.88%]
  • ✔️010_tpch_18: 56.1ms -> 57.2ms [-4.85%, +8.74%]
  • ✔️010_tpch_19: 124ms -> 124ms [-1.32%, +2.14%]
  • ✔️010_tpch_20: 75.1ms -> 74.6ms [-2.68%, +1.26%]
  • ✔️010_tpch_21: 431ms -> 429ms [-2.4%, +1.54%]
  • ✔️010_tpch_22: 79ms -> 78.3ms [-3.57%, +1.73%]
  • ✔️100_tpch_01: 1.19s -> 1.23s [-1.09%, +7.44%]
  • ✔️100_tpch_02: 123ms -> 126ms [-9.65%, +15.04%]
  • ✔️100_tpch_03: 1.1s -> 1.08s [-4.71%, +0.34%]
  • ✔️100_tpch_04: 1.12s -> 1.06s [-13.86%, +3.16%]
  • ✔️100_tpch_05: 1.18s -> 1.18s [-1.52%, +1.11%]
  • ✔️100_tpch_06: 1.01s -> 1s [-2.61%, +0.97%]
  • ✔️100_tpch_07: 1.16s -> 1.15s [-6.36%, +3.46%]
  • ✔️100_tpch_08: 1.16s -> 1.15s [-3.61%, +2.61%]
  • ✔️100_tpch_09: 1.26s -> 1.22s [-7.55%, +1.72%]
  • ✔️100_tpch_10: 1.14s -> 1.13s [-4.22%, +3.55%]
  • ✔️100_tpch_11: 84.2ms -> 83.5ms [-7.78%, +6.2%]
  • ✔️100_tpch_12: 1.11s -> 1.1s [-1.38%, +0.59%]
  • ✔️100_tpch_13: 333ms -> 349ms [-4.73%, +14.85%]
  • ✔️100_tpch_14: 1.03s -> 1.06s [-2.15%, +9%]
  • ✔️100_tpch_15: 1.13s -> 1.15s [-0.76%, +4.13%]
  • ✔️100_tpch_16: 125ms -> 124ms [-7.03%, +6.75%]
  • ✔️100_tpch_17: 1.08s -> 1.08s [-3.6%, +4.35%]
  • ✔️100_tpch_18: 1.12s -> 1.11s [-4.1%, +3.08%]
  • ✔️100_tpch_19: 1.2s -> 1.21s [-2.7%, +4.41%]
  • ✔️100_tpch_20: 1.08s -> 1.1s [-1.71%, +4.9%]
  • ✔️100_tpch_21: 2.25s -> 2.28s [-6.3%, +9.15%]
  • ✔️100_tpch_22: 175ms -> 180ms [-7.67%, +13.3%]

Further explanation regarding interpretation and methodology can be found in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should as_duckplyr_df() work with tibbles from readr::read_csv()?
2 participants