-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding/Collation problems with 1.5.0 SQL server #834
Comments
Hi there. While waiting for @shrektan to chime in. I am unable to replicate the failure with your second example on my setup. Can you think of how your and my setups are different?
Completely shooting off the hip - are you working against an older version of SQL Server? I think the ability to store UTF-8 encoded characters in Need to think how this changed between 1.4.2 and 1.5.0, and how/why your first example worked then and no longer works today. |
Nevermind @soetang: Ignore the note above; after creating a catalog with your collation, I am able to replicate your second example as well. |
Thanks for looking into it. Tested some more and teoretically i can get it to work by converting the data myself to and from latin1 and by not providing the encoding argument. This however is what I would expect happened internally when i provide the encoding. library('magrittr')
conn <- DBI::dbConnect(
odbc::odbc()
, Driver = "ODBC Driver 17 for SQL Server"
, Server = "server"
, Database = 'database'
, Trusted_Connection = "Yes"
#, encoding = 'latin1'
, AutoTranslate = 'no'
)
test_table <- DBI::Id(schema = 'test', table = 'test_table')
df <- tibble::tibble(
var_char_col = c('kanin', 'ræven', 'ålens', 'ørred'),
bøvs = 1
)
df$var_char_col <- stringi::stri_encode(df$var_char_col, to = 'LATIN1')
DBI::dbWriteTable(
, conn =conn
, name = test_table
, value = df
, field.types = c('var_char_col' = 'varchar(5)')
, overwrite = TRUE
)
db_data <- DBI::dbReadTable(conn, test_table)
db_data
#> var_char_col bøvs
#> 1 kanin 1
#> 2 r\xe6ven 1
#> 3 \xe5lens 1
#> 4 \xf8rred 1
db_data$var_char_col <- stringi::stri_encode(
db_data$var_char_col
, from = 'latin1'
, to = 'UTF-8'
)
db_data
#> var_char_col bøvs
#> 1 kanin 1
#> 2 ræven 1
#> 3 ålens 1
#> 4 ørred 1 We are using SQL server 2019: conn@info
...
#> $dbms.name
#> [1] "Microsoft SQL Server"
#>
#> $db.version
#> [1] "15.00.4375"
....
#> $drivername
#> [1] "libmsodbcsql-17.10.so.6.1"
#>
...
#> $driver.version
#> [1] "17.10.0006"
#>
.... |
Thanks for the investigation. Here's some more notes from me:
|
So I tried writing data while only including the encoding parameter (Without The same time I just reran my second example but with varchar(6). And while it from R makes both writing and retrieving look correct - the actual data in the database is incorrect: R result: library('magrittr')
conn <- DBI::dbConnect(
odbc::odbc()
, Driver = "ODBC Driver 17 for SQL Server"
, Server = "server"
, Database = 'database'
, Trusted_Connection = "Yes"
# , encoding = 'latin1'
, AutoTranslate = 'no'
)
test_table <- DBI::Id(schema = 'test', table = 'test_table')
df <- tibble::tibble(
var_char_col = c('kanin', 'ræven', 'ålens', 'ørred'),
bøvs = 1
)
DBI::dbWriteTable(
conn
, name = test_table
, df
, field.types = c('var_char_col' = 'varchar(6)')
, overwrite = TRUE
)
db_data <- DBI::dbReadTable(conn, test_table)
db_data
#> var_char_col bøvs
#> 1 kanin 1
#> 2 ræven 1
#> 3 ålens 1
#> 4 ørred 1 |
However I can confirm that for reading the data - it works fine without the |
I am off the next days - but have asked my team if they can test it. Thx. Alot. |
@detule I can confirm that it works with your branch. |
We have been expriencing encoding problems with odbc 1.5.0 on SQL server. This works fine on odbc 1.4.2
All our databases unfortunately uses the varchar collation "Danish_Norwegian_CI_AS" - with odbc 1.4.2 however we were able to create a connection so that we could correctly read and write to the database. With odbc 1.5.0 the column names are no longer formatted correctly.
Is this a bug or can we change the connection settings so that it works correctly?
Expected value:
If we remove the encoding parameter - we get correct colum names, however the character vector no longer fit within the varchar(5) datatype even though we only have 5 characters:
The text was updated successfully, but these errors were encountered: