Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiledb_put_metadata only saving first element of character vector #626

Open
PedroMilanezAlmeida opened this issue Nov 30, 2023 · 5 comments

Comments

@PedroMilanezAlmeida
Copy link

PedroMilanezAlmeida commented Nov 30, 2023

On my machine, tiledb_put_metadata will only save one (the first) element of a character vector, but all elements of a numeric or integer vector. I am not sure whether that is the expected behavior.

library(tiledb)
pth <- tempfile()
dir.create(pth)
dm <- tiledb_domain(dims = c(tiledb_dim("d1", c(1L, 10L), type = "INT32")))
sch <- tiledb_array_schema(dm, attrs = c(tiledb_attr("a1", type = "INT32")), sparse = TRUE)
tiledb_array_create(pth, sch)
arr <- tiledb_array(pth, "WRITE")
tiledb_array_open(arr, "WRITE")
tiledb_put_metadata(arr, "numeric_key", c(0.5, 1.5))
tiledb_put_metadata(arr, "integer_key", c(1L, 2L))
tiledb_put_metadata(arr, "character_key", c("value_1", "value_2"))
tiledb_array_close(arr)
arr <- tiledb_array_open(arr, "READ")
allmd <- tiledb_get_all_metadata(arr)
print(x = allmd)

Result

character_key:	value_1
integer_key:	1, 2
numeric_key:	0.5, 1.5

sessionInfo

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppSpdlog_0.0.14 tiledb_0.21.1    

loaded via a namespace (and not attached):
 [1] zoo_1.8-12        bit_4.0.5         compiler_4.3.1    tools_4.3.1       RcppCCTZ_0.2.12  
 [6] rstudioapi_0.15.0 spdl_0.0.5        Rcpp_1.0.11       bit64_4.0.5       nanotime_0.3.7   
[11] grid_4.3.1        lattice_0.21-8   
@eddelbuettel
Copy link
Contributor

I believe this to be a documented constraint: essentially a 'string' is already a vector of char, so you would have to do something like paste( c("value1", "value2"), collapse=";") to create a single vector. That single vector then become a (single column) char array on disk.

While not ideal, you could also combine it with JSON writers / parser to write for complex structures.

@PedroMilanezAlmeida
Copy link
Author

PedroMilanezAlmeida commented Nov 30, 2023

Yeah, I (kind of) see what you mean. Just found this as well: #168 (comment). paste with collapse seems a better solution rn.

@eddelbuettel
Copy link
Contributor

I will leave this open because this could do with added documentation.

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Nov 30, 2023

Kudos by the way for sending a perfect demonstration. Here is a slightly mod'ed version:

#!/usr/bin/env Rscript

library(tiledb)
pth <- tempfile()
dir.create(pth)
dm <- tiledb_domain(dims = c(tiledb_dim("d1", c(1L, 10L), type = "INT32")))
sch <- tiledb_array_schema(dm, attrs = c(tiledb_attr("a1", type = "INT32")), sparse = TRUE)
ign <- tiledb_array_create(pth, sch)
arr <- tiledb_array(pth, "WRITE")
ign <- tiledb_array_open(arr, "WRITE")
ign <- tiledb_put_metadata(arr, "numeric_key", c(0.5, 1.5))
ign <- tiledb_put_metadata(arr, "integer_key", c(1L, 2L))
ign <- tiledb_put_metadata(arr, "character_key", paste(c("value_1", "value_2"), collapse=";"))
ign <- tiledb_array_close(arr)
arr <- tiledb_array_open(arr, "READ")
allmd <- tiledb_get_all_metadata(arr)
print(x = allmd)

and it shows:

$ ./gh_issue_626.R 
character_key:  value_1;value_2
integer_key:    1, 2
numeric_key:    0.5, 1.5
$ 

@PedroMilanezAlmeida
Copy link
Author

I will leave this open because this could do with added documentation.

saw this too late, re-opened now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants