You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue
According to the documentation for xpt metadata, number_rows cannot be determined unless the entire dataset is read. I understand that number_rows cannot be extracted from the metadata alone, but I think it can be calculated using only the metadata and final 80-byte chunk.
Expected behavior
Read the header information to find: variable_storage_widths and the start of record data
Calculate record_storage_width as sum of variable_storage_widths
Read the last 80-byte chunk of data to find out how much trailing ASCII blank padding there is.
Calculate number of records using: (total_file_size - start - padding) / record_storage_width
The text was updated successfully, but these errors were encountered:
gerrycampion
changed the title
Calculate number_rows using metadata and final chunk
Calculate XPT number_rows using metadata and final chunk
Apr 30, 2024
Thanks for the interesting suggestion. Pyreadstat is a wrapper around the C library ReadStat, new functionality has to be implemented there before I can expose that functionality here. I do not think that ReadStat has functions to return the start of the data or the padding, so the xalculation xannot be done right now, but you can suggest it over there and once implemented, I can wrap it and provide it in Pyreadstat.
I believe that the number of rows is available for v8 XPORT files created at least for SAS v9.0401M8. This is causing an issue with readstat-created v8 XPORT files from being read by this version of SAS as readstat does not provide the observation count but SAS is expecting it.
Unfortunately, this revised layout does not appear to be documented in the official v8/v9 XPORT layout documentation released by SAS in Oct 2021.
I am currently trying to test the changes necessary to the readstat code to, first of all, write the file. Then there could be some optional code to read in that metadata from the XPORT observation header.
I'll try to get this posted to the readstat site as a new issue (and ideally a PR) soonish once I finish testing "in my spare time". :)
-- Edit: This has been posted as issue #316. I included a blurb about reading in the observation count when available. This could be a partial solution to your issue.
Describe the issue
According to the documentation for xpt metadata,
number_rows
cannot be determined unless the entire dataset is read. I understand thatnumber_rows
cannot be extracted from the metadata alone, but I think it can be calculated using only the metadata and final 80-byte chunk.Expected behavior
variable_storage_widths
and thestart
of record datarecord_storage_width
as sum ofvariable_storage_widths
padding
there is.(total_file_size - start - padding) / record_storage_width
The text was updated successfully, but these errors were encountered: