-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference in metadata columns #27
Comments
Yes I think that's because you use the official interface that GISAID provides; the AA Substitutions column only seems present when one selects records directly on the GISAID website & then presses Download at the bottom, where one can then choose between downloading the metadata or the FASTA sequences... With manually selected & downloaded GISAID records I get a tsv back with these columns
It's only when I used the GISAIDR download function that I get the columns
which misses the AA Substitutions field (confirmed by directly inspecting the gisaidr_data_tmp.tar file)... I think getting a tsv back with all the columns included could be supported if the download would be driven via RSelenium, similar to how I download the GISAID batch download packages that are available, https://stackoverflow.com/questions/72632118/download-covid-patient-metadata-from-gisaid-website-in-r-using-rselenium. This would involve: Aside from downloading particular records in this way (which should also get the AA substitutions field), I think supporting the download of the batch download packages via RSelenium could be cool too, but you would probably just have to put it in a separate function, as one can then only download the whole database (download+reading it in in R then just takes 2 mins), and not a particular subset. |
Hi @tomwenseleers, I’m not using the offical GISAID interface (none exists as far as I can tell). GISAIDR just sends the equivalent HTTP requests that you send when using the website. I think the problem here is that we have different versions of GISAID? This is what my download panel looks like. There is no Patient status metadata or Nucleotide sequences (FASTA) option only Augur or acknowledgements. When I press download I get a zip that combines metadata and the sequences. Can you please double check the URLs for the steps above? My url is https://www.epicov.org/epi3/frontend ie /frontend. If I use https://www.epicov.org/epi3 without /frontend I get a 404 error. |
Ha sorry. What a shame then - it seems GISAID somehow decided to give different users different tiers of access or what? How is one supposed to write reproducible code to drive this? The URL I get to start with is If I use GISAIDR I also get back a .tar file with sequences & metadata combined, and with metadata lacking that AA substitutions field. It is this that confused me, because if I manually log in to the GISAID website and select some records and press Download at the bottom I get this Aside from that I also have batch package download options available when I press on the Downloads button at the top of the page which for me looks like |
For the record, with my login & credentials, this is how I managed to download a separate metadatafile with all the columns I was given access to & the code given can also still be modified a bit to allow download of the FASTA; this is using RSelenium (so a bit different than your |
When I download a metadata file there is no AA Substitutions column... I'm not sure why we have different columns. It seems maybe different users are getting different results from GISAID? #26
The text was updated successfully, but these errors were encountered: