-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BED "chrom" field regex is inconsistent with existing practice #779
Comments
I originally posted this issue to ga4gh/ga4gh-bed#3 I suspect that was not the correct place for this bug hence I'm reposting this here. Please let me know I got this wrong. If this is indeed the better place to post these issues it would be great to have some direction here instead in the future. |
SAM uses VCF is similar, but uses So unifying things I'd say Ping @michaelmhoffman, @arq5x Edit: I notice that I was wrong with printable non-white-space as we exclude ` and |
I don't think
If a CHROM value is defined in a header, it is generally a |
The relevant text in the BED specification is
The clear intention during development of this specification was to codify existing practice, as noted in #570 (comment) and other comments on the original BED PR. Clearly Sadly there doesn't appear to have been any discussion of the I reviewed an earlier version of the draft document which had While the intention to make “BED files more portable to varying environments which may make different assumptions about allowed characters” is very laudable, I think existing BED files need to be surveyed for punctuation character (including |
Totally agree that the intention was to codify existing practice and not allowing |
Hello,
Recently there was some work with BED files and RefSeq/Genbank chromosome IDs which typically have a period in them for versioning purposes (e.g. "NC_000001.11"). This is currently not allowed as-is in the spec. Only alphanumeric characters are allowed.
I e-mailed Jim Kent regarding this issue and this is what he had to say:
And from UCSC Matthew Speir had this to say:
The details and initial reasoning come from specifically an engineer there named Angie Hinrichs:
The text was updated successfully, but these errors were encountered: