-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
database #3
Comments
Not sure if I understand your question correctly, but if by "release date" you mean publication date, you will find that this is always given in the second column of the metadata.csv file. It will appear in different places in the TEI Header (the XML version of the file) depending on the kind of bibliographic data provided. The date of the first edition, if available, should be located by an XPath like "sourceDesc//bibl[@type='firstEdition']/date" . Which files are you looking at? |
Hi, I wonder whether you want to get the information from the xml files, or
whether it is enough to use the metadata file.
In case you want to get the date information from the xml files, you have
to understand that there are potentially three dates:
the first edition date (not always known), the date of the physical copy
that was digitized, and/or the date of the digitization that was used for
ELTeC.
Which date do you want? And which cases do you mean "some of the date are
the correct release date, but some others arent"?
If you tell us which ones gave you problems, I might either correct it or
explain why it is like that.
Anyway, from your mail you seem to use the second date... but there is no
actual requirement that the second is consistently the same.
What is encoded is whether it is inside
<bibl type="digitalSource">
<bibl type="firstEdition">
<bibl type="printSource">
And the order of these may vary.
Hope this helped
Diana
philbmz ***@***.***> escreveu no dia quarta, 22/03/2023 à(s)
21:19:
… Yes, publication date, and im looking at those level 1 xml files
[image: Opera Instantâneo_2023-03-22_171837_zenodo org]
<https://user-images.githubusercontent.com/128617956/227028227-c77e4398-f6bb-45b6-8c21-f343422c3ad8.png>
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6H44QGKEK3VYYHWWOCJNLW5NNFHANCNFSM6AAAAAAWD7OFB4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Yeah, im trying to read all those xml files with python, and make some study on it with NLP, for this i need the date of the first edition, but the way im getting those information is by the tag name, which wont work cuz the tag "date" is not always the first edition date in the same position for every xml. Im not sure if im making myself clear, but the example i gave by "second date" its the second position of the tags named "date", sometimes this second position gives me the first edition date, and sometimes another date. Anyway, i get that the method that im using its the problem, thanks for the help. just to exemplify, these are the first 3 times that the tag "date" appears, sometimes the first edition date will be the third one, but in others files it wont, i thought that these dates were padronized, but now i get that i have to get these information with another method, so again, thanks for the help |
Hi, i would like to know how to use this as a database in python. because im trying to get some information from the xml by their tags, like "author", "date" (their release date), "title" and some others, but, the release date is not something padronized in the xml's, so, when i try to get the text in all the second tag "date" (for example) from the xml's, some of the date are the correct release date, but some others arent, cuz the correct ones are in other tag, the first or third one (according to the metadata csm). So, can i get these information in another way?
The text was updated successfully, but these errors were encountered: