database #3

philbmz · 2023-03-22T15:28:57Z

Hi, i would like to know how to use this as a database in python. because im trying to get some information from the xml by their tags, like "author", "date" (their release date), "title" and some others, but, the release date is not something padronized in the xml's, so, when i try to get the text in all the second tag "date" (for example) from the xml's, some of the date are the correct release date, but some others arent, cuz the correct ones are in other tag, the first or third one (according to the metadata csm). So, can i get these information in another way?

lb42 · 2023-03-22T15:56:59Z

Not sure if I understand your question correctly, but if by "release date" you mean publication date, you will find that this is always given in the second column of the metadata.csv file. It will appear in different places in the TEI Header (the XML version of the file) depending on the kind of bibliographic data provided. The date of the first edition, if available, should be located by an XPath like "sourceDesc//bibl[@type='firstEdition']/date" . Which files are you looking at?

philbmz · 2023-03-22T20:19:20Z

Yes, publication date, and im looking at those level 1 xml files

dianamsmpsantos · 2023-03-23T16:50:50Z

Hi, I wonder whether you want to get the information from the xml files, or whether it is enough to use the metadata file. In case you want to get the date information from the xml files, you have to understand that there are potentially three dates: the first edition date (not always known), the date of the physical copy that was digitized, and/or the date of the digitization that was used for ELTeC. Which date do you want? And which cases do you mean "some of the date are the correct release date, but some others arent"? If you tell us which ones gave you problems, I might either correct it or explain why it is like that. Anyway, from your mail you seem to use the second date... but there is no actual requirement that the second is consistently the same. What is encoded is whether it is inside <bibl type="digitalSource"> <bibl type="firstEdition"> <bibl type="printSource"> And the order of these may vary. Hope this helped Diana philbmz ***@***.***> escreveu no dia quarta, 22/03/2023 à(s) 21:19:

…

Yes, publication date, and im looking at those level 1 xml files [image: Opera Instantâneo_2023-03-22_171837_zenodo org] <https://user-images.githubusercontent.com/128617956/227028227-c77e4398-f6bb-45b6-8c21-f343422c3ad8.png> — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6H44QGKEK3VYYHWWOCJNLW5NNFHANCNFSM6AAAAAAWD7OFB4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

philbmz · 2023-03-23T22:32:38Z

Yeah, im trying to read all those xml files with python, and make some study on it with NLP, for this i need the date of the first edition, but the way im getting those information is by the tag name, which wont work cuz the tag "date" is not always the first edition date in the same position for every xml. Im not sure if im making myself clear, but the example i gave by "second date" its the second position of the tags named "date", sometimes this second position gives me the first edition date, and sometimes another date. Anyway, i get that the method that im using its the problem, thanks for the help.

just to exemplify, these are the first 3 times that the tag "date" appears, sometimes the first edition date will be the third one, but in others files it wont, i thought that these dates were padronized, but now i get that i have to get these information with another method, so again, thanks for the help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database #3

database #3

philbmz commented Mar 22, 2023

lb42 commented Mar 22, 2023

philbmz commented Mar 22, 2023

dianamsmpsantos commented Mar 23, 2023 via email

philbmz commented Mar 23, 2023

database #3

database #3

Comments

philbmz commented Mar 22, 2023

lb42 commented Mar 22, 2023

philbmz commented Mar 22, 2023

dianamsmpsantos commented Mar 23, 2023 via email

philbmz commented Mar 23, 2023