Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database #3

Open
philbmz opened this issue Mar 22, 2023 · 4 comments
Open

database #3

philbmz opened this issue Mar 22, 2023 · 4 comments

Comments

@philbmz
Copy link

philbmz commented Mar 22, 2023

Hi, i would like to know how to use this as a database in python. because im trying to get some information from the xml by their tags, like "author", "date" (their release date), "title" and some others, but, the release date is not something padronized in the xml's, so, when i try to get the text in all the second tag "date" (for example) from the xml's, some of the date are the correct release date, but some others arent, cuz the correct ones are in other tag, the first or third one (according to the metadata csm). So, can i get these information in another way?

@lb42
Copy link
Collaborator

lb42 commented Mar 22, 2023

Not sure if I understand your question correctly, but if by "release date" you mean publication date, you will find that this is always given in the second column of the metadata.csv file. It will appear in different places in the TEI Header (the XML version of the file) depending on the kind of bibliographic data provided. The date of the first edition, if available, should be located by an XPath like "sourceDesc//bibl[@type='firstEdition']/date" . Which files are you looking at?

@philbmz
Copy link
Author

philbmz commented Mar 22, 2023

Yes, publication date, and im looking at those level 1 xml files
Opera Instantâneo_2023-03-22_171837_zenodo org

@dianamsmpsantos
Copy link
Collaborator

dianamsmpsantos commented Mar 23, 2023 via email

@philbmz
Copy link
Author

philbmz commented Mar 23, 2023

Yeah, im trying to read all those xml files with python, and make some study on it with NLP, for this i need the date of the first edition, but the way im getting those information is by the tag name, which wont work cuz the tag "date" is not always the first edition date in the same position for every xml. Im not sure if im making myself clear, but the example i gave by "second date" its the second position of the tags named "date", sometimes this second position gives me the first edition date, and sometimes another date. Anyway, i get that the method that im using its the problem, thanks for the help.

Sem título

just to exemplify, these are the first 3 times that the tag "date" appears, sometimes the first edition date will be the third one, but in others files it wont, i thought that these dates were padronized, but now i get that i have to get these information with another method, so again, thanks for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants