A useful tool for looking up Bib entries using DOI, PubMed ID (URL), or arXiv ID (URL).
π NEW π Streamlit support! See here for an app deployed on Streamlit Community Cloud.
It is an updated version of https://github.com/wenh06/utils/blob/master/utils_universal/utils_bib.py
NOTE that you should have internet connection to use bib_lookup
.
- Installation
- Dependencies
- Basic Usage Examples
- Command-line Usage
- Output (Append) to a
.bib
File - arXiv to DOI
- Bib Items Checking
- Simplify a
.bib
File CitationMixin
class- TODO
- WARNING
- Biblatex Cheetsheet
- Citation
- References
Run
python -m pip install bib-lookup
or install the latest version in GitHub using
python -m pip install git+https://github.com/DeepPSP/bib_lookup.git
or git clone this repository and install locally via
cd bib_lookup
python -m pip install .
π Back to TOC
- requests
- feedparser
- pandas
π Back to TOC
Click to expand!
>>> from bib_lookup import BibLookup
>>> bl = BibLookup(align="middle")
>>> print(bl("1707.07183"))
@article{wen2017_1707.07183v2,
author = {Hao Wen and Chunhui Liu},
title = {Counting Multiplicities in a Hypersurface over a Number Field},
journal = {arXiv preprint arXiv:1707.07183v2},
year = {2017},
month = {7}
}
>>> print(bl("10.1109/CVPR.2016.90"))
@inproceedings{He_2016,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Deep Residual Learning for Image Recognition},
booktitle = {2016 {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})},
doi = {10.1109/cvpr.2016.90},
year = {2016},
month = {6},
publisher = {{IEEE}}
}
>>> print(bl("10.23919/cinc53138.2021.9662801", align="left-middle"))
@inproceedings{Wen_2021,
author = {Hao Wen and Jingsu Kang},
title = {Hybrid Arrhythmia Detection on Varying-Dimensional Electrocardiography: Combining Deep Neural Networks and Clinical Rules},
booktitle = {2021 Computing in Cardiology ({CinC})},
doi = {10.23919/cinc53138.2021.9662801},
publisher = {{IEEE}},
year = {2021},
month = {9},
pages = {1β4}
}
π Back to TOC
Click to expand!
After installation, one can use bib-lookup
in the command line:
bib-lookup 10.1109/CVPR.2016.90 10.23919/cinc53138.2021.9662801 --ignore-fields url doi -i path/to/input.txt -o path/to/output.bib
View current version:
bib-lookup --version
View current configuration:
bib-lookup --config show
Remove current configuration:
bib-lookup --config reset
Set specific configuration:
bib-lookup --config "timeout=2.0;print_result=true;ignore_fields=['url','pdf']"
or from a json
file or yaml
file:
bib-lookup --config /path/to/config.json
bib-lookup --config /path/to/config.yaml
Note that unrecognized fields will be ignored and warning messages will be printed. The following table lists all the available configuration options:
Option | Type | Default | Description |
---|---|---|---|
align |
str |
middle |
Alignment of the bib item. |
email |
str |
None |
Email address to be used in the request. |
ignore_fields |
list |
['url', 'pdf'] |
Fields to be ignored in the output. |
ignore_errors |
bool |
False |
Whether to ignore errors. |
timeout |
float |
6.0 |
Timeout in seconds for each request. |
arxiv2doi |
bool |
True |
Whether to convert arXiv ID to DOI. |
format |
str |
bibtex |
Output format. |
style |
str |
apa |
Citation style. Valid only when format is text . |
verbose |
int |
0 |
Verbosity level. |
print_result |
bool |
False |
Whether to print the result. |
ordering |
list |
['title', 'author', 'journal', 'booktitle'] |
Ordering of the fields. |
π Back to TOC
Click to expand!
Each time a bib item is successfully found, it will be cached. One can call the save
function to write the cached bib items to a .bib
file, in the append mode.
>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl(["10.1109/CVPR.2016.90", "10.23919/cinc53138.2021.9662801", "DOI: 10.1142/S1005386718000305"]);
>>> len(bl)
3
>>> bl[0]
'10.1109/CVPR.2016.90'
>>> bl.save([0, 2], "path/to/some/file.bib") # save bib item corr. to "10.1109/CVPR.2016.90" and "DOI: 10.1142/S1005386718000305"
>>> len(bl)
1
>>> bl.pop(0) # remove the bib item corr. "10.23919/cinc53138.2021.9662801", equivalent to `bl.pop("10.23919/cinc53138.2021.9662801")`
>>> len(bl)
0
π Back to TOC
Click to expand!
From 2022.2.17, new arXiv articles are automatically assigned DOIs (old ones in progress). If one prefers DOI citation to arXiv citation then
>>> from bib_lookup import BibLookup
>>> bl = BibLookup(arxiv2doi=True) # the default for `arxiv2doi` is False
>>> print(bl("https://arxiv.org/abs/2204.04420"))
@misc{https://doi.org/10.48550/arxiv.2204.04420,
author = {Hao, Wen and Jingsu, Kang},
title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing},
doi = {10.48550/ARXIV.2204.04420},
keywords = {Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
while with bl = BibLookup()
, one would get
@article{hao2022_2204.04420v1,
author = {Wen Hao and Kang Jingsu},
title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing},
journal = {arXiv preprint arXiv:2204.04420v1},
year = {2022},
month = {4}
}
π Back to TOC
Click to expand!
One can use BibLookup
to check the validity (required fields, duplicate labels, etc) of bib items in a Bib file. The following is an example with a Bib file with incorrect and duplicate bib items.
>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl.check_bib_file("./test/invalid_items.bib")
Bib item "He_2016"
starting from line 3 is not valid.
Bib item of entry type "inproceedings" should have the following fields:
['author', 'title', 'booktitle', 'year']
Bib item "Wen_2018"
starting from line 16 is not valid.
Bib item of entry type "article" should have the following fields:
['author', 'title', 'journal', 'year']
Bib items "He_2016" starting from line 3
and "He_2016" starting from line 45 is duplicate.
[3, 16, 45]
or from command line
bib-lookup -c ./test/invalid_items.bib
bib-lookup --ignore-fields url doi -i ./test/sample_input.txt -o ./tmp/a.bib -c true
π Back to TOC
Click to expand!
Sometimes one wants a clean bib without bib items that are not cited, then one can use the static method simplify_bib_file
to generate a new .bib
File that contains only the cited bib items from an old .bib
File.
>>> from bib_lookup import BibLookup
>>> new_bib_file_path = BibLookup.simplify_bib_file("path/to/tex/source/file", "path/to/old/bib/file")
>>> # or use the following if one has multiple source files
>>> new_bib_file_path = BibLookup.simplify_bib_file(list_of_tex_source_files_or_folders, "path/to/old/bib/file")
π Back to TOC
Click to expand!
One can inherit the CitationMixin
class to have the method get_citation
for any class,
in which case one only needs to provide a self.doi
. For example:
from bib_lookup import CitationMixin
class SomeClass(CitationMixin):
doi = "10.23919/cinc53138.2021.9662801" # can also be a list
Click to expand!
- (βοΈ)
add CLI support; - use eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi for PubMed, as in [3];
- try using google scholar api described in [4] (unfortunately [4] is charged);
- use
Flask
to write a simple browser-based UI; - (:heavy_check_mark:)
check if the bib item is already existed in the output file, and skip saving it if so; - since arXiv articles are now automatically assigned DOIs (ref. this blog), consider converting arXiv identifiers to DOI indentifiers, and requesting from DOI. Currently, the request results are different, at least the entry type is change from
article
tomisc
; - make
__call__
method asynchronised usingasyncio
andaiohttp
orhttpx
.
π Back to TOC
Click to expand!
Many journals have specific requirements for the Bib entries, for example, the title and/or journal (and/or booktitle), etc. should be capitalized, which could not be done automatically since
- some abbreviations in title should have characters all in the upper case, for example
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- some should have characters all in in the lower case,
mixup: Beyond Empirical Risk Minimization
- and some others should have mixed cases,
KeMRE: Knowledge-enhanced Medical Relation Extraction for Chinese Medicine Instructions
This should be corrected by the user himself if necessary (which although is rare), and remember to enclose such fields with double curly braces.
For example, the lookup result for the AlexNet
paper is
>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("https://doi.org/10.1145/3065386"))
@article{Krizhevsky_2017,
author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton},
title = {{ImageNet} classification with deep convolutional neural networks},
journal = {Communications of the {ACM}},
doi = {10.1145/3065386},
year = {2017},
month = {5},
publisher = {Association for Computing Machinery ({ACM})},
volume = {60},
number = {6},
pages = {84--90}
}
This result (the title) should be adjusted to
@article{Krizhevsky_2017,
author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton},
title = {{ImageNet Classification with Deep Convolutional Neural Networks}},
journal = {Communications of the {ACM}},
doi = {10.1145/3065386},
year = {2017},
month = {5},
publisher = {Association for Computing Machinery ({ACM})},
volume = {60},
number = {6},
pages = {84--90}
}
A more severe example that need manual correction is as follows
>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("10.1093/acprof:oso/9780195058239.001.0001"))
@book{Malmivuo_1995,
author = {Jaakko Malmivuo and Robert Plonsey},
title = {{BioelectromagnetismPrinciples} and Applications of Bioelectric and Biomagnetic Fields},
doi = {10.1093/acprof:oso/9780195058239.001.0001},
year = {1995},
month = {10},
publisher = {Oxford University Press}
}
Adjust it to
@book{Malmivuo_1995,
author = {Jaakko Malmivuo and Robert Plonsey},
title = {{Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields}},
doi = {10.1093/acprof:oso/9780195058239.001.0001},
year = {1995},
month = {10},
publisher = {Oxford University Press}
}
This shows that the data in the DOI database is NOT always correct.
π Back to TOC
This file downloaded from [6] gives full knowledge about bib
entries.
π Back to TOC
@misc{https://doi.org/10.5281/zenodo.6435017,
author = {WEN, Hao},
title = {bib\_lookup: A Useful Tool for Uooking Up Bib Entries},
doi = {10.5281/ZENODO.6435017},
url = {https://zenodo.org/record/6435017},
publisher = {Zenodo},
year = {2022},
copyright = {MIT License}
}
The above citation can be get via
>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("DOI: 10.5281/zenodo.6435017"))
π Back to TOC
- https://github.com/davidagraf/doi2bib2
- https://arxiv.org/help/api
- https://github.com/mfcovington/pubmed-lookup/
- https://serpapi.com/google-scholar-cite-api
- https://www.bibtex.com/
- http://tug.ctan.org/info/biblatex-cheatsheet/biblatex-cheatsheet.pdf
π Back to TOC