Comment by mks_shuffle

12 hours ago

While this is certainly a welcome step, I hope there is more work done to fix the underlying problem of easily creating correct BibTeX entries for the cited papers. Citations for any given paper can come from a wide range of journals with various publishers, conferences, and preprints. The same paper can be available from multiple sources with varying details, e.g. arXiv and the conference website. Tools like Zotero have certainly made it significantly easier to extract citations from webpages of publication, but I still find issues with the extracted BibTeX details. While author names and titles are often extracted correctly, I still have to manually ensure that details like publication venue, year, volume number, page number, URL, etc. are extracted correctly and also shown correctly in LaTeX format. Different publications can use different citation styles. This can unfortunately lead to taking shortcuts with AI-generated citation data due to the lack of an easy and unified approach to extract consistent citation data. I am not sure whether hallucinated citations are being generated in the main manuscript or in a separate BibTeX file, so I may be a bit off in my understanding.

Fun fact: if an article has a DOI, you can just use curl to get a BibTeX entry. An example using one of my articles:

  $ curl -L "https://doi.org/10.47397/tb/43-1/tb133chernoff-widows" -H 'Accept: application/x-bibtex'
  @article{Chernoff_2022, title={Automatically removing widows and orphans with <tt>lua-widow-control</tt>}, volume={43}, ISSN={0896-3207}, url={http://dx.doi.org/10.47397/tb/43-1/tb133chernoff-widows}, DOI={10.47397/tb/43-1/tb133chernoff-widows}, number={1}, journal={TUGboat}, publisher={TeX Users Group}, author={Chernoff, Max}, year={2022}, pages={28–39} }

This is the exact same method that Zotero uses internally, so this won't ever give you better results, but I still find it kinda neat.