Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check out Pandoc Scholar #32

Open
dhimmel opened this issue Jul 19, 2017 · 16 comments
Open

Check out Pandoc Scholar #32

dhimmel opened this issue Jul 19, 2017 · 16 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Jul 19, 2017

Described in Formatting Open Science: agilely creating multiple document formats for academic manuscripts with Pandoc Scholar:

In this article we demonstrate the feasibility of writing scientific manuscripts in plain markdown (MD) text files, which can be easily converted into common publication formats, such as PDF, HTML or EPUB, using Pandoc. The simple syntax of Markdown assures the long-term readability of raw files and the development of software and workflows. We show the implementation of typical elements of scientific manuscripts—formulas, tables, code blocks and citations—and present tools for editing, collaborative writing and version control. We give an example on how to prepare a manuscript with distinct output formats, a DOCX file for submission to a journal, and a LATEX/PDF version for deposition as a PeerJ preprint. Further, we implemented new features for supporting ‘semantic web’ applications, such as the ‘journal article tag suite’—JATS, and the ‘citation typing ontology’—CiTO standard.

The GitHub repo for this project is pandoc-scholar/pandoc-scholar. Created by @tarleb.

Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.

@dhimmel
Copy link
Member Author

dhimmel commented Jul 19, 2017

Also worth checking out gh-publisher -- use case at jakevdp/multiband_LS.

@tarleb
Copy link

tarleb commented Jul 20, 2017

Hi @dhimmel, thank you for checking out pandoc-scholar!
Your project looks interesting. Is it going to be a long-term effort? If so, I'd advice not to build it on the current pandoc-scholar, but to use the upcoming pandoc version 2 (inoffical nightly builds). The reason for this is that we are going to integrate lua deeper into pandoc and make more internals accessible to lua programs. Pandoc-scholar includes some hacks and a complex Makefile based build system, which mostly won't be necessary with the new pandoc version. As an additional advantage, it will become possible to use pandoc as a lua interpreter. So instead of requiring users to have bash and python installed (which is a pain for windows users), it will be possible to use just pandoc and its integrated lua interpreter.
Please let me know if that's an option for you and I'll happily help with the pandoc side of things.

@dhimmel
Copy link
Member Author

dhimmel commented Jul 20, 2017

Is it going to be a long-term effort?

Yes.

use the upcoming pandoc version 2

When do you think release will be? Would you recommend using the nightly builds in production?

I'd advice not to build it on the current pandoc-scholar

We're not. Just using pandoc-scholar as a reference. This repository does a few things that are beyond the scope of pandoc-scholar:

  1. Automatic generation of reference metadata as JSON CSL Items.
  2. Use of continuous integration to rebuild and deploy the manuscript upon any changes
  3. A templating framework that enables dynamically inserting data (in progress)
  4. Timestamping the manuscript using the bitcoin blockchain during deployments

instead of requiring users to have bash and python installed (which is a pain for windows users), it will be possible to use just pandoc and its integrated lua interpreter

I'm not sure this is the way we want to go. First, most of the project developers are familiar with Python but not Lua. Also all of our current infrastructure (see items above) is written in Python. We use conda to manage the environment, so we don't anticipate major OS compatibility issues... but you make a good point that our use of shell scripts will likely cause some issues with windows.

I'll happily help with the pandoc side of things

We're happy to modernize if it fits within the project goals. Based on the above discussion, what do you recommend? It may also help to see the system in use at greenelab/deep-review or greenelab/scihub-manuscript.

@agitter
Copy link
Member

agitter commented Jul 20, 2017

We use conda to manage the environment, so we don't anticipate major OS compatibility issues...

Note that even with conda, the current build process only works in Linux due to wkhtmltopdf (see greenelab/deep-review#545). However, because that is all done with continuous integration I don't think that is a major limitation.

@dhimmel
Copy link
Member Author

dhimmel commented Jul 20, 2017

Note that even with conda, the current build process only works in Linux due to wkhtmltopdf

@agitter we're getting wkhtmltopdf from the bioconda channel. We could always submit a PR to add windows and OS X builds, so this is just a temporary limitation.

@tarleb
Copy link

tarleb commented Jul 20, 2017

The way you describe it, I agree with you and think that you're making the right technological choices. I guess I misunderstood some details. I was basing pandoc-scholar on python at first, but had to switch due to our portability requirements. Since that's a non-issue for you, python is an excellent choice IMHO.
Personally, I'd be using the current pandoc 1.19.2 unless I required features not present in that version. The command line interface won't be changing much, and there is no release timeline for pandoc 2 yet.

Off-topic side note: you might be able to skip the sed command removing the authors and date h2 by just specifying author-meta and date-meta in the yaml file.

@tarleb
Copy link

tarleb commented Jul 20, 2017

You might also be interested in panflute, an excellent library allowing simple modifications of the pandoc document AST.

dhimmel added a commit to dhimmel/manubot-rootstock that referenced this issue Jul 20, 2017
dhimmel added a commit that referenced this issue Jul 21, 2017
Refs #32 (comment)

Also ignore manuscript.docx output
dhimmel added a commit that referenced this issue Jul 21, 2017
This build is based on
293050c.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/256046149
https://travis-ci.org/greenelab/manubot-rootstock/jobs/256046150

[ci skip]

The full commit message that triggered this build is copied below:

Use author-meta / date-meta to remove sed (#37)

Refs #32 (comment)

Also ignore manuscript.docx output
dhimmel added a commit that referenced this issue Jul 21, 2017
This build is based on
293050c.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/256046149
https://travis-ci.org/greenelab/manubot-rootstock/jobs/256046150

[ci skip]

The full commit message that triggered this build is copied below:

Use author-meta / date-meta to remove sed (#37)

Refs #32 (comment)

Also ignore manuscript.docx output
@agitter
Copy link
Member

agitter commented Jul 30, 2017

Texture may also be relevant. The repository is https://github.com/substance/texture.

@dhimmel
Copy link
Member Author

dhimmel commented Jul 30, 2017

Texture may also be relevant

I played around with the demo editor, which was slick although some features haven't been fully implemented yet. One note from this document:

At this initial stage, Texture is being developed to be used by a production team seeking to take the author’s final version of a manuscript and produce production quality JATS for publishing purposes.

Therefore, one route where manubot could work with Texture is if we exported JATS XML. Then a journal may be able to use Texture to refine the manubot produced manuscript. Or we potentially could use the article viewer (Lens Viewer) to display our manuscripts.

In the meantime, I don't think there is a ton of overlap between our project and Texture.

@agitter
Copy link
Member

agitter commented May 27, 2019

Therefore, one route where manubot could work with Texture is if we exported JATS XML.

A recent eLife Labs post provides some updates. One relevant part:

We will endeavour to accept submissions of reproducible manuscripts in the form of DAR files by the end of 2019.

DAR files are apparently based on JATS. This is not immediately applicable to Manubot but is worth monitoring. eLife will be a leading journal when it comes to accepting submissions in newer formats.

@dhimmel
Copy link
Member Author

dhimmel commented May 31, 2019

My understanding is that DAR stores the manuscript as JATS, while allowing for the inclusion of other assets like figures, data, and code. For Manubot manuscripts, creating a DAR archive with a JATS manuscript and figures would probably be sufficient. Something to keep in mind when we resume work on #82.

I am less convinced that all data and code should be bundled with manuscripts. I think this breaks down with complex studies whose code and data spans many repositories. Therefore, I think it makes sense to initially focus on creating bare-bone DARs that would allow lossless submission of manuscripts to eLife (i.e. no manual formatting or styling steps required).

@agitter
Copy link
Member

agitter commented May 31, 2019

I am less convinced that all data and code should be bundled with manuscripts. I think this breaks down with complex studies whose code and data spans many repositories. Therefore, I think it makes sense to initially focus on creating bare-bone DARs that would allow lossless submission of manuscripts to eLife

This was my thinking as well

@jcolomb
Copy link

jcolomb commented Jan 16, 2020

Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.

Maybe the integration with jatsxml they are working on ;)

@dhimmel
Copy link
Member Author

dhimmel commented Jan 17, 2020

Maybe the integration with jatsxml they are working on ;)

I updated #82 and will see if Pandoc produces reasonable JATS from our markdown. Are you specifically referring to the jats-cite.lua and jats-fixes.lua filters as well as the pandoc-scholar.jats template? These could be useful (especially the filters). Possibly they will make it into core pandoc... and they're using pandoc-scholar to prototype.

@jcolomb
Copy link

jcolomb commented Jan 17, 2020

I was refering to a twitter discussion with @tarleb, seems they are working on that ;)

@tarleb
Copy link

tarleb commented Jan 17, 2020

Yes, we might merge some of this back into pandoc/pandoc-citeproc, but it might take a while. The filter is a workaround for some shortcomings of the current implementation, but a proper fix would require bigger changes in pandoc-citeproc.

I'll happily keep you updated on our progress.

ploegieku added a commit to ploegieku/2023-functional-homology-paper that referenced this issue Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants