-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plotting manuscript progression for methods manuscript #952
Comments
Initial prototype is working 👍 Thanks for the point in the right direction, @agitter, those json suggestions make it WAY easier than what I was imagining (which involved a lot of regex). This is obviously EXTREMELY ROUGH and needs to be visually cleaned up in basically every way, but it is a graph of the data! |
That's amazing! We should be able to flip the order of the dates and show fewer x-axis ticks (e.g. monthly) without too much trouble. Can we account for the big spikes in word count? My first guess is that the initial big spike was adding the reviews as an appendix. Then the other sharp increase and decrease could be when you duplicated text to convert from a single paper to multiple papers, but I don't know whether the timing matches. Nevertheless, having the data plotted is very cool and helps make the point that a git-managed manuscript enables lots of inspection and analysis that is impossible with a typical writing process. Maybe not "impossible" with LaTeX, but it would be painful to analyze every commit without having these stats ready to go in json files. |
Yes! The first one is when we added the appendix and the second one is likely when we accidentally duplicated the appendix 😆 Unfortunately this makes it super clear that it sat there duplicated for a long time before anyone noticed! I think most of the text I duplicated for the manuscript splitting process is still duplicated (since I use blame pretty heavily while adding the attributions of text that I moved between documents!) |
For the ACM-BCB submission we could plot the following manuscript statistics over time:
All of these are available in the files
variables.json
andreferences.json
in the output branch of the repo. Some quick Python experimentation shows how to access these values and the corresponding date:I don't know the most efficient way to get these for every commit in the output branch. However
dumps a list of all commits to a text file that we could iterate over.
Pseudocode for an algorithm could look like:
Doing this with a Python script would be messy due to the subprocess calls to issue git commands, but it's possible and I don't know the GitPython package well enough to do it that way. For example
will checkout a specific commit from the output branch.
The text was updated successfully, but these errors were encountered: