Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases or build support #8

Closed
joshfinley opened this issue Aug 15, 2023 · 4 comments
Closed

Releases or build support #8

joshfinley opened this issue Aug 15, 2023 · 4 comments

Comments

@joshfinley
Copy link

First of all, this is an amazing project you have put together. You’ve taken a simple static site generator and transformed it into a powerful tool for organizing complex written content and rendering it elegantly.

Because of this, I would love to use this as a platform for my own writing. Judging by the number of forks of this project, I believe others would as well.

That being said, would it be possible for a release version or some sort of build support and usage documentation to be provided?

I understand that gwern.net is a bespoke tool and was never intended for this purpose, but it would be amazing if other people could use it.

@gwern
Copy link
Owner

gwern commented Sep 27, 2023

This is a WONTFIX because we don't see a release, or widespread use, as either particularly feasible or desirable at this point. The site+codebase is best seen as a prototype and demo to inspire other cleaner implementations.

Gwern.net is a testbed, showcase, and highly-opinionated personal wiki—but it is not finished and the backend is now an atrocious pile of hacks pushing the static site paradigm & Pandoc far beyond where they should be. I am somewhat optimistic that the overall design is stabilizing now that we have good client-side transclusions, but the backend needs a complete rewrite to a real database + dynamic site (ie. a regular wiki or CMS) and a non-Pandoc (and perhaps non-Markdown) language. No one should be trying to use the current backend (including me). The final output is pretty good, but the sausage factory would not pass an FDA inspection.

The JS frontend is reasonably high-quality due to Said Achmiz continually refactoring it (although we continue to work our way through feedback to finetune the experience and deal with endless occasional bugs), and parts of it can be profitably deployed elsewhere, but only parts, because it needs an extensive backend.
And the backend has many major issues.
Problems range from persistent issues with HTML↔Markdown↔AST due to Pandoc limitations to unscalable approaches like editing a large YAML file to create annotations to site compilation now requiring 10+ hours on a beefy workstation due to all the slowdowns (especially from the insanely slow vector search of rp-tree) to a link metadata schema that desperately needs a rewrite to drop the DOI field & add a creation date & add a more flexible [(String,String)] association list (to store miscellaneous metadata like the DOI & affiliation) to rewriting everything from String to Text (constant overhead in source code, development, and runtime)...
All that aside, while I've made some progress on factoring out 'configuration' data, there is still a lot of Gwern.net-hardwired assumptions that any would-be user would keep running into.

Indeed, I'm not sure the current backend even can be shipped. Haskell versioning aside (I'm on some old GHC, I think), there's at least two private forks I've had to make that I can recall: removing line IDs from Pandoc skylighting in code blocks, because it has no available override or way to configure it, and adding symlink support to Hakyll for copying files—as with 100GB+ of files, it would take a long time to unnecessarily copy everything into the 'compiled' site and I don't have the disk space now to do that even if I wanted to wait.

Had I known it would come to this, I probably would've never tried to do it with Pandoc+Hakyll, but of course, there was no way to know any of that without trying and engaging in constant iteration, so, here we are.

The current 'plan', such as it is, is to just keep going and incrementally patch issues until everything stabilizes and hopefully we get clarity on what the right architecture will be for a proper rewrite. I've had some thoughts about whether it could be done in org-mode, and I've been outlining some ideas about how to redesign personal wikis + text editors from the ground up for the new neural net age which might supersede Gwern.net entirely at some point.

@joshfinley
Copy link
Author

joshfinley commented Sep 28, 2023

Hello,

Thank you for sharing such a comprehensive look into the backend complexities of your project. I appreciate the candor; it’s certainly not uncommon to see an ambitious endeavor spiral into a tangle of technical challenges.

That being said, your site has several standout features. From design elements like a minimalist aesthetic and marginal footnotes, to functional components like archiving and automated link / PLOS/PMCID abstract extraction—these are features that I and many other content creators would find beneficial for producing long-form content in a static blog format.

Given the intricacies you've pointed out, it got me thinking: could some of the simpler features be distilled into a more straightforward static site project? Perhaps a Hugo theme or an extension for another static site generator? It's a thought I'm throwing into the wind, but I feel that some of the non-backend-heavy features could find life in other projects. This is something I may explore if I can find the time.

Thanks again for your insights, and best of luck with the future of Gwern.net and your ongoing exploration of new technologies.

@gwern
Copy link
Owner

gwern commented Sep 29, 2023

The sidenotes/margin-notes I believe can be reused more or less as-is. The sidenotes JS should be standalone, and the margin notes are nothing but a simple <span class="marginnote">margin note</span> wrapper that anyone can write in a Markdown file (or using the Pandoc span syntax, [margin note]{.marginnote}) without further ado.

functional components like archiving and automated link / PLOS/PMCID abstract extraction

This could probably be split out of the backend relatively independently, but they are in the uncanny valley of being mostly glue code/special-cases/'schlep'. That makes it tricky to make them useful in general without a lot of overhead or architecture astronauting.

For example, the archiving code right now is much more specialized than, say, ArchiveBox, because the important parts are the logic of the whitelist and manual review to ensure quality and bookkeeping of what's been downloaded where.

And the PLOS/PMC code mostly outsources the work to a suite of R libraries, which I regret to say are bitrotting and are probably going to break outright in a few years as they are now unmaintained, so there's not all that much value to splitting them out. (EDIT: they didn't even make it through 2024 before all breaking.)

Given the intricacies you've pointed out, it got me thinking: could some of the simpler features be distilled into a more straightforward static site project?

It's hard to say. Most of it builds on each other. If you have references and footnotes, you want to be able to quickly view them; hence the whole popup system to begin with; if you have many references, then to be useful they need to not be asymptotically approaching 100% dead links everywhere, hence the archive system; with many different reference sources, it's help to annotate them by domain/filetype (if nothing else, to warn readers about PDFs, but the easiest way to do that is to just archive each URL and see if it turned out to be a PDF or not, as URLs are often misleading about the final filetype); if you have copies of references then you can view them in popups, but the experience of squinting at a PDF inside a popup is not great, especially when the abstract is often typeset even smaller, so you want to display the abstract at full size, so you get the annotation system, and of course you don't want to write them by hand, so you automate sources like Arxiv or Wikipedia; the more annotations you have, the more useful it becomes to collect them under tags, as otherwise you find yourself building up long lists of citations by hand and engaged in lots of copy-paste, so you need tag-directories and a tag metadata system; if they are hyperlinked to each other in addition to the tags, then the reverse citations become important, so you need bidirectional backlinks; and so on and so forth. If you try to stop partway, it's obviously bad. Like, you could have popups & annotations but only for within-website essays and Wikipedia and Arxiv, but then that's problematic for users because they have to learn that a very small subset of links will popup and have annotations, and then why not all the others...?

You can borrow the theme on its own, but to be honest, I consider that to be one of the least important parts. There are lots of minimalist themes and there's no arguing taste, so it's not really important to make a 'Gwern.net template' which has a bunch of black-and-white and boxes and a dropcap or SVG flourish here or there. (If anything, I think people should design & customize their own theme to express themselves rather than just copying my theme.)

@gwern
Copy link
Owner

gwern commented Aug 19, 2024

Update on the refactoring: as of August 2024, the YAML files have been replaced with a lighterweight novel 'GTX' format which has many fewer papercuts; the link metadata schema has been fixed; most of the hardwired configuration information has been split out to Config.* and tests added to keep them sane; the skylighting fork is no longer an issue (I apparently replaced it with a sed rewrite at some point...?), although the Hakyll symlink compiler still is an issue; I was forced to upgrade to a recent GHC by my workstation motherboard/CPU frying itself, so that bitrot is less of an issue; but the long-dreamt-of String->Text migration still remains a fantasy. Overall, the backend is still a dumpster fire, but at least it's no longer quite so raging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants