This is an archive of all the posts in the r/rational subreddit in plain-text org-mode.
I personally use it to do fast, offline full-text searches on the whole subreddit.
Reddit does not map cleanly to org-mode, so I am open to ideas on changing the template used to create the org-mode files.
Github renders org headings as HTML headers, which doesn’t work at all for these. Use an org-mode viewer to view the files or just open them as plain-text.
Online full-text search via Sourcegraph
This search engine was optimized for searching code, so it is not too suitable for our purposes, but it’s still much better than Reddit’s own search.
Here is Sourcegraph’s query syntax. The important point is that it supports regular expressions and assumes the words are in the correct order, unless you use boolean operators such as japanese AND horror
.
Note that the link above searches in the indices
directory, where each file contains only a single comment. This is usually what you want . (It’s only drawback being that it’s tedious to find the comments around the found results.) To search per submission (instead of per comment), use this link, which searches the posts
directory instead.
Install GitHub - Genivia/ugrep: 🔍NEW ugrep v3.3: ultra fast grep with interactive que… by, e.g.,
brew install ugrep
Now paste this function into your shell:
ugc () { ugrep --heading --color=always --pretty --context=3 --recursive --bool --smart-case '--sort=best' --no-confirm --perl-regexp --hidden '--binary-files=without-match' "$@" | less -n }
Now you can do:
git clone --recursive https://github.com/NightMachinary/r_rational cd r_rational/posts
ugc 'japanese horror'
ugrep
also supports an interactive, incremental search mode:
function ugci { local r="${@[-1]}" opts=("${@[1,-2]}") ugrep --heading --color=always --pretty --context=3 --recursive --bool --smart-case '--sort=best' --no-confirm --perl-regexp --hidden '--binary-files=without-match' "$opts[@]" --query=1 --regexp="$r" }
ugci 'japanese horror'
This directory saves each comment to a single file, which is very inefficient on modern OSes with a block size of 4KB. If you don’t use these files, deleting them will reduce the size of this repo by a lot (as of this writing, the posts
directory is only 163MB). You can also delete the .git
directory, but then you would lose access to git
features such as pulling new updates.
The easiest way to achieve this is to delete the authors’ names from the data using a search-and-replace tool such as ms-jpq/sad:
fd . | sad '\s*:author:.*' '' fd . | sad 'u/\S+' 'u/redacted'
This repo was created using this script, which needs some refactoring to be decoupled from my environment.
I plan to keep the repo up-to-date as new posts are added to the subreddit.