Pre-commit hook for mirroring Word (docx) files into plain text files (using Pandoc).
This pre-commit hook provides a solution for organizations that manage Word (.docx) documents with Git and GitHub. With this hook, whenever a Word document is committed or updated in a Git repository, a plain text version is also created. You can use this plain-text mirror to facilitate GitHub Pull Request reviews.
At the root of your document's Git repository, add a file named .pre-commit-config.yaml
with the following contents:
repos: - repo: https://github.com/jsickcodes/pre-commit-docx-plain rev: 0.3.0 hooks: - id: docxplain
Next, you'll need to install pre-commit (if you haven't already):
pip install -U pre-commit
Initialize the pre-commit hooks in the repository itself:
pre-commit install
If the repository has an existing Word document, it is a good idea to create the mirrored plain text file now:
pre-commit run --all-files
Commit the plain text (.txt
) file that is generated.
If you are contributing to a repository using pre-commit-docx-plain
, you will also need to install pre-commit itself and install the pre-commit hooks in your local clone of the repository:
pre-commit install -U pre-commit pre-commit install
Now, when you update and commit changes to the Word file in your repository, pre-commit will run the pre-commit-docx-plain
hook and generate a new or updated mirror of the file in plain text.
Use git add
to stage the plain text file and try your git commit
again.
On this second try, the plain text mirror file should be in sync with the Word file, and the commit can go ahead.
You can run pre-commit-docx-plain
in GitHub Actions to ensure that the plain-text mirror file is always up-to-date.
If the repository does not already have a GitHub Actions workflow, create a file with the path .github/workflows/ci.yaml
with the following contents:
name: CI 'on': pull_request: push: branches: [main] jobs: pre-commit: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 - name: Install pandoc run: brew install pandoc - name: Run pre-commit hooks uses: pre-commit/[email protected]
This workflow will generate a build "failure" if the plain-text mirror file is out of date with the Word file in the repository — as might happen if a contributor did not install pre-commit locally.
To avoid complexities related to installing pre-commit, the GitHub Actions workflow can be configured to automatically generate, commit, and push updates to the plain text mirror.
The .github/workflows/ci.yaml
file:
name: CI 'on': pull_request: push: branches: [main] jobs: pre-commit: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 with: fetch-depth: 0 - name: Set up Python uses: actions/setup-python@v2 - name: Install pandoc run: brew install pandoc - name: Run pre-commit hooks uses: pre-commit/[email protected] with: token: ${{ secrets.GITHUB_TOKEN }}
Note that this workflow can only run with private repositories.
The GITHUB_TOKEN
secret is not available to public forks.
When using this workflow, contributors need to either pull down the plain text file update to their local branch, or be prepared to use a forced push (git push --force
) because their branch is "behind" the GitHub origin.
This pre-commit hook works out of the box, but does allow for some customization.
By default, if the Word file is named document.docx
, the plain text mirror file is named document.txt
.
However, you can customize the suffix of the file name by setting a --suffix
command-line option:
repos: - repo: https://github.com/jsickcodes/pre-commit-docx-plain rev: 0.3.0 hooks: - id: docxplain args: - "--suffix" - ".extracted.txt"
You can add a header to the plain text file's content by setting the --header
command-line option
This is useful for explaining that the file is autogenerated:
repos: - repo: https://github.com/jsickcodes/pre-commit-docx-plain rev: 0.3.0 hooks: - id: docxplain args: - "--header" - "THIS FILE IS AUTOGENERATED"
You can also insert the name of the source docx file using Python format string syntax and the docx
template variable:
repos: - repo: https://github.com/jsickcodes/pre-commit-docx-plain rev: 0.3.0 hooks: - id: docxplain args: - "--header" - "This file is autogenerated from {docx}. Do not edit."
From the pull request:
- Update the changelog
- Update the version numbers in the
.pre-commit-config.yaml
code samples in the README. - Update the version in setup.cfg.
Next, merge the PR to the main
branch once checks pass.
Finally, create a Release using the GitHub Release UI from the main
branch. The tag name should be the semantic version set in the first step.
pre-commit-docx-plain is developed and maintained by J.Sick Codes Inc.