markdown2dita
is a lightweight tool written in pure python to convert
content written in markdown to dita-ot.
It relies heavily on the mistune library (which is fantastic to work with!) to parse the markdown.
markdown2dita
should work for all python2 versions 2.6+ and for all python3
versions 3.3+.
markdown2dita
can be installed via the pip package manager:
pip install markdown2dita
markdown2dita
can be used locally as follows:
- Clone the repository:
git clone https://github.com/mattcarabine/markdown2dita.git
- Install the requirements:
pip install -r requirements.txt
markdown2dita
is designed to be used either as a python package or a simple
command line tool.
The CLI tool takes markdown as input either from a file or stdin
and outputs
the equivalent dita to either a file or stdout
.
usage: markdown2dita [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
markdown2dita - a markdown to dita-ot CLI conversion tool.
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
input markdown file to be converted.If omitted, input
is taken from stdin.
-o OUTPUT_FILE, --output-file OUTPUT_FILE
output file for the converted dita content.If omitted,
output is sent to stdout.
The below example demonstrates how to convert a given input file test.md
:
markdown2dita -i test.md
The below example demonstrates how to use markdown2dita
to convert markdown
provided via stdin
to dita:
echo '**My** `markdown` *string*' | markdown2dita
The below example demonstrates how to send the dita output to a given file:
markdown2dita -o test.dita
markdown2dita
attempts to be API-compatible with mistune
as much as
possible, meaning that all code written for mistune can use markdown2dita
as
a pop-in replacement instead to generate dita rather than html.
A simple API that renders markdown formatted text as dita:
import markdown2dita
markdown2dita.markdown('I am using **markdown2dita markdown converter**')
If you care about performance, it is better to re-use the Markdown
instance:
import markdown2dita
markdown = markdown2dita.Markdown()
markdown('I am using **markdown2dita markdown converter**')
Much like mistune
you can do things like override the renderer, however I
would not recommend doing so as this tool simply aims to be a dita converter.
If you wish to alter the output of the markdown conversion then I recommend checking out mistune itself.
Certain elements from markdown do not exist in dita, below is a list of them along with an explanation of how the converter handles them:
- block quote (
>
): dita-ot does not have the capability for block quotes, therefore the converter will output these as codeblocks instead which create a similar graphic effect - strikethrough (
~strikethrough~
): dita-ot does not have the capability to display strikethroughs, the converter will just include the struck through text as plain text. - inline html: dita-ot does not really support inline html, the converter will just send the plain html through.
- footnotes: footnotes are completely ignored by the converter.
- hrule: dita-ot does not support horizontal rules, the converter ignores any in the input text.
- headings: dita-ot only supports a single level of heading. Therefore each
section is split on
H2
and above (where the heading is the section title). This can be configured by passing the optiontitle_level
to the markdown initializer. For example, to set all headings H4 and above to be the section titles:import markdown2dita markdown = markdown2dita.Markdown(title_level=4) markdown(text)
If you find any issues not described above or have any feedback then please raise a GitHub issue and I will take a look!
- add tests: the tool has been extensively tested using known markdown texts and visually inspecting them. It would be great if we could find a better way to validate that the dita generated is correct (perhaps converting from md -> html -> dita?)
- add ability to automatically parse metadata: currently the page type is set to concept and the title/short description are not filled in. Would be good to do something like jekyll does and automatically parse the top block for metadata and use that to fill in the other parts of the dita output.
- add batch processing to CLI tool: currently you have to parse each file one command at a time, this may be arduous for large migrations where there is a whole directory structure filled with markdown files to be converted. The CLI tool should be able to handle this when you give it a directory (possibly a
--recursive
option?)