NEI plaintext notebook syntax

Goals

NEI always tries to satisfy these three goals:

Handle any existing ipynb file correctly.
Present notebooks in the editor buffer as valid Python syntax.
Minimize the diffs applied to opened ipynb files. If you open an ipynb file in NEI and then save it, there should be no diffs. In general, the diffs you see in an ipynb file should directly correspond only to the edits you made in the NEI buffer.

As vanilla Python does not have the concept of code or markdown cells, some new conventions are required to demark these boundaries. The two conventions used to offer NEI hints about cell boundaries are:

Code cells

# In[ ]
<code cell>

Here all code below the prompt line is part of that code cell until a markdown cell is detected or another code prompt is used.

Markdown cells

"""
<markdown cell>
""" #:md:

A markdown cell is an unindented triple quoted string where the triple quotes start on a new line. The top triple must have no trailing content while the bottom boundary needs to be followed by the comment #:md: (after a single space) and no other trailing characters.

Triple quoted strings are not NEI markdown cells (making them normal Python triple quoted strings) unless the bottom boundary has the #:md: marker. A markdown cell is defined by these annotated lines by looking backwards for the first top """ boundary (again, on a newline with no trailing content).

Both these conventions are valid Python. If these hints are missing from a .py file loaded in NEI, the file can still be loaded as a notebook but information about cell boundaries will be missing.

This scheme is sufficient for most notebooks but the mapping from JSON ipynb to a Python plaintext file cannot be perfect. There are two corner cases that need to be highlighted: escaping of triple quotes in markdown cells (next section) and how the NEI annotations are handled when present within the markdown and code cells of an ipynb file.

Escaping quotes

SUMMARY: Triple quotes in a markdown block are escaped in NEI when viewed in the editor buffer by converting triple double quotes (i.e """) to zero width space separated double quotes (i.e """, equivalent to the Python string '\"\N{zero width space}\"\u200b\"'). These are only present in the editor buffer and not in the markdown rendering or in the saved ipynb file (they are converted back to regular triple quotes before saving). If you wish, you can also escape triple quotes in the normal way (i.e using \"\"\") if required in the editor buffer (to preserve valid Python syntax) but you can also escape the same way NEI does it (which looks better) using nei-insert-escaped-triple-quotes or via the menu (NEI -> Buffer -> Insert escaped triple quotes).

Details and rationale

This approach places no restrictions on what strings you can use inside markdown cells. The only time you may have a problem is if you want to preserve the zero width space separated quotes in the saved markdown source. This is quite unlikely as you have no reason to use zero width space separated quotes in classic notebook as they behave and look identical to normal triple quotes in that context.

NEI annotations inside markdown/code cells

Given the application of the quoting scheme described above, a markdown cell can contain any syntax at all. It is therefore possible to document (and nest) NEI syntax in a markdown cell without worry about issues parsing the file.

Code cells however follow a different rule as they can contain any Python in the same way that NEI itself is expressed as Python. This means if NEI syntax appears in a code cell (where it won't have any effect on the code result due to the choice of syntax), this nesting will not be represented when the contents of that code cell are presented in the editor.

In other words, while NEI syntax can be nested in markdown cells it is flattened if present in a code cell (e.g if you copy/paste from a NEI buffer into a code cell in the classic notebook environment). In this situation, if you open such a notebook in NEI and save it to disk, you will see a diff as the annotations in the code cell will be transformed into cell boundaries.

This isn't a problem for working with notebooks in the wild as these boundary annotations are very unlikely to be encountered unless NEI already in use. This is something to be aware of if you are copy/pasting from a NEI buffer into other notebook environments.

Offer function to copy from buffer while stripping annotations?

Dead code regions

Unlike regular notebooks which only have markdown and code cells, NEI buffers actually have three types of content: markdown cells, Python inside code cells and Python outside of code cells.

For instance a Python file without any prompt of the form # In[ ] has not started a code cell at any point. If there is no markdown boundary annotations either, this means typical Python files correspond to empty notebooks.

There are two types of region which represent Python code outside of code cells (and which therefore do not appear in notebooks):

Python code before the first code cell annotation # In[ ].
Python code after a markdown cell terminates but before either another markdown cell or a code prompt annotation. E.g:

"""
<Markdown>
"""

<Python code not in a code cell>

"""
<More markdown>
"""

or

"""
<Markdown>
"""

<Python code not in a code cell>

# In[ ]
<Python code in a code cell>

Neither of these conditions will occur when loading an ipynb file but when typing in a buffer, it is easy to end up with this type of code. In some situations these areas can be useful (e.g as a scratch space or for documenting a notebook in a way that doesn't appear in the notebook itself, e.g to add shebang lines, elisp settings etc to the top of the file).

Due to this type of region that doesn't appear in the notebook save, it is recommended that you check either the HTML view with nbconvert or simply keep an eye on the rendered view of the notebook in the browser.

Offer policies here when saving? Ignore, warn, disallow?

Provide feedback

Saved searches