"The Origin of ProseMirror"

✍🏼 Written on Nov 5, 2021   
❗️ Note: it has been days since this article was written, please be aware of its timeliness
🖥  Note:The original author wrote this article on July 7, 2015, when they decided to create a rich text framework. Two years later, ProseMirror 1.0 was released. This translation leans toward a liberal interpretation.

Sometimes I lie in bed at night, feverishly searching for new ways to take on more responsibilities with meager income. Then it hits me—I should start another open-source project!

Well, the above scenario isn’t entirely true, but the outcome is the same: I keep building complex, high-difficulty code and then abandoning it. In reality, the usual process is that I first come up with some technical concept, investigate it, and find that no one has done it yet. Finally, driven by curiosity and a desire for self-validation, I decide to see if I can do it.

This mechanism has birthed my latest “misadventure” (though I don’t plan to abandon it): ProseMirror, a browser-based rich-text editor. I open-sourced it via crowdfunding and considered how to sustain post-release maintenance.

An Editor?

Didn’t I just say I’d tackle things “no one has done before”? Aren’t there already at least a hundred browser-based rich-text editors?

Yes, yes, and yes. But none of the existing projects take what I consider the ideal approach. Many are deeply rooted in the old paradigm of relying on contentEditable elements and then trying to untangle the resulting mess. This leaves us with almost no control over what users and browsers do to our documents.

What do we need control for? First, a rich-text editor should make it easier to keep documents in a reasonable state. If the document is only modified by your code, you can define those modifications to preserve the invariants you want to maintain, and you can ensure the same behavior across different browsers.

More importantly, it allows you to represent these modifications in a more abstract way—not just as state changes (“something changed here, so here’s a new document”). Abstractly representing modifications is especially helpful for collaborative editing—effectively merging conflicting changes from multiple users by accurately capturing the intent behind the edits.

The Basic Implementation

ProseMirror does create a contentEditable element to display its document. This gives us the freedom to handle all logic related to focus and cursor movement and makes it easier to support screen readers and bidirectional text.

Any actual modifications to the document are captured by handling appropriate browser events and translated into our own representation of those changes. For relatively modern browsers, most types of modifications are easy to abstractly describe. We can handle key events to capture typed text, backspace, enter, etc. We can handle clipboard events to make copy, cut, and paste work properly. Drag-and-drop is also implemented via events. Even IME input triggers relatively usable composition events.

Unfortunately, in some cases, browsers don’t fire events that describe user intent—you only get the aftermath as a 输入 event. For example, this happens when correcting spelling from the context menu or when inputting special characters via key combinations (e.g., “Multi + e + =” on Linux to type “€”). Luckily, so far, all such cases I’ve encountered involve simple, character-level input. We can inspect the DOM, compare it to our document representation, and infer the intended changes.

When a modification occurs, the editor’s document representation updates, and the display (the DOM elements on the screen) refreshes to reflect the new document. By using a persistent data structure for the document—where modifications create a new document object without altering the old one—we can employ a very fast diffing algorithm to update only the necessary parts of the DOM. This is somewhat similar to what React and its derivatives do, except ProseMirror uses its own document representation rather than a generic DOM-like data structure.

The Editor’s Document

This document representation certainly isn’t HTML. But it’s also a “semantic” representation of the document: a tree-like data structure describing text in terms of paragraphs, headings, lists, emphasis, links, etc. It can be rendered into a DOM tree, serialized as Markdown, or converted to any other format that can express its encoded concepts.

The outer layer of this representation, which handles paragraphs, headings, lists, etc., is structurally similar to the DOM—it consists of nodes with child nodes. The content of paragraph nodes (and other block-level elements like headings) is represented as a flat sequence of inline elements, each with a set of associated styles. This is better than a full tree structure like the DOM. It makes it easier to enforce invariants (e.g., preventing text from being wrapped in emphasis tags twice) and allows us to represent positions within paragraphs as simple character offsets, which are easier to reason about than positions in a tree.

Outside paragraphs, we must use a tree structure. Thus, a position in the document is represented by a path—a sequence of integers indicating child indices at each level of the tree—and an offset within the node at the end of that path. This is how cursor positions are tracked and how the locations of modifications are recorded.

ProseMirror’s current document model mirrors Markdown’s, perfectly supporting everything expressible in that format. In the future, you’ll be able to extend and customize the document model for specific editor instances.

The User Interface

Currently, there are two styles of user interfaces for editors on the market. One is the classic toolbar at the top, while the other displays tooltips above your selection for inline styling, along with a menu button on the right side of the currently selected paragraph for block-level operations. I prefer the latter because it doesn’t disappear when unused (leaving your document completely unaffected), though I suspect many users might favor the familiarity of the toolbar.

All these user interfaces are implemented as modules separate from the editor’s core, and other UI styles can also be built on top of the same API.

Key bindings are configurable as well, following CodeMirror’s pattern. Functions bound to keys can be invoked as “commands” or executed from scripts via the execCommand method.

Finally, there’s a module called inputrules that can be used to specify what should happen when text matching a given pattern is entered. It can handle things like “smart quotes” or automatically creating a list when you type “1.” and press space.

Collaborative Editing

I previously mentioned collaboration. A significant portion of this project’s effort went into enabling real-time collaborative editing. I wrote another blog post (translated into [Chinese](/tech/Collaborative-Editing-in-ProseMirror.html)) detailing the technical aspects, but the general idea is this:

When a document is modified, it creates a new version along with a position mapping that translates positions from the old document to the new one. For example, this is used to move the cursor in response to edits.

The ability to map positions makes it possible to “[rebase” changes on top of other modifications by mapping where they should apply. There’s more to it, and I had to rewrite the system several times before getting it right, but I’m confident the final code behaves as expected.

In a collaborative scenario, when a client makes changes, they are buffered locally before being sent to the server. If another client submits its changes before ours arrive, the server responds with, “No, apply these changes first.” The other client then accepts those changes, rebases its own modifications on top of them, and tries again. Once changes are accepted, they are broadcast to all other clients to ensure everyone stays in sync.

Target Users

Who is ProseMirror for?

  • On one hand, websites that currently accept Markdown or similar formats might want to provide a more beginner-friendly interface for less technical users, then simply convert the results back to Markdown.

  • On the other hand, sites that have traditionally offered rich-text input but want tighter control over output might switch to ProseMirror, since enforcing constraints directly in the editing experience is far better than cleaning up messy HTML and hoping for the best.

  • Finally, companies looking to support collaborative editing while moving users from Google Docs to their own platform for document creation.

Sound interesting? Check out how the crowdfunding campaign for this open-source project is going.

- EOF -
Originally published at: "The Origin of ProseMirror" - Xheldon Blog