"Cooperative Editing Implementation in ProseMirror"

❗️ Note: it has been days since this article was written, please be aware of its timeliness

🔗 Source： https://marijnhaverbeke.nl/blog/collaborative-editing.html

🖥 Note：If I don’t understand something in translation, I’ll just post the original text and translate it later when I figure it out, to avoid misleading others and becoming a laughingstock.

This blog post describes the collaborative editing technology used in ProseMirror]. For an introduction to ProseMirror, you can check here.

The Problem of Collaborative Editing

A real-time collaborative editing system means multiple people may edit the same document simultaneously. The system ensures the document remains synchronized—changes made by one user are sent to others and displayed in their documents.

Since transmitting these changes over any network takes time, the complexity of such systems lies in how they handle concurrent updates. One solution is to allow users to lock the document (or part of it), preventing others from making changes at the same time. However, this forces users to think about locks and wait when they don’t hold them (i.e., while others are editing). We don’t want that.

If we allow concurrent updates, we encounter a scenario where users A and B modify the document simultaneously without being aware of each other’s changes, requiring them to negotiate how to update the document. Their actions might not interfere—e.g., editing different parts—or they might conflict—e.g., attempting to modify the same word.

Operational Transformation (OT)

There’s a lot of research on this problem. I must admit that despite reading many papers, I don’t fully grasp the field, and if you spot any misunderstandings or missing references, I’d be happy if you emailed me about it.

Much of the research actually focuses on distributed systems, where nodes exchange messages without a central authority. The classic solution is called “Operational Transformation,” a distributed algorithm defining a way to describe changes with two properties:

You can transform changes relative to others. For example, if user A inserts an “O” at position 1 and user B inserts a “T” at position 10, A can transform B’s change relative to their own, inserting “T” at position 11, because an extra character was added before B’s offset.
Regardless of the order concurrent changes are applied, everyone ends up with the same document. This allows A to transform B’s changes relative to their own, and B to do the same for A’s changes, ensuring both users don’t end up with divergent documents.

An OT system applies local changes immediately to the local document, then broadcasts them to other users, who transform and apply the changes. To accurately determine which local changes remote ones should be transformed against, the system must also include a representation of the document state when broadcasting changes.

This process sounds simple but is a nightmare to implement. Once you support multiple trivial changes (e.g., “insert” and “delete”), ensuring the same document results from any order of operations becomes extremely difficult.

Joseph Gentle, an engineer who worked on Google Wave, said…

Unfortunately, implementing OT sucks. There are a million algorithms with different trade-offs, mostly locked up in academic papers. Getting it right is very hard and time-consuming.

Centralization

The complexity of OT largely stems from its need to distribute changes. Distributed systems have great practical and strategic properties and are often fun to develop.

However, you can significantly reduce complexity by introducing a “central authority for processing changes.” Honestly, I’m puzzled that Google used OT in Google Docs (a centralized system).

ProseMirror’s algorithm is centralized, with a single authority (to which all users connect) deciding the order of changes. This makes implementing a collaborative editing system relatively easy and understandable.

I don’t think this “centralized” nature is a major obstacle to running OT in a distributed way. You could use consensus algorithms like [Raft](https://en.wikipedia.org/wiki/Raft_(computer_science) to elect an arbiter instead of relying on a central service. (Note: I haven’t actually tried this.)

ProseMirror’s Collaborative Algorithm

Like OT, ProseMirror uses a vocabulary of changes (e.g., delete, insert) and transforms them against each other. Unlike OT, however, it doesn’t guarantee that applying changes in different orders will produce the same document.

By using a centralized service, it can even be made easy for all clients to apply changes in the same order. You can employ a mechanism similar to that used in code version control systems. When a client has a change, they attempt to push it to the server. If the server determines that the change is based on the latest version, the change is accepted. If not, the client must first pull changes from other clients, then rebase their own changes before retrying the push to the server.

Unlike Git, the document’s history in this model is linear, and a given version of the document can be represented simply using an integer.

Also differing from Git, all clients continuously pull (or push and listen for) new changes to the document and track the server’s state as quickly as the network allows.

The only challenging part is rebasing others’ changes onto your own. This is very similar to the transformations performed by OT. However, it is done with the client’s own changes rather than remote server changes.

Position Mapping

While OT transforms changes relative to others’ changes, ProseMirror uses a derived data structure called a position map to handle this transformation. Whenever you apply a change to a document, you obtain a new document and the aforementioned map, which can be used to translate positions from the old document to their corresponding positions in the new document. The most notable use of the map is to adjust cursor positions so they remain at the same “conceptual” location—if a character is inserted before the cursor, the cursor should move forward (to the right) by one position along with the surrounding text.

The transformation of changes is entirely based on position mapping. This is actually quite good, as it means we don’t need to write transformation code specific to change types. Each change has one to three position-related pieces of information, represented as from, to, and at. When transforming a change relative to another given change, these positions are mapped through the other change’s position map.

For example, if a character is inserted at position 5, then when transforming the change “delete positions 10 to 14” relative to this insertion, it would become “delete positions 11 to 15.”

The positions of each change only make sense when the initial document version they were applied to is the same. A position map defines the mapping between positions in two document versions before and after a change. To apply a change to a different version, it must be mapped step by step through the changes between its own version and the target version.

(For simplicity, examples will use integers to represent positions. Actual positions in ProseMirror consist of an integer offset within a paragraph plus the path of that paragraph in the document tree.)

Rebasing Positions

Things get interesting when a client has multiple local changes that haven’t been pushed to the remote server. If changes from others arrive at this point, all local uncommitted changes need to be transformed based on these changes. Suppose we have local changes L1 and L2 and want to rebase them onto remote changes R1 and R2, where L1 and R1 are based on the same document version.

First, we apply R1 and R2 to our original document version (the client must track the document version they are currently displaying—including un-pushed changes—and the original version that doesn’t yet include these changes). This operation creates two maps, mR1 and mR2.

Translator’s note: The following paragraph is somewhat difficult to understand and may require visualization. In simple terms, L2 is a modification of the document based on the position mapping (position adjustments) made by L1. To correctly rebase L2, we must first map (adjust) the positions affected by L1 and all changes before it (i.e., R1 and R2). This ensures L2 obtains the correct position mapping and can then modify the document at the correct positions.

We can straightforwardly map L1 forward to L1*, where L1* is the version of L1 mapped through mR1 and mR2. However, L2 is based on the document modified by L1 from the initial version, so for L2 (note: here, L1 can be simply mapped as described earlier, but handling L2 requires the following steps), we must first map backward L2 through mL1 (the map created by applying L1, i.e., the inverse operation of mL1—rolling back history). At this point, the document is the same as when R1 began, so we can map L2 through mR1 and mR2, and finally through mL1*—the map produced by simply applying L1*. Now we have L2*, which can be applied to the document after L1* has been applied. Voilà, we’ve rebased two changes onto two others.

Translator’s Note: In the following passage, “inserting two characters at position 5” corresponds to L1 above, while “inserting between two characters (position 6)” corresponds to L2 above. Therefore, when rebasing L2, it is necessary to first roll back L2’s insertion position to the initial document via mL1, but at this point, the document does not have a mappable position for L2 because the position does not yet exist.

Mapping deletion operations and backward-mapping (historical rollback—Translator’s Note) insertion operations lose information. If you insert two characters at position 5, and then another person inserts at position 6 (between the previously inserted characters), backward-mapping (historical rollback, as mentioned earlier for L2—Translator’s Note) and then forward-mapping through the initial insertion operation will place you (after inserting these two characters) either before or after the inserted characters, because the position between these two characters cannot be represented in a document that does not yet have them (how to represent the exact position of a character that is about to be inserted but has not yet been inserted?—Translator’s Note).

Translator’s Note: The following passage is the core of this article. By providing additional inverse mapping information during mapping, when backward-mapping (historical rollback) encounters a position that needs to be mapped to a non-existent location, the system first restores the content of that position (using the mirror of the mapping) and then performs the mapping. After the preceding mappings are completed, this mapping is skipped directly (because it has already been completed during the content restoration). Note that the mirror of the mapping must maintain the same size as the mapping content (obviously, otherwise the position calculation would be incorrect).

To fix this issue, the ProseMirror collaborative system uses mapping pipelines, which are not just a series of mappings but also store information about which mappings are mirrors of each other. When a position passes through these pipelines and encounters a mapping that deletes the content around that position, the system scans the pipeline forward (i.e., toward earlier mappings that have not yet encountered this position—Translator’s Note) to find the mirror of that mapping. If such a mapping is found, we will forward-map (i.e., normal position adjustment—Translator’s Note) to its position and use the relative offset of this position in the deleted content to restore the position in the inserted content of the mapping. The mirror of a deletion operation’s mapping must maintain the same content size as the mirror of an insertion operation’s mapping.

Direction of Mapping

Whenever content is inserted, this explicit insertion point can be mapped to two different positions (both of which are meaningful): before the inserted content or after it. Sometimes the former is appropriate, and sometimes the latter is. ProseMirror’s position mapping system allows developers to choose their preferred direction.

Translator’s Note: The following passage explains that if the document is abc, and the letter x is inserted after a and before b, then the changes from and to are both 1; after the content is inserted, from forward-maps (leftward) and remains 1 unchanged, while to backward-maps (rightward) and becomes 2.

This is also why a change-related position contains several different pieces of position information. If a change has position information such as from and to, such as deleting or styling a part of the document, and there is content before or after this position, that content should not be included in this change (this content only needs to have its position mapped after the change occurs—Translator’s Note). Therefore, the from position should forward-map (leftward), while the to position should backward-map (rightward).

When a change is mapped through a mapping that completely contains it—for example, inserting a character at position 5 and then mapping this position through a change created by deleting positions 2 to 10—the entire operation of inserting a character at position 5 will simply be discarded because its context no longer exists.

Types of Changes

An atomic change in ProseMirror is called a step. Some changes that appear as a single action to the user are actually decomposed into several steps. For example, if you select text and press the Enter key, the editor will generate a delete step to remove the selected text, followed by a split step to divide the current paragraph.

Here are the types of steps that exist in ProseMirror:

addStyle and removeStyle add or remove inline styles from the document.
split splits a node into two. For instance, it can be used to divide a paragraph when the user presses the enter key. It only requires a single at position.
join merges two adjacent nodes. This operation is only valid for nodes containing content of the same type. It requires from and to positions to specify the end of the first node and the start of the second node (this ensures the correct nodes are merged. If content is inserted between the nodes during this process, the merge is ignored).
ancestor modifies a node’s type and adds or removes its ancestors. It can be used to wrap a list or convert a paragraph into a heading. It requires from and to positions to mark the node’s start and end.
replace replaces a portion of the document with zero or more nodes, optionally stitching compatible nodes at the cut edges. Its from and to positions define the range to be deleted, while at specifies where the new nodes should be inserted.

Among these types, the last one is the most complex. My initial impulse was to split it into separate removal and insertion steps. However, since the position mapping for replacement steps requires treating the step as atomic (positions must be derived from all replacement content), I achieved better results by treating it as a single step.

Intent of Operations

A crucial property of real-time collaborative editing systems is that they must strive to preserve the intent of changes. Since merging changes happens automatically without user intervention, it can be frustrating when your edits are rebased into something unintended.

I designed these step types and their rebasing behavior to minimize odd outcomes. Most of the time, changes don’t overlap and require no interaction. But when they do overlap, we must ensure the merged result makes sense.

Sometimes, changes must simply be discarded. If you type into a paragraph that another user deletes before your changes reach the server, your input loses its context, and inserting it would create a meaningless fragment.

If you attempt to merge two lists, but someone inserts a paragraph between them, the operation becomes impossible (you can’t merge non-adjacent nodes), so your change is discarded.

In other cases, modified changes remain meaningful. If you bold characters at positions 5–10 while another user inserts a character at position 7, the result is bolding positions 5–11.

Finally, some changes can overlap without conflict. For example, if you turn a word into a hyperlink while another user bolds it, both changes are applied.

Offline Editing

For real-time collaboration, silently modifying or dropping changes isn’t a major issue—feedback is nearly instant. You see the paragraph you’re editing disappear, so you know your changes are lost.

But for offline editing (where you edit without connecting to a central server) or branching workflows—where you make extensive edits and later merge them into a document others have modified in the meantime (regardless of how much they’ve added, deleted, or rearranged)—this model (and OT) falls short. It might silently discard large amounts of editing work (if the context is gone) or produce bizarre text combinations when users edit the same sentence differently.

In such cases, a diff-based approach may be better. Automatic merging might not be possible—conflicts must be identified and resolved by the user, as Git does.

Undo History

How should undo history work in collaborative systems? The widely accepted answer is that it should never use a single, shared history. Undoing should reverse your last edit, not the document’s last edit.

This means simple rollback to a prior state won’t work. Reverting your changes (while others have also edited) creates a new state not present in the document’s history.

To support this, I defined reversible changes (sequences of steps) where inversion produces a new step that counteracts the original.

ProseMirror’s undo history accumulates inverse steps and tracks their position mappings to the current document version—essential for reverse-mapping to the latest state.

However, the downside is that if a user makes changes and then becomes idle, while others modify the document during this period, the positional mappings for converting this user’s changes to the current version of the document will accumulate indefinitely. To address this issue, the history is periodically compressed by forward-mapping the reversed changes so they can once again start editing from the current document. This discards the intermediate positional mappings.

- EOF -

Originally published at: "Cooperative Editing Implementation in ProseMirror" - Xheldon Blog

Writing Stats2026

Popular Tags