DonaldxDonald

DonaldxDonald

twitter
github

Evolution of Web Rich Text Editors

This article was written in 2022, and here is an upload.

What is Rich Text?#

Rich Text refers to text that is formatted with common formatting options (such as bold and italic) as opposed to Plain Text.

rich-text

Web + Rich Text Editor#

vs-code

It is well known that the Web is currently the most universal platform; whether it's a computer, mobile phone, game console, car, or Kindle, as long as there is a browser, a website can be opened. With the support of technologies and tools like WebAssembly and Electron, super applications (Figma, VS Code) that perform complex operations on the Web are also emerging one after another. Faced with the temptation of "write once, run anywhere," many traditional desktop applications are gradually expanding their territory to the Web, and rich text editors are one of them.

Rich Text Editor = Pitfall?#

In the web front-end industry, rich text editors are recognized as a notorious pitfall. The advantage of web development is cross-platform, but the problem also arises from the compatibility issues brought about by cross-platform. Cross-platform means that it can run on various platforms, but how it runs is where the pitfalls lie.

In web development, editors must first handle focus, cursor selection, undo stack, parsing content pasted from external sources, etc., and then consider compatibility with different browsers (Chrome, Firefox, Safari, etc.)... After handling the most basic English input, users represented by Chinese users demand support for IME combination input... After solving IME input, RTL languages (Hebrew, Arabic) users come... Mobile users come... Collaborative editing users come...

To summarize with a quote from Zhihu:

The contradiction between backward productivity and people's growing demands.

Backward Productivity:

  1. Slow advancement of web-related standards
  2. Different implementations of the same operations or scenarios by browser vendors lead to compatibility issues
  3. Using HTML DOM to describe rich text content has too many uncontrollable situations

Growing Demands:

  1. Uncertain interaction intentions, such as pressing the Delete key, where different focus positions require different considerations
  2. Diversity of content input, such as typing, pasting, dragging, etc., each of which is quite complex to handle
  3. A large number of browser default behaviors that need to be intercepted, blocked, and proxied to ensure data integrity and correctness
  4. Users have increasingly high requirements for editors, such as: merging cells, multi-level nested lists, collaborative editing, version comparison, paragraph annotations; everyone considers these basic needs, but the technical difficulties involved exceed everyone's imagination.

Overview of Editor Technology Stages#

StageDescriptionTypical Products
L0
  1. Strongly relies on browser DOM API (contenteditable, document.execCommand)
  2. View is data
  1. UEditor
  2. TinyMCE
  3. CKEditor 1 ~ 4
L1
  1. Still based on contenteditable
  2. Abandon document.execCommand for content operations, implement it by themselves
  3. Have an abstract data model to describe the content and state of the rich text editor
  1. Quill
  2. Slate
  3. CKEditor 5
  4. Draft.js
  5. ProseMirror
  6. wangEditor v5
L2
  1. Abandon contenteditable, implement it by themselves
  2. Abandon document.execCommand for content operations, implement it by themselves
  3. Implement their own layout engine
  1. Google Docs

L0#

Editors in the L0 stage mainly rely on the browser's native contenteditable API for editing and use the document.execCommand API to perform various operations, such as bolding, linking, copying, pasting, etc.

Advantages:

  1. Low technical threshold. As long as the above two APIs are used, the webpage can have editing capabilities.
  2. Based on the browser's native editing capabilities, input is very smooth.
  3. No troublesome combination input issues.

Disadvantages:

  1. The same operation may have different implementations across different browsers.
  2. Outputting rich text content as HTML is not conducive to data management.
  3. Extending complex rich text is very difficult.
  4. There is no way to implement collaborative editing.

L1#

Currently, most editor frameworks are in the L1 stage, with representative examples being Quill, Slate, ProseMirror, and Draft.js. They mainly have two obvious characteristics:

  1. Still rely on the contenteditable API for content editing, but no longer rely on the document.execCommand API for content operations, implementing it themselves.
  2. Have an abstract data model to describe the content and state of the rich text editor.

2012 - Quill#

quill

Quill is an API-driven rich text editor framework that provides an out-of-the-box editing experience.

The author of Quill, Jason Chen, is of Chinese descent. Quill is actually a side project for Jason, who initially founded a company dedicated to creating collaborative editors similar to Google Docs, which led him to write Quill for personal use.

Quill abstracts the operations of modifying the DOM Tree and data, so that in actual use, we do not need to manipulate the DOM directly but instead operate through Quill's API, with the corresponding relationships as follows:

Editor Document ====> Parchment

DOM Node ====> Blot

With this layer of abstraction, the original direct operations on the DOM become operations on Blot, and these operations are represented using Delta. Quill discards the hierarchical structure of the DOM node tree in Delta, so it is completely impossible to see the tags and node relationships that wrap the text, only a flattened array ops.

Data Model:

img

{
  "ops": [
    {
      "attributes": {
        "bold": true
      },
      "insert": "Check"
    },
    {
      "insert": " "
    },
    {
      "attributes": {
        "link": "https://donaldxdonald.xyz/"
      },
      "insert": "this"
    },
    {
      "insert": " out ~"
    }
  ]
}

The flattened structure of Delta is actually an implementation of the OT model in collaborative editing, so Quill was designed from the beginning for collaborative editing. The benefit of flattening is that it helps improve performance, while the downside is that it can be challenging to represent some complex nested content.

Quill's characteristics are:

  1. Relies on the browser's native editing capability contentEditable (L1)
  2. Introduces a layer of abstract data structure to describe content and behavior
  3. Good support for collaborative editing
  4. Output structure can be either a string or Delta (JSON), but Delta as a data model has low readability

2015 - ProseMirror#

prosemirror

Marijn is the author of the CodeMirror editor and the acorn parser, the former of which is already used in the debugging tools built into Chrome and Firefox, while the latter is a dependency of Babel. To generate more income, Marijn started a new project, ProseMirror.

Marijn felt that none of the open-source editors on the market at that time adopted what he considered to be an ideal approach, and many were still using old paradigms for design, relying on contentEditable for implementation. This limited developers' control over document content, which could easily be modified by users and browsers. Although ProseMirror is still based on contentEditable for editing functionality, implementing a new selection logic from scratch would be too troublesome.

ProseMirror has a schema, so once the schema is defined, ProseMirror can automate the parser for you. The framework defines what properties and methods are needed to introduce a new Node, such as the nodeFromJSON method for converting structure to JSON and the toDOM method for defining how to convert structured data to DOM (somewhat similar to JSX). ProseMirror manages the changes from JSON data to DOM at this intermediate layer.

Data Model:

img

{
  "type": "paragraph",
  "content": [
    {
      "type": "text",
      "marks": [
        {
          "type": "strong"
        }
      ],
      "text": "Check"
    },
    {
      "type": "text",
      "text": " "
    },
    {
      "type": "text",
      "marks": [
        {
          "type": "link",
          "attrs": {
            "href": "https://donaldxdonald.xyz/",
            "title": ""
          }
        }
      ],
      "text": "this"
    },
    {
      "type": "text",
      "text": " out ~"
    }
  ]
}

ProseMirror's characteristics are:

  1. Relies on the browser's native editing capability contentEditable (L1)
  2. More abstract JSON document model. ProseMirror only defines a configurable model framework, and the specific structure can be customized during actual development.
  3. Nested tree structure. Can support complex structured content.
  4. Good support for collaborative editing. From its inception, ProseMirror has focused on supporting collaborative editing.
  5. After version 1.0, it introduced immutable data, providing a complete data flow for the editor's data processing, making it stable and controllable.

2016.02 - Draft.js#

draftjs

At that time, Facebook (now Meta) open-sourced Draft.js. Since they are from the same company, Draft.js used React to render the UI in the view layer. This was the first case of combining React with an editor, and the popularity of React allowed users to quickly develop based on Draft.js.

Draft.js not only looks like React but also has deep React influences internally, with state management similar to Redux in EditorState and ContentState, and using features like Immutable at the data layer. The properties of JS objects can be assigned freely, meaning they are mutable. In contrast, immutable data types do not allow arbitrary assignments; each modification through the Immutable API generates a new reference.

Draft.js's characteristics are:

  1. Relies on the browser's native editing capability contentEditable (L1)
  2. Uses React to implement the view layer
  3. Separates content storage and rendering logic
  4. Uses immutable data
  5. Although it also abstracts a JSON-based data model, its support for nested data is somewhat weak

2016.06 - Slate#

slate

At this time, many editor frameworks were already in circulation, but Ian Storm Taylor, while developing his own CMS product, still felt that there was no good editor available. He believed that these editors were fine for simple products, but developing large applications like Medium, Google Docs, and Dropbox Paper was too difficult, leading to the creation of Slate.

Slate is also an editor framework rather than an out-of-the-box editor tool. As a younger framework, Slate incorporates many advantages from its predecessors, referencing immutable data and plugin mechanisms from Draft.js, and borrowing nested data structures and schema constraints from ProseMirror. By integrating many core features of editor frameworks, along with an advanced framework concept and the author's pursuit of architecture (as of 2022, it is still in beta and has not reached 1.0), Slate remains quite popular in the community.

Data Model:

example-slate

{
  "object": "block",
  "type": "paragraph",
  "nodes": [
    {
      "object": "text",
      "text": "This is editable "
    },
    {
      "object": "text",
      "text": "rich",
      "marks": [{ "type": "bold" }]
    },
    {
      "object": "text",
      "text": " text, "
    },
    {
      "object": "text",
      "text": "much",
      "marks": [{ "type": "italic" }]
    },
    {
      "object": "text",
      "text": " better than a "
    },
    {
      "object": "text",
      "text": "<textarea>",
      "marks": [{ "type": "code" }]
    },
    {
      "object": "text",
      "text": "!"
    }
  ]
}

At this point, Slate's characteristics are:

  1. Relies on the browser's native editing capability contentEditable (L1)
  2. Uses React to implement the view layer
  3. Supports nested JSON data structures
  4. Immutable data
  5. Plugin mechanism as the core
  6. Schema for constraining data

2019 - Slate 0.50+#

Slate underwent a major update in its architecture, with the author stating, "The entire framework was reconsidered from the ground up." The main updates include:

  1. Separating the underlying logic into Slate Core, decoupling it from the view layer
  2. Rewriting in TypeScript
  3. Simplifying the plugin mechanism, with plugins no longer coupled with rendering logic
  4. Replacing Immutable.js with simple JSON objects
  5. Streamlining its own concepts and some commands, renaming them to Transforms

Data Model:

example-slate

{
  "type": "paragraph",
  "children": [
    { "text": "This is editable " },
    { "text": "rich", "bold": true },
    { "text": " text, " },
    { "text": "much", "italic": true },
    { "text": " better than a " },
    { "text": "<textarea>", "code": true },
    { "text": "!" }
  ]
}

At this point, Slate's characteristics are:

  1. Relies on the browser's native editing capability contentEditable (L1)
  2. Very concise support for nested data models
  3. Overall architecture adopts pure functions + interfaces, making both the thought process and code very simple
  4. The plugin mechanism supports the development of powerful features
  5. The overall design philosophy is very similar to the DOM

2022 - Lexical#

lexical

Due to Draft.js having done a lot of dirty work in terms of browser compatibility in the past, which is now largely unnecessary, and to improve developer experience (many developers complained that Draft.js is not user-friendly), Meta open-sourced a new editor framework called Lexical, aimed at replacing Draft.js.

Currently, there doesn't seem to be any particularly innovative concepts; it mainly absorbs the advantages of other editor frameworks:

  1. Relies on the browser's native editing capability contentEditable (L1)
  2. Retains some concepts from Draft.js (EditorState)
  3. Not bound to React, allowing various frameworks to implement the view layer
  4. The entire framework is quite lightweight, with almost no other dependencies

L2#

As mentioned earlier, with the browser's contenteditable API, it is possible to quickly develop an editor, but there are too many compatibility issues. Although most editor frameworks are still in L1, as early as 2010, the financially strong Google had already begun to abandon the browser's contenteditable. The new version of Google Docs is based on Canvas, developing its own cursor system, text layout system, and font parsing, so in addition to its own abstract data structure, even the editing operations are implemented by itself, resulting in consistent editing presentation effects, making collaborative editing a natural outcome.

google-docs

Of course, the core technology of Google Docs is not open-sourced; such financially driven work must be firmly in their own hands. Domestic editors like Tencent Docs and WPS can also be considered L2, not necessarily using Canvas, but the idea is to implement editing layout functions independently, abandoning the browser's contenteditable.

Conclusion#

Contenteditable is terrible, but editors have minimized their use of it. More severe is the chaotic ecosystem formed by the combination of operating systems, browsers, and input methods—an editor cannot control it, yet products expect to flourish in this ecosystem. Hence, it is said that web rich text editors are one of the pitfalls in front-end development.

  1. Why is it said that rich text editors are pitfalls? - Zhihu
  2. What proportion of front-end engineers can independently develop a text editor suitable for commercial websites within a reasonable time? - Zhihu
  3. The evolution of open-source rich text editor technology (2020 1024)
  4. The dilemma and breakthrough of ContentEditable
  5. Slate.js - A revolutionary rich text editing framework - Juejin
  6. Jason Chen: Building Editors in the Browser | JSConf.ar 2014
  7. Marijn Haverbeke: Salvaging contentEditable: Building a Robust WYSIWYG Editor | JSConf EU 2015
  8. ProseMirror
  9. Facebook open sources Lexical, an extensible text editor library - Hacker News
  10. Why is ContentEditable so terrible - OSCHINA - Chinese Open Source Technology Exchange Community
  11. How do you view Google Docs migrating from HTML to Canvas-based rendering? - Zhihu
  12. Discussing the dilemmas of web rich text editors based on popular editor architectures
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.