Add Inline Styles to Text with Markdown Syntax in ProseMirror

Posted on Mon, Aug 16, 2021 🌿 Growing Article Engineering ProseMirror RegExp

In ProseMirror, you can write InputRules, which trigger certain actions when a given pattern of text is typed in the editor and matched by a RegExp.

There are already many examples of InputRules in prosemirror-example-setup that help you create different types of block nodes. So, in this article, we talk about writing ones for creating inline styles like **bold** and *italic*.

Explanation

Naive

To match **bold**, you write:

/**(.+)**$/

But this does not work if you also need to match *italic*, let’s say, with:

/*(.+)*$/

Why? Because when you type **bold*, it triggers the match of /*(.+)*$/, so the text becomes *bold, and you have no chance of typing the final *.

However, if there’re no overlapping cases, like `code`, this naive version works fine.

Robust

To match **bold**, you write:

/(?<=[^*\n]|^)\*\*([^*\n]+)\*\*$/

The first part (?<=[^*\n]|^) uses a Positive Lookbehind operator (?<=...) to tell the regex engine to match something, but not to add it to the match (so when you want to match x**abc**, you don't get the "x" in the result). [^*\n] says to match a character that is not * or \n. |^ says to also match the beginning of a line.

The second part is trivial. \*\* matches two *s, to find the first two characters of **bold**. \ escapes * to treat it as a character.

We can now go back to the first part. The reason of the first part is to ensure that there’re no *s before ** that we want to match, so we can match exactly two *s. If there’re three, we don’t match, so other rules that match three works.

The third part ([^*\n]+) is to capture the text between the pair of **. + says that it’ll match one or more characters. [^*\n] excludes *, since * would suggest the end of the pair, and \n, since we’re matching inline styles, so no newline!

The final part \*\*$ ensures the end of the pair.

In conclusion, there’re four parts in this regex structure — Guard + Match the Start + Wrapping Text + Ensure the End. Now we can write more regexes based on this structure.

Examples & Test Cases

Bold and Italic with Triple Stars

Regex:

/(?<=[^*\n]|^)\*\*\*([^*\n]+)\*\*\*$/

Match:

***123***
5***123***
***1***

No match:

****123***
***123**
**123***
***123

Bold with Double Stars

Regex:

/(?<=[^*\n]|^)\*\*([^*\n]+)\*\*$/

Match:

**123**
5**123**
**1**

No match:

***123**
**123*
*123**
**123

Bold with Double Underscores

Regex:

/(?<=[^_\n]|^)__([^_\n]+)__$/

Match:

__123__
5__123__
__1__

No match:

____
__123_
_123_
__123

Italic with Single Star

Regex:

/(?<=[^*\n]|^)\*([^*\n]+)\*$/

Match:

*123*
5*123*
*1*

No Match:

**123**
**123*
*123**
*123

Italic with Single Underscore

Regex:

/(?<=[^_\n]|^)_([^_\n]+)_$/

Match:

_123_
5_123_
_1_

No match:

__
__123_
__123__
_123