The Intl.Segmenter object is now part of Baseline

You can now use Intl.Segmenter for locale-sensitive text segmentation to split a string into words, sentences, or graphemes.

Browser Support

  • Chrome: 87.
  • Edge: 87.
  • Firefox: 125.
  • Safari: 14.1.

Source

Many non-Latin languages, such as Chinese and Japanese, don't use spaces to separate words. Therefore, using the JavaScript split() method on whitespace to split text into words, will return incorrect results.

When creating a new Intl.Segmenter object with the Intl.segmenter() constructor, pass in a locale and options including granularity, which can have values of "grapheme", "word", or "sentence". The following example creates a new Intl.Segmenter object for Japanese, splitting on words.

const segmenter = new Intl.Segmenter('ja-JP', { granularity: 'word' });

Calling the segment() method on an Intl.Segmenter object with a string of text returns an iterable:

const segments = segmenter.segment(str);
console.table(Array.from(segments));

Read Using the Intl.Segmenter API on the Polypane blog for an excellent tutorial on how to use this feature.

International Text Segmentation with Intl.Segmenter in JavaScript has more examples, including how to use Intl.Segmenter with emoji.