Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

BudouX Java Module

BudouX is a standalone, small, and language-neutral phrase segmenter tool that provides beautiful and legible line breaks.

For more details about the project, please refer to the project README.

Demo

https://google.github.io/budoux

Usage

Simple usage

You can get a list of phrases by feeding a sentence to the parser. The easiest way is to get a parser is loading the default parser for each language.

import com.google.budoux.Parser;

public class App
{
    public static void main( String[] args )
    {
        Parser parser = Parser.loadDefaultJapaneseParser();
        System.out.println(parser.parse("今日は良い天気ですね。"));
        // [今日は, 良い, 天気ですね。]
    }
}

Supported languages and their default parsers

  • Japanese: Parser.loadDefaultJapaneseParser()
  • Simplified Chinese: Parser.loadDefaultSimplifiedChineseParser()
  • Traditional Chinese: Parser.loadDefaultTraditionalChineseParser()
  • Thai: Parser.loadDefaultThaiParser()

Working with HTML

If you want to use the result in a website, you can use the translateHTMLString method to get an HTML string that wraps phrases with non-breaking markup, speicifcally, zero-width space (U+200B).

System.out.println(parser.translateHTMLString("今日は<strong>良い天気</strong>ですね。"));
//<span style="word-break: keep-all; overflow-wrap: anywhere;">今日は<strong>\u200b良い\u200b天気</strong>ですね。</span>

Please note that separators are denoted as \u200b in the example above for illustrative purposes, but the actual output is an invisible string as it's a zero-width space.

Caveat

BudouX supports HTML inputs and outputs HTML strings with markup applied to wrap phrases, but it's not meant to be used as an HTML sanitizer. BudouX doesn't sanitize any inputs. Malicious HTML inputs yield malicious HTML outputs. Please use it with an appropriate sanitizer library if you don't trust the input.

Author

Shuhei Iitsuka

Disclaimer

This is not an officially supported Google product.