Next Unicode Code Point

C++ code to find the next Unicode code point in UTF-8 and UTF-16 encoded strings.

Associated blog post: https://giodicanio.com/2025/11/03/finding-the-next-unicode-code-point-in-strings-utf-8-vs-utf-16/

The NextCodePoint.hpp header declares the public interfaces of two functions: NextCodePointUtf8 and NextCodePointUtf16. As their names suggest, they are used to find the next code point in UTF-8 and UTF-16 encoded strings. Think of them like the "Unicode evolution" of increasing a character position index (index++, or pch++ with pointers) when iterating through characters of pure ASCII strings:

std::string name = "Connie";

for (size_t index = 0; index < name.size(); index++) {
    std::cout << name[index];
}

The function declarations are the following:

// Returns the next Unicode code point and number of bytes consumed.
// Throws std::out_of_range if index is out of bounds or string ends prematurely.
// Throws std::invalid_argument on invalid UTF-8 sequence.
[[nodiscard]] std::pair<char32_t, size_t> NextCodePointUtf8(const std::string& str, size_t index);


// Returns the next Unicode code point and the number of UTF-16 code units consumed.
// Throws std::out_of_range if index is out of bounds or string ends prematurely.
// Throws std::invalid_argument on invalid UTF-16 sequence.
[[nodiscard]] std::pair<char32_t, size_t> NextCodePointUtf16(const std::wstring& input, size_t index);

These functions can be used like this:

std::wstring text = L"A\xD834\xDD1E!"; // A + U+1D11E (𝄞) + !

size_t index = 0;
while (index < text.size()) {
    auto [codepoint, units] = NextCodePointUtf16(text, index);
    std::cout << "Codepoint: U+" << std::hex << codepoint
              << " (" << units << " units)\n";
    index += units;
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
UnicodeNextCodePoint		UnicodeNextCodePoint
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next Unicode Code Point

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Next Unicode Code Point

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages