11

The linux command strings looks for ASCII strings in a binary file. Are there any command line tools to show UTF-8 strings in a binary file?

2

1 Answer 1

8

The strings command supports the --encoding option. Check the man page.

But however, I failed to extract UTF-8 strings using any possible option value. Currently searching their mailing list. will update this if I find more help

Sign up to request clarification or add additional context in comments.

11 Comments

UTF-8 characters are variable byte width, which won't work with the fixed width pattern matching nature of strings.
On my Debian 9, strings -e S works with UTF-8. strings version: 2.28, LANG=de_CH.UTF-8.
@12431234123412341234123 Thanks for the comment! I'll test it later and update the answer if I can reproduce it
@hek2mgl: Indeed. The 127 chars that require only 7 bits to encode. Coincidentially, the 127 chars that UTF-8 encodes with one byte, because the most significant bit set is UTF-8's way to tell if any given byte is part of a multibyte sequence.
Can't make much sense of that last comment. The characters which UTF-8 encodes in one byte are code points U+0000 through U+007F. These are identical with the same range (0x00 through 0x7f) of ASCII-7, which incidentially is all of ASCII-7. For anything beyond that, like "Ä", UTF-8 uses two bytes or more (whereas ISO/IEC 8859 / "extended ASCII" uses the range of 0x80..0xff to encode additional characters in a collection of one-byte encodings). "UTF-8 characters which are encoded in one byte, which is more than ASCII" do not exist.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.