onDataString is easy to use but is unfortunately a subtle foot-gun.
This function calls the underlying node API returning a buffer, and then calls Buffer.toString on the result.
But if a character would span two data events, each Buffer.toString will replace the initial/trailling code units with the replacement character.
As it stands, onDataString should be documented with a clear warning, it is not suitable for general purpose use but only streams that are either guaranteed to have single-byte encoded characters/known short lengths.
The comment on setEncoding unfortunately recommends this:
"Where possible, you should try to use onDataString instead of this function."
I think if this recommended onData, it might be more clear to the user that something fishy is going on.
onDataStringis easy to use but is unfortunately a subtle foot-gun.This function calls the underlying node API returning a buffer, and then calls
Buffer.toStringon the result.But if a character would span two
dataevents, eachBuffer.toStringwill replace the initial/trailling code units with the replacement character.As it stands,
onDataStringshould be documented with a clear warning, it is not suitable for general purpose use but only streams that are either guaranteed to have single-byte encoded characters/known short lengths.The comment on
setEncodingunfortunately recommends this:I think if this recommended
onData, it might be more clear to the user that something fishy is going on.