-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Stdin (standard input) for interactive external console programs is not set to the console's character encoding in the console/terminal host #10907
Description
Note: #10824 fixed the issue for stdout (standard output).
When invoking external programs, the encoding of their standard input stream should match that of the hosting console, as reflected in [Console]::InputEncoding - just like the standard output encoding is set to [Console]::OutputEncoding (which was always the case in Windows PowerShell, but for PS Core was only recently fixed in #10824).
Note:
-
This is for the case where stdin is not redirected, where the external program reads input interactively typed by the user; by contrast, when PowerShell pipes input to an external program, the encoding specified in preference variable
$OutputEncodingis used. -
The problem also exists in Windows PowerShell.
Steps to reproduce
Run the following on Windows, where [Console]::InputEncoding is (still) the OEM code page's encoding by default.
Install dotnet-script for dynamic execution of C# code via an external program.
Run the following and type or paste character ü when prompted:
# Make (BOM-less) UTF8 the input encoding.
[Console]::InputEncoding = [Text.Utf8Encoding]::new($false)
# Execute dotnet-script with ad hoc code that reads bytes
# from stdin.
dotnet-script eval ((@'
Console.WriteLine("Enter single character 'ü' and press ENTER");
byte[] buf = new byte[2];
using (Stream inStream = Console.OpenStandardInput())
{
inStream.Read(buf, 0, buf.Length);
}
Console.WriteLine("First 2 bytes read (should be 0xC3 0xBC for 'ü'):");
foreach (byte b in buf) {
Console.WriteLine("0x" + b.ToString("X"));
}
'@
) -replace '"', '\"') | Tee-Object -Variable output
$output[-2..-1] | Should -Be '0xC3', '0xBC'Expected behavior
The test should pass.
That is, the raw bytes read from stdin should reflect the UTF-8 encoding assigned to [Console]::InputEncoding, which is byte sequence 0xC3 0xBC for ü (LATIN SMALL LETTER U WITH DIAERESIS, U+00FC)
Actual behavior
The test fails:
Expected @('0xC3', '0xBC'), but got @('0x0', '0xD').
Curiously, a NUL byte was read instead of the typed character (0xD is just the CR char).
This shows that changing [Console]::InputEncoding had some effect, but not the desired one.
On macOS, changing [Console]::InputEncoding is effectively ignored.
Environment data
PowerShell Core 7.0.0-preview.5