Skip to content

Stdin (standard input) for interactive external console programs is not set to the console's character encoding in the console/terminal host #10907

@mklement0

Description

@mklement0

Note: #10824 fixed the issue for stdout (standard output).

When invoking external programs, the encoding of their standard input stream should match that of the hosting console, as reflected in [Console]::InputEncoding - just like the standard output encoding is set to [Console]::OutputEncoding (which was always the case in Windows PowerShell, but for PS Core was only recently fixed in #10824).

Note:

  • This is for the case where stdin is not redirected, where the external program reads input interactively typed by the user; by contrast, when PowerShell pipes input to an external program, the encoding specified in preference variable $OutputEncoding is used.

  • The problem also exists in Windows PowerShell.

Steps to reproduce

Run the following on Windows, where [Console]::InputEncoding is (still) the OEM code page's encoding by default.

Install dotnet-script for dynamic execution of C# code via an external program.

Run the following and type or paste character ü when prompted:

# Make (BOM-less) UTF8 the input encoding.
[Console]::InputEncoding = [Text.Utf8Encoding]::new($false)

# Execute dotnet-script with ad hoc code that reads bytes
# from stdin.
dotnet-script eval ((@'
  Console.WriteLine("Enter single character 'ü' and press ENTER"); 
  byte[] buf = new byte[2];
  using (Stream inStream = Console.OpenStandardInput())
  {
    inStream.Read(buf, 0, buf.Length);
  }
  Console.WriteLine("First 2 bytes read (should be 0xC3 0xBC for 'ü'):");
  foreach (byte b in buf) { 
    Console.WriteLine("0x" + b.ToString("X"));
  }
'@
) -replace '"', '\"') | Tee-Object -Variable output

$output[-2..-1] | Should -Be '0xC3', '0xBC'

Expected behavior

The test should pass.

That is, the raw bytes read from stdin should reflect the UTF-8 encoding assigned to [Console]::InputEncoding, which is byte sequence 0xC3 0xBC for ü (LATIN SMALL LETTER U WITH DIAERESIS, U+00FC)

Actual behavior

The test fails:

Expected @('0xC3', '0xBC'), but got @('0x0', '0xD').

Curiously, a NUL byte was read instead of the typed character (0xD is just the CR char).

This shows that changing [Console]::InputEncoding had some effect, but not the desired one.

On macOS, changing [Console]::InputEncoding is effectively ignored.

Environment data

PowerShell Core 7.0.0-preview.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue-Questionideally support can be provided via other mechanisms, but sometimes folks do open an issue to get aResolution-No ActivityIssue has had no activity for 6 months or more

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions