Prerequisites
Bug Report
Bug Type
Description
PyMarkdownApi().scan_string() raises a Configuration Error on Windows when the input string contains non-ASCII characters such as à, è, ò, etc.
The root cause is an encoding mismatch in file_scan_helper.py. The method __scan_from_stdin (line 154) writes the string to a temporary file using tempfile.NamedTemporaryFile("wt", delete=False) without specifying an encoding. On Windows, this defaults to the system locale encoding (typically cp1252). The file is then read back by FileSourceProvider.__init__ in general/source_providers.py (line 73) with explicit encoding="utf-8", causing a UnicodeDecodeError.
For example, the character à is encoded as the single byte 0xE0 in cp1252, but 0xE0 in UTF-8 is the start of a 3-byte sequence — so the read fails with invalid continuation byte.
Specifics
What operating system and version are you running into this behavior on?
Windows 11 Pro 10.0.26200
What version are you seeing this behavior in? (Run pip list or pipenv run pip list and look for the entry beside pymarkdownlnt.)
0.9.36
Are there any extra steps that need to be taken before executing the application?
None. The issue is triggered simply by running on a Windows system with a non-UTF-8 default locale (which is the default for most Windows installations).
What is the command line you invoke to get this behavior?
from pymarkdown.api import PyMarkdownApi
result = PyMarkdownApi().scan_string("Testo con caratteri accentati: à è ò ù")
Are you using a configuration file? Either on the command line or one of the implicit configuration files? If so, attach that file to this issue.
None. Default configuration, no custom config file.
What Markdown document causes this behavior to manifest? Attach that file to this issue.
Not applicable (using scan_string API). The equivalent content is:
Testo con caratteri accentati: à è ò ù
Actual Behavior
WARNING:pymarkdown.main:Configuration Error: 'utf-8' codec can't decode byte 0xe0 in position 45: invalid continuation byte
Expected Behavior
scan_string should correctly handle any valid Python string containing non-ASCII/Unicode characters. Since FileSourceProvider reads with encoding="utf-8", the temporary file should also be written with encoding="utf-8" to ensure consistency across all platforms.
Prerequisites
Bug Report
Bug Type
scan_stringAPI fails on Windows when the input contains non-ASCII characters (e.g. accented letters)Description
PyMarkdownApi().scan_string()raises aConfiguration Erroron Windows when the input string contains non-ASCII characters such asà,è,ò, etc.The root cause is an encoding mismatch in
file_scan_helper.py. The method__scan_from_stdin(line 154) writes the string to a temporary file usingtempfile.NamedTemporaryFile("wt", delete=False)without specifying an encoding. On Windows, this defaults to the system locale encoding (typicallycp1252). The file is then read back byFileSourceProvider.__init__ingeneral/source_providers.py(line 73) with explicitencoding="utf-8", causing aUnicodeDecodeError.For example, the character
àis encoded as the single byte0xE0in cp1252, but0xE0in UTF-8 is the start of a 3-byte sequence — so the read fails withinvalid continuation byte.Specifics
What operating system and version are you running into this behavior on?
Windows 11 Pro 10.0.26200
What version are you seeing this behavior in? (Run
pip listorpipenv run pip listand look for the entry besidepymarkdownlnt.)0.9.36
Are there any extra steps that need to be taken before executing the application?
None. The issue is triggered simply by running on a Windows system with a non-UTF-8 default locale (which is the default for most Windows installations).
What is the command line you invoke to get this behavior?
Are you using a configuration file? Either on the command line or one of the implicit configuration files? If so, attach that file to this issue.
None. Default configuration, no custom config file.
What Markdown document causes this behavior to manifest? Attach that file to this issue.
Not applicable (using scan_string API). The equivalent content is:
Testo con caratteri accentati: à è ò ùActual Behavior
WARNING:pymarkdown.main:Configuration Error: 'utf-8' codec can't decode byte 0xe0 in position 45: invalid continuation byteExpected Behavior
scan_stringshould correctly handle any valid Python string containing non-ASCII/Unicode characters. SinceFileSourceProviderreads with encoding="utf-8", the temporary file should also be written withencoding="utf-8"to ensure consistency across all platforms.