Roland's homepage

My random knot in the Web

Python & standard output redirection on ms-windows

Redirecting standard output of the Python script on ms-windows can cause strange crashes because of encoding differences.

The problem

Recently, I saw the following traceback when running the lamprop console application on ms-windows.

U:\>lamprop.py foo.lam >foo.txt
Traceback (most recent call last):
File "C:\_LocalData\Python3\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
File "C:\_LocalData\Python3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
File "__main__.py", line 97, in <module>
File "__main__.py", line 93, in main
File "C:\_LocalData\Python3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03bd'
in position 0: character maps to <undefined>

(The character \u03bd is the greek letter ν.)

Since Python 3.6, utf-8 is used as the encoding for sys.stdout, as long as this is not redirected. From the documentation:

On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding.

Suppose you have the following program;

import sys
print(sys.stdout.encoding)

Normally, when you call this program from cmd.exe, it will report utf-8. But, if you redirect its output to a file, that file will contain cp1252! So it you are trying to print a character that cannot be encoded in that codepage, the abovementioned error occurs.

The solution

In the console version of lamprop, I added the following code to the program initialization.

if os.name == "nt":
    sys.stdout.reconfigure(encoding="utf-8")

The forces the output to utf-8 even when it is redirected.


For comments, please send me an e-mail.


Related articles


←  Profiling Python scripts (5): ent_without_numpy Examining an OpenSSH ECDSA public key  →