Bug report
Resolution 1 (outdated): #139482 (comment).
Resolution 2 (merged): #139482 (comment).
Bug description:
Problem
Summary
The os.environ.clear() method currently has O(N²) time complexity. This is because it inherits the generic MutableMapping.clear() implementation, which repeatedly calls popitem(). Each popitem() call invokes __iter__(), and for _Environ, __iter__() creates a full snapshot of all environment keys every time. As a result, clearing large environments is extremely slow (e.g., 23 seconds to clear 100,000 variables).
Details
os.environ is an instance of _Environ, which subclasses MutableMapping. _Environ implements the required abstract methods (__getitem__, __setitem__, __delitem__, __iter__, __len__), but inherits generic implementations for methods like clear, pop, popitem, update, and setdefault.
The inherited clear() method is implemented as follows:
class MutableMapping(Mapping):
...
def popitem(self):
try:
key = next(iter(self))
except StopIteration:
raise KeyError from None
value = self[key]
del self[key]
return key, value
def clear(self):
'D.clear() -> None. Remove all items from D.'
try:
while True:
self.popitem()
except KeyError:
pass
...
For every item deleted, clear() calls iter(self) inside popitem().
This becomes a performance issue for _Environ because _Environ.__iter__() creates a snapshot of all keys:
class _Environ(MutableMapping):
...
def __iter__(self):
# list() from dict object is an atomic operation
keys = list(self._data)
for key in keys:
yield self.decodekey(key)
This leads to quadratic time complexity for os.environ.clear().
Performance Impact
This problem is especially evident on large environments.
On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables.
A more extreme example: 100K variables take 23s to clear.
Environments with thousands of variables are rare, but do exist.
In [3]: for i in range(100000):
...: os.environ[str(i)] = str(i)
...:
In [4]: %time os.environ.clear()
CPU times: user 23.3 s, sys: 94.3 ms, total: 23.4 s
Wall time: 23.4 s
Suggested solution
Implement a custom _Environ.clear() that avoids repeated snapshots (for every key):
def clear(self):
while self._data:
for key in list(self._data):
try:
del self[key]
except KeyError:
pass
This implementation makes use of the atomicity of list(dict()), which is important in multithreaded environments and is the reason we call list(self._data) inside __iter__(self).
Further improvement on Linux/FreeBSD could be achieved by using clearenv() which is part of the standard C library there.
Other considerations
More generally, we could consider challenging the existing design decisions:
- Should
__iter__ make a snapshot of all environment keys?
- Should
MutableMapping.clear() rely on popitem() in a loop, assuming __iter__ is cheap?
CPython versions tested on:
3.12, CPython main branch
Operating systems tested on:
macOS
Linked PRs
Bug report
Resolution 1 (outdated): #139482 (comment).
Resolution 2 (merged): #139482 (comment).
Bug description:
Problem
Summary
The
os.environ.clear()method currently has O(N²) time complexity. This is because it inherits the genericMutableMapping.clear()implementation, which repeatedly callspopitem(). Eachpopitem()call invokes__iter__(), and for_Environ,__iter__()creates a full snapshot of all environment keys every time. As a result, clearing large environments is extremely slow (e.g., 23 seconds to clear 100,000 variables).Details
os.environis an instance of_Environ, which subclassesMutableMapping._Environimplements the required abstract methods (__getitem__,__setitem__,__delitem__,__iter__,__len__), but inherits generic implementations for methods likeclear,pop,popitem,update, andsetdefault.The inherited
clear()method is implemented as follows:For every item deleted,
clear()callsiter(self)insidepopitem().This becomes a performance issue for
_Environbecause_Environ.__iter__()creates a snapshot of all keys:This leads to quadratic time complexity for
os.environ.clear().Performance Impact
This problem is especially evident on large environments.
On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables.
A more extreme example: 100K variables take 23s to clear.
Environments with thousands of variables are rare, but do exist.
Suggested solution
Implement a custom
_Environ.clear()that avoids repeated snapshots (for every key):This implementation makes use of the atomicity of
list(dict()), which is important in multithreaded environments and is the reason we calllist(self._data)inside__iter__(self).Further improvement on Linux/FreeBSD could be achieved by using
clearenv()which is part of the standard C library there.Other considerations
More generally, we could consider challenging the existing design decisions:
__iter__make a snapshot of all environment keys?MutableMapping.clear()rely onpopitem()in a loop, assuming__iter__is cheap?CPython versions tested on:
3.12, CPython main branch
Operating systems tested on:
macOS
Linked PRs
os.environ.clearby 2x #139483posix._clearenv()function #139965