Skip to content

ReSA Eval crashed #1725

@MengAiDev

Description

@MengAiDev

Describe the bug
ReSA

The problem arises when using:
When I'm running eval_math_local.sh, it crashed and failed with the import error

A clear and concise description of what the bug is.
Console Output:

/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pyramid/path.py:2: UserWarning: pkg_resources is deprecated as an AP
I. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.                                                                                import pkg_resources
Traceback (most recent call last):
  File "/workspace/unilm/ReSA/llm/eval.py", line 11, in <module>
    from eval_math import evaluate as evaluate_math
  File "/workspace/unilm/ReSA/llm/eval_math.py", line 5, in <module>
    from arch.model import create_kv_cache
  File "/workspace/unilm/ReSA/llm/arch/model.py", line 8, in <module>
    from apex.normalization.fused_layer_norm import fused_rms_norm_affine
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/apex/__init__.py", line 3, in <module>
    from apex.i18n import MessageFactory
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/apex/i18n.py", line 1, in <module>
    from pyramid.i18n import TranslationStringFactory
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pyramid/i18n.py", line 20, in <module>
    from pyramid.threadlocal import get_current_registry
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pyramid/threadlocal.py", line 3, in <module>
    from pyramid.registry import global_registry
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pyramid/registry.py", line 12, in <module>
    from pyramid.path import CALLER_PACKAGE, caller_package
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pyramid/path.py", line 4, in <module>
    import imp
ModuleNotFoundError: No module named 'imp'
E0708 07:17:39.972000 23464 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 23546) of bi
nary: /home/gitpod/.pyenv/versions/3.12.11/bin/python3                                                                                 Traceback (most recent call last):
  File "/workspace/.pyenv_mirror/user/3.12.11/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py
", line 355, in wrapper                                                                                                                    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/torch/distributed/run.py", line 892, in main
    run(args)
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/torch/distributed/run.py", line 883, in run
    elastic_launch(
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agen
t                                                                                                                                          raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
eval.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-07-08_07:17:39
  host      : localhost
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 23546)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Pip List:

Package                Version     Editable project location
---------------------- ----------- ----------------------------------------
absl-py                2.3.1
accelerate             1.8.1
aiohappyeyeballs       2.6.1
aiohttp                3.12.13
aiosignal              1.4.0
antlr4-python3-runtime 4.11.1
anykeystore            0.2
apex                   0.9.10.dev0
chardet                5.2.0
click                  8.2.1
colorama               0.4.6
cryptacular            1.6.2
DataProperty           1.1.0
datasets               3.6.0
dill                   0.3.8
einops                 0.8.1
evaluate               0.4.4
frozenlist             1.7.0
fsspec                 2025.3.0
greenlet               3.2.3
hf-xet                 1.1.5
huggingface-hub        0.33.2
hupper                 1.12.1
joblib                 1.5.1
jsonlines              4.0.0
latex2sympy2           1.9.0       /tmp/Qwen2.5-Math/evaluation/latex2sympy
lm_eval                0.4.9
lxml                   6.0.0
mbstrdecoder           1.1.4
mpmath                 1.3.0
multidict              6.6.3
multiprocess           0.70.16
networkx               3.3
nltk                   3.9.1
numexpr                2.11.0
numpy                  2.3.1
oauthlib               3.3.1
pandas                 2.3.1
PasteDeploy            3.1.0
pathvalidate           3.3.1
pbkdf2                 1.3
Pebble                 5.1.1
peft                   0.16.0
pillow                 11.0.0
plaster                1.1.2
plaster-pastedeploy    1.0.1
portalocker            3.2.0
propcache              0.3.2
pyarrow                20.0.0
pybind11               2.13.6
pyramid                1.10.7
pyramid-mailer         0.15.1
pytablewriter          1.2.1
python3-openid         3.2.0
pytz                   2025.2
regex                  2024.11.6
repoze.sendmail        4.4.1
requests-oauthlib      2.0.0
rouge_score            0.1.2
sacrebleu              2.5.1
safetensors            0.5.3
scikit-learn           1.7.0
scipy                  1.16.0
setuptools             80.9.0
SQLAlchemy             2.0.41
sqlitedict             2.1.0
sympy                  1.13.3
tabledata              1.3.4
tabulate               0.9.0
tcolorpy               0.1.7
threadpoolctl          3.6.0
timeout-decorator      0.5.0
tokenizers             0.21.2
torch                  2.7.1+cpu
torchaudio             2.7.1+cpu
torchvision            0.22.1+cpu
tqdm                   4.67.1
tqdm-multiprocess      0.0.11
transaction            5.0
transformers           4.53.1
translationstring      1.4
typepy                 1.3.4
tzdata                 2025.2
velruse                1.1.1
venusian               3.1.1
WebOb                  1.8.9
word2number            1.1
WTForms                3.2.1
wtforms-recaptcha      0.3.2
xxhash                 3.5.0
yarl                   1.20.1
zope.deprecation       5.1
zope.interface         7.2
zope.sqlalchemy        3.1
zstandard              0.23.0

Expected behavior
It should output and do the eval.

  • Platform:
  • Python version: 3.12
  • PyTorch version (GPU?): CPU only
  • OS: Linux mengaidev-unilm-vkk0hwuiiok 6.1.139-0601139-generic #202505202314 SMP PREEMPT_DYNAMIC Tue May 20 23:54:01 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions