Skip to content

2026 03 19 alo5#181

Open
MingxuanLiu147 wants to merge 8 commits intolocuslab:mainfrom
MingxuanLiu147:2026-03-19-alo5
Open

2026 03 19 alo5#181
MingxuanLiu147 wants to merge 8 commits intolocuslab:mainfrom
MingxuanLiu147:2026-03-19-alo5

Conversation

@MingxuanLiu147
Copy link
Copy Markdown

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Have you gone through the contributions guide?
  • Are your changes documented? Read documentation guidelines here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 483337d0d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/trainer/edit/mend.py
Comment on lines +277 to +280
# 将 delta reshape 并应用
weight = proj.weight
delta_reshaped = delta[: weight.numel()].reshape(weight.shape)
weight.add_(delta_reshaped * self.edit_lr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit a full-size weight update before reshaping delta

_init_edit_network() produces a vector of length hidden_size * 2, but here that vector is reshaped into proj.weight.shape, which is much larger for every real transformer MLP projection (for example, Llama/Qwen down_proj). In those cases delta[:weight.numel()].reshape(weight.shape) throws at runtime, so MENDEditor.edit() cannot complete on actual checkpoints and will immediately flip success to false on the first edit.

Useful? React with 👍 / 👎.

Comment on lines +17 to +23
# LoRA 参数
method_args:
r: 8
lora_alpha: 16
target_modules:
- q_proj
- v_proj
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Nest inject experiment method_args under trainer

This override block sits at the config root, but load_trainer() only reads cfg.trainer.method_args (src/trainer/__init__.py:90-92). As a result, running experiment=inject/alpaca/default silently ignores the advertised LoRA overrides here and falls back to the defaults from configs/trainer/inject/LoRA.yaml; the same pattern also affects the sibling adalora and dora experiment templates.

Useful? React with 👍 / 👎.

Comment thread configs/eval/edit.yaml
Comment on lines +3 to +7
defaults:
- edit_metrics/reliability
- edit_metrics/generalization
- edit_metrics/locality
- edit_metrics/portability
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Provide runtime inputs in the default edit eval config

This suite only pulls in the metric nodes, but src/eval.py populates edit_data and original_model only when the config defines edit_data(_path) and original_model_path. With the current eval=edit defaults, the evaluators run against an empty edit set, so reliability/generalization/portability all collapse to 0 while locality returns the optimistic fallback 1.0 from src/evals/edit.py:439-443; python src/eval.py eval=edit ... therefore writes misleading scores unless the user manually adds extra overrides.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant