Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

This repository contains the codebase for the paper Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners.

We study latent reasoning in large reasoning models (LRMs) across 11 languages, using truncation-based probing, logit lens analysis, and representation-level similarity analysis to understand how predictions emerge internally before answers are explicitly produced in the reasoning traces.

Overview

The codebase supports four core components of the paper:

Reasoning trace generation across languages
Truncation-based latent reasoning evaluation
Internal dynamics analysis, including logit lens and hidden-state similarity
Memorization vs. latent reasoning

The overall workflow mirrors the experimental pipeline in the paper.

Repository Structure

.
├── README.md
├── analysis
│ ├── calculate_trunc_acc.py
│ ├── compute_cosine_sim.py
│ ├── compute_all_pairs_cosine_similarity.py
│ ├── create_counterfactuals.py
│ ├── run_memorization_counterfactual_vllm.py
│ ├── run_counterfactual_solvability_gemini.py
│ ├── memorization_test_helper.py
│ └── helper.py
├── instructions.json
├── hack_prefix.json
├── helper.py
├── run.py
├── run.sh
├── run_truncation.py
├── run_truncations.sh
├── run_logitlens_dynamics.py
├── run_logitlens_dynamics.sh
└── run_save_hidden_states.py

Core Components

🔹 Reasoning Trace Generation

run.sh
Generates full chain-of-thought reasoning traces for each language.
run.py
Main driver script for multilingual reasoning generation and evaluation.

🔹 Truncation-Based Latent Reasoning

These scripts probe when correct predictions emerge as reasoning traces are progressively truncated.

run_truncation.py
Evaluates model accuracy given partial reasoning traces.
run_truncations.sh
Runs truncation experiments across languages, models, and truncation ratios.

This corresponds to the latent prediction formation analysis in the paper.

🔹 Internal Prediction Dynamics

Logit Lens Analysis

Tracks how the model’s predicted answer evolves across layers and reasoning steps.

run_logitlens_dynamics.py
run_logitlens_dynamics.sh

Hidden-State Representation Analysis

Used for cross-lingual similarity and alignment analysis.

run_save_hidden_states.py
Saves the hidden state of the last token at each reasoning step for a fixed set of truncation ratios.

Analysis Utilities (`analysis/`)

This directory contains scripts for post-hoc analysis and robustness checks:

calculate_trunc_acc.py
Computes truncation-based accuracy metrics.
compute_cosine_sim.py, compute_all_pairs_cosine_similarity.py
Representation similarity analysis across languages and layers.
create_counterfactuals.py
Generates controlled counterfactual edits for memorization vs. reasoning tests.
run_memorization_counterfactual_vllm.py
Generates the responses from LRMs on the editted counterfactuals.
run_counterfactual_solvability_gemini.py
Counterfactual solvability and memorization diagnostics.
memorization_test_helper.py, helper.py
Shared analysis utilities.

Typical Workflow

# 1. Generate multilingual reasoning traces
bash run.sh

# 2. Run truncation-based latent reasoning evaluation
bash run_truncations.sh

# 3. Analyze internal prediction dynamics (logit lens)
bash run_logitlens_dynamics.sh

# 4. Save hidden states for representation similarity analysis
python run_save_hidden_states.py

Citation

If you use this code, please cite:

@misc{liu2026largereasoning,
      title={Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners}, 
      author={Yihong Liu and Raoyuan Zhao and Hinrich Schütze and Michael A. Hedderich},
      year={2026},
      eprint={2601.02996},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.02996}, 
}

We use the prompt templates and the prompt-hacking prefixes from the previous papers, please also consider citing:

@misc{zhao2025comprehensive,
      title={A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages}, 
      author={Raoyuan Zhao and Yihong Liu and Hinrich Schütze and Michael A. Hedderich},
      year={2025},
      eprint={2510.09555},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.09555}, 
}

@inproceedings{qi-etal-2025-models,
    title = "When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy",
    author = "Qi, Jirui  and
      Chen, Shan  and
      Xiong, Zidi  and
      Fern{\'a}ndez, Raquel  and
      Bitterman, Danielle  and
      Bisazza, Arianna",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.1103/",
    doi = "10.18653/v1/2025.findings-emnlp.1103",
    pages = "20279--20296"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Overview

Repository Structure

Core Components

🔹 Reasoning Trace Generation

🔹 Truncation-Based Latent Reasoning

🔹 Internal Prediction Dynamics

Logit Lens Analysis

Hidden-State Representation Analysis

Analysis Utilities (`analysis/`)

Typical Workflow

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
.DS_Store		.DS_Store
README.md		README.md
hack_prefix.json		hack_prefix.json
helper.py		helper.py
instructions.json		instructions.json
run.py		run.py
run.sh		run.sh
run_logitlens_dynamics.py		run_logitlens_dynamics.py
run_logitlens_dynamics.sh		run_logitlens_dynamics.sh
run_save_hidden_states.py		run_save_hidden_states.py
run_truncation.py		run_truncation.py
run_truncations.sh		run_truncations.sh

Folders and files

Latest commit

History

Repository files navigation

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Overview

Repository Structure

Core Components

🔹 Reasoning Trace Generation

🔹 Truncation-Based Latent Reasoning

🔹 Internal Prediction Dynamics

Logit Lens Analysis

Hidden-State Representation Analysis

Analysis Utilities (analysis/)

Typical Workflow

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Analysis Utilities (`analysis/`)

Packages