Understanding when and why agents scheme

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad and David Lindner

21 Mar 2026 20:33 UTC

8 points

0 comments4 min readLW link

China Derangement Syndrome

Arjun Panickssery21 Mar 2026 19:19 UTC

23 points

1 comment3 min readLW link

(arjunpanickssery.substack.com)

Grounding Coding Agents via Dixit

qbolec21 Mar 2026 11:01 UTC

14 points

0 comments10 min readLW link

The Future of Aligning Deep Learning systems will probably look like “training on interp”

williawa20 Mar 2026 23:06 UTC

16 points

2 comments4 min readLW link

Untrusted monitoring: extra bits

Morgan S20 Mar 2026 21:32 UTC

7 points

0 comments15 min readLW link

Finding features in Transformers: Contrastive directions elicit stronger low-level perturbation responses than baselines

Francisco Ferreira da Silva and StefanHex

20 Mar 2026 21:09 UTC

34 points

1 comment6 min readLW link

Confusion around the term reward hacking

ariana_azarbal20 Mar 2026 16:13 UTC

38 points

5 comments5 min readLW link

The Distaff Texts

Tomás B.20 Mar 2026 15:05 UTC

60 points

3 comments14 min readLW link

It’s a Good Thing to Respond to Internet Trolls

Bowl of Cereal20 Mar 2026 14:22 UTC

−8 points

4 comments2 min readLW link

Untrusted Monitoring is Default; Trusted Monitoring is not

J Bostock20 Mar 2026 14:10 UTC

22 points

0 comments4 min readLW link

Against Messianic AI: Why Optimizing the Environment Doesn’t Optimize the Agent

Nathan Heath20 Mar 2026 12:40 UTC

1 point

0 comments3 min readLW link

Hundred ways a superintelligence could kill you (non-serious exercise)

samuelshadrach20 Mar 2026 8:58 UTC

3 points

1 comment6 min readLW link

(samuelshadrach.com)

Internet anonymity without Tor

samuelshadrach20 Mar 2026 8:52 UTC

1 point

0 comments3 min readLW link

(samuelshadrach.com)

No, You Don’t Need Self-Locating Evidence.

Ape in the coat20 Mar 2026 5:38 UTC

7 points

4 comments5 min readLW link

(substack.com)

The Low Hanging Fruit of AI Self Improvement

HunterJay20 Mar 2026 4:09 UTC

1 point

0 comments5 min readLW link

Nullius in Verba: 3rd party evidence for Nectome’s Brain Preservation

Aurelia20 Mar 2026 3:19 UTC

109 points

17 comments12 min readLW link

Does Hebrew Have Verbs?

Benquo20 Mar 2026 3:04 UTC

27 points

4 comments6 min readLW link

Positive-sum interactions between players with linear utility in resources

Cleo Nardo20 Mar 2026 0:42 UTC

10 points

0 comments2 min readLW link

No, we haven’t uploaded a fly yet

Ariel Zeleznikow-Johnston19 Mar 2026 23:43 UTC

167 points

7 comments8 min readLW link

(open.substack.com)

The Case for Low-Competence ASI Failure Scenarios

Ihor Kendiukhov19 Mar 2026 23:10 UTC

101 points

6 comments6 min readLW link