Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Understanding when and why agents scheme
Mia Hopman
,
Jannes Elstner
,
Maria Avramidou
,
Amritanshu Prasad
and
David Lindner
21 Mar 2026 20:33 UTC
8
points
0
comments
4
min read
LW
link
China Derangement Syndrome
Arjun Panickssery
21 Mar 2026 19:19 UTC
23
points
1
comment
3
min read
LW
link
(arjunpanickssery.substack.com)
Grounding Coding Agents via Dixit
qbolec
21 Mar 2026 11:01 UTC
14
points
0
comments
10
min read
LW
link
The Future of Aligning Deep Learning systems will probably look like “training on interp”
williawa
20 Mar 2026 23:06 UTC
16
points
2
comments
4
min read
LW
link
Untrusted monitoring: extra bits
Morgan S
20 Mar 2026 21:32 UTC
7
points
0
comments
15
min read
LW
link
Finding features in Transformers: Contrastive directions elicit stronger low-level perturbation responses than baselines
Francisco Ferreira da Silva
and
StefanHex
20 Mar 2026 21:09 UTC
34
points
1
comment
6
min read
LW
link
Confusion around the term reward hacking
ariana_azarbal
20 Mar 2026 16:13 UTC
38
points
5
comments
5
min read
LW
link
The Distaff Texts
Tomás B.
20 Mar 2026 15:05 UTC
60
points
3
comments
14
min read
LW
link
It’s a Good Thing to Respond to Internet Trolls
Bowl of Cereal
20 Mar 2026 14:22 UTC
−8
points
4
comments
2
min read
LW
link
Untrusted Monitoring is Default; Trusted Monitoring is not
J Bostock
20 Mar 2026 14:10 UTC
22
points
0
comments
4
min read
LW
link
Against Messianic AI: Why Optimizing the Environment Doesn’t Optimize the Agent
Nathan Heath
20 Mar 2026 12:40 UTC
1
point
0
comments
3
min read
LW
link
Hundred ways a superintelligence could kill you (non-serious exercise)
samuelshadrach
20 Mar 2026 8:58 UTC
3
points
1
comment
6
min read
LW
link
(samuelshadrach.com)
Internet anonymity without Tor
samuelshadrach
20 Mar 2026 8:52 UTC
1
point
0
comments
3
min read
LW
link
(samuelshadrach.com)
No, You Don’t Need Self-Locating Evidence.
Ape in the coat
20 Mar 2026 5:38 UTC
7
points
4
comments
5
min read
LW
link
(substack.com)
The Low Hanging Fruit of AI Self Improvement
HunterJay
20 Mar 2026 4:09 UTC
1
point
0
comments
5
min read
LW
link
Nullius in Verba: 3rd party evidence for Nectome’s Brain Preservation
Aurelia
20 Mar 2026 3:19 UTC
109
points
17
comments
12
min read
LW
link
Does Hebrew Have Verbs?
Benquo
20 Mar 2026 3:04 UTC
27
points
4
comments
6
min read
LW
link
Positive-sum interactions between players with linear utility in resources
Cleo Nardo
20 Mar 2026 0:42 UTC
10
points
0
comments
2
min read
LW
link
No, we haven’t uploaded a fly yet
Ariel Zeleznikow-Johnston
19 Mar 2026 23:43 UTC
167
points
7
comments
8
min read
LW
link
(open.substack.com)
The Case for Low-Competence ASI Failure Scenarios
Ihor Kendiukhov
19 Mar 2026 23:10 UTC
101
points
6
comments
6
min read
LW
link
Back to top
Next