RSS

Un­der­stand­ing when and why agents scheme

21 Mar 2026 20:33 UTC
8 points
0 comments4 min readLW link

China Derange­ment Syndrome

Arjun Panickssery21 Mar 2026 19:19 UTC
23 points
1 comment3 min readLW link
(arjunpanickssery.substack.com)

Ground­ing Cod­ing Agents via Dixit

qbolec21 Mar 2026 11:01 UTC
14 points
0 comments10 min readLW link

The Fu­ture of Align­ing Deep Learn­ing sys­tems will prob­a­bly look like “train­ing on in­terp”

williawa20 Mar 2026 23:06 UTC
16 points
2 comments4 min readLW link

Un­trusted mon­i­tor­ing: ex­tra bits

Morgan S20 Mar 2026 21:32 UTC
7 points
0 comments15 min readLW link

Find­ing fea­tures in Trans­form­ers: Con­trastive di­rec­tions elicit stronger low-level per­tur­ba­tion re­sponses than baselines

20 Mar 2026 21:09 UTC
34 points
1 comment6 min readLW link

Con­fu­sion around the term re­ward hacking

ariana_azarbal20 Mar 2026 16:13 UTC
38 points
5 comments5 min readLW link

The Distaff Texts

Tomás B.20 Mar 2026 15:05 UTC
60 points
3 comments14 min readLW link

It’s a Good Thing to Re­spond to In­ter­net Trolls

Bowl of Cereal20 Mar 2026 14:22 UTC
−8 points
4 comments2 min readLW link

Un­trusted Mon­i­tor­ing is De­fault; Trusted Mon­i­tor­ing is not

J Bostock20 Mar 2026 14:10 UTC
22 points
0 comments4 min readLW link

Against Mes­si­anic AI: Why Op­ti­miz­ing the En­vi­ron­ment Doesn’t Op­ti­mize the Agent

Nathan Heath20 Mar 2026 12:40 UTC
1 point
0 comments3 min readLW link

Hun­dred ways a su­per­in­tel­li­gence could kill you (non-se­ri­ous ex­er­cise)

samuelshadrach20 Mar 2026 8:58 UTC
3 points
1 comment6 min readLW link
(samuelshadrach.com)

In­ter­net anonymity with­out Tor

samuelshadrach20 Mar 2026 8:52 UTC
1 point
0 comments3 min readLW link
(samuelshadrach.com)

No, You Don’t Need Self-Lo­cat­ing Ev­i­dence.

Ape in the coat20 Mar 2026 5:38 UTC
7 points
4 comments5 min readLW link
(substack.com)

The Low Hang­ing Fruit of AI Self Improvement

HunterJay20 Mar 2026 4:09 UTC
1 point
0 comments5 min readLW link

Nul­lius in Verba: 3rd party ev­i­dence for Nec­tome’s Brain Preservation

Aurelia20 Mar 2026 3:19 UTC
109 points
17 comments12 min readLW link

Does He­brew Have Verbs?

Benquo20 Mar 2026 3:04 UTC
27 points
4 comments6 min readLW link

Pos­i­tive-sum in­ter­ac­tions be­tween play­ers with lin­ear util­ity in resources

Cleo Nardo20 Mar 2026 0:42 UTC
10 points
0 comments2 min readLW link

No, we haven’t up­loaded a fly yet

Ariel Zeleznikow-Johnston19 Mar 2026 23:43 UTC
167 points
7 comments8 min readLW link
(open.substack.com)

The Case for Low-Com­pe­tence ASI Failure Scenarios

Ihor Kendiukhov19 Mar 2026 23:10 UTC
101 points
6 comments6 min readLW link