Fun Theory is the study of questions such as "How much fun is there in the universe?",
"Will we ever run out of fun?", "Are we having fun yet?" and "Could we be having
more fun?". It's relevant to designing utopias and AIs, among other things.

Fun Theory

Customize

Quick Takes

ryan_greenblatt6h*7023

Vivek S, TsviBT, and 1 more

I think it's both true that LessWrong (LW) has a bunch of issues and that it would be better if much more discourse happened there rather than on X/Twitter. Some claims that all seem true to me despite being in tension: * Many people thinking about AGI/ASI/related post on X but not on LW * This is partially due to LW being scary/aversive * X is more insane and adversarial than LW * The quality of discourse on LW is way better than on X * The LW scene has specific unreasonable biases, is somewhat tribal, and is somewhat of an echo chamber, but on net LW seems way more reasonable than all of the AI parts of X * It would be good if way more of the important discussion about AI happened on LW or at least happened in a forum less messed up than X (or X got less messed up) * Epistemics of AI company employees seem pretty important and are quite bad, with some of the more important influences being internal chatter (with strong biases, influence from leadership, and echo chambers) or X for some of the companies * Public discourse/argumentation seems like it could improve epistemics, especially of AI company employees * LW is often hostile and aversive to AI company employees and some of this seems unhealthy/bad (as in, indicates insufficient decoupling/soldier-mindset/motivated-reasoning and corresponds to people piling on in unhelpful ways). But a bunch of this is a natural consequence of reasonable views of people on LW and the behavior of AI companies and employees at AI companies * E.g., it would be better if people on LW applied something more like typical researcher norms to research outputs. [1] (E.g., for many types of issues, it's good to first email the author with the issue and let them comment or say they are going to fix it before posting publicly. [2] ) And it would be good if people on LW tried to avoid their criticism being unnecessarily rude (though not necessarily less hostile). I also wish LW had more (measured, thoughtful) praise of good

ryan_greenblatt7h176

Adam Scholl, Spective

People seem to use the terms 'slop' or 'AI sloppiness' to refer to a wide variety of different issues with pretty different causes, consequences, and solutions. It seems good to typically be more precise, at least in long-form discussion. These all differ: * AIs have poor capabilties at conceptual / very hard-to-check tasks * AIs aren't trying on conceptual / very hard-to-check tasks but could do better * AIs "intentionally" produce outputs that are low-quality but appear good * AIs try to make their (low-quality) outputs look good * Many people specifically like AI outputs with a low-quality vibe so the internet fills up with this * AIs make it cheap to produce (mediocre) output so it will floods many places

Charbel-Raphaël6h110

StanislavKrym

4 years of AI safety: what I got wrong I've spent the last 4 years working on AI safety. On paper, it's gone well. Here's what actually happened. 1. I became what I wanted to prevent At some point, I looked up and realized I had almost become a paper-clipper optimizing for one objective. Working 80-hour weeks. No real breaks. Telling myself the stakes justify it. Sacrificing jazz improvisation on the piano for one more strategic doc, and realizing one day that fingers had forgotten how to play. Yes, the compounding effect of going faster is real - but I think there is a difference between going faster and going further. The first reason is that preserving slack is vital in the long run, as Richard Hamming says: "I notice that if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don't quite know what problems are worth working on; all the hard work you do is sort of tangential in importance." The second reason is more personal. One of my friends at the time advised me to slow down. In the beginning I considered him quite lazy. But in fact he was right about something I couldn't see at the time: I forgot why I cared in the first place. My father is an activist. He fights for causes that don't resonate with me. There's a growing gap between us. But every few weeks, I call him, and I stay on the line even when the conversation goes nowhere. If I can't even preserve a connection with my own father, what business do I have claiming I'm working to save humanity? I'd love to say I've completely fixed this, but unlearning is not an open problem just in AI. 2. I didn't think much about the actual risk I was giving a talk at a workshop in Paris. Risk models in the first half, interpretability research in the second. Someone raised their hand and asked: "I don't understand, what's the point of doing this?" I froze. I didn't have a real answer besides "interpreta

davekasten10h150

One thing I find myself doing increasingly in the AI policy community is connecting people to offices in DC. This is good, but I worry that it leads us to have path-dependency on who people like me know and the logic that we have about where to start, rather than prospecting for new offices that we don't know as well. It's plausible that everyone doing this should run rand(535) against a list of all the Senators and Representatives a few times a month and go meet a truly random office and see what happens.

Vladimir_Nesov2d703

Since GPT-5.5 turns out to be the same thing as Spud, there likely isn't currently a Mythos-class model at OpenAI. The price for GPT-5.5 ($5/$30 per 1M input/output tokens) is about the same as the price for Opus 4.7 ($5/$25), so they are likely in the same weight class. Given the jagged parity in performance between the smaller GPT-5.4 and the bigger Opus 4.6 (with different reasoning token budgets), the current OpenAI recipe might make GPT-5.5 or its upcoming successors stronger than Mythos in some ways, but as a pretrain Mythos probably holds more potential by the end of the year (and might be stronger for now, if GPT-5.5 didn't yet go all out on RLVR scaling). At the same time, OpenAI might still train its own Mythos-scale model by the end of the year, so that it's ready to go once the GB300 NVL72 buildout can support it at a more reasonable price (such as $10/$50), and they can release it as a replacement for the GPT-5.5 series then (which might be at GPT-5.7 or something at that point), the way GPT-5.5 is replacing GPT-5.4 despite the doubling of API price. The introduction of the mini and nano models for GPT-5.4 illustrates how OpenAI probably intends to frame the introduction of ever more expensive models, with users who need the models to remain cheap having the option to switch to a smaller model variant after a version change, while a model with a given size branding (normal/mini/nano) can get more expensive at some version changes.

J Bostock13h*132

Marcus Williams

Scheming Defences Aren't Just for Schemers Both Anthropic and OpenAI have put in place (or are thinking about putting in place) some kind of monitoring for internal coding agents. Neither company has set up their monitoring to be scheming-proof. When I've spoken to people working on this, they (correctly, IMO) argue that current models aren't likely to be scheming. (In particular, OpenAI appears to be thinking about EDIT: using [poor word choice on my part, they are thinking about better methods] untrusted-only monitoring without much collusion busting, where one instance of a model audits the outputs of another instance of itself.) But, there are other weird things a model might do that don't qualify as "scheming" which you might still want to defend against! For example, reward hacking/fudging critical safety infrastructure is extremely bad, yet well within the remit of bad behaviour we might expect. Would this cause a model to get caught? I expect so, but who knows! One vector for monitoring failure might be a kind of persuasion; if a model wants to reward hack, it might justify to itself---using some kind of motivated reasoning---why the hack is actually the correct action to take. Another copy of that model, reading the chain-of-thought, might then agree with the first one! (this is extremely common in motivated reasoning amongst humans as well) This would kinda fall under the broader umbrella of collusion in untrusted monitoring, and would definitely be mitigated by certain signal jamming techniques (though not all). The above example is illustrative of the more general point that you can't get safety by ruling out a single worst-case scenario (scheming) then only mitigating for one other specific scenario (whatever they are planning for here, I think mostly model mistakes and low-stakes deception and occasional destruction) because there are other scenarios you haven't thought of!

ryan_greenblatt3d10377

It's good that Anthropic's system cards contain a lot of useful information on misalignment and risk. It's also good they are putting out detailed sabotage risk reports that articulate their views. [1] I appreciate the hard work of many employees going into these reports. I wish other companies released similarly informative reports. [2] 1. I'm not necessarily claiming I agree with their risk reports. For instance, I have at least some moderately important disagreements with the Mythos Preview sabotage risk report update. But having lots of detail, including sufficient detail that I can get a decent sense of where I disagree with the analysis, is praiseworthy. ↩︎ 2. I'm not claiming that this is the most leveraged thing for safety-motivated employees at other companies to work on, but I do think it would be good for AI companies to do better (without trading off against other safety efforts). ↩︎

Your Feed

ryan_greenblatt6h*7023

Vivek S, TsviBT, and 1 more

ryan_greenblatt7h176

Adam Scholl, Spective

Charbel-Raphaël6h110

StanislavKrym

davekasten10h150

Vladimir_Nesov2d703

J Bostock13h*132

Marcus Williams

ryan_greenblatt3d10377