Andrew Critch

Who to contact about x-risk

acritch — Tue, 02 Mar 2021 21:25:25 +0000

I feel bad that a lot of people feel like they have no one to call about their concerns about existential risk, or adjacent topics that seem very important to them because of relevance to existential risk. I feel especially bad about this when folks contact me about such topics and I don’t have time to give a good response. This post is meant to encourage behavior that can gradually shift the world in a more positive direction for addressing people’s worries about x-risk.

I love my local fire department. There were serious wildfires in my area over the past year, and at times I felt worried about them. A few times, I called my local fire department for updates on the situation. They answered the phone, kindly responded to my questions about the fires, and told me how to sign up for more frequent updates. They tested their update system, and I received the tests. Conditional on the wildfires, I was quite happy about their responsiveness to my need for more information. Almost no one died of wildfires in California last year.

Who can you call if you’re worried about existential risk (x-risk), or global catastrophic risks more broadly? A lot of people have contacted me about this topic because it is my area of professional focus, particular as it pertains to artificial intelligence. I feel bad that I can’t be as responsive to them as my local fire department has been to me. If people are worried about x-risk, there should be someone they can call to get more information about it. My local fire department is the de facto best source of information about local fires, and is recognized as such, but currently there is no globally-recognized best source of information about x-risk (although some institutions are doing quite well in this regard in my opinion; see below.) So who should you be contacting if you’re worried about x-risk? Here’s what I suggest:

1) Therapists, for advice on managing your priorities or feelings. If you’re having trouble concentrating on other important things in your life (sleep, food, family, friends, work) because you can’t stop thinking about x-risk, please see a therapist. Seeing a therapist is also a good idea if you’re just worried and wish you could manage the anxiety better. You might feel that therapists don’t know anything about x-risk, but if you see 3-5 different therapists and pick the one you like best, you will probably find one that can help you manage your anxiety and focus your thinking about x-risk in ways that are non-destructive to your health and lifestyle. If you feel your therapist doesn’t understand you, tell them it bothers you that they don’t seem to understand you and you want to spend more of your time with them resolving that. If you feel your therapist doesn’t understand existential risk because it’s too abstract or intellectual, look for a therapist with a PhD, who is therefore more likely to be open to academic conversations.

It’s important to learn to manage one’s fears, anxieties, and frustrations around this topic before attempting to engage with experts on it; otherwise the conversation will probably be unproductive.

2) Academics, for expert information. If you need information directly from experts, you can try contacting personnel at research institutions who think about existential risk, such as:

The Center for the Study of Existential Risk (Cambridge)
The Future of Humanity Institute (Oxford)
The Stanford Existential Risk Initiative (Stanford)

However, these folks are extremely busy, and probably won’t have time to respond to most people’s questions. In that case, I suggest contacting your local university instead. Even if you don’t get a good answer, you create evidence that people would like to be able to contact their local university with questions about x-risk, which over time can help can create jobs for people who want to think and communicate about x-risk professional. If that fails, you can also try:

3) Government representatives, for basic information, or as expert proxies. If you’re having a hard time reaching experts, or just want basic information about how governments and companies manage x-risk, I suggest you contact local representatives of your municipal or state government. They will not know as much about the topic as you’d like, so you should ask them to gather more information and get back to you about it. They might have better luck getting a conversation with experts than you will, and after they do that, they might be slightly better at answering future questions about the topic. In other words, you’ll have helped the government focus a bit more of its attention on x-risk, and thereby helped them to become slightly more informed about the topic.

Much of the government is responsive rather than proactive in nature, such that it can mostly only pay attention to topics if people pressure or expect them to. If we never ask, our governments will never learn to answer.

A note on contacting the government: Some folks I know have expressed that it would be bad for governments to get too interested in existential risk, because then the issue will become politicized in a way that damages discourse about it. I think there is some truth to this concern, however, I think this is higher order effect and is therefore less concerning than the problem of causing the government to gradually become more responsible and responsive to the topic. I think the kind of gradual force that’s created by contacting local representatives in a democracy creates a net-positive effect for the issue at hand, even if a certain amount of political machination inevitably ends up emerging around the topic.

Effective funerals: buy biographies instead of expensive burials, and maybe cemeteries can become libraries.

acritch — Sat, 03 Aug 2019 05:42:42 +0000

Cemeteries and funerals are beautiful, because they tell a story of the past that we care about. They’re also somewhat expensive: families routinely spend on the order of $10k on funeral and burial rites for their families. There are people whose entire jobs are the preparation of bodies for funeral rites. Can we tell the story of the past better, but for the same cost?

I believe we can. If your loved one is close to death or has recently died, instead of planning for an expensive burial funeral, you might consider instead planning for the cheapest possible disposal of their earthly remains, and use the excess money to hire a biographer. The biographer can talk to your loved one’s family, and even your loved one directly if they haven’t yet passed, and write down people’s most treasured or meaningful memories about them. Your children, grandchildren, and great grandchildren could have much more than a tombstone to remember them by.

If more people adopted this tradition, cemeteries could become libraries where we keep tomes of stories about our lost loved ones, both bitter and sweet. When we bring flowers to the cemetery, we could leave them next to a book containing their life story. We could re-read their memories, and perhaps even take some time to read through the memories of other people we don’t know, and develop a feeling of what it was like to be them. Probably some people would take an interest in reading the stories even of strangers. Perhaps these “cemetery historians” would even bond when they meet at the cemetery, and recommend their favorite stories to each other. Together, we’d have a culture more capable of preserving and cherishing the memories of the people we’ve lost.

Sure, the biographies wouldn’t always be perfectly accurate, and perhaps far from it. Some biographers might offer to exaggerate in order to create more favorable stories. But we’d all know that to be the case, and we might even become more aware of the inconsistencies between our stories of the past if we could read many different accounts of what happened. And I know I’d spend more time enjoying the beautiful landscape of cemeteries if there were also books there to read about the people who were buried there.

How could we transition to such a culture?

Writers: advertise yourselves as end-of-life biographers. If you’re a writer willing to write stories about people around the ends of their lives, tell people you’re willing to do it, and name your price. Start a website. Pioneer a culture. Try partnering with a local funeral home to make it easier for grieving families to find you and get your help.
Funeral homes: partner with biographers. Offer to connect biographers with grieving families in exchange for percentage of the fee paid for their service.
Families with dying loved ones: make a social media post seeking a biographer. Let your desire to preserve the memories of your loved one clear and visible to people you know. Decide an amount you’re willing to pay—or range of amounts, say “Between \$1000 and \$3000″—and let people know that you’re interested in their help writing up a biography for your loved one. Let them know it doesn’t have to be perfect, and that something is better than nothing (if that’s how you feel).

I realize there would be lots of challenging questions and priorities for the biographers and families to sort out. But that’s why it’s a job. Funeral directors get used to dealing with grieving families, and learn to accommodate their preferences as best they can. I believe end-of-life biographers could learn to do the same. And I wager that, in 50 years time, if we’re all still around to read the stories they right, we’ll be glad of their work.

Make Gmail or Inbox open “mailto:” links in Chrome

acritch — Fri, 04 Aug 2017 23:10:43 +0000

Life will be better… just click the “handler” button and choose “allow”:

Associate your academic email address with a Google account

acritch — Wed, 02 Aug 2017 19:49:32 +0000

If I’ve sent you a link to this blog post, it’s probably because your .edu email address is not already associated with a Google account, and I got a notification about that when sharing a doc or calendar item with you. To fix this problem permanently, open a browser logged into a gmail account (create a new one if you don’t want to use your personal one), and go to:
https://myaccount.google.com/alternateemail

From there, you can add email addresses that will actually work for receiving things like Google doc invitations and Google calendar invitations. This is somewhat new, and different from just setting up a “send mail as” setting in gmail, because it applies to all google services at once.

Give it a try, and save us both a bunch of future hassle

Deserving Trust, II: It’s not about reputation

acritch — Sat, 20 May 2017 20:52:39 +0000

Summary: a less mathematical account of what I mean by “deserving trust”.

When I was a child, my father made me promises. Of the promises he made, he managed to keep 100% of them. Not 90%, but 100%. He would say things like “Andrew, I’ll take you to play in the sand pit tomorrow, even if you forget to bug me about it”, and then he would. This often saved him from being continually pestered by me to keep his word, because I knew I could trust him.

Around 1999 (tagged in my memory as “age 13”), I came to be aware of this property of my father in a very salient way, and decided I wanted to be like that, too. When I’d tell someone they could count on me, if I said “I promise”, then I wanted to know for myself that they could really count on me. I wanted to know I deserved their trust before I asked for it. At the time, I couldn’t recall breaking any explicit promises, and I decided to start keeping a careful track from then on to make sure I didn’t break any promises thereafter.

About a year later, around 2000, I got really wrapped up in thinking about what I wanted from life, in full generality… I’d seen friends undergo drastic changes in their world views, like deconverting from Christianity, and becoming deeply confused about what they wanted when they realized that everything they previously wanted was expressed in terms of a “God” concept that no longer had a referent for them. I wanted to be able to express what I wanted in more stable, invariant terms, so I chose my own sensory experiences as the base language — something I believed more than anything else to be an invariant feature of my existence — and then began trying to express all my values to myself in terms of those features. Values like “I enjoy the taste of strawberry” or “it feels good to think about math” or “the split second when I’m airborne at the apex of a parabolic arc feels awesome.” Life became about maximizing the integral of the intrinsic rewardingness of my experiences over the rest of time, which I called my “intensity integral”, because it roughly corresponded to having intense/enriching intellectual and emotional experiences. (Nowadays, I’d say my life back then felt like an MDP, and I’d call the intensity function my “reward function”. I’ll keep these anachronistic comments in parentheses, though, to ensure a more accurate representation of the language and ideas I was using at the time.)

So by 2001, I had decided to make the experience-optimization thing a basic policy; that as a matter of principle, I should be maximizing the integral of some function of my sensory experiences over time, a function that did not depend too much on references to concepts in the external world like “God” or “other people”. I knew I had to get along with others, of course, but I figured that was easier to explain as something instrumental to future opportunities and experiences, and not something I valued intrinsically. It seemed conceptually simpler to try to explain interactions with others as part of a strategy than as part of my intrinsic goals.

But by 2005, I had fallen deeply in love with someone who I felt understood me pretty well, and I began to feel differently about this self-centered experience-optimization thing. It started to seem like I cared also about *her* experiences, even if I didn’t observe them myself, even indirectly. I ran through many dozens of thought experiments over a period of months, checking that I couldn’t find some conceptually simpler explanation, until I eventually felt I had to accept this about myself. Even if it had no strategic means of ever paying off in enjoyable experiences for *me*, I still wanted *her* to have enjoyable experiences, full stop.

Around the same time, something even more striking to me happened: I realized I also cared about other things she cared about, aside from her experiences. In other words, the scope of what I cared about started expanding outward to things that were not observable by either of us, even indirectly. I wanted the little stack of rocks that I built for her while walking alone on the beach one day to stay standing, even though I never expected her to see it, because I knew she would like it if she could. (In contemporary terms, my life started to feel more like a POMDP, so much so that, by the time I first encountered the definition of an MDP around 2011, it felt like a deeply misguided concept that reminded me of my teenage self-model, and I didn’t spend much time studying it.)

At this point, depending whether you want to consider “me” to be my whole brain or just the conscious, self-reporting part, I either realized I’d been over-fitting my self-model, or underwent “value drift”. When I introspected on how I felt about this other person, and what was driving this change in what I cared about (I did that a lot), it felt like I wanted to deserve her trust, the same way I wanted to keep promises, the way my father did. Even when she wasn’t looking and would never know about it, I wanted to do things that she would want me to do, to produce effects that, even if neither of us would ever observe them, would be the effects she wanted to happen in the world. This was pretty incongruous with my model of myself at the time, so, as I pretty much always do when something important seems incongruous, I entered a period of deep reflection.

Before long, I noticed a similarity between my situation and Newcomb’s problem, and I recalled other people describing similar experiences. I wanted to deserve her trust in the same way I wanted to be a one-boxer on Newcomb: even when the predictor-genie in the experiment isn’t looking anymore, you have to one-box today so that the genie will have trusted you yesterday and placed $1,000,000 in the first box to begin with. (For technical details, see this 2010 post I made to LessWrong, and another I made earlier this year on the same topic.)

Basically, I noticed that the assumptions of classical game theory that give rise to homo-economicus behavior were just false when a person can really get to know you and understand you, because that enables them to imagine scenarios about you before you actually get into those scenarios. In other words, your reputation literally precedes you. Your personality governs not only what you do in reality, but also — to a weaker, noisier, but non-zero extent — what you do in the imaginations and instinctive impulses of other people who knew you yesterday.

So, when I find myself entrusted by someone who isn’t looking anymore, or who can’t otherwise punish me for defecting, I still ask myself, “If this were happening in their imagination, before they decided to trust me, would they have given me their trust in the first place?”

This is what I call deserving trust. This framework fits both with my felt-sense of wanting to deserve trust, and with my normative understanding of decision theory. It’s not about having a reputation of being trustworthy. It’s about doing today what people and institutions who might now be unable to observe or punish you would want you to do when they made the decision to trust you.

As I’ve been finding myself expanding my ring of coworkers and collaborators wider and wider, I’ve been wanting to make these distinctions more and more explicit and understandable. The closer I come to working with a new colleague, the more I want them to help me help us be trustworthy. I feel a strong desire for them to know I’m not just trying to preserve our reputation, but that I actually want us, as a group, to be trustworthy, which usually involves making some extra effort. I feel this desire because, unless I make this distinction explicit, people often often respond with something like “Yeah, we don’t want to be perceived as [blah blah blah]”, which leaves me feeling disappointed and like I haven’t really communicated how I want us to work with each other and the outside world. Optimizing perceptions today just isn’t good enough for deserving trust yesterday.

As a researcher, I’ve seen academics do some fairly back-stabby things to each other by now, and while blatant examples of it are rare, there are still commonplace things like writing disingenuous grant applications that I find fairly unacceptable for me personally, and I want my closest colleagues to really understand that I don’t want us to operate that way.

I’m trying to do work that has some fairly broad-sweeping consequences, and I want to know, for myself, that we’re operating in a way that is deserving of the implicit trust of the societies and institutions that have already empowered us to have those consequences.

FAQ

acritch — Tue, 07 Feb 2017 23:43:58 +0000

I get a lot of email, and unfortunately, template email responses are not yet integrated into the mobile version of Google inbox. So, until then, please forgive me if I send you this page as a response! Hopefully it is better than no response at all.

Thanks for being understanding.

Q: What should I learn as a (college student | grad student | postdoc | autodidact) to get me up to speed on AI / AI safety research so I can start contributing?

A: I previously organized a team of researchers to develop and maintain http://humancompatible.ai/bibliography to answer this question. Good luck!

Q: Can you give me non-AI-safety-specific advice on how to learn math/CS/neuroscience/finance?

A: http://acritch.com/leveraging-academia
http://acritch.com/deliberate-grad-school

Q: Should I get / how can I get more involved with the Center for Human Compatible AI at UC Berkeley?

A: http://humancompatible.ai/get-involved
http://acritch.com/ai-berkeley/

Q: I just sent you an instant message about something academic / work-related; why did you send me this FAQ?

A: Please use email for work-related stuff! I process email in batches, and prefer to avoid getting too many interruptions from instant messaging about work.

Deserving Trust / Grokking Newcomb’s Problem

acritch — Thu, 02 Feb 2017 14:12:46 +0000

Summary: This is a tutorial on how to properly acknowledge that your decision heuristics are not local to your own brain, and that as a result, it is sometimes normatively rational for you to act in ways that are deserving of trust, for no other reason other than to have deserved that trust in the past.

Related posts: I wrote about this 6 years ago on LessWrong (“Newcomb’s problem happened to me”), and last year Paul Christiano also gave numerous consequentialist considerations in favor of integrity (“Integrity for consequentialists”) that included this one. But since I think now is an especially important time for members of society to continue honoring agreements and mutual trust, I’m giving this another go. I was somewhat obsessed with Newcomb’s problem in high school, and have been milking insights from it ever since. I really think folks would do well to actually grok it fully.

You know that icky feeling you get when you realize you almost just fell prey to the sunk cost fallacy, and are now embarrassed at yourself for trying to fix the past by sabotaging the present? Let’s call this instinct “don’t sabotage the present for the past”. It’s generally very useful.

However, sometimes the usually-helpful “don’t sabotage the present for the past” instinct can also lead people to betray one another when there will be no reputational costs for doing so. I claim that not only is this immoral, but even more fundamentally, it is sometimes a logical fallacy. Specifically, whenever someone reasons about you and decides to trust you, you wind up in a fuzzy version of Newcomb’s problem where it may be rational for you to behave somewhat as though your present actions are feeding into their past reasoning process. This seems like a weird claim to make, but that’s exactly why I’m writing this post.

Overview:

Introducing Newcomb’s problem

Let’s start by analyzing Newcomb’s original problem, because it’s an extreme case of “influencing the past”. Being an extreme case makes the original Newcomb easier to understand in technical terms than its fuzzier, real-life variants, which we’ll analyze later.

In Newcomb’s problem, you have a choice between taking either

box A (“one-boxing”)
box A and B together (“two-boxing”)

Box A contains either \$0 or \$1,000,000, and box B definitely contains \$1,000. So far, you should clearly take both boxes, because A+B > B, no matter what A is. But there’s a catch: yesterday, Newcomb scanned your brain and predicted what you’d do in this scenario. If he (yesterday) predicted you’d take only box A (today), he (yesterday) placed \$1,000,000 in box A, otherwise he placed \$0 in box A.

In this scenario, the people who one-box get \$1,000,000 (rather than 0), and people who two-box get \$1,000 (rather than \$1,001,000). So a one-boxing makes more money than a two-boxing strategy, and its therefore better. But there is a tempting argument in favor of two-boxing, namely: no matter what A is, A+B > A. This makes it “obvious” that you should take both boxes, which we know is “wrong” because that strategy earns less money.

So what’s wrong with the two-boxing argument? It seems like you should be treating A as a variable under your control, but now that you’re standing in front of the boxes and Newcomb has left the room, A is a fixed constant. Does it make sense to be trying to “control” it to make it \$1,000,000 instead of $1,000? What’s going on?

Debugging two-boxing behavior

I claim that most people who think they know what’s wrong with the two-boxing argument are really just ignoring the two-boxing argument in favor of the one-boxing argument. They (and perhaps you) just redirect mental attention to the one-boxing argument and feel like “That one. That’s the right way.”, instead of nerding out about what exactly is wrong with the two-boxing argument. Figuring out what exactly is wrong with the argument will help you generalize to more scenarios and is much more useful than merely choosing “one-box” and your favorite argument for it.

To say this another way: yes, it’s clear that one-boxing is a better strategy, but knowing that two-boxing is wrong is not the same as knowing what’s wrong with the argument for two-boxing. Knowing that the argument leads to a wrong conclusion is not the same as knowing where the fallacy is, just like knowing your program doesn’t run is not the same as knowing where the bug is. And finding the bug is key to getting better performance in the future!

Moreover, I claim that understanding what’s wrong with the two-boxing argument, at a deep, intuitive/emotional level, is key to understanding how not to be tempted by anti-sunk-cost-fallacy heuristics to violate your integrity. You can progress from resisting the temptation to be untrustworthy to not being tempted at all, by realizing that sometimes, being untrustworthy is a logically incoherent strategy.

Transparent Newcomb

To make the two-boxing argument more potent, let’s now imagine the boxes are transparent. Say they’re made of glass. A strategy for this game consists of two components:

What you’ll do if you see \$1,000,000 in box A
What you’ll do if you see \$0 in box A

(Technicality: to ensure his predictions remain accurate, if Newcomb predicts you’ll attempt to defy his predictions— say, by two-boxing in case 1, or one-boxing in case 2—then he makes sure not to give you that opportunity, perhaps by not offering you the game at all.)

Now, if you walk up to the boxes and see \$0 in Box A, it feels really weird to treat that number as a variable under your control. Zero is a constant! If this happens, you should just take both boxes and collect your \$1,000 right? And if you walk up and see A=\$1,000,0000, then you might as well take A+B and collect your \$1,001,000, right?

Well, if you’re the kind of person who two-boxes when you see A=$\1,000,000, that scenario will never happen to you, so your payoff is bounded at \$1,000. You have to be the kind of person who can turn down the extra \$1,000 in order to get offered the \$1,000,000 in the first place. This is kind of like being “trustworthy”, insofar as you model Newcomb’s hopes that you won’t defy his predictions as “trust”.

Moreover, to ensure Newcomb definitely sets you up with \$1,000,000 and not \$0 in box A, you have to be the kind of person who would one-box anyway even if you see A=\$0. That way, when Newcomb imagines you in that scenario, he learns that if he places \$0 in box A — an implicit prediction that you will two-box — then you will defy his prediction and one-box instead. This ensures that the only consistent thing for Newcomb to imagine (and remain an accurate predictor) is you one-boxing. This is kind of like being “trustworthy” even in scenarios where someone didn’t trust you; it means that you would defy Newcomb’s “mistrust” by being trustworthy anyway. Since Newcomb’s is aiming to be a good predictor, this ensures that he will “trust” you.

Probabilistic Transparent Newcomb

Some people feel like Newcomb being a 100% accurate predictor of them is robbing them of free will, making the problem is unfair. They feel like as long as Newcomb has only a 99%-or-less chance of predicting them correctly, they should assume they’re in that 1% scenario when they happen upon the \$1,000,000 box, and just go ahead and grab the extra \$1,000 in Box B.

This is a mistake. For, consider a further variant of the Transparent Newcomb problem where you’re uncertain about how good a predictor Newcomb really is. Say that Newcomb makes perfect predictions only a 10% of the time (independently of what you do), and the other 90% of the time his predictions are random and uncorrelated with you. Well, 10% of \$1,000,000 is still much more than \$1,000, so one-boxing is still the right strategy. That is, if you have a 10% chance of ending up in a perfect Transparent Newcomb scenario, you still want to be a one-boxer. In particular, uncertainty about Newcomb’s predictive power does not eliminate the need to sabotage the present (leave \$1,000 on the table) in favor of the past (to have been offered the \$1,000,000 option). What’s going on?

What exactly is wrong with the two-boxing argument?

I’m just gonna spoil it for you now, so stop here if you want to think more. The logical fallacy in the argument for two-boxing is the part where you think you’re walking up to the boxes for the first time in real-life. If Newcomb is so good at predicting you, when you see the \$1,000,000 in box A, you might be a simulation in Newcomb’s imagination, being run yesterday to decide how much money goes in the box! In that case, you should clearly one-box (if you care about how much money real-you gets, which you should, because people whose copies care about each other perform better than people who don’t). More interestingly, if you see A=0, then simulation-you should still one-box, to ensure that scenario doesn’t play out, and that Newcomb will end up offering you the \$1,000,000.

Note that I’m not just saying “it’s better to one-box on Newcomb”; I’m saying it is a logical fallacy to be certain you’re not in a simulation when you make the decision. Let’s examine this claim more critically.

First, perhaps Newcomb doesn’t simulate a fully conscious version of you when he prepares for the scenario. And, since you can tell that you’re conscious, it’s a bit weird to say “I can’t tell if I’m in Newcomb’s imagination/simulation”.

However, if you know already that Newcomb is somehow-able to predict you, and you know (or believe) that he does this using non-conscious imagined version of you, it is a fallacy for you to think that you can base your decision on whether you’re conscious. Apparently, whatever decision procedure you use, if it starts with “If I’m conscious, do X, else do Y” then Newcomb must always be running the “If I’m conscious” branch of your strategy, otherwise his predictions would be incorrect.

In other words, even if some part of your mind “really knows” that it’s conscious, the part of your mind that decides what to do on Transparent Newcomb apparently doesn’t “really know” that you’re conscious, in the sense that Newcomb is able to trick that decision algorithm by running it in his head with the “If I’m conscious” boolean set to “true”.

In this case, even if some part of your mind can in some sense legitimately detect your consciousness (I won’t get into arguments about whether that’s possible here), it is a fallacy for your decision procedure to act like *it* can tell that you’re conscious, once you know (or suspect) that Newcomb is going around making perfect predictions of you with non-conscious imagined versions of your mind.

Grokking the uncertainty about where your decision-algorithm is

In order to more fully acknowledge this realization, back in high school I spent a bunch of time vividly imagining myself in Newcomb-like hypotheticals, and vividly imagining the uncertainty of not knowing whether I’m the real me or Newcomb’s imagined version of me. After a while, it just felt automatic to be uncertain in that way, and the temptation to two-box went away.

That is, not only does the argument for two-boxing no longer make emotional sense to me, and not only am I able to pinpoint the exact line of the argument where if fails, but my pinpointing of the error itself happens automatically and intuitively, as a result of a clearer visualization of how the scenario is working. When the two-boxing argument gets to saying “Since you’re staring right through the glass at the contents of Box A and already know its fixed value, you should optimize your payoff by adding \$1,000 to it.”, my gut goes … Wrong! Because Newcomb can predict my decisions, I (or my decision-algorithm) does not know if I’m staring at the real Box A, or the one in Newcomb’s imagination yesterday! (If you want to nerd out about exactly how sure I should be that I’m real, the answer is 50%, because of a super cool result in this blog post by Jessica Taylor and Ryan Carey, echoing an equivalent result of Piccione and Rubinstein from 1997.)

This is what it’s like to let decision-theory all the way into your soul

People who trust you are like Probabilistic Transparent Newcomb problems

Now, here’s the fun part where you realize you’re secretly in Newcomb-like problems all the time. When someone looks you in the eyes, talks to you, and gives you a great opportunity (like the \$1,000,000) with the hopes that you won’t exploit them (take the extra \$1,000)… well, before they offered you that opportunity, they had some chance of understanding and predicting whether you would be trustworthy to them. That means you’re in a probabilistic transparent Newcomb problem! On top of caring about that person intrinsically and valuing their happiness and your relationship with them, you should be extra loyal to them because of the value of the great opportunity they gave you based on trust. The strength of this consideration depends both on the value of the opportunity the person gave you, and how well you think they understood your trustworthiness when they did.

When I say you “should” be extra loyal based on this consideration, this isn’t just a moral “should”. It’s just a logical, you’re-missing-something-if-you-don’t-realize-this kind of “should”. The kind that maybe ought to make you feel icky and embarrassed of yourself for missing out on, if you’re the kind of person who feels icky about that kinda thing.

So, if you ever find yourself feeling “Whoah, why am I being loyal to this person if I’ve already reaped the benefits of their trust? I know I’m supposed to be moral, but isn’t this the sunk cost fallacy?” Then, even if for some reason you don’t like that person anymore (hopefully a rare scenario), and even if for some reason you think there won’t be any reputational cost of letting them down (hopefully also rare), if you think they gave you a great opportunity based on some minimal degree of understanding of your personality, you should still think, “Wait a minute, I’m about to commit the my-decision-algorithm-thinks-it-knows-where-it-is fallacy. The part of me that’s making this decision right now might actually be operating in the imagination of the person who I think already trusted me, but is currently using this imagined scenario to determine whether I get to reap the benefits of their trust for real. Maybe this isn’t a sunk cost fallacy after all!”

Other important arguments for integrity

Deserving trust is just one aspect of what you might call “moral” integrity that I think follows from avoiding certain logical fallacies; check out Paul’s post on “Integrity for consequentialists” for a shallower-but-broader overview of some more such considerations that all add up to what I consider a pretty strong force in favor of basic moral behavior, even in the absence of friendship and reputation.

In other words: geez, be a decent person already :p

Start following conservative media, and remember how agreements between people and states actually work

acritch — Mon, 30 Jan 2017 21:57:09 +0000

Dear liberal American friends: please pair readings of liberal media with viewings of Fox news or other conservative media on the same topics. This will take work. They will say things you disagree with, using words you are unfamiliar with. You’ll have to stop scrolling down on Facebook and actively google phrases like “Trump executive order to protect America.” That may sound hard, but the integrity of your country depends on you doing it.

You’ve probably heard about the President’s executive order restricting immigration from seven countries, which lead to the mistreatment of legal visa holders and permanent residents of the United States in Airports. You probably also understand that there is a huge difference between ruling out new visas from those countries, and dishonoring existing ones. The latter is breaking a promise. Dishonoring agreements like that makes you untrustworthy, and that is very bad for cooperation. Right?

Well, hear this. The electoral college is an agreement. Collectives of human beings agreed with one another to form states that would represent and govern them, which in turn agreed to form a country with a particular means of representation. Today, as a consequence of that agreement, when you move to (or remain in) many urban areas, you are relinquishing some degree of your control over the governance of the United States, and deferring a disproportionate share of that control to your fellow Americans in rural areas. That is a choice you make, albeit confounded by many other factors like where you were born and where your friends live. That is simply how the prevailing agreement works.

Now, here’s what concerns me: I hear a lot of arguments saying that green cards should be honored because they’re agreements that people base their lives on, but I don’t hear many arguments that the electoral college should be honored. Be it acknowledged, to honor the electoral college may have undesirable consequences. But so also might the honoring of green cards. Even if you don’t believe that, some do, and in principle it is a valid consideration.

Sure, you might want to argue for a nation-wide agreement to dissolve the electoral college system. That would not be in violation of existing promises and commitments between people and states. But to argue that the electoral college be disregarded while it still exists is a very different move. I don’t think it’s very reasonable to say “f*** the electoral college; Clinton got the popular vote; Trump is not my President”, and in the next breath say “green cards are sacrosanct agreements between people and nations that must be honored”, except of course to signal civil unrest. So, as long as you possess the intelligence to do so, please try to keep your potentially-incoherent signaling behaviors (“Trump is not my President, but protect green cards”) mentally walled off from your intellectual arguments about how the world works and how agreements function.

In particular, here I am drawing your attention to a very important agreement that you may not like. I claim that, if you live in America, you are in an agreement with rural and conservative Americans whereby they now control the country that you live in. If you are part of a different demographic than they, it is in your demographic’s interest to understand the views and needs of the ruling demographic. You may appeal to them, but you may not make demands that their representatives consider unreasonable, except for those demands to be rejected. You may be okay with making demands that are rejected, for important signalling effects like holding open the Overton window and reminding people how you think the world really ought to be if we could all just agree on it. But meanwhile, your demographic must also seek out creative solutions to satisfy its own needs that are acceptable to the ruling demographic, or else the changes your demographic seeks are much less likely to happen. This may seem unfair to you, but it is implicit in the agreement of the electoral college until it is legitimately dissolved, just as a green card entitles a foreign national to permanent residence in the United States until it is legitimately dissolved.

This means that, if you’re not conservative, your demographic needs to understand conservatives now in order to thrive, and consuming conservative media is a good way to do that. Understand what conservative leaders want and represent. Understand what their constituents need and demand. And then, using some of your most creative thinking and problem solving, start thinking about how to give them what they want in a manner that you find maximally agreeable.

If you’re a liberal, you probably recoil at the idea of deploying your creative problem-solving skills to “prevent terrorism” when you know how few deaths are incurred from terrorism each year, and how they are sensationalized by the media in a cycle that only further incentivizes terrorism. But no matter how obvious it may seem to you that terrorist prevention measures are a mistake, you have not yet transferred this view to people who remain afraid of terrorism and wish to support leaders in favor of drastic immigration bans. If you do not share those fears, you must choose between appeasing those fears by your means or theirs, as long as those fears remain in power.

You may wish to somehow change who is power, but that too will require cooperating with people you disagree with on what feel like very fundamental facts and values. You are stuck on this planet with other humans, who believe things you don’t, and you have to understand those beliefs to engage with them fruitfully.

So, please, stop scrolling down reflexively on Facebook, and go read Fox news. Stop googling “court order to stay Muslim ban” and search instead for “Trump’s executive order to protect nation”. In doing this, do not lose sight of your own views and logic; those are perhaps even more precious when you are not in power. But do not remain so blind to the views and needs of your fellow Americans and their representatives that their only recourse is to forcibly overpower you.

Time to spend more than 0.00001% of world GDP on human-level AI alignment

acritch — Tue, 10 Jan 2017 17:06:48 +0000

From an outside view, looking in at the Earth, if you noticed that human beings were about to replace themselves as the most intelligent agents on the planet, would you think it unreasonable if 1% of their effort were being spent explicitly reasoning about that transition? How about 0.1%?

Well, currently, world GDP is around \$75 trillion, and in total, our species is spending around \$9MM/year on alignment research in preparation for human-level AI (HLAI). That’s \$5MM on technical research distributed across 24 projects with a median annual budget of \$100k, and 4MM on related efforts, like recruitment and qualitative studies like this blog post, distributed across 20 projects with a median annual budget of \$57k. (I computed these numbers by tallying spending from a database I borrowed from Sebastian Farquhar at the Global Priorities Project, which uses a much more liberal definition of “alignment research” than I do.) I predict spending will roughly at least double in the next 1-2 years, and frankly, am underwhelmed…

There are good signs, of course. For example, the MIT Media Lab and the Berkman Klein Center for Internet and Society at Harvard University will serve as the founding anchor institutions for a new initiative aimed at bridging the gap between the humanities, the social sciences, and computing by addressing the global challenges of artificial intelligence (AI) from a multidisciplinary perspective.

This is not HLAI alignment research. But it’s an awesome step in the right direction. *But* it’s nowhere near what’s needed. Why?

The lag time from major successes in Deep Learning to generally-socially-concerned research funding like that of the Media Lab / Klein Center has been several years, depending on how you count. We need that reaction time to get shorter and shorter until our civilization becomes proactive, so that our civilizational capability to align and control superintelligent machines exists before the machines themselves do. I suspect this might require spending more than 0.00001% of world GDP on alignment research for human-level and super-human AI.

Granted, the transition to spending a reasonable level of species-scale effort on this problem is not trivial, and I’m not saying the solution is not to undirectedly throw money at it. But I am saying that we are nowhere near done the on-ramp to acting even remotely sanely, as a societal scale, in anticipation of HLAI. And as long as we’re not there, the people who know this need to keep saying it.

Considerations against pledging donations for the rest of your life

acritch — Wed, 07 Dec 2016 06:35:25 +0000

I think donating to charity is great, especially if you make more than \$100k per year, placing you well past the threshold where your well-being depends heavily on income (somewhere around \$70k, depending on who does the analysis). I’ve been in that boat before, and donated more than 100% of my disposable income to charity. However, I was also particularly well-positioned to know where money should go at that time, which made donating particularly worth doing. I haven’t made any kind of official pledge to always donate money, because I take pledges/promises very seriously, and for me personally, taking such a pledge seems like a bad idea, even accounting for its signalling value. I’m writing this blog post mainly as a way to reduce social pressure among such folks who earn less than \$100k per year to produce donations, while at the same time encouraging folks who earn more to consider donating more seriously.

Note that I currently work for a charitable institution that I believe is extremely important. So, having been both a benefactor and beneficiary of donations, I hope I may come across as being honest when I say “donating to charity is great.”

Note also that I believe I’m in a somewhat rare situation relative to humans-in-general, but not necessarily a rare situation among folks who are likely to read my blog, who tend to have interests in rationality, effective altruism, existential risk, and other intellectual correlates thereof. Basically, depending on how much information I expect you to actively obtain about the world relative to the size of your donations or other efforts, I may or may not like the idea of you pledging to always donate 10% of your income. Here’s my very rough breakdown of why:

Consider future variance in whether you should donate.

If you either (1) make less than \$100k/year, or (2) might be willing to make less than that at some future time in order to work directly on something the world needs you to do (besides giving), I would not be surprised to find myself recommending against you pledging to always donate 10% of your income every year.

Moreover, if you currently spend more than 100 hours per year investigating what the world-at-large needs, I would not be that surprised if in some years you were able to find opportunities to spend \$10k-worth-of-effort (per year on average, rather than every year) that were more effective than giving \$10k/year. Just from eyeballing people I know, I think a person who spends that much time analyzing the world (especially one who is likely to come across this post) can be quite a valuable resource, and I expect high initial marginal returns to their own direct efforts to improve themselves and the world.

Example: during my PhD, I spent a considerable fraction of my time on creating a non-profit called the Center for Applied Rationality. I was earning very little money at that time, and donating 10% of it would have been a poor choice. It would have greatly reduced my personal flexibility to spend money on getting things done (saving time by taking taxis, not worrying about the cost of meals when I was in flow working with a group that couldn’t relocate to obtain cheaper food options without breaking productivity, etc.). I think the value of my contribution to CFAR during those years greatly exceeds \$4,000 in charitable donations, which is what 10% of my income over two years would have amounted to. In fact, I would guess that it exceeds \$40,000, so even if I thought things were only 10% likely to turn out as well as they did, not donating in those years was a good idea.

In other years when I made much more money, I’ve chosen to donate 100% of my disposable income. You might want to do that sometimes, too, and I would highly recommend considering it, especially if you’re spending a lot of your time investigating where that money should go. But I still might recommend against you pledging to keep donating, unless you expect to stop investigating the world as much as you currently do and will therefore be less likely to discover things in the future that should change your plans for years-at-a-time.

Sometimes you should trust your own future judgement.

You might think that you should just defer all your decisions about where money or effort should go to the investigations of a larger group like GiveWell, OPP, GPP, or GWWC, who spend more time on investigation than you. Such a position favors donating as a way of impacting the world, because your impact gets multiplied by the value of someone else’s investigation. This is a highly tenable position, but I believe it becomes less tenable as the ratio of [value of time you spend investigating cause prioritization] to [value of money or effort you spend on your top cause] increases. E.g., if you’ve spent 100 hours this year identifying and analyzing arguments about what the world needs most, I would not be surprised if you could find a way to spend \$10k worth of money or effort on some important and neglected cause that was more valuable than donating to something with more mainstream support.

On the other hand, it would take more convincing for me to think it was also worth you spending \$1mm worth of money or effort on that cause, since that would represent a larger inefficiency in the charity market that should have been easier for others to have identify, and someone spending $1mm has plenty of incentive to have investigated (or hired investigation) for more than 100 hours. That would be a case where I think it makes more sense to depend on (or even better: pay for your own!) more centralized analysis of what’s needed.

Expecting variance + respecting your judgement = not pledging

The combined effects of

expecting variance in whether you should donate, and
respecting your own judgement for donations valued comparably to the time you spend investigating,

leads me to recommend some folks against pledging to always donate 10% of their income. If you expect low variance and/or low time-ratio-spent-investigating relative to the examples I’ve given, I’m less likely to discourage you from taking such a pledge, because it helps you signal to the world that donating to charity is extremely important.

Having said that: you can donate lots of money without ever pledging do so for the rest of your life, and if you can afford it, I totally think you should do it