Topic: The Necro Wars (Read 356031 times)

silverspawn · « **Reply #2100 on:** December 11, 2021, 05:58:39 pm »

Like, the fact that the goal is about text completion doesn't do the work for you. You still need an elaborate physics model to answer physics questions, and knowledge of lord of the rings to answer lord of the rings questions and so on. This is why language modeling is so powerful. Being a good language model requires modeling everything because language encompasses everything.

Awaclus · « **Reply #2101 on:** December 11, 2021, 06:09:59 pm »

Quote from: silverspawn on December 11, 2021, 05:55:30 pm

GPT-3 is always trying to predict the next token in the sequence because that's the training objective. But -- clearly -- predicting the correct answer requires knowing the correct answer.

Sure, but predicting what a human would say does not require knowing the correct answer, and might not output the correct answer.

silverspawn · « **Reply #2102 on:** December 11, 2021, 06:34:12 pm »

Well it requires modeling it as well as the smartest human. You can set up a conversation between yourself and Van Neumann. And GPT-3 is. in fact, smarter if you use someone like that than if you use a random one.

But GPT-3 was only an example, anyway. The thing that kills everyone isn't going to be a language model

Awaclus · « **Reply #2103 on:** December 11, 2021, 07:01:49 pm »

Quote from: silverspawn on December 11, 2021, 06:34:12 pm

Well it requires modeling it as well as the smartest human. You can set up a conversation between yourself and Van Neumann. And GPT-3 is. in fact, smarter if you use someone like that than if you use a random one.

Sure, but GPT-3 doesn't model it as well as the smartest human, or at all. When you ask it bizarre enough questions, it becomes distinguishable from humans.

faust · « **Reply #2104 on:** December 12, 2021, 03:56:02 am »

Quote from: silverspawn on December 11, 2021, 01:05:47 pm

This may sound like a cop out, but my honest answer is "probably exactly like GPT-3 figured out that buying bitcoin 10 years ago means you're right now". I suspect that this is the same process that makes me reason about how to prove \sqrt{2} is irrational.

and we have no idea how that works because interpretability isn't there.

It does sound like a cop out. That's fair though - if I wanted to explain how human intelligence is different from deep learning algorithms I probably couldn't come up with a more satisfying answer either.

I will say though that there are approaches to AI that are different from deep learning, e.g. automated reasoning. I think to obtain something resembling general intelligence one would need to combine such approaches.

Awaclus · « **Reply #2105 on:** December 12, 2021, 04:44:34 am »

Quote from: faust on December 12, 2021, 03:56:02 am

It does sound like a cop out. That's fair though - if I wanted to explain how human intelligence is different from deep learning algorithms I probably couldn't come up with a more satisfying answer either.

Well, one clear difference is that we know exactly how deep learning algorithms work, while we only really have an incredibly vague idea how human intelligence, or non-artificial intelligence in general works.

silverspawn · « **Reply #2106 on:** December 12, 2021, 06:48:50 am »

We know how deep learning algorithms work, but not how the models that they find work. Crucial difference.

Quote from: faust on December 12, 2021, 03:56:02 am

I will say though that there are approaches to AI that are different from deep learning, e.g. automated reasoning. I think to obtain something resembling general intelligence one would need to combine such approaches.

Yeah, and I know that several smart people think AGI will not just be trained by deep learning. Also, deep learning hasn't even been around for that long and it's entirely conceivable that we will have a new paradigm completely take over

From a safety perspective, it seems difficult to imagine anything worse and harder to interpret than deep learning, so to the extent that AGI will be made in other ways, that's probably a good thing

Awaclus · « **Reply #2107 on:** December 12, 2021, 07:29:57 am »

Quote from: silverspawn on December 12, 2021, 06:48:50 am

We know how deep learning algorithms work, but not how the models that they find work. Crucial difference.

In what sense do we not know how the models work?

silverspawn · « **Reply #2108 on:** December 12, 2021, 07:43:41 am »

with a neural network, you have a giant vector of rational numbers coming in, then you iterate many linear and non-linear transofrmations (as many as there are layers), and in the end, your result pops out (e.g. in the form of a probability distribution over words). It's only a slight exaggeration to say that we don't understand anything about how they work except for thi.

silverspawn · « **Reply #2109 on:** December 12, 2021, 07:44:00 am »

*this

silverspawn · « **Reply #2110 on:** December 12, 2021, 04:29:17 pm »

me somehow getting an advantage in an endgame without really understanding why:

"haah yes this is nice"

me somehow getting a disadvantage in an endgame while strongly suspecting my opponent did nothing to plan for this

"you {lots of swear words mainly complaining about luck}"

I actually swear a lot in thought, just never in person and rarely in writing

Awaclus · « **Reply #2111 on:** December 13, 2021, 02:22:08 am »

Quote from: silverspawn on December 12, 2021, 07:43:41 am

with a neural network, you have a giant vector of rational numbers coming in, then you iterate many linear and non-linear transofrmations (as many as there are layers), and in the end, your result pops out (e.g. in the form of a probability distribution over words). It's only a slight exaggeration to say that we don't understand anything about how they work except for thi.

What is there to understand besides this? It's probably too complex to try to wrap your mind around how exactly the training data leads to the specific weights between nodes and how that, then, produces the ability to correctly predict words, but that's just because there are too many variables, not because it's fundamentally doing something we don't understand.

silverspawn · « **Reply #2112 on:** December 13, 2021, 01:48:54 pm »

Quote from: Awaclus on December 13, 2021, 02:22:08 am

Quote from: silverspawn on December 12, 2021, 07:43:41 am
with a neural network, you have a giant vector of rational numbers coming in, then you iterate many linear and non-linear transofrmations (as many as there are layers), and in the end, your result pops out (e.g. in the form of a probability distribution over words). It's only a slight exaggeration to say that we don't understand anything about how they work except for thi.

What is there to understand besides this?

Well, suppose you have an image classifier that's supposed to recognize animals. On one image that shows a duck, it outputs "cat". In this case, you might want to know why it misclassified the image. Even if it classifies the duck correctly, you might want to know what parts of the image were important. You might also want to know what the numbers in the network mean. In case of image classification, you generally put in 3 rational numbers per pixel of the image (for Red, Blue, and Green values), and the network iteratively transforms those into less granular numbers with more channels. E.g., with ResNet you start from 112 x 112 x 3 input values (3 because red/green/blue, 112x112 because that's how many pixels the image has), and at the final hidden layer, it stores 7x7x512 numbers instead. What do these numbers mean? (Choosing this example because that was exactly the question my paper looked at.)

For a different example, say you have GPT-3 and you prompt it with a dialogue where one person asks another a question and the second person gives the wrong answer. You'd like to know whether this happened because GPT-3 didn't know the answer or whether it knew but thought the person who gave the answer didn't know.

And for the hypothetical future scenario of an AGI, we'd like to know what it's trying to do. If it behaves nicely, is it because it's aligned with our interests or because it wants X which humans don't want and recognizes that hiding what it wants is useful to make us not update its code?

What you said is literally true; there is no one thing that we don't understand. Right now it's just affine linear transformations and sigmoid or relu functions. But the fact that we hypothetically understand every one step doesn't change that we have no idea what practically goes on inside the network.

Awaclus · « **Reply #2113 on:** December 13, 2021, 03:06:33 pm »

Quote from: silverspawn on December 13, 2021, 01:48:54 pm

Well, suppose you have an image classifier that's supposed to recognize animals. On one image that shows a duck, it outputs "cat". In this case, you might want to know why it misclassified the image. Even if it classifies the duck correctly, you might want to know what parts of the image were important. You might also want to know what the numbers in the network mean. In case of image classification, you generally put in 3 rational numbers per pixel of the image (for Red, Blue, and Green values), and the network iteratively transforms those into less granular numbers with more channels. E.g., with ResNet you start from 112 x 112 x 3 input values (3 because red/green/blue, 112x112 because that's how many pixels the image has), and at the final hidden layer, it stores 7x7x512 numbers instead. What do these numbers mean? (Choosing this example because that was exactly the question my paper looked at.

For a different example, say you have GPT-3 and you prompt it with a dialogue where one person asks another a question and the second person gives the wrong answer. You'd like to know whether this happened because GPT-3 didn't know the answer or whether it knew but thought the person who gave the answer didn't know.

And for the hypothetical future scenario of an AGI, we'd like to know what it's trying to do. If it behaves nicely, is it because it's aligned with our interests or because it wants X which humans don't want and recognizes that hiding what it wants is useful to make us not update its code?

What you said is literally true; there is no one thing that we don't understand. Right now it's just affine linear transformations and sigmoid or relu functions. But the fact that we hypothetically understand every one step doesn't change that we have no idea what practically goes on inside the network.

What's going on inside the hidden layers is incomprehensible to humans because it is complete nonsense that just coincidentally happens to produce the correct result most of the time (except it's not really a coincidence because the system has been rigged to make it happen). If you want to know why the image recognition fails, or why it works in a specific case, or why GPT-3 does what it does, the answer is not something that would make sense in humanly understandable concepts if only we could figure out how to find out what's really going on behind all the numbers, the answer is simply that this particular bunch of rigged nonsense happens to produce this result.

silverspawn · « **Reply #2114 on:** December 13, 2021, 03:22:38 pm »

I mean, if it produces the correct result, it can't literally be nonsense. In the case of ResNet, the output is computed entirely based on the 7x7x512 representation of the final hidden layer (though of course that layer is computed based on the previous stuff), so all information that went into the networks decision is present there. All 512 filers in the hidden layer mean something.

You've acknowledged that it's not coincidentally given that it was nudged by gradient descent. I don't think I get in what sense you still think it's kind of nonsense.

Anyway, the claim that it's not at all human-understandable is empirically untrue. ( Although I'd like to point out that, if it were true, this would be even more reason to expect doom from AGI.) My paper is based on the observation that the 512 filters at the end of ResNet do, in fact, activate for human-understandable concepts. The algorithmic contribution was to find clever ways to connect human-made annotations to approximate what the neuron is doing.

Here's an example:

This is from one of the 512 filters and one image, where I've thresholded the numbers, i.e., all of the 7x7 cells where the number is above a certain value are highlighted.

Now, this example is cherry-picked and most of them aren't as crisp. But nonetheless, this neuron is clearly reacting to the tree house. This is a human-understandable concept.

There is also work that goes into the loss function and edits it to encourage the filters to be more human-understandable.

silverspawn · « **Reply #2115 on:** December 13, 2021, 03:24:06 pm »

More abstractly, to classify images, you need an ontology that compresses high-level concepts into lower-level concepts. We use such an ontology, and it's probably not optimal, but it's also not arbitrary. The vast majority of ways that you could compress pixels into high-level concepts are way way way worse than what we are doing. So if you apply enough optimization pressure to a network to force it to compress low level concepts effectively, it will end up with something that is at least a little similar to what we are doing. And to the extent that it's different, we may still understand how it's different.

silverspawn · « **Reply #2116 on:** December 13, 2021, 03:31:07 pm »

Quote from: silverspawn on December 13, 2021, 03:24:06 pm

More abstractly, to classify images, you need an ontology that compresses high-level concepts into lower-level concepts.

Blplpp, i mean the opposite of course; you compress low level concepts like pixels into high level concepts like tree houses.

Actually, there's evidence from neuroscience that the brain's visual cortex also does this. We have these neurons at several layers, and the activations on the first layers are very local and the ones at the later layers don't. The analogy isn't perfect but closer than I used to think.

Awaclus · « **Reply #2117 on:** December 13, 2021, 04:39:23 pm »

Quote from: silverspawn on December 13, 2021, 03:22:38 pm

I mean, if it produces the correct result, it can't literally be nonsense. In the case of ResNet, the output is computed entirely based on the 7x7x512 representation of the final hidden layer (though of course that layer is computed based on the previous stuff), so all information that went into the networks decision is present there. All 512 filers in the hidden layer mean something.

You've acknowledged that it's not coincidentally given that it was nudged by gradient descent. I don't think I get in what sense you still think it's kind of nonsense.

Anyway, the claim that it's not at all human-understandable is empirically untrue. ( Although I'd like to point out that, if it were true, this would be even more reason to expect doom from AGI.) My paper is based on the observation that the 512 filters at the end of ResNet do, in fact, activate for human-understandable concepts. The algorithmic contribution was to find clever ways to connect human-made annotations to approximate what the neuron is doing.

Here's an example:

[pic]

This is from one of the 512 filters and one image, where I've thresholded the numbers, i.e., all of the 7x7 cells where the number is above a certain value are highlighted.

Now, this example is cherry-picked and most of them aren't as crisp. But nonetheless, this neuron is clearly reacting to the tree house. This is a human-understandable concept.

There is also work that goes into the loss function and edits it to encourage the filters to be more human-understandable.

Just because it produces the correct result doesn't mean it can't be nonsense. To use a very high-level example, there was the neural network that was supposed to identify skin cancer based on a photo and it turned out to identify rulers that tended to be present in pics of skin cancer, which produced the correct result with their data set but was nonsense.

In principle, the final hidden layer, which is computed based on previous stuff, is not fundamentally different from the output, which is also computed based on previous stuff. It's just a step earlier in the chain. Given that we expect the output to make sense to humans, it's not that surprising the step immediately before also contains some information that makes some kind of sense to humans (at least given the fact that we know what it's supposed to be doing). It would be surprising if all the previous layers also made sense.

silverspawn · « **Reply #2118 on:** December 13, 2021, 05:12:17 pm »

Quote from: Awaclus on December 13, 2021, 04:39:23 pm

Just because it produces the correct result doesn't mean it can't be nonsense. To use a very high-level example, there was the neural network that was supposed to identify skin cancer based on a photo and it turned out to identify rulers that tended to be present in pics of skin cancer, which produced the correct result with their data set but was nonsense.

Ok, but this is a case where the network specifically focused on an aspect in which the training set was unrepresentative. What about all of the cases where it doesn't do that, and instead makes its decision based on genuine aspects of the training data?

silverspawn · « **Reply #2119 on:** December 13, 2021, 05:14:36 pm »

(I also wouldn't call this nonsense. It found a totally legitimate way to classify the training data. It's just that the training data had a quirk that's not present in the real world. That's not the network's fault.)

Awaclus · « **Reply #2120 on:** December 13, 2021, 05:45:40 pm »

It's super difficult for training data to be completely representative, so that's a big part of the point, but also there are quirks that are present in the real world (arguably this is one, since people with real skin cancers really do have rulers nearby while having a picture taken more often than people with harmless skin conditions). In some other cases, identifying such a quirk could be almost indistinguishable from what humans do, and yet it would be completely different.

Awaclus · « **Reply #2121 on:** December 13, 2021, 05:48:14 pm »

And something like that could occur at a much earlier level, in which case it would be hard to find words to describe what it's doing or what it should be doing.

silverspawn · « **Reply #2122 on:** December 13, 2021, 06:34:11 pm »

I don't think I understand your position. It sounds like you're saying they networks are nonsensical insofar as they they reason from things we don't want them to reason from, but there are undoubtedly examples where this isn't happening, otherwise deep learning wouldn't be so successful in practice.

Also, this entire class of behavior is a reason why you want interpretability. You want to know what the network is looking at

Awaclus · « **Reply #2123 on:** December 14, 2021, 01:04:42 am »

Quote from: silverspawn on December 13, 2021, 06:34:11 pm

I don't think I understand your position. It sounds like you're saying they networks are nonsensical insofar as they they reason from things we don't want them to reason from, but there are undoubtedly examples where this isn't happening, otherwise deep learning wouldn't be so successful in practice.

Also, this entire class of behavior is a reason why you want interpretability. You want to know what the network is looking at

Humans also reason from things we don't want humans to reason from. There are tons of cases where nonsensical reasoning is almost as good and way easier to accomplish through evolution than correct reasoning.

And the fact that ANNs can detect patterns that appear incomprehensible to humans is not only a flaw but also a strength. For example, if the point of an algorithm was to detect whether someone is having suicidal thoughts based on their social media posts so that the site could show them helpful links, it wouldn't matter if it detected the equivalent of the ruler thing as long as the results were accurate.

silverspawn · « **Reply #2124 on:** December 14, 2021, 04:31:17 am »

Does any of that imply that interpretability tools are impossible or not useful?

Dominion Strategy Forum

News:

Author Topic: The Necro Wars (Read 356031 times)