Dominion Strategy Forum

Please login or register.

Login with username, password and session length
Pages: [1]

Author Topic: Sample Uniformly from ~Normal Distribution  (Read 1359 times)

0 Members and 1 Guest are viewing this topic.

GeoLib

  • Jester
  • *****
  • Offline Offline
  • Posts: 965
  • Respect: +1263
    • View Profile
Sample Uniformly from ~Normal Distribution
« on: April 04, 2015, 01:02:57 am »
0

So I'm trying to do this thing for my research and struggling with finding the best way to do it. But I thought maybe some of the people here might know of a good way (especially since we seem to have so many statisticians).

So I have some data that is approximately uniformly distributed wrt to a variable x. I would like a sample of that data that is approximately uniformly distributed wrt x on the range from the minimum to maximum. Specifically, I want to sample without replacement about a third of this data. I tried to bin things, but in order to have enough samples in each bin, I can only use 3 bins, which is not ideal. Is there a good way to do this? Obviously, since I want to draw without replacement I can't really get a uniform distribution, since the ends will quickly be depleted, but something "sort of uniform" would be good. Any thoughts?
Logged
"All advice is awful"
 —Count Grishnakh

blueblimp

  • Margrave
  • *****
  • Offline Offline
  • Posts: 2818
  • Respect: +1527
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #1 on: April 04, 2015, 02:47:52 am »
+2

So I have some data that is approximately uniformly distributed wrt to a variable x.
I don't understand what this means. Specifically, I'm not familiar with the "distributed with respect to a variable" terminology. Could you be a bit more specific about the motivating problem maybe?
Logged

Titandrake

  • Mountebank
  • *****
  • Offline Offline
  • Posts: 2163
  • Respect: +2739
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #2 on: April 04, 2015, 04:27:39 am »
+1

If I'm understanding it right...you have a list of values in your variable x, and want to pick a third of it without replacement? (I'm assuming you have a list because it otherwise doesn't make much sense to sample without replacement.)

Assuming that's true, you could look into the Knuth shuffle? http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle. Assuming you can mess with the given list, you can do
the permutation generation, but stop after picking the first 1/3rd elements.

Code: [Select]
i = 0
while you want more samples:
  r = random index from i to n-1
  swap elements a[i] and a[r]
  output a[i] as a sample
  i++

Depending on the language you're using, there's probably a library that does this for you as well. I know Python in particular has random.shuffle to get a random permutation of a list, so you can do that then chop off the first third as your samples.
Logged
I have a blog! It's called Sorta Insightful. Check it out?

DStu

  • Margrave
  • *****
  • Offline Offline
  • Posts: 2627
  • Respect: +1488
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #3 on: April 04, 2015, 04:57:24 am »
0

So I have some data that is approximately uniformly distributed wrt to a variable x.
I don't understand what this means. Specifically, I'm not familiar with the "distributed with respect to a variable" terminology. Could you be a bit more specific about the motivating problem maybe?

I would guess he means he has data whose distribution is the distribution of the variable x.

But in this case I'm a bit sceptical that what you want is to draw without replacement. This is of course only guessing without knowing the problem.
Logged

GeoLib

  • Jester
  • *****
  • Offline Offline
  • Posts: 965
  • Respect: +1263
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #4 on: April 04, 2015, 11:21:49 am »
0

So obviously my explanation was unclear.

I have a bunch of chemical compounds with a specific property I'm trying to predict (real-valued). The set of compounds is approximately normally distributed in this variable x (not sure I'm saying this right. If I were to bin them and plot frequency vs. x, I get something that looks apprximately like a normal distribution). I want to select a test set of these compounds, but I would like it to be approximately uniformly distributed in this variable (so if I plotted frequency vs. x of the test set, it would be approximately uniformly distributed in the range of possible values). This isn't actually possible to do, since I want a third of the compounds, and there aren't enough at the ends, but something closer would be good.

So I think Titandrake's solution gives me a uniform random sample of the list, which is what I was doing previously.

The reason for wanting a different set is that the compounds we're actually interested in are the ones at the extreme values, so we want to make sure that when we say our model is effective, it isn't just good at predicting things with middling values because those are the bulk of the test set.

Does that make sense?

The solution I came up with last night was to generate a uniform random variable and select the compound with the x value closest to that value, which I think is probably fine. If someone has a better idea though, that would also be good since it's somewhat awkward as is.
Logged
"All advice is awful"
 —Count Grishnakh

DStu

  • Margrave
  • *****
  • Offline Offline
  • Posts: 2627
  • Respect: +1488
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #5 on: April 04, 2015, 04:10:18 pm »
0

What you can do if you want to be exact is http://en.wikipedia.org/wiki/Rejection_sampling. But might be inefficient as hell, also because there are not enough values at the edge.

RS is a technique to sample from a distribution, starting from another one which is easy to sample. Given you know the density between those two.  And the density has an upper limit.
In your case, you can easily sample from your distribution which is approximately Gaussian, by the technique Titandrake already mentioned: Just draw the sample at a random index.
Your target distribution is uniform.
You also know the (approximate) density between those two: phi(x) := 1/sqrt(2pisigma²)exp(-x²/sigma²).  On a bounded intervall, it is also bounded (but the constant might be bad, which would make the algo inefficient). Call the constant M.

So what you do to draw one sample:

1.) Draw uniform sample z from distribution x:
2.) accept z with probability phi(z)/M. Otherwise, goto 1)

To draw 1/3N samples, you do this 1/3N times.
Logged

Witherweaver

  • Adventurer
  • ******
  • Offline Offline
  • Posts: 6476
  • Shuffle iT Username: Witherweaver
  • Respect: +7845
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #6 on: April 04, 2015, 04:34:39 pm »
0

Have you looked into psuedorandom or quasirandom sequences?  The idea is that they're low discrepency, so you better span the space.  I think that's what you're trying to do with the bins.
Logged

GeoLib

  • Jester
  • *****
  • Offline Offline
  • Posts: 965
  • Respect: +1263
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #7 on: April 04, 2015, 04:51:45 pm »
0

What you can do if you want to be exact is http://en.wikipedia.org/wiki/Rejection_sampling. But might be inefficient as hell, also because there are not enough values at the edge.

RS is a technique to sample from a distribution, starting from another one which is easy to sample. Given you know the density between those two.  And the density has an upper limit.
In your case, you can easily sample from your distribution which is approximately Gaussian, by the technique Titandrake already mentioned: Just draw the sample at a random index.
Your target distribution is uniform.
You also know the (approximate) density between those two: phi(x) := 1/sqrt(2pisigma²)exp(-x²/sigma²).  On a bounded intervall, it is also bounded (but the constant might be bad, which would make the algo inefficient). Call the constant M.

So what you do to draw one sample:

1.) Draw uniform sample z from distribution x:
2.) accept z with probability phi(z)/M. Otherwise, goto 1)

To draw 1/3N samples, you do this 1/3N times.

Hmmm... This does look like approximately what I want, though I think the method I came up with is actually easier to implement. Thank you though!

Have you looked into psuedorandom or quasirandom sequences?  The idea is that they're low discrepency, so you better span the space.  I think that's what you're trying to do with the bins.

Well python's random module uses pseudorandom numbers, but I think quasirandom is actually what you're talking about with low discrepancy. That might be worth checking out.


I've gotten a sample with x approximately uniformly distributed from the method I mentioned. I'm grumpy though because it means our model can't train on anything very far from the mean because they're all in the test set >:(

Thank you to everyone who made a suggestion
Logged
"All advice is awful"
 —Count Grishnakh

DStu

  • Margrave
  • *****
  • Offline Offline
  • Posts: 2627
  • Respect: +1488
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #8 on: April 04, 2015, 04:54:11 pm »
0

Hmmm... This does look like approximately what I want, though I think the method I came up with is actually easier to implement. Thank you though!

It might even be that your method does the right thing.  I'm not in the mood to calculate that though...
Logged

Witherweaver

  • Adventurer
  • ******
  • Offline Offline
  • Posts: 6476
  • Shuffle iT Username: Witherweaver
  • Respect: +7845
    • View Profile
Re: Sample Uniformly from ~Normal Distribution
« Reply #9 on: April 04, 2015, 05:11:28 pm »
0

Yeah, I mean quasi.
Logged
Pages: [1]
 

Page created in 0.069 seconds with 21 queries.