If you can match means and variances (that is, matching first moment and second moment), you're doing much better. If you can match more moments, the distributions are closer.
Basically
E(X) = E(Y)
is weak
E(X^n) = E(Y^n) for all n= 1, 2, ..., N
for a decent size N is much better.
So, I've never taken a real statistics class, only "AP" Statistics. Even there we still talked about how two distributions can have different spreads, i.e. variance. I suppose this is the second moment, and I looked up how it's defined: Var(x) = E((E(x) - x)^2), which makes sense certainly. This doesn't look like what you wrote here, but I like the idea that you're just talking about how the expectation of the variable to a power is different from the expectation to a power. So then I thought, based on your post, before I saw the above definition, that maybe Var(x) = E(x^2) - E(x)^2, since that is always positive and seems to measure spread. But now I notice:
E((E(x) - x)^2) = E(E(x)^2 - 2x E(x) + x^2) = E(x)^2 - E(2x E(x)) + E(x^2) = E(x^2) - 2 E(x) E(x) + E(x)^2 = E(x^2) - E(x)^2
I think that's pretty neat. Can we define the n-th moment to be E(x^n) - E(x)^n? Wait, but that doesn't work for n = 1 at all. So then how would we define it analogously to the E((E(x) - x)^2) definition of variance? What even is the third moment?
Maybe it's something like E(Var(x) - (E(x) - x)^2)), but wait that's zero, so on a whim, maybe we square the inner bit? Then it's:
E((Var(x) - (E(x) - x)^2))^2) = E(Var(x)^2 - 2 Var(x) (E(x) - x)^2 + (E(x) - x)^4) = E((E(x) - x)^4) - Var(x)^2 = E((E(x)^2 - 2x E(x) + x^2)^2) - Var(x)^2
Hmmm, that doesn't really work. Okay, maybe it's E((E(x) - x)^3) = E(E(x)^3 - 3x E(x)^2 + 3x^2 E(x) - x^3), which is E(x)^3 - 3 E(x)^2 E(x) + 3 E(x) E(x^2) - E(x^3) = E(x)^3 - E(x^3) + 3 E(x) Var(x)
I like this because if X and Y have the same mean and variance then the only different term is E(x^n). So is the n-th moment given my E((E(x) - x)^n)? I still don't have an intuitive feel for what it should actually say about the distribution in general, and also this does not work for n = 1.
EDIT: I looked it up, and looks like the n-th moment is indeed E((E(x) - x)^n), neat. Just ignore the first moment I guess. I also learned just now that the third moment measures skewness, which makes sense actually. If E(x) = 0, then the direction of skew is determined by the sign of the third moment I would think. Well, that was a fun distraction