# curse of dimensionality

## Definitions

## Etymologies

## Examples

• Arises in problems that map an input to an output, for instance, classification problems in supervised learning or statistics. (Everything2.com)

• Boy, those folks over at Everything2.com are just sharp as tacks, aren't they? That explanation or definition or observation or gloss, or whatever it was meant to be, is about as useful as a chocolate teapot.

In statistics, the curse of dimensionality just refers to the fact that the sample size needed to address a given type of problem satisfactorily increases exponentially with the number of variables under study. Thus, if the response you are interested in only depends on a single variable X, which can take values from 0 to 10 (for example), you might take 10 samples equally spaced along that range and feel that you had covered it reasonably well. Add another possible variable Y, which can also take values from 0 to 10 - to sample the (X,Y) range with the same fidelity would require 100 samples. In general, for n variables under study, 10^n samples.

This gets to be a problem, e.g. in microarray studies, where the RNA expression levels of up to 8000 genes at a time can be measured. Trying to find where the action is in that 8,000-dimensional space of genes can be tricky.

Mathematically, this results from the exponential dependence of the volume of the unit hypercube on dimension, and the difficulty of studying all regions of the space adequately.

And if I have allowed a note of detectable pique to creep into this commentary it's because I have never read quite such an idiotic characterization of statistical reasoning before in my life ("problems that map an input to an output" seems to cover more or less any cognitive process) and my professional sensibilities are (slightly) offended.

