2
$\begingroup$

First, my background is not math.

My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.

Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).

But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.

Problem is, I don't even know what to google (tried for hours), and even when I do find something that MAY be the answer (kernel density estimation, continuous probability distribution), I don't understand what the hell they're talking about.

The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.

  • 0
    Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.2011-11-18
  • 1
    If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.2011-11-18
  • 0
    Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks2011-11-20
  • 1
    Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have **NO MODE** whatsoever. This is not as if people did not know the solution, people **know** that there is **no solution** (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).2011-11-27
  • 0
    I don't have an Excel solution for you. This is a near replicate of http://stats.stackexchange.com/questions/19952/computing-the-mode-of-data-sampled-from-a-continuous-distribution except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.2012-04-07
  • 0
    While what Didier and Dilip say are almost surely true, especially since you're saying that you are using (only) hundreds of float data, you *should* be able to use some sort of kernel density estimation. However, most kernel density estimations will require some assumption about your distribution.2012-04-07

1 Answers 1