8
$\begingroup$

I am trying to understand entropy. From what I know we can get the entropy of a variable lets say X.

What i dont understand is how to calculate the entropy of a matrix say m*n. I thought if columns are the attributes and rows are object we can sum the entropy of individual columns to get the final entropy(provided attributes are independent). I have couple of question

  1. IS my understanding right in case of independent attributes?
  2. What if the attributes are dependent? what happens to entropy? Is there where conditional entropy comes in?

Thanks

2 Answers 2

5

First of all, you shall keep in mind that there are actually many entropy definitions. The most common one is so called Shannon information entropy
$H(X) = -\sum_{i=1}^{n}p_i\log p_i$ where $p_i$ is the probability of seeing the $i$th possible outcome of $X$.

Therefore

  1. Entropy is defined close-related to the probability distribution of random variable $X$
  2. Entropy does not care about correlation or independence, because only the probability distribution matters.
  3. Yes we do have conditional entropy, see wiki pages for details.

I am not sure in what context you want to find the entropy for a matrix, but in image processing, where images are represented by matrices. The way that we measure the entropy of an image is to

  1. Find the distribution of pixel intensities (i.e. the distribution of element values)
  2. Compute entropy using the information entropy formula
  • 0
    does anyone have a ref or can someone sketch out how the shannon entropy formula can be computed from a pixel intensity distribution?2013-09-25
3

You may be interested in the Von Neumann entropy of a matrix, which is defined as the sum of the entropies of the eigenvalues. Ie, for $A = P \begin{bmatrix}\lambda_1 \\ & \lambda_2 \\ && \ldots \\ &&& \lambda_n \end{bmatrix} P^{-1}$ with positive $\lambda_i$, the entropy is, $H(A):=-\sum_i \lambda_i \log \lambda_i.$

For more on the definition of the von Neumann entropy you might look here on wikipedia, and for how to maximize it numerically you could look at my answer on this Computer Science stack exchange thread.

For rectangular matrices, you could extend the definition by replacing the eigenvalues with singular values in the SVD, though it's not clear what this would mean.