3
$\begingroup$

Let $\mathcal{P}_i$ be the set of probability density functions to which $f_i$ belongs, $(i=0,1)$. Furthermore assume that $$L(y)=\frac{f_1(y)}{f_0(y)}$$ is an increasing function for any chosen $f_1$ and $f_2$. Let the support of the densities be a compact set in reals defined by $\mathbb{K}$.

For a given threshold $\tau\in\mathbb{K}$ one can calculate the probability of false alarm and probability of miss detection as follows:

$$P_F(\tau)=\int_\tau^{\infty}f_0(y)dy$$

$$P_M(\tau)=\int_{-\infty}^\tau f_1(y)dy$$

ROC:=$(P_F(\tau),P_M(\tau))$ forms a curve in $[0,1]$ which is convex.

(ROC, for those who don't know, stands for Receiver Operator Characteristic).

Here is an example:

$$f_0(y)=\frac{1}{\sqrt{2\pi\sigma_0^2}}e^{\frac{-\left(y-\mu_0\right)^2}{2\sigma_0^2}}$$

$$f_1(y)=\frac{1}{\sqrt{2\pi\sigma_1^2}}e^{\frac{-\left(y-\mu_1\right)^2}{2\sigma_1^2}}$$

with $\sigma_0=\sigma_1=1$ and $\mu_0=0$ and $\mu_1=1$. Then we have the following figure for $(P_F(\tau),P_M(\tau))$ when $\tau$ is changed from $-\infty$ to $\infty$, ($\mathbb{K}=\mathbb{R}$).

enter image description here

As known and can be seen from the figure, the blue curve is convex.

For any chosen pair of densities $(f_0,f_1)\in \mathcal{P}_0\times \mathcal{P}_1$. The ROC curve (the blue one) $(P_F(\tau),P_M(\tau))$ when $\tau\in (-\infty,\infty)$ will lie in the butterfly given in the figure with red lines assuming that the point $\theta=P_F=P_M$ is common for all densities in $\mathcal{P}_0\times\mathcal{P}_1$ (in the figure $\theta \approx 0.3$)

Question:

Assume that all densities $(f_0,f_1)\in\mathcal{P}_0\times\mathcal{P}_1$ are known to have a particular $\theta$ in their ROC. In other words, let $\mathcal{P}_0\times\mathcal{P}_1$ define only the pair of densities that have $\theta$ in their ROC and furthermore let one choose any pair of density from $\mathcal{P}_0\times\mathcal{P}_1$ with equal probability.

What is the probabilty that a single point of the ROC that we obtain by this selection will lie in the green sector?

Once again the green sector is the intersection of the butterfly with the area under the line which passes through $\theta$ and $f_1/f_0$ is increasing as defined before. One can assume any $\mathbb{K}$ for example $\mathbb{K}=[0,1]$ or ($\mathbb{K}=\mathbb{R}$).

  • 0
    @Andres Caicedo you sure that this question doesnt have anything to do with set theory?2012-11-05
  • 0
    Yes. Completely sure. In fact, I would suggest to remove "infinities" from the title.2012-11-05
  • 0
    Is the black line tangent to your ROC? Also as an unrelated aside, this would be a fairly poor ROC. You'd be much better off with $1-$ your classifier! Such a curve would be concave upward, so I'm not sure if the question would be relevant or different in such a case.2012-11-05
  • 0
    @Andres Caicedo Then you will give me a hint how to deal with my sets with infinite elements and some have even more and I am interested in how much more?2012-11-05
  • 0
    I'd actually add the tags [tag:signal-processing] and possible [tag:machine-learning] to this to attract attention of folks who work with this more often.2012-11-05
  • 0
    @EdGorcenski Signal processing guys will not be able to deal with this problem as the difficulty lies in the infinite sets. There are infinitely and uncountably many number of pair of densities mapping to one curve in the given red butterfly. The ones touching the green area are also infintely many. The questions is the rate. I think some people who knows probability theory and set theory quite well can solve it.2012-11-05
  • 0
    @EdGorcenski black line passes from the point $\theta$ and intersects $x$ and $y$ axes at the same point. On the black line $P_F=P_M$. This line is independent of $f_0$ and $f_1$. But you are right it is tangent.2012-11-05
  • 0
    The line $y=x$ also defines an infinite set. I guess signal processing guys can't handle that, either?2012-11-05
  • 0
    @EdGorcenski ok ok I add the Tag as you wish. I dont want to underestimate any group as I am from signal processing and already discussed the problem with some friends.2012-11-05

1 Answers 1

1

If the black line is tangent, and the blue curve is convex, then there is only a single point of the blue line contained in the green area. This is because the green area is defined by the tangent line, and convexity guaranteed that the blue curve will not intersect the black line, and hence the green region, at any other point.

If you're looking for the probability that a single realization will land in this region, simply compute the area of the region. The ROC curve defines the "dividing line" of classification; however, the unit square is still your global probability space.

If you want to know the probability that a different ROC curve intersects this green region, then you can employ a few different conditions. First, assume that any other ROC curve is convex and continuous. Then, the curve defined by $$ R = \left\{ \left(P_F(\tau),P_M(\tau)\right) \right\}$$ is continuous and monotonic and maps from $[0,1]$ to $[0,1]$.

Therefore, this curve has a fixed point in $[0,1]$, namely the point where

$$P_F(\tau) = P_M(\tau).$$

These fixed points lie on the line $y=x$. Obviously, any monotonic curve whose fixed point is $x' > \theta$ will not intersect the green region.

Conversely, any curve with a fixed point $x' \le \theta$ will pass through the green triangle.

Therefore, your probability is $P(x' \le \theta)$ and is uniformly distributed, so your probability in question is therefore exactly $\theta$, which agrees with my previous assessment.

  • 0
    you are not right. you can draw infinitely many ROC intersecting the black line. Of course I am not talking the point $\theta=P_M=P_F$ as it is **always** and trivially in the green region.2012-11-05
  • 0
    I have no idea in what context you're using ROC. Are you talking about drawing more blue curves that possible intersect the black line? Are you talking about using a receiver with this characteristic and wanting to find the probability of landing in this region?2012-11-05
  • 0
    I calculated the area of green area and divided it to the whole area of the butterfly. This will give you exactly $\theta$. However I dont think that this is the solution. Thats the reason why I posted the question here.2012-11-05
  • 0
    Yes. Think that you are given all probabilty densities which have $P_F=P_M=0.3$, then clear that we have infinitely many of them. We put them in a box and pick one pair with equal probabily. Then lets draw the ROC curve for the chosen pair. We will get a convex curve in the butterfly (**can or cannot** intersect the black line). Question is what is the probability that it will intersect? Again excluding the point $\theta$2012-11-05
  • 0
    Ok, I think I understand what you're asking. I am editing my answer.2012-11-05
  • 0
    @SeyhmusGüngören I have updated the answer. The probability you are looking for is indeed $\theta$.2012-11-05
  • 0
    I agree in case you can prove that $P(x^{'}\leq\theta)$ is uniform. My problem is actually here. If I know that the occurances of all points in the butterfly are uniform and if the question is to pick a point uniformly. It is okay. But First the curve is a specific curve which is convex. Next, I am not able to prove if we pick the densities uniformly at random, then the points at the ROC are also **uniformly** passed through? this is not clear to me...2012-11-05
  • 0
    The solution to your question relies on the distribution of the fixed point. This fixed point is distributed over $[0,1]$. This fixed point is dependent on the characteristics of $P_M(\tau)$ and $P_F(\tau)$. Specifically, the probabilities are defined using cumulative distribution functions of arbitrary distributions; however, the CDF is defined on $[0,1]$, and so we may use an **inverse sample transform** to a uniform distribution to treat the resulting fixed point as a uniformly distributed random variable.2012-11-05
  • 0
    yes it depends on how many densities are there passing through point say $(0.2,0.4)$ compared to say $(0.2,0.45)$ from the defined set of densities which pass through $\theta$. I only know that the density which I choose is chosen uniformly. But I dont know how many of them eventually pass for example $(0.2,0.45)$. I mean I can choose the densities from the set uniformly but it might be the case that they pass more times from the point $(0.2,0.45)$ then $(0.2,0.40)$. Assume all these are in the butterfly.2012-11-05
  • 0
    Why? because there are infintely many densities which can pass through $(0.2,0.40)$ as well as $(0.2,0.45)$. Can you prove that both infinities have the same cardinality? especially thinking that the curve structure is convex2012-11-05
  • 0
    It has nothing to do with cardinality or infinities. Any ROC curve will pass through the butterfly if and only if its fixed point is less than $\theta$. This is guaranteed by continuity and monotonicity of the ROC curve. It doesn't matter how many densities pass through $(.2,.4)$ or whatever. All that matters is that the fixed point is less than $\theta$. By talking about infinities and cardinalities you are greatly and unnecessarily over-complicating the problem. Every ROC curve has a fixed point. We need only focus on the distribution of that fixed point.2012-11-05
  • 0
    Yes we are somehow calculating the probability of probability. $P_F$ is a probability and we need to check its occurance. The problem doesnt directly say that $P_F$ is uniformly distributed. When we calculate a probability of an event we only check its occurance. We randomly choose a density2012-11-05
  • 0
    Yes we are somehow calculating the probability of probability. $P_F$ is a probability and we need to check its occurance. The problem doesnt directly say that $P_F$ is uniformly distributed. When we calculate a probability an event we only check its occurance. We randomly choose a density and this Choice is uniform. To relate this to $P_F$ we need to see that from N random choice of pair of densities, K has a point in green region. And this rate is $\theta$. I don't want to overcomplicate the problem. The occurrence of $P_F$ depends on the number of pairs of densities.2012-11-05
  • 0
    "The occurrence of $P_F$ depends on the number of pairs of densities." No, it doesn't. $P_F$ is a cumulative distribution function evaluated at a point, returning a value in the interval $[0,1]$. The problem as posed involves computing the probability of this value being less than a certain threshold. We then treat $P_F$ as a random variable by using an inverse sample transform, regardless of its density. I am voting now to close this question, as it has promoted unnecessary discussion.2012-11-05
  • 0
    @SeyhmusGüngören This analysis is with respect to some pre-determined value of $\theta$; if you wish to let $\theta$ vary randomly as well, then you can choose to compute the joint probability with any arbitrary distribution on $\theta$. However, the probability of a randomly chosen ROC falling within the butterfly region **with respect to some pre-determined ROC** can be taken to be uniform. The problem then reduces to computing the joint distribution of fixed points. The specific value is unanswerable, as you must provide further information to determine that value.2012-11-05