I am trying to show the following result.
Let $X_1, \ldots,X_n$ be independent random variables with the common density $f$ and distribution function $F$. If $X$ is the smallest and $Y$ the largest among them, the joint density of the pair $(X, Y)$ for $y>x$ is given by $n(n-1)f(x)f(y)[F(y)-F(x)]^{n-2}$
Some thoughts towards a partial solution
Attempt 1:
Given they all share the same density, the Joint density can be calculated as $f_{X,Y}( x,y) = f_{Y\mid X}( y\mid x) f( x) = f_{X\mid Y}( x\mid y) f( y)$
So we can choose $x$ in $n$ ways and fixing $x$, we can pick the maximum random variable as $C_{1}^{n-1} = (n-1)$ so this explains $n(n-1)f(x)f(y)$ part but i am unsure why we have the difference of the distribution functions of the $y$ and $x$ times $(n-2)$. i know we have $n-2$ variables to still account for and they are being integrated out. Hence we should have $(n-2)$ terms but why the difference ?
Attempt 2: The sample space corresponding to $X_1, \ldots,X_n$ is the $n$-dimensional hypercube $\Gamma $ defined by $x_k=f$ and the probabilities equal the $n$-dimensional volume. The natural sample space with the $X_k$ as coordinate variables is the subset $\Omega$ of $\Gamma$ containing all points such that $x_1\leq \cdots \leq x_n$. The hypercube contains $n!$ congruent replicas of the set $\Omega$ and in each the ordered $n$-tuple $(X_1,\ldots,X_n)$ coincides with a fixed permutation of $X_1,\ldots, X_n$.
I am not sure i am getting anywhere with these thoughts. Any help would be much appreciated.