18
$\begingroup$

Suppose I have a square, let's say the sides have length 1. I will then partition the square into $N^2$ sub-squares, where $N \in \mathbb{N}$ and the sub-squares are all the same size. Now, suppose we place two points $A$ and $B$ randomly within the large square a distance of $D$ apart from each other, and then draw the line segment $AB$. The question is, what is the expected number of small squares that $AB$ will pass through?

I have no idea how to solve this analytically, and was thinking of attempting to get a sense using simulation, but was curious what I could learn about it here. (This isn't a homework question; it's related to a problem I'm working on in economics research.)

  • 0
    Please edit the question to clarify it. Also, you might have to mention how exactly you are selecting the points, as the method of selection could potentially change the answer.2012-03-23

4 Answers 4

3

I broadly agree with @Byron Schmuland's answer, which seems correct, but I have some additional suggestions.

I am assuming that the size of a small square is 1, for simplicity, compared to a distance of D.

If the question is to place A and B "randomly" with a given distance of D apart, this may fall foul of Bertrand's paradox: that this concept of randomness is not clearly enough defined.

However, it seems to make sense to place A first, suggesting we should place B on an arc with centre A and radius D. The arc will be restricted depending on how close it is to the edge of the square.

Let's imagine that we choose the point A, and then the angle $\theta$ is chosen randomly. Each direction $\theta$ is equally likely (by assumption here and by symmetry of the square).

Then the number of squares passed through will be $D|\sin(\theta)|+D|\cos(\theta)|+1$ from Byron's work and using a right-angled triangle.

$E[D|\sin(\theta)|+D|\cos(\theta)|+1]$ $= D \cdot E[|\sin(\theta)|+|\cos(\theta)|] + 1$ $= 4D/\pi + 1$

Or, considering $(1/N)$ to be the length of a little square (according to the question as given) $= 4ND/\pi + 1$

In the case where $N = 2$ ($2$ little squares per side) and $D = 0.5$ ($D$ is half the length of a long side), we expect to pass through $2.27$ squares. Seems right. Sometimes it will pass through only $1$, but more often $3$.

This will break down badly when D gets long compared to the side of the big square, I imagine (D>1?), when the line no longer has the option of being horizontal or vertical.

  • 0
    Thanks :) I think the question would benefit a lot from giving the hinted-at economics context. That would make the question a lot clearer.2012-04-01
12

Imagine the subdivided square as an $N\times N$ chessboard of subsquares. Every point on the board belongs to one subsquare with coordinates $(r,c)$, where $r$ (the row index) ranges from $1$ to $N$ and similarly for the $c$ (the column index).

Now the number of subsquares that the line segment $\overline{AB}$ between two uniformly selected points $A$ and $B$ touches equals $d(A,B)=|A(r)-B(r)|+|A(c)-B(c)|+1$. Here $A(r)$ means row index for $A$, and similarly for the other notation. The points $A$ and $B$ are in "general position" because of the randomness, so the segment always passes through the maximum number of subsquares. The segment doesn't hit any corners.

Since the random variables $A(r),B(r),A(c),B(c)$ are independent and uniformly distributed on $\{1,\dots, N\}$, it is not hard to calculate that the expected number is $\mathbb{E}(d(A,B))=1+{2(N^2-1)\over3N}.$

  • 1
    @itzy Yes, I could only solve the simplest case of this problem.2012-03-23
4

Suppose that you divide your square into $N^2$ sub-squares, where $N\in\mathbb{N}$. Then suppose that $A=(x_{1},y_{1})$ and $B=(x_{2},y_{2})$ where $x_{1},x_{2},y_{1}$ and $y_{2}$ are real numbers in $[0,1]$. Then the segment $AB$ must pass through at least $N|y_{2}-y_{1}|$ squares in the $y$ (or vertical) direction and at least $N|x_{2}-x_{1}|$ squares in the $x$ (or horizontal) direction. So, we would expect the segment $AB$ to pass through $N|x_{2}-x_{1}|+ N|y_{2}-y_{1}|$ squares.

  • 0
    It ends up that this answer is an undercount by 1, because we should include the squares at the end and beginning of each direction (contrary to usual counting). The $x$ and $y$ directions are kind of independent, because you can either pass _up_ into the next square, or _left_ into the next square - there isn't really a risk of double-count. However, we are assuming that the line doesn't pass exactly through a corner, which would complicate things a bit. Which is fine, because it's almost sure that it _won't_ pass through a corner.2012-03-31
4

The number of of the cells touched by the line is (following Byron's answer) $N (|X_1-X_2| + |Y_1-Y_2| +1) $ . Calling $X =|X_1 - X_2|$ and $Y = |Y_1 - X_2|$ and leaving apart the $N$ factor, this is approximately equivalent, for large $N$, to $d_M=X+Y$, the Manhattan distance.

Hence, I'll attack the following problem: We throw two points at random on the unit square; we want to compute $E(d_M | d)$

where $d=\sqrt{X^2+Y^2}$ is the euclidean distance.

The expected number of touched cells, in this approximation, would be $N E(d_M | d) + 1$.

It's not difficult to see that $X$ (absolute value of the difference of two uniform v.a) has a triangular density : $f_X(x)=2(1-x)$ The same for $Y$, and both are independent, hence: $f_{XY}(x y)=4(1-x)(1-y)$ on the unit square.

UPDATE: Below I found a much simpler solution

--------------- begin ignore -----

Conditioning on a fixed value of $d$ implies that we must restrict to an arc, the intersection of $x^2+y^2=d^2$ and the unit square.

Let's assume first that $d<1$

Note that $d\le d_M \le \sqrt{2} d$

To compute $P(d_M \le \xi | d)$, we must integrate over two pieces of arcs begining on the axis (because of symmetry, we compute just one and multiply by two). (blue arcs in the figure) The first limit point is given by $x_1=\frac{\xi - \sqrt{2 d^2-\xi^2}}{2}$ enter image description here Further to change the integration over the arc for an integral over $x$, we note that $ds=\sqrt{1+\frac{x^2}{y^2}}{dx} = \frac{d}{y}dx$. So, leaving out the constants (that disappear on the division), we get that the cumulative distribution function (conditioned on $d$) is given by

$F_{d_M|d}(\xi;d)=P(d_M \le \xi | d) = \frac{2 \int_{0}^{x_1}\left( 1-x\right) \,\left( \frac{1}{\sqrt{{d}^{2}-{x}^{2}}}-1\right) dx}{\int_{0}^{d}\left( 1-x\right) \,\left( \frac{1}{\sqrt{{d}^{2}-{x}^{2}}}-1\right) dx}$

Finally, we must integrate to get the expected value

$E(d_M |d) = d+ \int_{d}^{\sqrt{2} d} \left[1- F_{d_M|d}(\xi;d) \right] \; d\xi $

This is all.. but it seems pretty complicated to get in close form. Let's see Maxima:

assume(x1>0);assume(d>0);assume(d>x1); integrate((1-x)*(1/sqrt(d^2-x^2)-1), x,0,x1); 

$\frac{2\,\mathrm{asin}\left( \frac{x1}{d}\right) +2\,\sqrt{{d}^{2}-{x1}^{2}}+{x1}^{2}-2\,x1-2\,d}{2}$

ix2(x1,d):=(2*asin(x1/d)+2*sqrt(d^2-x1^2)+x1^2-2*x1-2*d)/2; x1(dm,d):=(dm-sqrt(2*d^2-dm^2))/2; assume(dm>d); assume(dm

This gets a little messy. Let's try at least some numerical values:

dd:0.8; dd+quad_qags(1-fdist(dm,dd), dm, dd, sqrt(2)*dd) 

1.01798

Let's simulate to check: Matlab/Octave:

N=300000; t=rand(N,4); xy = [abs(t(:,1)-t(:,3)),abs(t(:,2)-t(:,4))]; t=[]; d2=xy(:,1).^2+xy(:,2).^2; d=sqrt(d2); dm=xy(:,1)+xy(:,2); step=0.02; dround=round(d/step)*step; %size(dround(dround==0.8)) mean(dm(dround==0.8)) >>>ans =  1.0174 

Not bad, I'd say. Some other values:

  d   Maxima      Octave (simul)  0.9  1.15847     1.1569  0.8  1.01798     1.0174  0.7  0.88740     0.8863  0.6  0.75983     0.7579  0.5  0.63328     0.6331  0.4  0.50698     0.5063  0.3  0.38062     0.3808 

The case $d>1$ should be solved in a similar way.

--------------- end ignore -----

A much simpler solution:

Let's change to polar coordinates: $x = r \cos(t)$, $y = r \sin(t)$. The joint density is

$f_{r,t}(r,t) = 4 r (1- r \cos(t))(1- r \sin(t))$

in the domain $0 \le r \le \sqrt{2}$, $0 \le t \le \pi/2$ for $r\le 1$, $ arccos(1/r) \le t \le arcsin(1/r)$ for $r > 1$

The conditional density is then

$f_{t|r}(t|r) = \frac{1}{g(r)} (1- r \cos(t))(1- r \sin(t))$

where $g(r)$ is the normalization constant (indepent of $t$).

Assuming first that $r<1$, then

$g(r) = \int_0^{\pi/2} (1- r \cos(t))(1- r \sin(t)) dt = \frac{{r}^{2}}{2}-2\,r+\frac{\pi }{2}$

Further, $d_M= x + y= r (\cos(t)+\sin(t))$, so the conditional expectation is given by

$E[d_M | r] = r \frac{1}{g(r)} \int_0^{\pi/2} (1- r \cos(t))(1- r \sin(t)) (\cos(t)+\sin(t)) dt$

which gives

$ E[d_M | r] = r \frac{4\,{r}^{2}-3\,\left( \pi +2\right) \,r+12}{3\,{r}^{2}-12\,r+3\,\pi }$

For $r>1$ it's more complicated. The result is

$ E[d_M | r] = r \frac{\sqrt{{r}^{2}-1}\,\left( 4\,{r}^{2}+8\right) +\left( 6\,\mathrm{asin}\left( \frac{1}{r}\right) -6\,\mathrm{acos}\left( \frac{1}{r}\right) -6\right) \,{r}^{2}-4}{3\,{r}^{3}-12\,r\,\sqrt{{r}^{2}-1}-\left( 6\,\mathrm{asin}\left( \frac{1}{r}\right) -6\,\mathrm{acos}\left( \frac{1}{r}\right) -6\right) \,r} $

The figure shows $E[d_M | r] / r$, ie. the factor by which the expected Manhattan distance exceeds the euclidean distance (we already knew that this must be in the $[1,\sqrt{2}]$ range).

enter image description here

(BTW: sorry if the formatting is not optimal, but I got sick of my Chorme crashing on the edition, lots of times, sometimes losing changes - am I the only one?)

Added: Notice that for small distances ($r \to 0$) the expected number of squares results $r N \frac{4}{\pi} +1 $, which agrees with Ronald's answer.