I am trying to implement James McKee's speed-up of Fermat's factoring algorithm described at http://www.ams.org/journals/mcom/1999-68-228/S0025-5718-99-01133-3/home.html. The algorithm factors semi-primes in time O(n^{1/4}) as follows:
Find the factors of composite N that has no factors less than 2N^{1/4} (use trial division to assure this is so). Define b=ceil(sqrt(N)) and Q(x,y)=(x+by)^2-Ny^2. 1. Compute modular square root. Choose a prime p greater than 2N^{1/4} and compute the solutions to Q(x,1)==0 (mod p^2). There are at most two solutions which we denote x_0 and x_1. 2. Find square. For x_i in [x_0,x_1]: Set x=x_i and y=1. While Q(x,y) is not a square, set r=ceil(p^2/x), x=xr-p^2 and y=r. Abort the loop, go back to Step 1 and choose a different prime when y exceeds a given threshold y_{max} in the order of N^{1/4}. If Q(x,y) is a square, compute gcd(x+by-sqrt(Q(x,y)), N). If the gcd is between 1 and N it is a non-trivial (not necessarily prime) factor of N; return it and quit. Otherwise, go back to Step 1 and choose a different prime. Abort the algorithm when p reaches a given bound.
I am having trouble with the step "compute the solutions to Q(x,1)==0 (mod p^2)." I need to find the square root of N modulo p^2, but the Tonelli-Shanks algorithm requires the modulus to be an odd prime, so it fails.
I know it is possible to make that calculation. When trying to factor 13290059=3119*4261, I calculate N^{1/4}=60, the smallest prime greater than 2N^{1/4}=127, and the square roots of 13290059 mod 127^2 are 2020 and 14109, which I found at Wolfram|Alpha using PowModList[13290059,1/2,16129].
Is there any way to find the modular square root with a non-prime modulus short of trying every integer from 0 to p^2-1?