4
$\begingroup$

I was wondering if anyone knew about Mathematics involved in Captcha Solvers. I'm not interested in spamming, I just it would be interesting to see how to classify letters. I'm certain homology is used to determine the holes of the letter. I also figure that a line is connected only if at least a certain number of pixels are next to each other.

  • 2
    You may be better off asking on another forum. I get the impression [computer vision](http://en.wikipedia.org/wiki/Computer_vision) uses general machine learning techniques more than algebraic topology, really...2012-08-20
  • 0
    Hmm... maybe some neural network thingie, which will read zip codes off envelopes?2012-08-20
  • 0
    I agree with Zhen Lin. Notice that captchas tend to be blurred and contain a lot of noise and chopped up letters. I *really* think it belongs to cs.stackexchange.com/ (rather than here, anyhow). I know next to nothing about algebraic topology, but taking into account the multitude of fonts and distortions possible, I think it would be near-impossible to develop an algebraic tool to crack any real captcha.2012-08-20

1 Answers 1

2

Gunnar Carlsson gave a talk many years ago about using algebraic topology to detect handwriting - in particular, handwriting in a particular Indian language. I'm not sure that anything on this was published, though. My recollection is that they used the notion of "soft tangent bundles" to distinguish two letters that had been indistinguishable by other methods. This was presented as a refinement of the idea of using local homology groups (i.e. considering the homology of deleted neighborhoods of points on the letter).

Googling the phrase "soft tangent bundle" didn't give any hits, so maybe I should say something about what this means. My memory is a bit fuzzy on this, but I think the idea is to consider vectors that are "almost" tangent to the letter (meaning that the angle they form with the actual tangent is small) which is viewed as a (collection of) curves.

Unfortunately, I have no recollection of what sort of geometry this detected (so as to distinguish the letters) and I don't seem to have notes from the talk...

  • 0
    Gunnar gave me the references. There are four papers by G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas, all available here: http://comptop.stanford.edu/preprints/2012-08-21
  • 0
    I haven't looked carefully at the references above to see whether my description is accurate. However, the phrase "soft tangent bundle" is definitely not used (they do talk about a "tangent complex" though). The above post is community wiki, so anyone who looks at the above references should feel free to correct my vague memory of this topic. I may eventually do so myself.2012-08-21