Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAPTCHA solving Tianhui Cai Period 3. CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart User is human or machine? Prevents.

Similar presentations


Presentation on theme: "CAPTCHA solving Tianhui Cai Period 3. CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart User is human or machine? Prevents."— Presentation transcript:

1 CAPTCHA solving Tianhui Cai Period 3

2 CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart User is human or machine? Prevents spam on registration pages Audio and visual Visual – contains noise, distortions  rotation  translation  scaling  noise  warp

3 Goal Solve a CAPTCHA, pretend to be a human Read the image – figure out what it says This has been done before. Show weaknesses of visual CAPTCHAs

4 Procedure Acquire image (from internet)‏ Remove background clutter Segmentation (separating letters)‏ Generate training/testing data set Letter identification (next section)‏

5 Procedure – cont’d Train on image data Test Review/Analyze

6 Implementation JAVA / ruby Acquire images – captchas.net  formula to get actual text from image Remove background clutter – median filter, etc Segmentation – flood fill Letter identification – neural network

7 First quarter Three layer backpropagation neural network Neural network – good for classification  Used often for image recognition Artificial neurons  convert input to output Backpropagation  used to let the neural network learn

8 Second quarter Image processing – Java ImageIO Noise removal Segmentation

9 Third quarter Neural network – made save / load  saved into a text file  a neural network can be trained multiple times Downloaded necessary images (ruby)‏  captchas.net  filename is what the image says Analyzed image outputs from images Cropped and centered segmented letters  uniform letter size  centered around bounding box  uniformity is good for training

10 Fourth quarter Train and test Problem: it didn’t work  resized images to 11x9, and then it worked Gather data / analyze  learning rate  number of training iterations  Best around 92% success rate (when training data and testing data are separate)‏  Higher when testing data is part of training data

11 Future? Segment letters that stick together (flood fill won’t work there)‏ using vertical divides Shape context Conquer more complicated distortions and noise Reduce blobbyness / better noise removal Neural network optimizations (structure, number of nodes)‏


Download ppt "CAPTCHA solving Tianhui Cai Period 3. CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart User is human or machine? Prevents."

Similar presentations


Ads by Google