Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11.

Slides:



Advertisements
Similar presentations
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Advertisements

Analysis of Dental Images using Artificial Immune Systems Zhou Ji 1, Dipankar Dasgupta 1, Zhiling Yang 2 & Hongmei Teng 1 1: The University of Memphis.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Order Structure, Correspondence, and Shape Based Categories Presented by Piotr Dollar October 24, 2002 Stefan Carlsson.
Business Identification: Spatial Detection Alexander Darino Week 8.
Face Recognition Method of OpenCV
CS 551 / CS 645 Antialiasing. What is a pixel? A pixel is not… –A box –A disk –A teeny tiny little light A pixel is a point –It has no dimension –It occupies.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen
Facial feature localization Presented by: Harvest Jang Spring 2002.
A Low-cost Attack on a Microsoft CAPTCHA Yan Qiang,
An article by: Itay Bar-Yosef, Nate Hagbi, Klara Kedem, Itshak Dinstein Computer Science Department Ben-Gurion University Beer-Sheva, Israel Presented.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Lecture 5 Template matching
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Image Enhancement To process an image so that the result is more suitable than the original image for a specific application. Spatial domain methods and.
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Generic Object Recognition -- by Yatharth Saraf A Project on.
Processing Digital Images. Filtering Analysis –Recognition Transmission.
Midterm review: Cameras Pinhole cameras Vanishing points, horizon line Perspective projection equation, weak perspective Lenses Human eye Sample question:
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
(1) Feature-point matching by D.J.Duff for CompVis Online: Feature Point Matching Detection, Extraction.
Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.
Highlights Lecture on the image part (10) Automatic Perception 16
Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
CS448f: Image Processing For Photography and Vision Denoising.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
VEHICLE NUMBER PLATE RECOGNITION SYSTEM. Information and constraints Character recognition using moments. Character recognition using OCR. Signature.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Multiclass object recognition
Chapter 2. Image Analysis. Image Analysis Domains Frequency Domain Spatial Domain.
Computer vision.
: Chapter 10: Image Recognition 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Spatial Business Detection and Recognition from Images Alexander Darino.
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Business Identification: Local Neighborhood Alexander Darino.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
EFFICIENT ROAD MAPPING VIA INTERACTIVE IMAGE SEGMENTATION Presenter: Alexander Velizhev CMRT’09 ISPRS Workshop O. Barinova, R. Shapovalov, S. Sudakov,
Multimodal Information Analysis for Emotion Recognition
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Computer Vision Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations.
Business Identification: Spatial Detection Alexander Darino Week 5.
November 13, 2014Computer Vision Lecture 17: Object Recognition I 1 Today we will move on to… Object Recognition.
G52IVG, School of Computer Science, University of Nottingham 1 Edge Detection and Image Segmentation.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
CAPTCHA Processing CPRE 583 Fall 2010 Project CAPTCHA Processing Responsibilities Brian Washburn – Loading Image into RAM and Preprocessing and related.
MedIX – Summer 07 Lucia Dettori (room 745)
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Expectation-Maximization (EM) Case Studies
COMP322/S2000/L171 Robot Vision System Major Phases in Robot Vision Systems: A. Data (image) acquisition –Illumination, i.e. lighting consideration –Lenses,
Copyright Howie Choset, Renata Melamud, Al Costa, Vincent Lee-Shue, Sean Piper, Ryan de Jonckheere. All Rights Reserved Computer Vision.
Edge Detection and Geometric Primitive Extraction Jinxiang Chai.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Autonomous Robots Vision © Manfred Huber 2014.
Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015
1 Machine Vision. 2 VISION the most powerful sense.
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Non-linear filtering Example: Median filter Replaces pixel value by median value over neighborhood Generates no new gray levels.
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
Optical Character Recognition
Paper Presentation: Shape and Matching
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Brief Review of Recognition + Context
Object Recognition Today we will move on to… April 12, 2018
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Presentation transcript:

Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11

STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection

One Font per classifier, a-z A-Z Generate alphabet templates Resize & center templates; Divide into grid (7x7) Apply several 2D Gabor filters to each grid patch – Different orientations, frequencies, variances, – For each pixel, yields real/imaginary component of transformation Feed data into Linear Discriminant Analysis – Reduces features and forms classifier at same time

2D Gabor Filter Convolution of Gaussian x Sine wave

Training Process

Character Determination Each grid patch has it’s own LDA classifier; classifier returns vector of probabilities for each symbol To classify overall character, recursively consider all 9-neighborhoods, multiply corresponding probabilities together When only one grid-patch remains, highest probability wins

Recognition Process Color Properties Analysis: Choose channel with highest confidence of best distinguishing foreground from background Binarization Threshold (50% of Otsu’s Method) Intermediate Representation: Trim, Resize, and Center Binary Image Perform OCR on variations of Int. Rep: stretched, eroded (gaussian-based), diluted Aggregate and return votes

Recognition Process Example: “G” using Trebuchet-MS Classifier Query Character (Actual Size) Intermediate Representation (Actual Size)

abcdefghijklmno pqrstuvwxyz ABCDEFGHIJKLMN OPQRSTUVWXYZ

Recognition Process Example: “G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: s Variation (Actual Size) Identified Character: G

Recognition Process Example: “G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: B

Recognition Process Example: “G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: B

Recognition Process Example: “G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G

Recognition Process Example: “G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: a

Recognition Process Example: “G” using Trebuchet-MS Classifier Final Results: – B: 5/15 – G: 5/15 – g: 3/15 – a : 1 (6.6%) – s : 1 (6.6%)

“GEORGE” (Trebuchet-MS) Votes: E: 14/15 t: 1/15

“GEORGE” (Trebuchet-MS) Votes: j: 13/15 i: 2/15 ‘j’ is the default when unable to decide Should invert during preprocessing

“GEORGE” (Trebuchet-MS) Votes: j: 13/15 i: 1/15 M: 1/15 ‘j’ is the default when unable to decide Should invert during preprocessing

“GEORGE” (Trebuchet-MS) Votes: B: 5/15 G: 5/15 g: 3/15 a: 1/15 s: 1/15

“GEORGE” (Trebuchet-MS) Votes: j: 12/15 Y: 2/15 X: 1/15 ‘j’ is the default when unable to decide Should invert during preprocessing or training

Note on the “Inversion Problem” Easy to fix; common problem in OCR systems Will likely detect and correct during preprocessing state as opposed to training More training data: slower, less reliable Preprocessing: like trying many different lenses at the eye doctor and taking your best guess with each lense.

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) Votes: B: 9/15 j: 3/15 H: 2/15 F: 1/15

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) Votes: A: 9/15 j: 5/15 n: 1/15

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) Votes: K: 12/15 j: 2/15 H: 1/15

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) Votes: E: 5/15 j: 3/15 L: 3/15 r: 2/15 F: 2/15

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) Votes: p: 12/15 j: 3/15 PR

“BAKERY” (Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) Votes: Y: 12/15 j: 3/15

“UNIVERSITY” (Used: Times New Roman) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

“UNIVERSITY” (Used: Times New Roman) Votes: U: 8/15 C: 3/15 j: 2/15 s: 1/15 O: 1/15

“UNIVERSITY” (Used: Times New Roman) Votes: N: 12/15 j: 3/15

“UNIVERSITY” (Used: Times New Roman) Votes: l(‘el’): 9/15 I(‘eye’): 6/15

“UNIVERSITY” (Used: Times New Roman) Votes: v: 9/15 j: 3/15 V: 3/15

“UNIVERSITY” (Used: Times New Roman) Votes: F: 9/15 L: 5/15 l (‘el’): 1/15

“UNIVERSITY” (Used: Times New Roman) Votes: G: 9/15 j: 6/15

“UNIVERSITY” (Used: Times New Roman) Votes: j: 12/15 x: 2/15 w: 1/15

“UNIVERSITY” (Used: Times New Roman) Votes: j: 5/15 C: 4/15 O: 4/15 x: 2/15

“UNIVERSITY” (Used: Times New Roman) Votes: T: 9/15 l: 3/15 i: 1/15 j: 1/15 L: 1/15

“UNIVERSITY” (Used: Times New Roman) Votes: Y: 10/15 j: 3/15 i: 2/15

Evaluation Biggest weaknesses in preprocessing stage – OCR sensitive to thresholding/color inversion – Occasionally color modeling chooses a bad channel to use for OCR – happens more often on low-resolution images Works surprisingly well for low-resolution images Font does not need to be exact, but proportions need to be roughly the same

How do I use this information?

The Big Picture Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses ImageSTR Detected Text Business Name Matching Business Identification Business Spatial Detection 43

Old Approach Form words from highest-voted characters Compare to lexicon using Levenshtein distance Use existing ranking system afterwards BOKFRY > BAKERY (L-DIST = 2) GFQRGF > GEORGE (L-DIST = 3)

New Approach (Lexicon-assisted STR) Minimize Levenshtein distance with best permutation of voted characters Use existing ranking system afterwards B O K F P Y G U H E R I >>> BAKERY J A j L I l (L-DIST = 0)

The End Result 46 Bruegger's Bagels Category:Bagels Address:Market Sq Pittsburgh, PA Phone: (412) Rating: Not Rated Category:Bagels Address:Market Sq Pittsburgh, PA Phone: (412) Rating: Not Rated

Next Steps Fix STR Preprocessing – Bug in Color Modeling code found online – Inversion determination – Multiple thresholds Word matching: Generate templates of words/logos instead of letters Text detector: fix character/word fragmentation by reading papers that address the issue

Next Steps Test more images; fix problems as they arise Ideas to consider: – Feed grid-patch probability vectors into SVM instead of “smoothing” – Generate “disambiguation classifiers” to differentiate: Between top contending votes. Remember how ‘G’ and ‘B’ got confused? Dynamically create classifier to tell them apart Between commonly confused letters. Eg. E/F, l/i/j, o/c, etc – Don’t consider statistically insignificant confidences

Next Steps Text Detection – Look into after more work has been done on STR – Need to address issues: Intracharacter segmentation Intercharacter segmentation Word segmentation – Needed to make STR system automated like before

Thank You