1 Image Databases  Conventional relational databases, the user types in a query and obtains an answer in response  It is different in image databases.

1 Image Databases  Conventional relational databases, the user types in a query and obtains an answer in response  It is different in image databases  a police officer may issue a query: “ Retrieve all pictures from the image database that are “similar” to this person and give the identities of the people.”  This query is fundamentally different from ordinary queries for 2 reasons:  1. The query includes a picture as part of the query  2. The query asks about similar pictures and therefore uses a notion of “imprecise match”

2 Top-k Computation Assume that the database contains only 5 image objects O 0, O 1, O 2, O 3 and O 4. O1O1 O2O2 O3O3 O4O4 O0O0 query image Q ? Top-2 images Image Database

3 ida1a2a3a4a5 O 3 99 90 75 74 67 O 1 66 91 92 56 58 O 4 70 67 O 0 63 61 56 44 07 19 01 35 Total Score 405 363 207 216 Top-2 Therefore, the two images that best match the query image are: O 3 with score 405 and O 1 with score 363.

4 Raw images  the content of an image consists of all “interesting” objects in that image  each object is characterized by  a shape descriptor: that describes the shape/location of the region within which the object is located inside the image  a property descriptor that describes the properties of individual pixels (e.g. RGB values of the pixel, RGB values aggregated over a group of pixels, grayscale levels)  a property consists of F a property name, e.g., red, green, blue, texture F a property domain - range of values that a property can assume {0, 1,..7}

5 Images  Every image is associated with a pair of positive integers (m,n), called grid-resolution, which divides the image into (m  n) cells of equal size (called image grid)  Each cell consists of a collection of pixels  A cell property: (Name, Values, Method)  Example F (bwcolor, {b,w}, bwalgo}, where the possible values are b(black) and w(white), and bwalgo is an algorithm that takes a cell as an input and returns either black or white by somehow combining the black/white levels of the pixels in the cell F (graylevel, [0,1], grayalgo), where the possible values are real numbers within the interval [0,1].

Ex for an black and white image (bwcolor, {b,w}, bwalgo)  In general, software engineers who are constructing an image database application must first decide which cell properties are of interest and then create methods associated with determining these properties  An object shape is any set P of points such that P 1,…P n ∈ P  For any point Pi where 1 ≤ I ≤ n the point P i+1 is a neighbor of P i  If P i+1 =(X i,Y i ) and P i+1 = (Xi+1, Yi+1) satisfies one of the following conditions PiPi  (X i+1, Y i+1 ) = (X i +1, Y i ) (X i+1, Y i+1 ) = (X i -1, Y i )  (X i+1, Y i+1 ) = (Xi, Yi+1) (X i+1, Y i+1 ) = (Xi, Yi-1)  (X i+1, Y i+1 ) = (Xi+1, Yi+1) (X i+1, Y i+1 ) = (Xi+1, Yi-1)  (X i+1, Y i+1 ) = (Xi-1, Yi+1) (X i+1, Y i+1 ) = (Xi-1, Yi-1)  We will only consider objects that have a rectangular shape in this course  Rectangle is an object shape P such that there exist integers XLB, XUB, YLB, YUB  P={(x, y) XLB≤ x ≤XUB& YLB ≤y≤ YUB}  An Image database (IDB) consists of a triple (GI, Prop, Rec)  GI is a set of gridded images of the form (m,n)  Prop is a set of cell properties  Rec is a mapping that associates with each image (a set of rectangles denoting objects)

7 Image Database  Image Database: (GI,Prop,Rec)  GI is a set of gridded images (Image,m,n)  Prop is a set of cell properties  Rec is a mapping that associates with each image, a set of rectangles denoting objects (in fact this does not necessarily have to be rectangle)

8 Problems with image databases  Images are often very large  infeasible to explicitly store the properties on a pixel by pixel basis  This led to a family of image “compression” techniques: attempt to compress the image into one containing fewer pixels  There is a need to determine the “features” of the image (compressed or raw)  done by “segmentation” : breaking up the image into a set of homogeneous rectangular regions called segments  Need to support “match” operations that compare either a whole image or a segmented image against another

Compressed image  Consider a two-dimensional image I consisting of (P1XP2) pixels then typically I(x,y) is a number denoting one or more attributes of the pixel  The creation of the compressed representation cr(I) of image I consists of two parts: 1.Size selection: the size h of the compresses image is selected by the image database designer the larger is the size and the greater is the fidelity of the representation  However as the size increases the complexity of creating an index for manipulating such representations and searching index increase

2. Transform selection :  the user must select a transformation that is capable of converting the image into a compressed representation  the user must select a transformation that given the image I and any pair of number 1 ≤I≤ h1 and 1≤ j ≤h2 h2 h1 P2 P1 We will now briefly discuss two of the better-known transformation 1.The Discrete Fourier Transform (DFT) DFT (x,y) = 1 ∑ ∑  The DFT has many nice properties as:  It is possible to get the original image I from its DFT representation 2. The Discrete Cosine Transform (DCT)  the DCT can be quickly computes  DCT has minor advantages over the DFT from the latter point of view and a simple analysis  The both approximately equal in the time taken for computation P1P2 P1-1 P2-1 a=0 b=0 ( I(a,b) X e -j(2πxa/p1+2πyb/p2) )

11 Image Compression  Lossy Compression  Image may contain details that human eye cannot recognize  get rid of those details F DCT(Discrete Cosine Transform) F DFT(Discrete Fourier Transform) F DWT(Discrete Wavelet Transform) convert images from time domain(Spatial) to frequency domain get rid of the frequencies which do not contain information.  Transforms  DCT and DFT are similar concepts F From time domain to signal domain F Given a signal of length “n”, these transforms return a sequence of n frequencies. Sample1, Sample2,......., Sample n transforms to : Freq1, Freq2,........., Freq n.

12 Why do we use the transform  Noise removal is easier in the frequency domain  Various filters are easier to implement in frequency domain  Compression (gathers similar values together)

13 Desirable Properties of Transforms  DFT  Invertibility: It is possible to get back the original image I from its DFT representation. (useful for decompression)  Note: practical implementations of DFT often use DFT with other non-invertible operations: thus sacrifice invertibility  Distance preservation: DFT preserves Euclidean distance. F This is important in image matching applications where we often use distance measures to represent similarity levels  DCT  DCT preserves all the above  a given signal can be represented with fewer frequencies  DWT  DFT and DCT have no temporal locality F a change in one single part of data changes all frequencies  wavelets introduce locality

14 Distance preservation

15 Distance preservation

16 Fractal Compression  Transform-based approaches benefit from the difference in visual perception in different frequencies  What else can we use for compression ?  Self similarity F We can find self similarities in a given image and describe the image in terms of these similarities.

17 Fractal Compression

18 Image Processing: Segmentation  A process of taking an image as input and cutting up the image into disjoint homogeneous regions  Connected region (R):  is a set of cells C 1.. C n in R such that the Euclidean distance between C i and C i+1 for all i < n is 1  Example  R1,R2,R3 is connected  R1  R2 is connected  R2  R3 is connected  R1  R2  R3 is connected  R1  R3 is not connected  Because the Euclidian  distance between (2,3)  and (3,4) is  2>1 R3 R1 R2 1 2 3 4 4 3 2 1

19 Measuring Homogeneity  Homogeneity predicate: is a function H that takes any connected region as input and returns either true or false  Example 1:  Suppose  is some real number between 0 and 1  H  bw can be defined as H  bw (R) is true if over (100*  )% of cells in R have the same color Region # of black #of white cells cells R1 800 200 R2 900 100 R3 100 900 Region H 0.8 bw H 0.89 bw H 0.92 bw R1 true false false R2 true true false R3 true true false

20 Measuring Homogeneity  Example 1:  Suppose each cell has a real value between 0, 1, this value is bw-level  Suppose f assigns a value between 0 and 1 to each cell  Assume  is the noise factor and  a threshold  H ,f,  (R) is true if {(x,y)| |bwlevel(x,y)-f(x,y)| 

21 Segmentation  Given an image I with (m  n) cells, a segmentation of I wrt a homogeneity predicate P is a set of R1,.Rk such that  Ri  Rj =  for all 1  i  j  k  I = R1 ..  Rk  H(Ri) = true for all i  j  k  for all distinct i,j, 1  I, j  n such that Ri  Rj is a connected region, it is the case that H(Ri  Rj) = false

22 An Example of Segmentation Row/Col 1 2 3 4 1 0.1 0.25 0.5 0.5 2 0.05 0.30 0.6 0.6 3 0.35 0.30 0.55 0.8 4 0.6 0.63 0.85 0.90  For H dyn,0.05 (R) of the following (4  4) image will yield the following segmentation  R1 = {(1,1),(1,2)}  R2 = {(1,3),(2,1),(2,2),(2,3)}  R3 = {(3,1),(3,2),(3,3),(4,1),(4,2)}  R4 = {(3,4),(4,3),(4,4)}  R5 = {(1,4),(2,4)} Row/Col 1 2 3 4 1 0.1 0.25 0.5 0.5 2 0.05 0.30 0.6 0.6 3 0.35 0.30 0.55 0.8 4 0.6 0.63 0.85 0.90

23 Segmentation Algorithm  Split:  if the whole image is homogeneous, we are done  otherwise, split the image into two parts and recursively repeat this process till we find a set of R1.. Rn such that each region is homogeneous  Merge:  check whether any of the Ri’s can be merged together  at the end of this step, we obtain a valid segmentation R1,..Rk

24 Similarity Based Retrieval

25 Similarity Based Retrieval

26 Similarity Based Retrieval  The Metric Approach:  Uses a distance measure d that can compare tow images  The smaller the distance, the more similar they are  I.e., given an input image I, find the “nearest neighbor” of I in the image archive  The Transformation Approach:  The metric approach assumes that the notion of similarity is fixed  Whereas the transformation approach computes the cost of transforming one image into another based on user-specified cost functions that may vary from one query to another

27 The Metric Approach  We define a distance function on a k dimensional space (k=n+2)  the distance function satisfies the following properties  d(x,y) = d(y,x)  d(x,z)  d(x,z) + d(z,y)  d(x,x) = 0  Example: Let the image object consists of (256  256) cells with 3 attributes (red,green,blue) each of which assumes a value from {0,…7}  d i (o 1,o 2 ) =    (diff r [i,j]+diff g [i,j]+diff b [i,j])  where diff r [i,j] = (o 1 [i,j].red - o 2 [i,j].red) 2  diff g [i,j] = (o 1 [i,j].green - o 2 [i,j].green) 2  diff b [i,j] = (o 1 [i,j].blue - o 2 [i,j].blue) 2  Such computations can be cumbersome (65536 expressions being computed inside the sum)

28 The Metric Approach  How can this massive similarity computation be avoided?  Through feature extraction!  Use a good feature extraction function fe and use it to map objects into single points in a s-dimensional space where s would typically be pretty small compared to n+2  This leads to two reductions  an object is originally is a set of points in an (n+2) dimensional space. In contrast, fe(0) is a single point  fe(o) is a point in s-dimensional space where s << (n+2)  The feature extraction mapping must preserve the distance relationships in the original space  (n+2) dim space  s-dim space  indexing algorithm  index  object repository (could be quadtree, R-tree for s- dim data)

29 Searching  Finding the best matches  find the nearest neighbors of fe(o) in the tree using a nearest neighbor search technique.  Finding sufficiently similar objects  execute a range query in the tree with center fe(o) and radius 

30 The Transformation Approach  The main principle  the level of dis-similarity between o 1,o 2 is proportional to the cost of transforming o 1 into o 2, or vice-versa  Transformation operators  translation  rotation  scaling (uniform and nonuniform)  excision  Transformation of o into o’ is a sequence of transformation operations (to 1,to 2,..to r ) such that  to 1 (o) = o 1  …...  To(o r ) = o’  Cost of transformation, cost(TS) =  cost(to i )

31 Example

32 Example

33 Example

34 Transformation vs. Metric  Advantages of the transformation model  user can setup his own notion of similarity by specifying certain transformation operators  user may associate a cost function with each transformation operator  Advantages of the metric model  by forcing the user to use only one similarity metric, the system can facilitate the indexing of data so as to optimize nearest neighbor search

Images Databases  We have studied techniques that take an image as input and that return a compresses version of the image as output using transformation such as DCT or DFT  We have also seen how we may segment an image  An Important question that still remains to be answer is the following How do we determine whether the content of a segment is similar to another image or a set of image that we have???  the answer of this question can be by this query here is a picture of a person can you tell me who it is?  There have been two broad approaches to Similarity-based retrieval of images Metric approach Transformation approach In this approach there is assumed to be a distance metric d that can compare any two images object. The closer two object are in distance the more similar they are considered to be in this approach the problem of Similarity-based retrieval may be stated as given an input image I find the nearest neighbor of I in the image archive The metric based approach is by far the most widely followed approach in the database world In this approach the user should be able to specify what they consider to be similar rather than the system forcing this upon them Transformation approach assume that in any given application only one notion of similarity is used to index the data

The Metric Approach  Suppose we consider a set obj of object having pixel properties P1,……Pn  Each object O may be viewed as a set S(O) of (n+2) tupels of the form (Xcoord, Ycoord, V1,…Vn)  Where vi denotes the value of the property Pi associated with the pixel coordinates (x,y)  S(o) contains w*h of these (n+2)_ tuples where z is the width of the rectangle associated with o and h is the height  In the metric-based approach the dissimilarity is captured by a distance function d  A function d from set X to the unit interval [0,1] is said to be a distance function if it satisfies the following axioms for all x,y,z ∈ X  d(x,y)= d(y,x)  d(x,y)≤ d(x,z) + d(z,y)  D(x,x)=0  Let d obj be a distance function on the space of all objects in our domain as we have seen above an object may be viewed as a set of points in a k-dimensional space for k n+2 this may cause the computation of d obj to be rather complex process  For ex. Suppose we just consider our set Obj to consist of (1256,256) images having three attributes (red, green, blue) each of which assumes a value from the set {0,….,7}  Suppose we define the distance function d1 between two images as the following

d1(o1, o2)= √ ∑ ∑ (diff r [i,j] + diff g [i,j]+ diff b [i,j] ) diffr [i,j]=(o1[i,j].red-o2[i,j].red) diffg [i,j]=(o1[i,j].green-o2[i,j].green) diffb [i,j]=(o1[i,j].blue-o2[i,j].blue)  obviously computing this is a cumbersome process because the double summation leads to 65,536 expression being compute inside the sum  In order to solve this problem, some research have suggested an important alternative technique  This technique depends on the use of feature extraction function f e this function would transform map object into single points in a s-dimensional where s would typically be pretty small compared to (n+2) thus two reduction would be achieved 1.Recall that an object o is a set of points in an (n+2) dimensional space, in contrast f e (0) is a single point 2. f e (0) is a point in s-diminsional space where s <<(n+2)  Hozever the mapping should intuitively preserve distance if o1, o2, o3 are objects such that the distance d(o1,o2)≤d(o1,O3) then d’(fe (01), fe (02) ) ≤d’(fe (01), fe (03) ) where d is a metric distance on the original (n+2) dimensional and d’ is a metric distance on the nez s-dimensional 256 i=1 256 j=1

(n+2) dimensional space S- dimensional space Indexing algorithm Indexing Object repository  in the other words the feature extraction map should preserve the distance relationships in the original space  Given a query object o we convert to fe(o) and then try to find a point p in the s-dimensional space such as d’(p,fe(o)) is as small as possible

Images Databases  In general there are many ways of representing an image database:  When retrieving images the characteristics of the image depend critically upon factors such as lighting condition camera position the position of the object and so on

Representing Image Database with relations  Any IDB =(GI, Prop, Rec)may be represented using the relational model as follows: 1.Create a relation called Images having the scheme (Image, ObjId, XLB,XUB,YLB,YUB) where -Image is the name of the image file -ObjId is the name of the object in the image -XLB,XUB,YLB,YUB describe the rectangle in question 2. For each property P which is part of Prop create a relation Rp having the scheme (Image, XLB,XUB,YLB,YUB, value) where -Image is the name of the image file -XLB,XUB,YLB,YUB describe the rectangular cell in the image -Value specifies the value of the property P In general properties of an image are of three type 1.Pixel- level properties: refer to properties such as the red –green-blue colors of individual pixel 2. Object region-level properties: some of the object in an image may have properties of their ozn such as Name of person, Age, etc. 3.Image –level properties: Properties of an image as a whole include data such as when and where the image was captured.

Quering Relational representation of Images Database  If we have stored the image data in a relational form Sal can be used to query these relations directly this query may be written as SELECT image FROM Image I, name N WHEREI.objid=N. objid N.name=“Jim”  Exporting the content of an image is done using Image-processing algorithms which are usually only partially accurate then we are forced to assume that these relation contain probabilistic attributes as well  Each relation Rp associated with Object region properties as well as Image-level properties has just two attributes an object Id and a value for the property The need for probabilities  However, am image-processing program that attempts to identify the people occurring in surveillance images may return multiple possible answers with different probabilities of a match  thus the two image objects identified as “Jim” may have the same probabilistic attributes associated with the match.

 The first tuple in this relation may be read as “ the probability the “Jim” is the name attribute of O1 is 0.8” etc.  under the reading of the probabilistic tuples things get a trifle more complex when we consider a complex query of the form “ Find any Image containing both Jim and Ken “ We can se the image Pic1 is candidates because it has two object O1 and O2 there is 80% probability that o1 is Jim and a 15 % probability that O2 is Ken By the same way Pc is candidates too. In the given reading, we are faced with the following questions: 1.What is the probability that p i c l. g i f contains both Jim Hatch and Ken Yip? Is the answer the product of the two probabilities—that is, is it (0.8 X 0.15) = 0.012? 2.What is the probability that p i c6. g i f contains both Jim Hatch and Ken Yip? Is the answer (0.7 X 0.1) = 0.07 ? Image ObjId Name Prop Pic1 O1 Jim 0.8 Pic1 O1 Dave 0.2 Pic1 O2 John 0.75 Pic1 O2 Ken 0.15 Pic2 O3 John 1 Pic3 O4 Jim 1 Pic4 O5 Bill 1 Pic5 O6 Dave 1 Pic6 O7 Ken 0.7 Pic6 O7 John 0.3 Pic6 O8 Bill 0.6 Pic6 O8 Dave 0.2 Pic6 O8 Jim 0.10 Pic7 O9 Ken 1

 Suppose the image Pic8.gif with two object O10 and O11 and suppose our table above is expanded by the insertion of the following new tuples Image ObjId Name Prop Pic8 O10 ken 0.5 Pic8 O10 Jim 0.4 Pic8 O11 Jim 0.8 Pic8 O11 John 0.1 Possibility 1: o10 is ken and o11 is Jim Possibility 2: o10 is ken and o11 is not Jim Possibility 3: o10 is not ken and o11 is Jim Possibility 4: o10 is not ken and o11 is not Jim  The probability that Pic8.gif is an answer to our query is identical to the sum of the probability P1+P2 = 0.5 from the fact that O10 is Ken according to possibility 1 and 2 P3+P4=0.5 from the fact that O10 is some one other than Ken according to possibility 3 and 4 P1+P3=0.8from the fact that O11 is Jim according to possibility 1 and 3 P2+P4=0.2 from the fact that O11 is some one other than Jim according to possibility 2 and 4 then P1+P2+P3+P4 =1  In order to determine the probability that Pic8.gif contain both Ken and Jim we must solve the above linear equations for finding P1  In this case P1 could be as low as 0.3 or as high as 0.5 or any where in between

 Unfortunately this result though correct poses a fundamental new problem.  In the relational model of data the answer to any query may be stored as a relation itself  In this case of relations with probabilistic attributes we would like the results of queries to themselves be probabilistic relation  the result of auery caused us to end with Probability Intervals such as the interval [0.3, 0.5] The need for Probability Intervals  If we wish to store Information of the form Object o occurring in image I with the probability P we must replacing the point probability with the interval [l,u]  When an image processing program identifies an object o in image I as X with the probability P then if taken into account that there is margin of error € in such we end up with the interval probability [p- €, p+ €]  In the above wx. If we assumed that the errors is +3% we will obtain this table Image ObjId Name Prop L Prop U Pic1 O1 Jim 0.77 0.83 Pic1 O1 Dave 0.17 0.23 Pic1 O2 John 0.72 0.78 Pic1 O2 Ken 0.12 0.18 Pic2 O3 John 0.97 1.00 Pic3 O4 Jim 0.97 1.00 Pic4 O5 Bill 0.97 1.00 Pic5 O6 Dave 0.97 1.00 Pic6 O7 Ken 0.67 0.73 Pic6 O7 John 0.27 0.33 Pic6 O8 Bill 0.57 0.63 Pic6 O8 Dave 0.17 0.23 Pic6 O8 Jim 0.07 0.13 Pic7 O9 Ken 0.97 1.00 Pic8 O10 ken 0.47 0.53 Pic8 O10 Jim 0.37 0.43

 Let’s now return to the query “Find an image that contain both Ken and Jim  Let’s reexamine the image Pic8.gif and see what is the probability of this image contain both Ken and Jim is 0.47 ≤P1+P2 ≤0.53 0.47 ≤ P3+P4 ≤0.53 0.77 ≤ P1+P3 ≤ 0.830.17 ≤ P2+P4 ≤0.23 P1+P2+P3+P4=1  the third equation is derived from our knowledge that Jim is object O11 with the probability 77 and 83 as in case of point probability there are two possibilities 1 and 3 in which object O11 is in fact Jim thus P1+P3 must lie within the 77-83 % interval  The fourth equation is derived from our knowledge that object O11 is some one other than Jim with possibility 17-23 % then the P2+P4 must lie within 17-23 %

Representing Image Database with R-trees  One way to represent an image database with R-tree as the following: 1.Create a relation called Occursin with two attribute (ImageId, ObjId) specifying which object appear in which images 2. Create one R tree that stores all the rectangles. If the same rectangle (XLB=5, XUB=15, YLB=20, YUB=30) appear in two image then we have an overflow list associated with that node in R-tree 3.Each rectangle has an associated set of fields that specifies the object region level properties of that rectangle these field contain information about the content of the rectangle Image ObjId Pic1 O1 Pic1 O2 Pic2 O3 Pic3 O4 Pic4 O5 Pic5 O6 Pic6 O7 Pic6 O8 Pic7 O9  the node in the R-tree representation associated with this database may have the following structure Facenode= record Rec1, Rec2, Rec3: rectangle P1,P2,P3: node type Objnode=record ObjID: String Image Id: string Data: Infotype Rectangle= record XLB,XUB,YLB,YUB: integer Objlist: objnode Day, ;th, yr:integer Place : String infotype= record Object name Object date Lp, up

 In the definition of objnode above Lp and Up denote the lower and upper bound of the probability  Let us consider a relly simple image database consisting just of the photographs Pic1, Pic2, Pic3  Suppose these photo contain the objects O1, O2, O3 and O4  with the probability interval specified before  the R-tree is constructed by the following steps 1.Get Rectangles: first, we construct a small table describing all the rectangles that occur and the image they occur in Note that Even though these rectangles were obtained from different images we are superimposing all of them within a single image 2. Create R-tree: we then create an R-tree representing the above rectangles at this stage the object nodes in the R-tree may still not be filled in completely 3. Flesh out object we then flesh out and appropriately fill in the fields of the various objects

1 Image Databases  Conventional relational databases, the user types in a query and obtains an answer in response  It is different in image databases.

Similar presentations

Presentation on theme: "1 Image Databases  Conventional relational databases, the user types in a query and obtains an answer in response  It is different in image databases."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Image Databases  Conventional relational databases, the user types in a query and obtains an answer in response  It is different in image databases.

Similar presentations

Presentation on theme: "1 Image Databases  Conventional relational databases, the user types in a query and obtains an answer in response  It is different in image databases."— Presentation transcript:

Similar presentations

About project

Feedback