Doc 6.5 p.1 1. Feature Selection (image 1L) 2. Stereo Matching (image 1R) 3. Feature Tracking (images 2L, 2R) 8. Rigidity Test #2 4. Rigidity Test #1 7.

Doc 6.5 p.1 1. Feature Selection (image 1L) 2. Stereo Matching (image 1R) 3. Feature Tracking (images 2L, 2R) 8. Rigidity Test #2 4. Rigidity Test #1 7. Model Refinement 6. ML Refinement 5. Least Squares Fit Original Algorithm Overview q Li, Σ Li q Ri, Σ Ri, Q i, Σ vi q Li+1 Σ Li+1 q Ri+1 Σ ri+1 Q i+1 Σ vi+1 R 0i+1, T 0i+1, Θ 0i+1 q Li+1 Σ Li+1 q Ri+1 Σ ri+1 Q i+1 Σ vi+1 Σ M, R ni+1, T ni+1, Θ ni+1 PiPi PiPi

Doc 6.5 p.2 Algorithm Details Based on Yang’s code, with notes from Larry’s thesis and Clark’s paper

Doc 6.5 p.3 Yang’s VisOdom Main.cc Main() reads args, inits mem mgt, calls doit(), and frees resources. Reading args seems to set flags to say which args have been read. Doit() is the main function –Instantiates two VisOdoms (front & rear) –Init()s them, ReadLogFile()s if filenames are given –Sets fusion_flag if rear log file is given –Read camera models for front cams (and back cams if fusion_flag) –GetNumPics(), TurnonVoStatusFlag(), then subtract attitude[0][0] from all attitude[][0] in front, and fill dpos[][0] with the derivative of position[][0] in front. Yang says to get rid of that last part. –If fusion flag, copy attitude[][0] and position[][0] from front to back –One huge if/else thing – see next slide for (similar) contents of each –WriteEstimatedMotion() to “motionest.txt”

Doc 6.5 p.4 Huge if/else thing Do the following for each pic –Do each bullet for front, then repeat for rear if (fusion_flag) –Copy… and the bullet after happen as a block, but MotionEst… is only tested once For each pic i –Read left & right images into memory & set CurImgIndex to that memory bank –If (first pic) InitPyramids, GeneratePyramidsMatch // Init & Generate pyramids for left & right images TransferCameras, FeaturesSelection, StereoMatch –Else // other pics TransferCameras, GeneratePyramidsTrack, FeaturesTrack, GeneratePyramidsMatch, StereoMatch, RigidityTest If (!MotionEstimation[Fusion]) CopyInitMotion2EstMotion –No clear purpose to CopyInitMotion2EstMotion except recording FeaturesSelection, StereoMatch // add features for next time StereoMatch == {FeaturesStereoMatch, FeaturesRayGaps, FeaturesCov }

Doc 6.5 p.5 TransferCameras Record rover pose (estMotion) for current image –In frame 0, copy from logRecord file, which gives rover position and attitude in world coordinates –In later frames, use the previous image’s estMotion (as refined by other functions) plus the change in position according to the logRecord file. This is more accurate than reading the log file entry, which has accumulated error. Do not update estMotion attitude at this point. Record camera pose (srcLeftImage[CurImgIndex]->cam) –If either camera is NULL for the current frame, initialize it by copying raw camera, which is camera (fixed) in rover coords –In frame 0, set cameras to estMotion plus leftrawcam or rightrawcam to get camera initial pose in world coords –In later frames, set the camera at estMotion plus the offset from rover to camera, rotated by the logRecord-estimated rotation

Doc 6.5 p.6 1. Feature Selection Basic (Larry) algorithm – build a list of 50 features –Divide image into a 10x10 grid of cells –For each cell without a feature from previous frame, evaluate interest operator across cell, choose best pixel, and add to a list –Sort list of newly identified features by interest, and add the best to the list of previous-frame-features (initially empty) to get 50 features –Output: lef t image feature coords & uncertainty Notes –Tracking features across multiple frames lets you improve 3D point model –Algorithm works poorly if features are collinear or all far away –Choose stereo-trackable features, because stereo error hurts more than tracking error, so horizontal trackability >> vertical trackability –Good features, well distributed

Doc 6.5 p.7 FeaturesSelection() GenerateInterestImage() on left image via Int_forstner() SelectMinima(). Pass in border width. Divide remaining image into 10x10 cells, rounding up. List min nonzero value in each full cell. Sort list. Add to existing feature list, up to numfeatures (which is set to 200 in init), excluding features too close to existing features. –Question: why round up to find grid, if you don’t look for features in the final row and column? Set resulting feature list as the left image’s feature list. Do not set Σ L – it will be done by stereo matching

Doc 6.5 p.8 Int_forstner() Find gradients (pixel[i-1]/2 – pixel[i+1]/2) For each pixel in range –Sum gxgx, gxgy, gygy across window around pixel –Put into matrix, then find & record largest eigenvalue –Does inversion, and scaling by determinant, and scaling by 4 because didn’t do derivatives right, but these all factor out – you should eliminate that code The smallest positive answer is the best feature

Doc 6.5 p.9 Feature Selection Issues and Improvements Algorithm works poorly if features are collinear or all far away. –Consider a filter to detect such problems. Investigate whether 50 features are as good as 200. –Yang says you lose around 50% each frame, but maybe you keep the top 50 in general. –If you can use fewer features, consider Larry’s original 10x10 grid, which has fewer cells to search. Retain features between frames. Yang does not. If you retain features, then do not apply interest operator or select minima for cells with a retained feature, and modify how and how many features you add to the feature list. Some sorting would be involved to easily tell which cells to search. If you retain features across frames, you could implement step 7 to improve model. Each feature would store model point P or the address of P, so that reordering the feature list does not disturb the feature-to-model-point map. If you retain features, consider allowing a range of number of features, for instance 40-60. –If you have at least 40, skip feature selection. Otherwise fill up to 60. –Use the time you save for more iteration on step 6. Consider using stereo to detect & reject features on occlusion contours, because they are unstable If you use a 10x10 grid, and you determine that part of the image is low texture, consider resizing the grid to ignore the low texture regions while maintaining 10x10 Consider weighting horizontal gradient more heavily in interest operator because horizontal tracking is more important because stereo is more important than 2D tracking

Doc 6.5 p.10 2. Stereo Matching Basic (Larry) Algorithm –Correlate in pyramid Olson uses (16x, 4x1, 1x), Yang uses (2x, 1x). If limited depth, limit disparities. Use epipolar line to constrain search window –If not found near epipolar line, reject –Threshold residue (at each pyramid level) –Triangulate to get depth at features –Not obvious whether subpixel disparity is intended –Bad data near image edges –Outputs: right image coords & deviation; 3D coords and deviation

Doc 6.5 p.11 Yang Stereo Matching GeneratePyramidsMatch – copy pyramid for left-image from 2 nd pyramid into 1 st pyramid, and make new pyramid for right-image in 2 nd pyramid –Would be faster to swap pointers, not copy FeaturesStereoMatch() – use pyramids & epipolar search windows to find each feature in right image and the 2D covariance matrix, which is the same in both images FeaturesRayGaps() – verify that epipolar lines mostly cross at feature location FeaturesCov() – fill in each features’ cov3d

Doc 6.5 p.12 FeaturesStereoMatch() –Find min & max disparity based on min & max range –For each feature, Pose2EpipolarLine to get the epipolar line in the right image Say (xl,yl) are the feature coords in the left image. Make a box in the right image using columns x = x1-mindisp … x1- maxdisp, and rows from 5 pixels above the highest value of epipolar line in those columns to 5 pixels below lowest value. If horizontal line, probably has a bug in window dims. Call MatchOneAffineFeature() to match a feature from 1 st pyramid to the image in 2 nd pyramid, using the above search range. Finds coords in second image and cov_stereo, which is both Σ L -1 and Σ R -1 for the feature

Doc 6.5 p.13 MatchOneAffineFeature() –Shrink search window dims by factor of 2 for each pyramid level, but at least 4x4 –For each pyramid level Get img2 from param pyr, img1 (template) from this pyr Feature is targetfeature coords at this resolution Call nav_corimg(). Pass a winsize window of img1 around the int feature location, and a winsize+searchsize window of img2 around the feature, where searchsize was determined based on epipolar line. Finds the location and inverse covariance of correlation peak. Add the img1-feature non-int offset – HACK If result was low confidence, return error. Otherwise change search window to 2*recovered move (actual move on next pyramid level) +/- 5 pixels

Doc 6.5 p.14 nav_corimg() p.1 Well documented, in corr_image.c Pseudo-normalized correlation of template img1 across window img2 –2*Σ(i 1 *i 2 )/(Σ(i 1 *i 1 )+Σ(i 2 *i 2 )) –i 1, i 2 are pixels in normalized windows. –To avoid 2-step normalizing, need Σu 1, Σu 2, Σu 1 2, Σu 2 2, Σ(u 1 +u 2 ), where u 1, u 2 are unnormalized images –Dan doubled speed by calculating i 1 *i 2, not i 1 +i 2, and using int instead of double –Return correlation peak and covariance

Doc 6.5 p.15 nav_corimg() p.2 Find Σ(u 1 ) and Σ(u 1 2 ) before loop main Calculate Σ(u 2 ) and Σ(u 2 2 ) for each column in top swath For each row-swath –Calculate ΣΣu 2, ΣΣu 2 2, and correlation for leftmost position –March ΣΣu 2, ΣΣu 2 2 forward and correlate remaining positions –March all Σu 2 and Σu 2 2 down one swath Fit 2D quadratic to the 3x3 around the best correlation score –Returns error if neighbor of “best score” ties or is better, if peak is more of a ridge –Trust their equations for fitting 9 points to biquadratic –Solve for subpixel offset of peak, add to best score pixel coords, interpolate correlation score there – trust their equations –Generate “covariance vector”. If quadratic is Ax 2 +By 2 +Cxy+…, then Σ L = [ 2A C; C 2B]. Calculate Σ L -1 and store the xx, xy, and yy terms. Assume that Σ L -1 and Σ R -1 have the same value

Doc 6.5 p.16 FeaturesRayGaps() – new Does old FeaturesRayGaps except records camera matrix –Bug: If dotp==0, m2 should be –dotbv2 Then does same as old FeaturesCov, except different equations for H0. –Larry’s document defines H 0 as dproj(P)/dP| P=P0, where proj(P) is the projection of P, and then Σ -1 = H 0 T * Σvq -1 * H –Yang instead uses P’ = dP/dproj(P) [ = H 0 -1 ], and thus Σ = H 0 -1 * Σvq * H T-1 = P’ * Σvq * P’ T. Assumes the final param to Image2DToRay3D is d3Dpos/d2Dpos for points on the ray. –The new equations find/use Σ, not Σ -1. Must fix nav_corimg() and motion estimator to conform –Yang’s code refills m1, m2 halfway through. That is an error, but it has negligible effect

Doc 6.5 p.17 FeaturesRayGaps() – old For each good feature in the new image –Convert feature loc in each image into 3D rays, (direction, pinhole location, camera covar) –Find the point on each ray at closest approach –Project both into both images –If Manhattan distance between projections of closest- approach points in either image exceeds threshold, reject feature –Else retain feature in 3D at average of the two closest- approach-points

Doc 6.5 p.18 FeaturesCov() – old Follows Larry thesis A.2 (pp.143-4) For each good feature in the new left image –Ip1, pos3d, pos1, ray1 are left image feature –Ip2, pos2, ray2 is right image feature –cov_stereo[3] array gives the xx, xy, and yy terms of Σ L -1 or Σ R -1 both have the same values –Generate Σvq -1 from Σ L -1 and Σ R -1 as described in thesis –Assume Final param to Image2DToRay3D is the transpose of [ sx 0 cx; 0 sy cy ]*R, where s and c refer to world-to-pixel scale and image center pixel and R is the camera rotation matrix. So that matrix converts world coords into orthographic projection coords, and you just have to divide by Z to get screen coords. –Fill H 0 (4x3), which the code calls Ht. It is dprojection(P)/dP I cannot verify the equations, and now Yang has changed them –Cov3d = H * Σvq -1 * Ht, the inverse of the cov matrix

Doc 6.5 p.19 Correlation Issues MatchOneAffineFeature() is just translation, not Affine –Would affine give better stereo results? –Would KL be better than fitting parabola during nav_corimg()’s Subpixel interpolation? Tracking also uses MatchOneAffineFeature –Then it does homography transform for refinement –Perhaps additional refinement and even subpixel interpolation in MatchOneAffineFeature is wasted – so perhaps they should be separated, and used only in stereo

Doc 6.5 p.20 3. Feature Tracking Original (Larry) Algorithm -correlate to find 1 st -left-image features in 2 nd - left-image -use 3D world model and external-source motion estimates to predict search window size and position (Yang uses odometry, no model) -Use stereo matching to find 2 nd -right-image features -threshold on residue

Doc 6.5 p.21 Yang Feature Tracking GeneratePyramidsTrack() –generate new 2 nd pyramid (2 nd -left-image) FeaturesTrack() into new left image –For each good pixel in 1 st -left-image that projects onto 2 nd -left-image Define feature window around 0 + expected 2D motion, based on feature 3D position and expected camera motion MatchOneAffineFeature() to get new 2D position If new feature position is good –computeLocalHomography to improve 2D estimate –if residue is low, use improved estimate Repeat stereo matching into new right image

Doc 6.5 p.22 computeLocalHomography() Put windows from (-4,-4) to (7,7) about old and estimated-new feature locs on their images In reading order, MatchOneAffineFeature() to track pixels of these windows until you find 2 whose correlation > 0.8. Use those 2 and the main feature’s locs to compute homography coefficients For each pyramid level –for each pixel in window around old position Apply homography to find equiv pixel in new position accumulate stats on old-image vs. new-transformed-image pixel intensities –If that correlation is high, init final 2 homography coefs –mrq minimize, probably to improve coefs Use homography to update new pose & covar

Doc 6.5 p.23 Feature Tracking Issues and Possible Improvements Why do we need pyramid if we have correlation? –Consider correlation vs. pyramid Do correlation and pyramid only if fine-tracking fails Consider the following order of events –Use any external data to estimate rotation Modulate correlation/pyramid size by credibility of external data –Use vision to refine roll estimate –Begin tracking with features high on image (far away). Use correlation and/or pyramid. Refine pitch & yaw estimate. Use rotation estimate and 3D model to predict location of nearer features. Use smaller/no search window and/or pyramid as accuracy improves. If no distant features, choose a large one high on image (per Clark’s paper) Consider affine tracker instead of homography – faster? Is a 3-point homography credible? Perhaps 3-point is just to init?

Doc 6.5 p.24 4. Rigidity #1 Original (Larry) Algorithm –Require Δ distance between 3D feature coords < threshold –Reject worst offending features & recalculate RigidityTest() – one big loop –For each pair of features, VSrigidity_ai() does Larry thesis section 5.2.1 up thru calculating a i, the change in distance between the features, normalized for uncertainty in their measurements Sum a i into A[] for each feature, and track which feature has highest A[] –If the highest A[] exceeds a threshold, remove the associated feature. –Else break from the loop Issues and possible improvements –perhaps watch evolution of an offender –Perhaps un-reject points that shape up –Perhaps predict & re-seek offending points

Doc 6.5 p.25 5-6. Motion estimation 5. Least Squares Fit – initial motion estimate –Could skip this step if you have odometry –Err = Σ (||Q i+1 – R*P i – T|| 2 / (det(Σv i ) + det(Σv i+1 ))) –P i is model point, based on observations Q 0 …Q i 6. ML Refinement – iterative Maximum Likelihood –Err = Σ (e T We) where e = Q i+1 -R*P i -T and W=Σv i-1 –Linearize about Θ n, T n and solve for Θ n+1, T n+1. Eqns on pp.23,150 –Also gives Σ M, covariance (confidence) in final Θ, T –Confidence is higher in closer points –Apparently critical to good results Presumably matching 3D-3D is faster or more accurate than matching 3D model to 2D images, Kalman style

Doc 6.5 p.26 Yang steps 5-6 MotionEstimation() –Init temp cameras but provide irrelevant pose –Make sure we have enough points –Schoneman_least_middian_square() – weighted least squares solution for R and T to describe motion of points Q i+1 in current frame and their counterparts P i in previous frame –Step 6: Big iteration –Move cameras of cur image by inverse of world-motion Cams of cur image are in world coords –ComputePose(): member estMotion == param estPose = new cameras attitude and position relative to frame 0 –Find covariance (Σ M aka estMotion.covariance) by eqn. B.11

Doc 6.5 p.27 Schoneman_least_middian_square() Weighted least squares solution for R and T to describe motion of points Q i+1 in current frame and their counterparts P i in previous frame – not motion of camera. Return R and T in params. Derivation in Larry’s thesis, sec. B.1, says … –weight points by w = 1 / (det(Σv i ) + det(Σv i+1 )) –find E = (Σ w*Q i+1 *P i T ) – (Σ w*Q i+1 )(Σ w*P i ) T /(Σ w), then svd to E=USV T, –then R=UV T and T = ((Σ w*Q i+1 )-R*(Σ w*P i ) T ) /(Σ w) Yang follows this except w = 1 / (|Q i+1 |+|P i |) –Suppose stereo is much worse than 2D tracking, so det(Σv i ) is dominated by the variance in the forward direction (perpendicular to image plane), so we could calculate that instead of det(Σv i ). –Define FW as the forward distance from baseline to feature –Further suppose that Σv i = J*J T, which seems to (incorrectly) assume 2D feature covariance = I –From there, comments in the code show that forward variance  FW 4, where all features share the same constant of proportionality, which we can drop from our equations –Further suppose that feature 3D coordinates reference an origin on the baseline, the feature is far away, and the field of view is small, such the feature is at roughly (FW,0,0) – then we can use |Q| 4 instead of FW 4 as the weight –Finally, assume that |Q| 2 (standard deviation) or even |Q| is a reasonable substitute for variance

Doc 6.5 p.28 Big Iteration R aka R 0 and Θ aka Θ 0 = rotation from before iteration or from previous iteration For each good feature Q in old left image, –Use Larry thesis eqn B.6 (3 rd eqn) and pp. 152-153 to get Jj –Use Larry thesis eqn B.7 (3 rd eqn) to get Qj from prev and cur feature.pos3d You would probably use Pj, not Qpj, for prev feature pos –See Larry thesis p.23 to get Wj (inverse covariance of Q noise) from Σ pj and Σ cj (cur and prev feature.cov3d) –Sum the 6 terms shown in B.8 Use eqns B.9 to find V1 (Θ hat ) and V2 (T hat ) Accept Θ = Θ hat, T = T hat, and make R from Θ Loop until out of iterations (return error) or change in Θ (radians) plus fractional change in T < threshold (continue)

Doc 6.5 p.29 Aux functions TransformStereoCamerasRot() takes new left-cam pos & this frame’s rotation; rotates A, H, V, O and copies R for both cameras; assigns left C; and updates right C by rotating about baseline ComputePose() –take raw (frame 0) cams and current cams, and generate the relative rotation and translation since frame 0. –Each frame’s coord sys is left/right along the camera baseline, forward perpendicular to that on the plane containing baseline and left camera A vector, and centered on the left camera C vector. MotionEstimationFusion() – make one set of feature from front and rear images, then do same as MotionEstimation(), and move both sets of cameras afterwards

Doc 6.5 p.30 7-8. Motion estimation 7. Model Refinement –Equations on p.26 improve estimate of model points P i –In practice, does not change R, T 8. Rigidity Constraint #2 –For each point Err = Q – R*P – T as before σ = a diagonal element of Σ v, minus a function of Σ M (see p. 159.) Point is bad if err > K*σ for some K, say 3 –Reject worst offender, then return to some earlier step, perhaps 5 or 6 Yang does not track points (P), so he does not do these steps

Doc 6.5 p.31 Motion Estimation Issues and Possible Improvements Make durable list of 3D feature positions (“P”) –Init from recovered 3D loc of new features –Implement thesis step 7 to update 3D locs using each new frame’s estimate –Modify earlier steps to use this list rather than the previous frame’s 3D estimate Implement step 8 (second rigidity test)

Doc 6.5 p.32 Things to improve Need image sequence to test on –Mast cam, approach – or carry cam in roverlike pattern –To test 1-pixel target recovery –So we can compare visodom, 2Donly, ICPonly, etc –Perhaps two image sets, with and without LED thru pinhole, so we can see actual pixel and compare with our results

Doc 6.5 p.1 1. Feature Selection (image 1L) 2. Stereo Matching (image 1R) 3. Feature Tracking (images 2L, 2R) 8. Rigidity Test #2 4. Rigidity Test #1 7.

Similar presentations

Presentation on theme: "Doc 6.5 p.1 1. Feature Selection (image 1L) 2. Stereo Matching (image 1R) 3. Feature Tracking (images 2L, 2R) 8. Rigidity Test #2 4. Rigidity Test #1 7."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Doc 6.5 p.1 1. Feature Selection (image 1L) 2. Stereo Matching (image 1R) 3. Feature Tracking (images 2L, 2R) 8. Rigidity Test #2 4. Rigidity Test #1 7.

Similar presentations

Presentation on theme: "Doc 6.5 p.1 1. Feature Selection (image 1L) 2. Stereo Matching (image 1R) 3. Feature Tracking (images 2L, 2R) 8. Rigidity Test #2 4. Rigidity Test #1 7."— Presentation transcript:

Similar presentations

About project

Feedback