Presentation on theme: "We not only see but we look, we not only touch we feel, JJ.Gibson"— Presentation transcript:
1 We not only see but we look, we not only touch we feel, JJ.Gibson Active PerceptionWe not only see but we look, we not only touch we feel,JJ.Gibson
2 Active Perception vs. Active Sensing WHAT IS ACTIVE SENSING?In the robotics and computer vision literature, the term“active sensor” generally refers to a sensor that transmits(generally electromagnetic radiation, e.g., radar, sonar,ultrasound, microwaves and collimated light) into the environmentand receives and measures the reflected signals.We believe that the use of active sensors is not a necessarycondition on active sensing, and that sensing can be performedwith passive sensors (that only receive, and do notemit, information), employed actively.
3 Active Sensing Hence the problem of Active Sensing can be stated as a problem of controlling strategies applied to the data acquisitionprocess which will depend on the current state of thedata interpretation and the goal or the task of the process.The question may be asked, “Is Active Sensing only anapplication of Control Theory?” Our answer is: “No, at leastnot in its simple version.” Here is why:
4 Active Perception1) The feedback is performed not only on sensory databut on complex processed sensory data, i.e., variousextracted features, including relational features.2) The feedback is dependent on a priori knowledge and modelsthat are a mixture of numeric/parametric andsymbolic information.
5 Active Perception turned into an engineering agenda The implications of the active sensing/perception approach are thefollowing:1) The necessity of models of sensors. This is to say, first,the model of the physics of sensors as well as the noise ofthe sensors. Second, the model of the signal processing and data reduction mechanisms that are applied on the measureddata. These processes produce parameters with a definiterange of expected values plus some measure of uncertainties.These models shall be called Local Models.
6 Engineering agenda,cont. 2) The system (which mirrors the theory) is modular asdictated by good computer science practices and interactive,that is, it acquires data as needed. In order to be ableto make predictions on the whole outcome, we need, inaddition to models of each module (as described in 1)above), models for the whole process, including feedback.We shall refer to these as Global Models.3) Explicit specification of the initial and final state /goal.If the Active Vision theory is a theory, what is its predictivepower? There are two components to our theory, eachwith certain predictions:
7 Active Vision theory1) Local models. At each processing level, local modelsare characterized by certain internal parameters. Examplesof local models can be: region growing algorithm with internalparameters, the local similarity and size of the localneighborhood. Another example is an edge detection algorithmwith parameter of the width of the band pass filter inwhich one is detecting the edge effect. These parameterspredict a) the definite range of plausible values, and b) thenoise and uncertainty which will determine the expectedresolution, sensitivity ,robustness of the output results fromeach module
8 Active Vision,cont.2) Global models characterize the overall performanceand make predictions on how the individual modules willinteract which in turn will determine how intermediateresults are combined. The global models also embody theGlobal external parameters, the initial and final global stateof the system. The basic assumption of the Active Visionapproach is the inclusion of feedback into the system andgathering data as needed. The global model represents allthe explicit feedback connection, parameters, and the optimizationcriteria which guides the process.
9 Control Strategiesthree distinct control stages proceeding in sequence:initialization,processing in midterm,completion of the task.Strategies are divided with respect to the tradeoff betweenhow much data measurement the system acquires (datadriven, bottom-up) and how much a priori or acquiredknowledge the system uses at a given stage (knowledgedriven, top-down). Of course, there is that strategy whichcombines the two.
10 Bottom up and Top down process To eliminate possible ambiguities with the terms bottom upand top-down, we define them here. Bottom-up (datadriven), in this discussion, is defined as a control strategywhere no concrete semantic, context dependent model isavailable, as opposed to the top-down strategy where suchknowledge is available.
11 GOALS/TASKSDifferent tasks will determine the design of the system, i.e. the architecture.Consider the following tasks:ManipulationMobilityCommunication and Interaction of machine to machine or people to people via digital media or people to machine.
12 Goal/TaskGeographically distributed communication and interaction using multimedia (vision primarily) using the Internet.We are concerned with primarily unspoken communication: gestures and body motion.Examples are: coordinated movement such as dance, physical exercises, training of manual skills, remote guidance of physical activities.
13 NoteRecognition , Learning will play a role in all the tasks.
14 Environments/context Serves as a constraint in the design.We shall consider only the constraints relevant to the visual task that serves to accomplish the physical activity.For example: in the manipulation task, the size of the object will determine the data acquisition strategy but also the design of the vision system (choice of field of view, focal length, illumination, and spatial resolution). Think of moving furniture vs. picking up a coin.
15 Environment/context Another example: Mobility There is a difference if the mobility is on the ground, in the air looking down or up.The position and orientation of the observer will determine the interpretation of the signal.Furthermore there is a difference between outdoor and indoor environment.Varied visibility conditions will influence the design and the architecture.
16 Environment/context For distributed communication and interaction. The environment will depend on the application, could be digitized environment of the place where the participants are or it also could be a virtual environment, for example one can put people into a historical environment (Rome, Pompei, etc.)
17 Active Vision System for 3D object recognition Table 1 below outlines the multilayered system of anActive vision system, with the final goal of 3-D object/shaperecognition. The layers are enumerated from 0, 1, 2, . . *with respect to the goal (intermediate results) and feedbackparameters. Note that the first three levels correspond tomonocular processing only. Naturally the menu of extractedFeatures from monocular images is far from exhaustive. Theother 3-5 levels are based on binocular images. It is onlythe last level that is concerned with semantic interpretation.
18 Table Level Feedback Goal Parameters stopping conditions ________________________________________________________0;control of the directly measured grossly focusedPhysical device current lighting system scene ,camera adjustedopen/close aperture aperture__________________________________________________________1.Control of the directly measured focusedPhysical device focus, zoom on one objectComputed contrast distance fromfocus_______________________________________________2.Control of low computed only D segmentationLevel vision threshold of the width max .#of edges/regionsModules of filters
19 Table cont. Level Feedback Parameters Goal/Stopping _______________________________________________________________________3.Control of binocular directly measured: Depth mapSystem hardware vergence angleSoftware) computed: range of admissibledepth values4.Control of intermediate computed only: segmentationGeometric vision threshold of similarityModule between surfaces______________________________________________________________________5.Control of compute the position D object descriptionSeveral views rotation of different viewsIntegration process___________________________________________________________________________6. Control of semanticInterpretation recognition of 3D objects/scene
20 Comments: Several comments are in order: 1) Although we have presented the levels in a sequentialorder, we do not believe that is the only way of theflow of information through the system. The only significancein the order of levels is that the lower levelsare somewhat more basic and necessary for the higherlevels to function.2) In fact, the choice of at which level one accesses thesystem very much depends on the given task and/orthe goal.
21 Active Visual Observer Several groups around the world build a binocular active vision system that can attend to and fixate a moving target.We will review two such systems one built at UPENN,GRASP laboratory and the other at KTH (Royal Institute of Technology) in Stockhols,Sweden.
24 PennEyesPennEyes is a head –in-hand system with a binocular camera platform mounted on a 6 DOF robotic arm. Although physically limited to reach of the arm, the functionality of the head is extended through the use of the motorized optics (10x zoom). The architecture is configured to rely minimally on external systems and .
25 Design considerations Mechanical:The precision positioning was afforded by the PUMA arm. However the binocular camera platform needed to weigh in the range of 2.5 Kg.Optics: The use of motorized lenses (zoom, focus and aperture) offered an increase functionality.Electronics: This was the most critical element in the design. A MIMD DSP organization was decided as the best tradeoff between performance, extensibility and ease of integration.
27 Tracking PerformanceThe two robots afforded objective measures of tracking performance with precision target.A three dimensional path with known precision can be repeatedly generated , allowing the comparison of different visual servoing algorithms.
29 BiSight headHas an independent pan axes with the highest tracking performance of 1000deg/s and 12,000deg/ssquare. The concern here is how well can be maintained the calibration after repeated exposure to acceleration and vibration.Another problem occurred with zoom adjustment the focal length also changed.The binocular camera platform has 4 optical (zoom and focus) and 2 mechanical (pan) degrees of freedom.
30 C40 ArchitectureBeyond the basic computing power of the individual C40s the performance of the network is enhanced by the ability to interconnect the modules with a fair degree of flexibility as well as the ability store an appreciable amount of information. The former is made possible up to six comports on each module and the later by several Mbytes of local storage.
32 Critical IssuesThe performance of any modularly structured active vision system depends critically on a few recurring issues. They involve the coordination of processes running on different subsystems, the management of large data streams, processing and transmission delays and the control of systems operating at different rates.
33 SynchronizationThe three major components of this modular active vision system are independent entities that work at their own pace. The lack of a common time base makes synchronizing the components a difficult task.In some cases , an external signal can be used to synchronize independent hardware components. In this system, C40 network, the digitizers and the graphics module are slaved on the vertical sync of the genlocked cameras.
34 Other considerations Bandwidth – large data streams System Integration. If data throughput becomes the bottleneck, then some new data compression algorithms must be invoked.Latency. Delays between the acquisition of a frame and the motor response to it are an inevitable problem of active vision systems. Delays make the control more difficult because they can cause instabilities.Multi-rate control. Active vision systems suggests by their very nature a hierarchical approach to control
35 ControlIf the visual and mechanical control rates are one or more orders of magnitude apart, the mechanical control loops are essentially independent of the visual control loop.