Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic Science and Technology of China 2.The University of Queensland 3. Tencent Group Sorry for no show because of the visa delay

Previous Works about Dynamic Background Learning: Mixture of Gaussian [Wren et al. 2002] Hidden Markov Model[Rittscher et al 2000]1-SVM [Cheng et al. 2009] DCOLOR[Zhou et al. 2013]

Existing Problems: 1. Many previous works needed clean background images (without foregrounds) to train classifier. 2. To extract clean background, some works added assumption to background images (such as Linear Correlated).

Preliminaries about Auto-encoder Network In our work, we use the deep auto-encoder networks proposed by Bengio et al (2007), as the building block. 1. Encoding In the encoding stage, the input data is encoded by a function which is defined as: where is a weight matrix, is a hidden bias vector, and sigm(z)=1/(1+exp(-z)) is the sigmoid function.

Then, h 1 as the input is also encoded by another function which is written as: where is a weight matrix, is a bias vector.

2. Decoding In the decoding stage, h 2 is the input of function: where is a bias vector.

Then the reconstructed output is computed by the decoding function: where is a bias vector. The parameters ( and ) are learned through minimizing the Cross-entropy function written as:

Proposed Method 1. Dynamic Background Modeling a. A deep auto-encoder network is used to extract background images from video frames. b. A separation function is defined to formulate the background images. c. Another deep auto-encoder network is used to learn the ‘clean’ dynamic background.

Inspired by denoising auto-encoder (DAE), we view dynamic background and foreground as ‘clean’ data and ‘noise’ data, respectively. [Vincent et al, 2008] DAE needs ‘clean’ data to add noises and learns the distribution of the noises.

Unfortunately, in real world application, clean background images cannot be obtained, such as traffic monitoring system. But do we really need ‘clean’ data to train an auto-encoder network?

Firstly, we use a deep auto-encoder network (named Background Extraction Network, BEN) to extract background image from the input video frames. Cross-entropy Background Items where vector B 0 represents the extracted background image. And is the tolerance value vector of B 0.

Background Items: This item forces the reconstructed frames approach to a background image B 0. This regularization item controls the solution range of. In video sequences, each pixel belongs to background at most of time. Basic observation of our work

Background Items: To be resilient to large variance tolerances, in this item, we divide the approximate error by the parameter at the ith pixel. How we train the parameters of the Background Extraction Network?

The cost function of Background Extraction Network: The parameters contains, and, where. (1)

(1) The updating of is: is the learning rate, and is written as:

There is an absolute item in the second item. We adopt a sign function to roughly compute the derivative as follows: where sign(a)=1(if a>0), sign(a)=0 (if a=0), and sign(a)=-1 (if a<0).

(2) The updating of is the optimal problem written as: According to previous works about –norm optimization, the optimal is the median of for i=1,…, N.

(3) The updating of is the optimal problem written as: Optimizing equals to minimize its logarithmic form, written as be zero. It follows that

The optimal is:

After the training of Background Extraction Network (BEN) is finished, for the video frames x j (j=1, 2,…, D), we can get a clean and static background B 0, and the tolerance measure of background variations. The reconstructed output is not exact the background image though the deep auto-encoder network BEN can move some foregrounds in some sense.

So we adopt a separation function to clean the output furthermore, which is: where are the cleaned background images.

If, then at the ith pixel of the jth background image equals. Otherwise, equals. For the input D video frames, we obtain the clean background image set ( ) in some sense.

2. Dynamic Background Learning Another deep auto-encoder network (named Background Learning Network, BLN) is used to further learn the dynamic background model.

The clean background images as the input data is used to train the parameters of the BLN. The cost function of Background Network is:

Online Learning In previous section, just D frames are used to train the dynamic background model. The number of samples is limited which may produce the overfitting problem. To incorporate more data, we propose an online learning method. Our aim is to find the weight vectors whose effecting of cost function is low.

W Firstly, the weights matrix W is rewritten as, where is a N -dimensional vector and M is the number of the higher layer nodes.

Then, let denote a disturbance from. We have. And then,

So we get where is the Hessian matrix of. Here we ignore the third-order term. Using Taylor’s theory, we obtain

For two hidden layer auto-encoder network, the optimal problem is to solve: where is the weights of two hidden layers. And is the jth column of identity matrix. Is the kth column of identity matrix.

We sort the results of for and, respectively. The vector with some j, which satisfies is substituted by a randomly chosen vector satisfying, where is an artificial parameter.

Experimental Results We use six publicly available video data sets in our experiments, including Jug, Railway, Lights, Trees, Water-Surface, and Fountain to evaluate the performance.

TPR vs on six data sets. The different values of provide different tolerances of dynamic background. 1. Parameter Setting

We compute the TPR on six data sets with different. In our discussion below, we choose the value of which is corresponding to the highest TPR on each data set. Specifically, =0.5, 0.4, 0.4, 0.5, 0.6, 0.4 correspond to the Jug, Lights, Fountain, Railway, Water-Surface, and Trees, respectively.

2. Comparisons to Previous Works Comparisons of ROC Curves

Table1: Comparisons of F-measure on Fountain, Water-Surface, Trees and Lights Table2: Comparisons of F-measure on Jug and Railway

Comparisons of foreground extraction

Online Learning Strategy Comparison Comparisons of online learning strategy

Thank you! Feel free to contact us: cvlab.uestc@gmail.com

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Similar presentations

Presentation on theme: "Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Similar presentations

Presentation on theme: "Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic."— Presentation transcript:

Similar presentations

About project

Feedback