2016年3月23日 星期三

[ammai2016] Nonlinear Dimensionality Reduction by Locally Linear Embedding

Date: March 24th, 2016

Title: Nonlinear Dimensionality Reduction by Locally Linear Embedding

Author: Sam T. Roweis and Lawrence K. Saul


Novelties:

1. Introduce LLE(locally linear embedding), which is an unsupervised and neighborhood-preserving embeddings.

Contributions:

1. The method is straightforward, the only free parameter is the number of neighbors, K.
2. LLE does not have to be rerun when more dimensions are added to it.

Technical Summarizes:

LLE is a method to mapping high-dimensional inputs into a low-dimensional space and preserving neighborhoods.
So the method based on geometric intuitions is to reconstruct each data point from its neighbors.
The reconstruction errors are add up the squared distances between all the data points and their reconstructions, that is:
The W is the weights we computed, W_ij means the contribution of the jth data point to the ith data point.
That is, if the jth data point is not a neighbor of the ith data point, W_ij should be 0, and we normalized the summation of all the points' weight to the ith data point to 1.
After mapping X into low-dimensional vector Y with minimizing the following cost function:
The step can be illustrated in the following figure:

2016年3月16日 星期三

[ammai2016] Online Dictionary Learning for Sparse Coding

Date: March 17th, 2016

Title: Online Dictionary Learning for Sparse Coding

Author: Julien Mairal, et al


Novelties:

1. Introduce a way to learn dictionary which is online and can apply on large dataset.

Contributions:

There are three contributions:
1. They ast the dictionary learning problem as the optimization of a smooth nonconvex objective function over a convex set.
2. They propose an iterative online algorithm to solve it.
3. Through experiments, the algorithm is faster than state-of-the-art.

Technical Summarizes:

The online dictionary learning algorithm:
We have a quardratic function:
This function aggregate the past information and is the upperbound of the empirical cost function.
Since the value of the function in neighbor iteration are close, we can obtain Dt with previous one as warm restart:
Then they introduce some practically improvement on this algorithm.

Experiments:

The dataset is images from Berkeley segmentation dataset, 1,000,000 for training and 250,000 for testing.
The top is online compares with batch settings, the bottom is their method compares with SG ways.

[ammai2016] Iterative Quantization: A Procrustean Approach to Learning Binary Codes

Date: March 17th, 2016

Title: Iterative Quantization: A Procrustean Approach to Learning Binary Codes

Author: Yunchao Gong and Svetlana Lazebnik


Novelties:

1. Introduce a way to rotate the data and preserve the locality structure.

Contributions:

This paper introduces a better way to do binary codes called ITQ.
To judge whether a method is good or not, it considers three constraints: the length of the code, the Hamming distance between similar images, and the efficiency.
The ITQ method can perform on unsupervised data embeddings (PCA) and supervised data embedding (CCA), that is, this method can work on any projection.

Technical Summarizes:

For unsupervised part, they first do dimensionality reduction with PCA and make our data zero-centered.

If we simply do binary coding on PCA aligned data, the result might look like (a), which split the cluster into two different parts. ITQ will rotate it first like (c) to put the dots in the same cluster into same code.
That is, we should find a rotation with smaller quantization loss:
B is the target coding matrix, contains n codes with length c, V is the projected data, and R starts with a random initialization c by c matrix.
Then there are two steps to do cyclically: "Fix R and update B" and "Fix B and update R."

1. Fix R and update B:
This step wants to minimize Q(B,R), that is to say, it wants to maximizing:
Where the V with tilde denotes VR. Then they can get B=sgn(VR).

2. Fix B and update R:
R can be calculated with B and SVD method.

Experiments:


The unsupervised datasets are subsets of the Tiny Image dataset. The first is a version of CIFAR dataset, the second is a larger subset. The images of these datasets are 32 x 32 pixels.
They evaluate these by nearest neighbor search and average precision of top 500 ranked image for each query.
By compare with other methods with mAPs, the PCA-ITQ method performs very well.

Then they perform ITQ on dataset with CCA and shows CCA-ITQ on clean dataset is better than baseline, and CCA-ITQ on noisy dataset is way better than PCA-ITQ.


2016年3月9日 星期三

[ammai2016] Aggregating local descriptors into a compact image representation

Date: March 9th, 2016
Title: Aggregating local descriptors into a compact image representation
Author: Herve Jegou, et al


Novelties:

1. Use a new, more sparse, and more structured way for descriptors.
2. Split a large vector into pieces to enhance performances.

Contributions:

This paper introduces a better way to do similar-image-searching problems.
To judge whether a method is good or not, it considers three constraints: the search accuracy, the efficiency , and the memory usage.

Technical Summarizes:

1. Image vector representation
The method is somewhat like Fisher kernel ways, it proposes a vector representation called VLAD, which is the abbreviation of  "vector of locally aggregated descriptors."
For a codebook C of k visual words (from BOF) and an local descriptor x of d dimensions, the methods represent it in D=k×d dimensions. The components of the VLAD v looks like:
Where i = 1...k and  j = 1...d, the NN(x) means the nearest visual word of x. Then the vector v will be L2-normalized.

The figure shows the Images and the corresponding VLAD descriptors where k=16, blue lines means positive value, while red means negative ones.
We can observe that the descriptors are sparse and clustered, so we can encode the descriptors.

2. From vectors to codes
There is two steps:
1) use a projection to reduce the dimensions (PCA)
2) use a quantization to index the vectors.

If we quantize vectors, we should find approximate nearest neighbor, the paper use ADC in it:
Where a means we want to find a-th nearest neighbor, the q() is the quantization function, x is for query vector, and y is for database vectors.
Note that we do not quantize x in this formula, so there will be no approximation error on the query side.
the bits of quantized vectors might be very large, so we split it into m subvectors and represents q(x) by them:
Then we can decomposition the formula in ADC and use look-up tables for the square distances in the new formula thus enhance performances.

Experiments:

Mainly on three datasets:
1. The INRIA Holidays datasets, 1491 holiday images.
2. The UKB, 2250 images.
3. 10M images from Flickr.
The first two datasets show that VLAD has better performance than Fisher kernel representation in any dimensions; The last dataset shows on large dataset, VLAD is significantly better, too.