Faiss教程翻译

教程

更快地搜索

为了加快搜索，将数据集合拆分成多个段。我们在多维空间定义了Voronoi单元(cell),数据集的每个向量都会落入一个单元中。检索时找query落入的单元，query向量与单元中的向量比较，从而得到近邻向量。Faiss中IndexIVFFlat类就是使用了对应的实现。
IndexIVFFlat

需要进行训练的一种index。使用IndexIVFFlat还需要传入其它的index,即量化器（quantizer,通常使用IndexFlatL2），化器将向量映射到Voronoi单元，每个单元定义一个质心（centroids）。传入的index会根据质心（centroids）找出query向量落在的单元，再计算近邻的向量。

search时需要指定用于执行近邻搜索的单元格数量（nprobe），搜索的耗时会随着nprobe增长而增长。但是随着nprobe增长，搜索结果也会越逼近穷尽搜索的结果。

Python：

nlist = 100
k = 4
quantizer = faiss.IndexFlatL2(d)  # the other index
index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)
       # here we specify METRIC_L2, by default it performs inner-product search
assert not index.is_trained
index.train(xb)
assert index.is_trained

index.add(xb)                  # add may be a bit slower as well
D, I = index.search(xq, k)     # actual search
print(I[-5:])                  # neighbors of the 5 last queries
index.nprobe = 10              # default nprobe is 1, try a few more
D, I = index.search(xq, k)
print(I[-5:])                  # neighbors of the 5 last queries

C++:

int nlist = 100;
    int k = 4;
    faiss::IndexFlatL2 quantizer(d);       // the other index
    faiss::IndexIVFFlat index(&quantizer, d, nlist, faiss::METRIC_L2);
    // here we specify METRIC_L2, by default it performs inner-product search
    assert(!index.is_trained);
    index.train(nb, xb);
    assert(index.is_trained);
    index.add(nb, xb);
    {       // search xq
        long *I = new long[k * nq];
        float *D = new float[k * nq];
        index.search(nq, xq, k, D, I);
        printf("I=\n");                    // print neighbors of 5 last queries
        ...
        index.nprobe = 10;                 // default nprobe is 1, try a few more
        index.search(nq, xq, k, D, I);
        printf("I=\n");
        ...
    }

低内存占用

IndexFlatL2 和 IndexIVFFlat两个index都存储了完整的向量。而当向量规模非常大的时候，Faiss基于点积量化（product quantizers）进行有损压缩存储向量。

这些向量依然存储在Voronoi单元中，但是他们的大小减少到了可配置的字节数。

由于向量并没有精确的存储，所以search返回的距离也是近似值。

Python:

nlist = 100
m = 8                             # number of bytes per vector
k = 4
quantizer = faiss.IndexFlatL2(d)  # this remains the same
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
                                    # 8 specifies that each sub-vector is encoded as 8 bits
index.train(xb)
index.add(xb)
D, I = index.search(xb[:5], k) # sanity check
print(I)
print(D)
index.nprobe = 10              # make comparable with experiment above
D, I = index.search(xq, k)     # search
print(I[-5:])

C++

int nlist = 100;
    int k = 4;
    int m = 8;                             // bytes per vector
    faiss::IndexFlatL2 quantizer(d);       // the other index
    faiss::IndexIVFPQ index(&quantizer, d, nlist, m, 8);

    index.train(nb, xb);
    index.add(nb, xb);
    {       // sanity check
        ...
        index.search(5, xb, k, D, I);
        printf("I=\n");
        ...
        printf("D=\n");
        ...
    }
    {       // search xq
        ...
        index.nprobe = 10;
        index.search(nq, xq, k, D, I);
        printf("I=\n");
        ...
    }

结果：

[[   0  608  220  228]
 [   1 1063  277  617]
 [   2   46  114  304]
 [   3  791  527  316]
 [   4  159  288  393]]

[[ 1.40704751  6.19361687  6.34912491  6.35771513]
 [ 1.49901485  5.66632462  5.94188499  6.29570007]
 [ 1.63260388  6.04126883  6.18447495  6.26815748]
 [ 1.5356375   6.33165455  6.64519501  6.86594009]
 [ 1.46203303  6.5022912   6.62621975  6.63154221]]

观察我们这好到的最近邻向量，但是检索向量与其自身的距离居然不是0，这是因为有损压缩造成的。

基础

Faiss基石: 聚类, PCA, 量化

Faiss基于一些基础算法进行了高效的实现，如： k-means、PCA、PQ

Clustering

Faiss提供的一种高效的k-means实现：

ncentroids = 1024
niter = 20
verbose = True
d = x.shape[1]
kmeans = faiss.Kmeans(d, ncentroids, niter, verbose)
kmeans.train(x)

聚类结果的质心（centroids）在kmeans.centroids中。

Assignment

当k-means完成训练后，给定一组向量，可以使用下面的函数计算其对应质心：

D, I = kmeans.index.search(x, 1)

x中的每行对应的最近质心将存储在I中返回，D中是其对应的距离。

相反的，如果想去找出给定质心，从x中找出最近的15个向量，则必须创建新的index:

index = faiss.IndexFlatL2 (d)
index.add (x)
D, I = index.search (kmeans.centroids, 15)

PCA

将40维降为10维：

# random training data 
mt = np.random.rand(1000, 40).astype('float32')
mat = faiss.PCAMatrix (40, 10)
mat.train(mt)
assert mat.is_trained
tr = mat.apply_py(mt)
# print this to show that the magnitude of tr's columns is decreasing
print (tr ** 2).sum(0)

PQ encoding / decoding

ProductQuantizer对象可以用于对codes进行编码和解码。

d = 32  # data dimension
cs = 4  # code size (bytes)
# train set 
nt = 10000
xt = np.random.rand(nt, d).astype('float32')
# dataset to encode (could be same as train)
n = 20000
x = np.random.rand(n, d).astype('float32')

pq = faiss.ProductQuantizer(d, cs, 8)
pq.train(xt)

# encode 
codes = pq.compute_codes(x)

# decode
x2 = pq.decode(codes)

# compute reconstruction error
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()

经过encode之后，x就被映射到codes,codes的大小为20000*4。

如何选择一个Index

是否需要精确的结果?

Faiss中只有IndexFlatL2和IndexFlatIP能够保证精确的结果，也就是这两个index可以为为其它index的提供一个baseline。这两个index也不会对向量进行压缩，也不支持添加时携带ID（add_with_ids），只能顺序添加。

这两个都支持在GPU.

内存对你而言是否是一个问题？

切记在Faiss中，所有的Index都是存储在内存中的。在考虑内存开销的基础上，需要兼顾准确率和速度。

如果内存足够或者数据量比较小，那么HNSWx是一个比较好的选择，HNSWx是一个既精准又快速的index，内存开销以及精确度与每个向量的连接数（number of links per vector）有关。HNSWx也只支持顺序添加（不支持add_with_ids）。

使用...,Flat方式，先对数据集进行聚类，聚类完后，Flat只是对向量分桶（bucket），所以这种方式存储大小和原始数据一致，准确率和速度需要通过nprobe 参数权衡。

如果存储所有向量代价非常高，可以进行两个操作：

使用PCA降维至x
使用标准量化（scalar quantization）将每个向量的component 量化至1个字节。

因此每个向量只需要x 字节进行存储。

当然如果精准性对你来说非常非常重要，使用"OPQx_y,...,PQx"方式，PQx使用点积量化进行向量压缩，输出x字节的的codes，通常x设置为<=64，对于较大的codes，标准量化（SQ）通常也是精准和快速的。OPQ是向量的线性变换，使得更容易进行压缩。y是维度。

y是x的倍数
y<=d
y <= 4*x

数据量有多大？

这个问题用于补充上面的聚类（即上面的...）选项，数据集聚类至不同的桶，在搜索时只遍历部分的桶（nprobe buckets）

如果小于1M：...,IVFx，...

如果介于1M-10M：...,IVF65536_HNSW32,...

如果介于10M-100M：IVF262144_HNSW32,...
如果结局100M-1B：IVF1048576_HNSW32,...

Faiss indexes

基础Indexes

单元探测方法（Cell-probe methods）

例如kmeans这些采用分区技术提升处理速度，但是不保证一定可以找出最近邻。相应的这些方法有时候会被称作单元探测。

我们使用基于多元探测（Multi-probing）的分区方法（best-bin KD树的变种）.

特征空间被分成ncells个单元
数据集中的向量通过hash函数分配至这些单元中（如果kmeans，那就分配质离它最近的质心），并存储在由ncells个倒排链组成的倒排文件中。
检索式，选择nprobe个倒排链，query向量与这些倒排链上的向量逐一计算。

这样的话，只有一小部分数据集需要和query向量比较，这部分数据大概是nprobe/ncells，但是这实际预估少了，因为每个倒排链长度并不相等。如果query向量的最近邻向量所在单元没有被召回，那就意味着失败。

在C++中，相关的index是IndexIVFFlat，该类的构造函数，需要传入一个index作为参数，这个index用于倒排链的分配。query也是用该index进行检索，最后返回倒排链上被访问向量的ID。

使用flat index作为粗粒度量化器的单元探测方法

使用Flat Index作粗量化器时，IndexIVF的训练方法，将质心赋给flat index。查询时需要指定nprobe（有助于在速度和准确率之间权衡）。

根据经验，n表示要索引的数量，如何给定质心的数量，这得衡量分配到质心的代价，包括解析倒排链时执行精确距离计算的向量数量（kprobe / ncells * n * c），常数c用于协调倒排链数据分布不均匀的情况，那么可以使用ncentroids = C * sqrt (n)表示质心的数量，通常c=10

与LSH的关系

大部分流行的单元探测方法的可能都是以Locality Sensitive Hashing为原型（E2LSH），然而该方法和它的衍生方法有两个缺点：

需要很多hash函数（等于partitions数量）才能获得比较好的结果，这会导致额外的内存开销
The hash function are not adapted to the input data. This is good for proofs but leads to suboptimal choice results in practice

Binary codes

在C++中，LSH index定义如下：

IndexLSH * index = new faiss::IndexLSH (d, nbits);

d是输入向量的维度，nbits为每个向量存储用的字节数。

在Python中，LSH index的创建和搜索如下：

n_bits = 2 * d
lsh = faiss.IndexLSH (d, n_bits)
lsh.train (x_train)
lsh.add (x_base)
D, I = lsh.search (x_query, k)

NOTE: The algorithm is not vanilla-LSH, but a better choice. Instead of set of orthogonal projectors is used if n_bits <= d, or a tight frame if n_bits > d.

基于量化的方式

在C++中，基于点积量化的index都有PQ关键词标识，例如，所有的基于点积量化的index都被描述成以下这样：

#include <faiss/IndexPQ.h>
  #include <faiss/IndexIVFPQ.h>

  // Define a product quantizer for vectors of dimensionality d=128,
  // with 8 bits per subquantizer and M=16 distinct subquantizer
  size_t d = 128;
  int M = 16;
  int nbits = 8;
  faiss:IndexPQ * index_pq = new faiss::IndexPQ (d, M, nbits);

  // Define an index using both PQ and an inverted file with nlists to avoid exhaustive search
  // The index 'quantizer' must be already declared
  faiss::IndexIVFPQ * ivfpq = new faiss::IndexIVFPQ (quantizer, d, nlists, M, nbits);

  // Same but with another level of refinement
  faiss::IndexIVFPQR * ivfpqr = new faiss::IndexIVFPQR (quantizer, d, nclust, M, nbits, M_refine, nbits);

在Python中定义如下：

m = 16                                   # number of subquantizers
n_bits = 8                               # bits allocated per subquantizer
pq = faiss.IndexPQ (d, m, n_bits)        # Create the index
pq.train (x_train)                       # Training
pq.add (x_base)                          # Populate the index
D, I = pq.search (x_query, k)            # Perform a search

在这个例子中，m为16（d必须为m的倍数），即16个子量化器，那向量将被分成16份，对应16个code book ,code size为16，每个code设定占用8 bits，也就对应2^8=256个质心。

使用PQ策略的倒排文件

IndexIVFPQ可能是大规模搜索最有用的索引结构：

coarse_quantizer = faiss.IndexFlatL2 (d)
index = faiss.IndexIVFPQ (coarse_quantizer, d,
                          ncentroids, code_size, 8)
index.nprobe = 5

有关质心数量（ncentroids）的设置可以参考IndexIVFFlat 小节，code_size通常设置为2的多次方，但是值介于4-64之间，和IndexPQ一样，d必须是m的倍数。

Composite Indexes

使用PQ作为粗量化器的单元探测（Cell probe）方法

Faiss中相应的粗量化器（coarse quantizer）是MultiIndexQuantizer，MultiIndexQuantizer相对于低精度高效率的index是比较有竞争力的，这个index特殊之处在于没有向量被添加到该index中。

nbits_mi = 12  # c
M_mi = 2       # m
coarse_quantizer_mi = faiss.MultiIndexQuantizer(d, M_mi, nbits_mi)
ncentroids_mi = 2 ** (M_mi * nbits_mi)

index = faiss.IndexIVFFlat(coarse_quantizer_mi, d, ncentroids_mi)
index.nprobe = 2048
index.quantizer_trains_alone = True

关于ncentroids_mi说几点，不考虑段的情况，如果每个向量需要nbits_mi个bits进行保存，那nbits_mi位可以表示 $2^{nbits_mi}$ 信息，也就是 $2^{nbits_mi}$ 个质心，那向量再分为M_mi个段的话，每个段中的向量需要nbits_mi个bits，那么一个完整向量存储需要 $M_mi * nbits_mi$ 位，那么对应的质心数量为 $2^{M_mi *nbits_mi}$

Pre-filtering PQ codes with polysemous codes

使用汉明距离进行比较，比使用点积量化有6倍的速度提升。通过对汉明距离设置一个阈值，许多代价比较高的PQ code比较可以被阻止。

在IndexPQ上启用：

index = faiss.IndexPQ (d, 16, 8)
# before training
index.do_polysemous_training = true
index.train (...)

# before searching
index.search_type = faiss.IndexPQ.ST_polysemous
index.polysemous_ht = 54    # the Hamming threshold
index.search (...)

在IndexIVFPQ上启用：

index = faiss.IndexIVFPQ (coarse_quantizer, d, 16, 8)
# before training
index. do_polysemous_training = true
index.train (...)

# before searching
index.polysemous_ht = 54 # the Hamming threshold
index.search (...)

设置一个合理的阈值，需要注意：

阈值必须介于0到每个code需要的bit数（该例子中为16*8）之间，并且code符合二项分布
阈值设置每个code占用位数的1/2，也就是能够节省code的1/2计算，但通常这还不足够，还需要更小一点。

Pre and post processing

ID映射

默认情况下，Faiss为添加的向量分配连续ID，下面会介绍如何改变这种ID分配规则。

一些Index类实现了add_with_ids方法，在添加向量的同时还可以指定64位的ID。在搜索时，将返回存储的ID，而不是原始向量。

IndexIDMap

这个Index封装了另外一个Index，并在搜索和索引时提供ID的转换。

index = faiss.IndexFlatL2(xb.shape[1]) 
ids = np.arange(xb.shape[0])
index.add_with_ids(xb, ids)  # this will crash, because IndexFlatL2 does not support add_with_ids
index2 = faiss.IndexIDMap(index)
index2.add_with_ids(xb, ids) # works, the vectors are stored in the underlying index

IndexFlatL2是不支持add_with_ids的，也就是对于IndexFlatL2而言，使用的还是顺序的ID，所以必须借助IndexIDMap进行ID转换，IndexIDMap在索引时为其维护了一个ID映射表。

IndexIVF

因为倒排的原因，所以IndexIVF和它的子类，肯定是需要存储ID的，所以提供了add_with_ids方法，我们不需要借助IndexIDMap。

Pre-transforming the data

在索引之前进行数据转换处理通常是很有用的，Transform类继承VectorTransform，支持将d_in维输入的向量转换为d_out维的输出向量。

RandomRotationMatrix，IndexPQ或者IndexLSH中索引前，重新平衡向量的components
RemapDimensionsTransform，减少或扩张向量大小
PCAMatrix，减少维度
OPQMatrix，对输入向量做旋转，使得更适合PQ编码。参考：Optimized product quantization, Ge et al., CVPR'13

Transformation可以从一组向量进行训练。Index也可以包装在IndexPreTransform中，这样映射更加透明，并且训练也可以与index训练集成。

示例

使用IndexPreTransform封装PCA和以个index进行降维

# the IndexIVFPQ will be in 256D not 2048
  coarse_quantizer = faiss.IndexFlatL2 (256)
  sub_index = faiss.IndexIVFPQ (coarse_quantizer, 256, ncoarse, 16, 8)
  # PCA 2048->256
  # also does a random rotation after the reduction (the 4th argument)
  pca_matrix = faiss.PCAMatrix (2048, 256, 0, True) 

  #- the wrapping index
  index = faiss.IndexPreTransform (pca_matrix, sub_index)

  # will also train the PCA
  index.train(...)
  # PCA will be applied prior to addition
  index.add(...)

例子中输入向量维度为2048，我们想将其降维到256，因为transform是工作在index之前的，所以IndexFlatL2和IndexIVFPQ定义是的输入维度是256。

因为PQ量化后，再一次对信息作了衰减，所以最终一个向量变成了16*8bit=16bytes

增加向量维度

有些时候我们会通过在向量中插入0值来扩充向量的维度，使之维度是给定数值的倍数。

例如上面的例子中，原本的维度不能被PQ的子量化器数量整除，所以通过RemapDimensionsTransform将维度扩充到M的倍数。

  # input is in dimension d, but we want a multiple of M
  d2 = int((d + M - 1) / M) * M
  remapper = faiss.RemapDimensionsTransform (d, d2, true)
  # the index in d2 dimensions  
  index_pq = faiss.IndexPQ(d2, M, 8)  
  
  # the index that will be used for add and search 
  index = faiss.IndexPreTransform (remapper, index_pq)

重新排序搜索结果（精排）

Faiss的IndexPQ，在search时计算是经过PQ code后的距离，而非与原向量的精确距离，而在某些场景中，我们还是需要精确计算与查询向量的距离，从而精排这些结果的。

下面这个例子中我们通过IndexPQ获取4*10个近邻向量，然后对这40个向量精确计算距离，最后取top 10.

q = faiss.IndexPQ (d, M, nbits_per_index)
rq = faiss.IndexRefineFlat (q)
rq.train (xt)
rq.add (xb)
rq.k_factor = 4
D, I = rq:search (xq, 10)

注意：IndexRefineFlat能够计算与原向量的距离，是建立在其存储了原始向量的基础上的。

从多个index合并结果

当向量集分散在多个index中时，可以通过IndexShards分发query，然后合并结果。

Index IO，Index Factory, Cloning以及超参数调整

Faiss的index通常是复合index，对于单个index类型很难进行操作，同时因为有很多参数，所以给定场景下经常很难找出最优的组合。因此Faiss提供了高阶接口批量操纵index以及自动探测参数空间。

I/O

write_index(index, "large.index"): 将指定index写入文件
Index * index = read_index("large.index"): 从文件读入index

深度拷贝

Index* index2 = clone_index(index) 返回index的深度拷贝
Index *index_cpu_to_gpu = index_cpu_to_gpu(resource, dev_no, index)
Index *index_gpu_to_cpu = index_gpu_to_cpu(index)
index_cpu_to_gpu_multiple，使用IndexShards或者IndexProxy将索引拷贝至多个GPU

Index Factory

index_factory通过指定的字符串参数创建一个复合的Faiss Index，给定的字符串最多包含3个部分：

预处理组件
1. PCA，例如PCA64指的是使用PCA将维度降为64（实现类为PCAMatrix）,PCAR64意味着先进性PCA降为，紧接着做随机旋转。
2. OPQ，例如OPQ16
倒排文件
1. IVF4096意味着使用IndexFlatL2构建大小为4096的倒排文件
2. IMI2x8意味着每个向量占用2*8 bits，使用MultiIndexQuantizer量化器构建大小为 $2^{2x8}$ 的倒排文件。
3. 如果不计划使用倒排文件，但是需要add_with_ids，我们可以设置IDMap创建IndexIDMap
精细化组件
1. Flat存储完整的向量，实现类有IndexFlat 和IndexIVFFlat
2. PQ16使用16bytes的PQ编码，实现类有IndexPQ 或者IndexIVFPQ
3. PQ8+16，实现类见IndexIVFPQR

示例：

index = index_factory(128, "PCA80,Flat")# 创建一个128维索引，使用PCA将其降为80，之后在使用详尽搜索。
index = index_factory(128, "OPQ16_64,IMI2x8,PQ8+16")

自学习运行时参数

Index的参数分为两类：

build-time，即构建index时的参数
run-time，只能在search前调整的参数

这一小节说的自主学习特指runtime时的参数，有哪些参数呢？

key	Index class	runtime parameter	备注
IVF, IMI2x	IndexIVF*	nprobe	衡量速度和精度的主要参数
IMI2x*	IndexIVF	max_codes	对IMI有用，通常拥有不平衡的倒排链
PQ*	IndexIVFPQ, IndexPQ	ht	汉明距离阈值
PQ+	IndexIVFPQR	k_factor	决定取多少结果进行精确验证

自主学习目的就是为了找到速度和精度的最优点。

AutoTuneCriterion

AutoTuneCriterion对象包含搜索的真实结果和搜索结果的评估，它返回0-1之间的一个性能评估值，目前有基于1-recall和R-recall实现。

OperatingPoints

该对象将所有操作点（性能, 搜索时间, 参数集ID）存储为元祖，并且选择一个最优的操作点。

ParameterSpace

该对象扫描给定index的所有可调参数，并根据每个参数所有可能取值构建table。

也就是说，参数空间会随参数数量呈现指数级增长。

详细的自动学习参数的demo可参考：demo_auto_tune.py

Index上的特殊操作

从index中还原向量（reconstruct或者reconstruct_n）

d = 64
nb = 1000
nt = 1500
np.random.seed(123)
xb = np.random.random(size=(nb, d)).astype('float32')
xt = np.random.random(size=(nt, d)).astype('float32')

index = faiss.index_factory(d, "IVF64,Flat")
index.train(xt)
index.add(xb)

index.make_direct_map()
recons_before = np.vstack([index.reconstruct(i) for i in range(nb)])

支持IndexFlat, IndexIVFFlat , IndexIVFPQ , IndexPreTransform，对于前三个需要先调用make_direct_map

从index中删除元素

index.remove_ids(faiss.IDSelectorRange(int(nb / 2), nb))

支持：IndexFlat, IndexIVFFlat, IndexIVFPQ, IDMap

范围搜索

返回query向量和指定半径内的所有向量。

支持：IndexFlat, IndexIVFFlat

Spliting and merging indexes

merge_from，拷贝另外一个index到当前，并且在运行中解决分配。
copy_subset_to 拷贝一部分code到另外一个index。

这些方法主要用于大索引场景下，所以只有IndexIVF的子类实现了这些方法。

Advanced topics

Faiss的代码结构

编译

Faiss通过Makefile编译接口，同时支持设置不同变量标志，如BLAS库，具体可以参考INSTALL中如何设置标志。

依赖

Faiss唯一的硬依赖就是BLAS/Lapack，它基于英特尔的MKL进行开发，但是所有带有fortran规范的接口实现都应该可以工作。因为所有实现都从32位想64位升级，所以所有的integer类型在编译时都应该指定-DFINTEGER=int 或者-DFINTEGER=long标志。

Faiss CPU的代码规范

没有public/private

Faiss中所有的C++对象结构，没有public/private概念，所有的字段都可以直接访问。

对象所有权

大部分情况下，Faiss对象都是可拷贝的。有一些例外的地方，对象A包含一个对象B的指针，在这种情况下，在A中有一个boolean类型的标志own_fields，这个标识用于指明当对象A被删除时对象B是否要被删除，构造时通常我们将该变量值设为false，即不是B的拥有者，当然如果在调用代码中失去了对B的引用，这可能会导致内存泄漏，所以如果我们想A在销毁时B跟着销毁，需要将其设为True，对于以下类的构造我们要注意：

Class	field
IndexIVF	quantizer
IndexPreTransform	chain
IndexIDMap	index
IndexRefineFlat	base_index

以IndexIVFPQ对象构造为例：

  faiss::Index *index; 
  if (...) { // on output of this block, index is the only object that needs to be tracked
     faiss::IndexFlatL2 * coarseQuantizer = new  faiss::IndexFlatL2(targetDim);
     // you need to use a pointer here
     faiss::IndexIVFPQ *index1 = new faiss::IndexIVFPQ(
          coarseQuantizer, targetDim, numCentroids, numQuantizers, 8);
     index1->own_fields = true; // coarse quantizer is deallocated by index destructor
     index1->nprobe = 5; 
     index = index1;
  }

上例子中IndexIVFPQ构造需要指定一个量化器，我们希望IndexIVFPQ销毁时，coarseQuantizer也被销毁。

和GPU相关的，都不做翻译。

线程和异步调用

线程安全

Faiss CPU对于并发检索和其它不更改index的操作都是线程安全的，多线程改变index的函数需要实现互斥。

内部线程

Faiss本身有多种不同的内部线程，对于CPU的Faiss，index上3中基本操作（train，add，search）都是多线程的。线程是通过OpenMP，以及多线程的BLAS实现。我们可以通过环境变量OMP_NUM_THREADS或者调用omp_set_num_threads(10)方法。

如果查询或添加单个向量，Faiss是不会使用多线程的。

查询的性能

最好的提升查询QPS是进行批量提交。如果只是一个向量查询，那将只会在调用线程中执行。

内部线程的性能（OpenML）

调整OpenML线程数量对性能提升有时候并不是特别显著，在一些机器架构中，将线程数量设置的比内核数量稍微小点，有时候执行效率会更高。例如在Intel E5-2680 v2中，将线程数设置为20而不是默认的40，效果会更好。

如果使用OpenBLAS、BLAS实现，将线程策略设置为PASSIVE会更好：

export OMP_WAIT_POLICY=PASSIVE

异步搜索

对于包含以下一些计算的search操作，并行计算会更好：

单线程计算
等IO
GPU计算

对于Faiss CPU，和其他多线程计算（如其它searcher）并行化并没有什么帮助，因为这样反而会导致太多的线程从而降低正题性能。

所以在Faiss中，可能多个用户的线程的search操作会被放入队列，然后聚合打包集中处理。

殷亚云的博客

教程