nogilnick.
About
Blog
Plots















The Jaccard Kernel and its Implied Feature Space

Sat, 08 Oct 2022

Data Science, Machine Learning, Mathematics

Kernel methods leverage the kernel trick to implicitly perform inner products in often high and even infinite dimensional feature spaces. For instance, the radial basis function (RBF) kernel can be shown to map to an infinite-dimensional feature space. In general, if a similarity function can be shown to satisfy Mercer's Condition, one may operate on the finite-dimensional Gram matrix induced by the function, while receiving the benefit of mathematical guarantees about the implied transformation.

The Jaccard index on sets can also be shown to satisfy Mercer's condition by reasoning about the structure of any Gram matrix produced using the similarity measure. This post explores the implicit transformation produced by the Jaccard kernel.

Let \(\mathbf{x}, \mathbf{y} \in \{0, 1\}^k\) be k-dimensional vectors of indicator variables corresponding to sets \(S_\mathbf{x}\) and \(S_\mathbf{y}\), where \(\mathbf{x}_i = 1\) when \(e_i \in S_\mathbf{x}\) and \(e_i\) is the i-th element of the set in some arbitrary and fixed order. Next, let \(\mathbf{W}\) be a diagonal matrix with non-negative item weights along the diagonal. It is assumed that \(\sum_i^{k}{\mathbf{W}_{ii}}=1\). Now, the weighted Jaccard Index may be defined as

\[\displaylines{J(\mathbf{x}, \mathbf{y}; \mathbf{W}) = (\mathbf{x}^T\mathbf{W}\mathbf{y})/(\mathbf{x}^T\mathbf{W}\mathbf{1}+\mathbf{1}^T\mathbf{W}\mathbf{y}-\mathbf{x}^T\mathbf{W}\mathbf{y})\\ \\ =(\mathbf{x}^T\mathbf{W}\mathbf{y})/(\mathbf{x}^T\mathbf{W}\mathbf{1}+\mathbf{1}^T\mathbf{W}\mathbf{y}-\mathbf{x}^T\mathbf{W}\mathbf{y}-1+1)\\ \\ =(\mathbf{x}^T\mathbf{W}\mathbf{y})/(1 + \mathbf{x}^T\mathbf{W}\mathbf{1}+\mathbf{1}^T\mathbf{W}\mathbf{y}-\mathbf{x}^T\mathbf{W}\mathbf{y}-\mathbf{1}^T\mathbf{W}\mathbf{1})\\ \\ =(\mathbf{x}^T\mathbf{W}\mathbf{y})/(1 + (\mathbf{1}-\mathbf{x})^T\mathbf{W}(\mathbf{y}-\mathbf{1}))\\ \\ =(\mathbf{x}^T\mathbf{W}\mathbf{y})/(1 - (\mathbf{x}-\mathbf{1})^T\mathbf{W}(\mathbf{y}-\mathbf{1}))\\ }\ ,\]

where \(\mathbf{1}\) is a vector of ones and the fact that \(\mathbf{1}^T\mathbf{W}\mathbf{1}=1\) is used in the third step. Now, notice that \(J(\mathbf{x}, \mathbf{y}; \mathbf{W})\) is of the form of the infinite geometric series:

\[\displaylines{\frac{a}{1-r}=\sum\limits_{j=0}^{\infty}{ar^j}}\ .\]

Namely, since they are both scalars, \(a=\mathbf{x}^T\mathbf{W}\mathbf{y}\) and \(r = (\mathbf{x}-\mathbf{1})^T\mathbf{W}(\mathbf{y}-\mathbf{1})\). Further, given the earlier definitions, note that \(0 \leq r \leq 1\).

Let \(\mathbf{\tilde{x}}=\mathbf{W}^{1/2}(\mathbf{x}-\mathbf{1})\), \(\mathbf{\tilde{y}}=\mathbf{W}^{1/2}(\mathbf{y}-\mathbf{1})\) and \(\mathbf{w} = diag(\mathbf{W})\). Substituting into the formula and expanding via the multinomial theorem produces

\[\displaylines{J(\mathbf{x}, \mathbf{y}; \mathbf{W}) = \mathbf{x}^T\mathbf{W}\mathbf{y}\sum\limits_{j=0}^{\infty}{[(\mathbf{x}-\mathbf{1})^T\mathbf{W}(\mathbf{y}-\mathbf{1})]^j} = \mathbf{x}^T\mathbf{W}\mathbf{y}\sum\limits_{j=0}^{\infty}{[\mathbf{\tilde{x}}^T\mathbf{\tilde{y}}]^j}\\ \\ = \mathbf{x}^T\mathbf{W}\mathbf{y}\sum\limits_{j=0}^{\infty}{[\sum\limits_{n_1 + \ldots + n_k = j}^{}{(\mathbf{\tilde{x}}^{n_1}_1 \cdots \mathbf{\tilde{x}}^{n_k}_k)(\mathbf{\tilde{y}}^{n_1}_1 \cdots \mathbf{\tilde{y}}^{n_k}_k)}]} \\ = (\sqrt{\mathbf{w}_1}\mathbf{x}_1\sqrt{\mathbf{w}_1}\mathbf{y}_1 + \ldots + \sqrt{\mathbf{w}_k}\mathbf{x}_k\sqrt{\mathbf{w}_k}\mathbf{y}_k) \sum\limits_{j=0}^{\infty}{[\sum\limits_{n_1 + \ldots + n_k = j}^{}{(\mathbf{\tilde{x}}^{n_1}_1 \cdots \mathbf{\tilde{x}}^{n_k}_k)(\mathbf{\tilde{y}}^{n_1}_1 \cdots \mathbf{\tilde{y}}^{n_k}_k)}]}\\ \\ = \sum\limits_{j=0}^{\infty}{[\sum\limits_{n_1 + \ldots + n_k = j}^{}{\sqrt{\mathbf{w}_1}\mathbf{x}_1(\mathbf{\tilde{x}}^{n_1}_1 \cdots \mathbf{\tilde{x}}^{n_k}_k)(\mathbf{\tilde{y}}^{n_1}_1 \cdots \mathbf{\tilde{y}}^{n_k}_k) \sqrt{\mathbf{w}_1}\mathbf{y}_1}]} \\ + \sum\limits_{j=0}^{\infty}{[\sum\limits_{n_1 + \ldots + n_k = j}^{}{\sqrt{\mathbf{w}_2}\mathbf{x}_2(\mathbf{\tilde{x}}^{n_1}_1 \cdots \mathbf{\tilde{x}}^{n_k}_k)(\mathbf{\tilde{y}}^{n_1}_1 \cdots \mathbf{\tilde{y}}^{n_k}_k) \sqrt{\mathbf{w}_2}\mathbf{y}_2}]} \\\\ + \ldots \\\\ + \sum\limits_{j=0}^{\infty}{[\sum\limits_{n_1 + \ldots + n_k = j}^{}{\sqrt{\mathbf{w}_k}\mathbf{x}_k(\mathbf{\tilde{x}}^{n_1}_1 \cdots \mathbf{\tilde{x}}^{n_k}_k)(\mathbf{\tilde{y}}^{n_1}_1 \cdots \mathbf{\tilde{y}}^{n_k}_k) \sqrt{\mathbf{w}_k}\mathbf{y}_k}]}\\ }\ ,\]

where \(\mathbf{w}_i\) is the i-th item weight and the multinomial theorem is employed in the second line.

Now, the entire right-hand of the equation is a sum of products involving symmetrically defined expressions of \(\mathbf{x}\) and \(\mathbf{y}\). By separating the terms involving \(\mathbf{x}\) from the terms involving \(\mathbf{y}\), the implied transformation becomes apparent. Name the transformation \(\varphi\) may be defined as

\[\displaylines{\varphi(\mathbf{x})=(c_{1j}\sqrt{\mathbf{w}_1}\mathbf{x}_1[\sqrt{\mathbf{w}_1}(\mathbf{x}_1 - 1)]^{n_1} \cdots [\sqrt{\mathbf{w}_k}(\mathbf{x}_k - 1)]^{n_k},\\ \ldots, c_{2j}\sqrt{\mathbf{w}_2}\mathbf{x}_2[\sqrt{\mathbf{w}_1}(\mathbf{x}_1 - 1)]^{n_1} \cdots [\sqrt{\mathbf{w}_k}(\mathbf{x}_k - 1)]^{n_k},\\ \ldots, c_{kj}\sqrt{\mathbf{w}_k}\mathbf{x}_k[\sqrt{\mathbf{w}_1}(\mathbf{x}_1 - 1)]^{n_1} \cdots [\sqrt{\mathbf{w}_k}(\mathbf{x}_k - 1)]^{n_k})^T}\ ,\]

where the \(c_{ij}\) terms are the appropriate constants determined as above by the multinomial theorem. In the above, the infinite product of the j-th power of the term involving the complement intersection is repeated k times: once for each element of the original vector. In this way, it can be seen that the implicit mapping corresponds to an infinite-dimensional transformation.

To better understand the behavior of this infinite-dimensional transformation, consider the expansion of the geometric series formula

\[\displaylines{\sum\limits_{j=0}^{\infty}{ar^j}=a(r^0+r^1+r^2+\ldots)}\ .\]

In this formula, a corresponds to the weighted proportion of elements in the intersection and r corresponds to the weighted proportion of elements in the intersection of the set complements. In normal situations, the contributions of the higher powers of r become vanishingly small and the sum rapidly converges to a finite number.

However, as r approaches 1, a approaches 0. In some sense, the two terms counteract each-other, keeping the ultimate result within the range [0, 1]. In the extreme case of two empty sets, note that \(r=1\) and \(a=0\) giving rise to the indeterminate form of \(0 * \infty\). This form corresponds to the \(0/0\) resulting from the typical formula.