Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso)


The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this project, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems.

Main Idea

The HSIC Lasso is given as the following form

\(\min_{\alpha_1,\ldots,\alpha_d} \frac{1}{2}\|\bar{\bf L} - \sum_{k = 1}^d \alpha_k \bar{\bf K}^{(k)}\|^2_{F} + \lambda \sum_{k = 1}^d |\alpha_k| \hspace{.3cm} \text{s.t.} \alpha_1,\ldots,\alpha_d \geq 0\)

where \(\|\cdot\|_F\) is the Frobenius norm, \(\bar{\bf K}^{(k)}\) is the centered Gram matrix computed from \(k\)-th feature, and \(\bar{\bf L}\) is the centered Gram matrix computed from output \(y\).

To compute the solutions of HSIC Lasso, we use the dual augmented Lagrangian (DAL) package.


  • Can select nonlinearly related features.

  • Highly scalable w.r.t. the number of features.

  • Convex optimization.



  • Download the source code.

  • For the less memory implementation, you need to download eigen and place it to the same folder of HSICLasso. Then, compile cpp files with mex.

  • Run the script (demo_HSICLasso.m).


I am grateful to Prof. Masashi Sugiyama and Dr. Leonid Sigal for their support in developing this software.


I am happy to have any kind of feedbacks. E-mail: \(\texttt{makoto.yamada@riken.jp}\)