Metric Learning
(Thanks to John Blitzer, who gave me this cake for my 30th birthday.)
One of the fundamental questions of machine learning is how to compare examples. If an algorithm could perfectly determine whether two examples were semantically similar or dissimilar, most subsequent machine learning tasks would become trivial. For example, in classification settings, one would only require one labeled example per class and could then, during test-time, categorize all similar examples with the same class-label. An analogous reduction applies to regression if a continuous estimate of the degree of similarity were available. It is not surprising that many popular machine learning algorithms, such as Support Vector Machines, Gaussian Processes, kernel regression, k-means  or k-nearest neighbors (kNN)  fundamentally rely on a representation of the input data for which a reliable, although not perfect, measure of dissimilarity is known.
A common choice of dissimilarity measure is an uninformed norm, like the Euclidean distance. Here it is assumed that the features are represented in a Euclidean subspace in which similar inputs are close and dissimilar inputs are far away. Although the Euclidean distance is convenient and intuitive, it ignores the fact that the semantic meaning of “similarity” is inherently task- and data-dependent. To illustrate this point, imagine two researchers who use the same data set of written documents for clustering. The first one is interested in clustering the articles by author, whereas the second wants to cluster by topic. Given the nature of their respective tasks, both should use very different metrics to measure document similarity, even if the underlying features are computed through similar means (e.g., bag-of-words or tf-idf ). Often, domain experts adjust the feature representations by hand — but clearly, this is not a robust approach. It is therefore desirable to learn the metric (or data representation) explicitly for each specific application.
Relevant Publications:
[Project] [PDF] Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi. BERTScore: Evaluating Text Generation with BERT, International Conference on Learning Representations (ICLR) 2020, Addis Ababa, Ethiopia, April 26-30 
[PDF] Gao Huang, Chuan Guo, Matt Kusner, Yu Sun, Fei Sha and Kilian Q. Weinberger. Supervised Word Mover's Distance.  Neural Information Processing Systems (NIPS), 2015, Curran Associates, Barcelona, Dec., 2016, in press…
[PDF] Matt J. Kusner, Yu Sun, Nicholas I. Kolkin , Kilian Q. Weinberger. From Word Embeddings to Document Distances
International Conference on Machine Learning (ICML), Lille, France, pp. 957–966, 2015. 
[PDF][CODE][BIBTEX] Matt  Kusner, Stephen  Tyree, Kilian Q. Weinberger, Kunal Agrawal, Stochastic Neighbor Compression. International Conference on Machine Learning (ICML), Beijing China, 2014. 
[Preprint][CODE][BIBTEX] Dor Kedem, Stephen Tyree, Kilian Q. Weinberger, Fei Sha, Gert Lanckriet. Nonlinear metric learning. In Proceedings of  Advances in Neural Information Processing Systems 25 (NIPS-25), (in press.) 
[PDF][CODE] Laurens J.P. van der Maaten and Kilian Q. Weinberger. Stochastic Triplet Embedding. To appear in Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 
[PDF][CODE] Zhixiang (Eddie) Xu, Minmin Chen, Kilian Q. Weinberger, Fei Sha. From sBoW to dCoT: Marginalized Encoders for Text Representation. Proc. of 21st ACM Conf. of Information and Knowledge Management (CIKM), Hawaii, 2012 (In press.)
[PDF][CODE][BIBTEX] Minmin Chen, Zhixiang (Eddie) Xu, Kilian Q. Weinberger, Fei Sha. Marginalized Stacked Denoising Autoencoders for Domain Adaptation. Proceedings of 29th International Conference on Machine Learning (ICML), Edingburgh Scotland, Omnipress, pages 767-774, 2012. 
[PDF][CODE][BIBTEX] Shibin Parameswaran and Kilian Q. Weinberger. Large Margin Multi-Task Metric Learning. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta (eds.), Advances in Neural Information Processing Systems 23 (NIPS), pages 1867-1875, 2010.
[PDF][CODE][BIBTEX] K. Q. Weinberger, F. Sha, and L. K. Saul. Convex optimizations for distance metric learning and pattern classification. IEEE Signal Processing Magazine 27(3): 146-158, 2010.  
[PDF] [CODE][BIBTEX] K. Q. Weinberger, L. K. Saul. Distance Metric Learning for Large Margin  Nearest Neighbor Classification. Journal of Machine Learning Research (JMLR) 2009
[PDF][BIBTEX] B. Bai, J. Weston, D. Grangier, R. Collobert, O. Chapelle, K. Q. Weinberger. Supervised Semantic Indexing. The 18th ACM Conference on Information and Knowledge Management (CIKM), 2009.
[PDF][BIBTEX] K. Q. Weinberger and O. Chapelle (2008).  Large Margin Taxonomy Embedding with an Application to Document Categorization.  Advances in Neural Information Processing Systems 21 (NIPS), 2009, 1737-1744.  
[PDF][BIBTEX] Malcolm Slaney, K. Q. Weinberger, William White, Learning a Metric for Music Similarity. International Symposium on Music  Information Retrieval (ISMIR), September 2008.
[PDF] [TALK][BIBTEX] K. Q. Weinberger and L. K. Saul (2008). Fast solvers and efficient implementations for distance metric learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 2008. 
[BIBTEX] K. Q. Weinberger PhD Thesis, Metric Learning with Convex Optimization. University of Pennsylvania. PhD committee: Lawrence K. Saul (chair), Fernando C. N. Pereira, Daniel D. Lee, Gert Lanckriet, Ben Taskar 
[PDF][BIBTEX] K. Q. Weinberger, G. Tesauro (2007). Metric learning for kernel regression. In Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics (AISTATS-07), Puerto Rico.  
[PDF][CODE][BIBTEX] K. Q. Weinberger, J. Blitzer, and L. K. Saul (2006). In Y. Weiss, B. Schoelkopf, and J. Platt (eds.), Distance Metric Learning for Large Margin Nearest Neighbor Classification, Advances in Neural Information Processing Systems 18 (NIPS). MIT Press: Cambridge, MA.