Yann LeCun publications

Affiliation: Unknown

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then …

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations …

Fast Approximation of Rotations and Hessians matrices
A new method to represent and approximate rotation matrices is introduced. The method represents approximations of a rotation matrix $Q$ with linearithmic complexity, i.e. with $\frac{1}{2}n\lg(n)$ rotations over pairs of coordinates, arranged in an FFT-like fashion. The approximation is "learned" using gradient descent. It allows to represent symmetric matrices $H$ …

Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships …

Computing the Stereo Matching Cost with a Convolutional Neural Network
We present a method for extracting depth information from a rectified image pair. We train a convolutional neural network to predict how well two image patches match and use it to compute the stereo matching cost. The cost is refined by cross-based cost aggregation and semiglobal matching, followed by a …

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion, that extends the FLIC dataset with additional motion features. We apply our …

Differentially- and non-differentially-private random decision trees
We consider supervised learning with random decision trees, where the tree construction is completely random. The method is popularly used and works well in practice despite the simplicity of the setting, but its statistical mechanism is not yet well-understood. In this paper we provide strong theoretical guarantees regarding learning with …

Efficient Object Localization Using Convolutional Networks
Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include pooling and sub-sampling layers which reduce computational requirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced localization accuracy. We introduce a novel architecture which …

The Loss Surfaces of Multilayer Networks
We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the …

Unsupervised Learning of Spatiotemporally Coherent Metrics
Current state-of-the-art classification and detection algorithms rely on supervised training. In this work we study unsupervised feature learning in the context of temporally coherent video data. We focus on feature learning from unlabeled video data, using the assumption that adjacent video frames contain semantically similar information. This assumption is exploited …