In the e-learning context, how much the learner is concentrated and engaged, or the learners’ efficiency, is essential for providing adaptive and flexible materials, timely suggestions, etc., which can lead to efficient learning. In this work, we explore to predict learners’ efficiency with a realistic configuration, in which we use a webcam or a laptop PC’s built-in camera. Specifically, we first provide a feasible definition of the learners’ efficiency, and based on this definition, we predict one’s efficiency from facial behavior. We predict the learners’ efficiency using various convolutional neural networks. Results are discussed using different evaluation metrics.