Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still light field images or are excessively expensive for normal users to capture the light field video. To tackle this problem, we propose a deep learning-based method for synthesizing a light field video from a monocular video. We propose a new synthetic light field video dataset that renders photorealistic scenes using Unreal Engine because no light field video dataset is available. The proposed deep learning framework synthesizes the light field video with a full set (9x9) of sub-aperture images from a normal monocular video. The proposed network consists of three sub-networks, namely, feature extraction, 5D light field video synthesis, and temporal consistency refinement. Experimental results show that our model can successfully synthesize the light field video for synthetic and real scenes and outperforms the previous frame-by-frame method quantitatively and qualitatively.