Depth video enhancement is an essential preprocessing step for various 3D applications. Despite extensive studies of spatial enhancement, effective temporal enhancement that both strengthens temporal consistency and keeps correct depth variation needs further research. In this paper, we propose a novel method to enhance the depth video by blending raw depth frame with the estimated intrinsic static structure, which defines static structure of captured scene and is estimated iteratively by a probabilistic generative model with sequentially incoming depth frames. Our experimental results show that the proposed method is effective both in static and dynamic scene and is compatible with various kinds of depth videos. We will demonstrate that superior performance can be achieved in comparison with existing temporal enhancement approaches.