As videos on the internet become more common, we need to understand the contents of the videos for recognizing important human actions or highlights. Moreover, videos with texts which depict the key points in the videos have encouraged research on video grounding (Gao, Jiyang, et al, 2017). Video grounding is an important task with many applications in video surveillance. Video grounding aims to find a grounding location, which is a video segment semantically corresponding to a query sentence in a long and untrimmed video. Recently, weakly supervised video grounding (Zheng, Minghang et al., 2022) has drawn more attention because it requires little annotation cost. In weakly supervised video grounding, the ground-truth grounding location is not available for training, and only matched pairs of video and query sentence are available. In this paper, we propose Trainable Positional Embedding (TPE)-based contrastive proposal learning for weakly supervised video grounding. The previous method for contrastive proposal learning (Zheng, Minghang et al., 2022) leverages several Gaussian masks which can be positive proposals for finding grounding locations. However, the predefined Sinusoidal positional embedding is used in that method, which is not efficient because it ignores varying information of word positions in the query sentence. To solve this problem, we leverage trainable positional embedding for contrastive proposal learning. We verify that the proposed method improves performance through quantitative experiments, outperforming the previous state-of-the-art methods.