I am a final year PhD student at UNC Chapel Hill, working on vision and language. My advisors are Tamara L. Berg and Mohit Bansal.

I earned my bachelor's degree in computer science from Yingcai Honors College, University of Electronic Science and Technology of China (UESTC) in 2017. I worked at Nanyang Technological University with Sinno Jialin Pan, and University of Manitoba with Yang Wang.

Email: jielei [at] cs.unc.edu
Office: SN 260, 201 S. Columbia St. Chapel Hill, NC 27599-3175


Publications & Preprints

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei, Tamara L. Berg, Mohit Bansal
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan*, Jie Lei*, Thomas Wolf, Mohit Bansal
arXiv 2021 [PDF] [Code] Star
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li*, Jie Lei*, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara L. Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu
NeurIPS 2021 - Datasets and Benchmarks Track [PDF] [Code] [Leaderboard & Challenge]
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu
ICCV 2021 Oral [PDF] [Dataset]
mTVR: Multilingual Moment Retrieval in Videos
Jie Lei, Tamara L. Berg, Mohit Bansal
ACL 2021 [PDF] [Code]
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal
ICML 2021 [PDF] [Code] Star
Improved Pre-Training from Noisy Instructional Videos via Dense Captions and Entropy Minimization
Zineng Tang*, Jie Lei*, Mohit Bansal
NAACL 2021 [PDF] [Code] Star
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei*, Linjie Li*, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu
CVPR 2021 Best Student Paper Honorable Mention Oral [PDF] [Code] Star
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
EMNLP 2020 [PDF] [VLEP Dataset]
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
TVQA: Localized, Compositional Video Question Answering
Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg
EMNLP 2018 Oral [PDF] [Slides] [Dataset] [Code] Star
image classification
Weakly Supervised Image Classification with Coarse and Fine Labels
Jie Lei, Zhenyu Guo and Yang Wang
CRV 2017 [PDF] [Code] Star


AnimeGAN: Create Anime Face using Generative Adversarial Networks
Jie Lei
A simple GAN model that could automatically generate anime girl faces.