A curated list of resources on visual tokenizers (primarily for visual generation).
We welcome your contributions! If you find any mistakes or omissions, please let us know.
Contact: Jialong Wu
- ✨ [SimVQ] Zhu, Y., Li, B., Xin, Y., & Xu, L. Addressing Representation Collapse in Vector Quantized Models with One Linear Layer. arXiv, 2024.
- [BSQ] Zhao, Y., Xiong, Y., & Krähenbühl, P. Image and Video Tokenization with Binary Spherical Quantization (arXiv:2406.07548). arXiv.
- ✨ [FSQ] Mentzer, F., Minnen, D., Agustsson, E., & Tschannen, M. (2023). Finite Scalar Quantization: VQ-VAE Made Simple (arXiv:2309.15505). arXiv.
- ✨ [LFQ] Yu, L., Lezama, J., Gundavarapu, N. B., Versari, L., Sohn, K., Minnen, D., Cheng, Y., Gupta, A., Gu, X., Hauptmann, A. G., Gong, B., Yang, M.-H., Essa, I., Ross, D. A., & Jiang, L. (2023). Language Model Beats Diffusion—Tokenizer is Key to Visual Generation (arXiv:2310.05737). arXiv.
- [RQ] Lee, D., Kim, C., Kim, S., Cho, M., & Han, W.-S. Autoregressive Image Generation using Residual Quantization. CVPR, 2022.
- ✨ [VQ] Oord, A. van den, Vinyals, O., & Kavukcuoglu, K. Neural Discrete Representation Learning. NeurIPS, 2017.
- Luo, Z., Shi, F., Ge, Y., Yang, Y., Wang, L., & Shan, Y. Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation. arXiv, 2024.
- Tang, A., He, T., Guo, J., Cheng, X., Song, L., & Bian, J. VidTok: A Versatile and Open-Source Video Tokenizer. arXiv, 2024.
- Wang, X., Zhang, X., Luo, Z., Sun, Q., Cui, Y., Wang, J., Zhang, F., Wang, Y., Li, Z., Yu, Q., Zhao, Y., Ao, Y., Min, X., Li, T., Wu, B., Zhao, B., Zhang, B., Wang, L., Liu, G., … Wang, Z. Emu3: Next-Token Prediction is All You Need. arXiv, 2024.
- Weber, M., Yu, L., Yu, Q., Deng, X., Shen, X., Cremers, D., & Chen, L.-C. MaskBit: Embedding-free Image Generation via Bit Tokens. TMLR, 2024.
- NVIDIA/Cosmos-Tokenizer
- Hpcaitech/Open-Sora
- PKU-YuanGroup/Open-Sora-Plan
- THUDM/CogVideo
- CompVis/taming-transformers
- openai/consistencydecoder
- lucidrains/vector-quantize-pytorch
- Chan, D. M., Corona, R., Park, J., Cho, C. J., Bai, Y., & Darrell, T. Analyzing The Language of Visual Tokens. arXiv, 2024.
Give it a star 🌟 if you find this project useful.