Skip to content
View zhousheng97's full-sized avatar
🐢
Focusing
🐢
Focusing

Block or report zhousheng97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zhousheng97/README.md

Hi there 👋

  • I’m Sheng.
  • My focus is multimodal learning, especially VQA, and I’m currently exploring multimodal LLMs.
  • 💬 As an ENFJ-A, I thrive on meaningful collaboration and communication.
  • 📫 You can reach me at [email protected]—let’s connect!

Pinned Loading

  1. EgoTextVQA EgoTextVQA Public

    [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

    Python 47 1

  2. ViTXT-GQA ViTXT-GQA Public

    [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering

    Python 17

  3. GPIN GPIN Public

    [ACM TOMM'24] Graph Pooling Inference Network for Text-based VQA

    Python 3

  4. SSGN SSGN Public

    [IEEE TIP'23] Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

    Python 4