Ali Asgarov

I'm a first-year Ph.D. student at the department of computer science at Virginia Tech , Blacksburg, VA, United States. I work in computer vision and natural language processing, with a focus on multimodal learning and vision-language models. Before joining Virginia Tech, I completed an M.Sc. in Computer Science at George Washington University.

At Virginia Tech, I am working under the guidance of Dr. Chris Thomas on advancing video-language understanding with cross-modal reasoning. In the past, I have been fortunate to collaborate with Dr. Rebecca Hwa at George Washington University and Dr. Samir Rustamov on cross-modal information retrieval across image, text, video, and audio.

I am associated with the Sanghani Center for Artificial Intelligence and Data Analytics.

Email  /  CV  /  Scholar  /  Twitter  /  Github  /  Linkedin

profile photo

Research


My research interests include:

  • Reasoning in vision-language models.
  • Cross-modal retrieval across images, text, video, and audio.
  • Structured information extraction from multimodal data.
  • Knowledge representation for multimodal reasoning.

News


Dec. '24

One paper is under review for CVPR2025.

Oct. '24

Our paper ENTER was accepted as a spotlight paper at the Multimodal Reasoning Workshop, NeurIPS 2024.

Aug. '24

Started Ph.D. program in Computer Science at Virginia Tech.

Jul. '24

One paper is under review for COLING2025.

Apr. '24

Received Ph.D. offer from Virginia Tech in Computer Science.

Dec. '23

Graduated from George Washington University with a Master's degree in Computer Science.

Publications


PontTuset ENTER: Event Based Interpretable Reasoning for VideoQA.
Hammad Ayyubi, Junzhang Liu, Ali Asgarov , Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Chia-Wei Tang, Zhecan James Wang, Hani Alomari, Md. Atabuzzaman, Xudong Lin, Naveen Reddy, Shih-Fu Chang, Chris Thomas

We introduce ENTER, an interpretable system for video question answering that uses event-graph representations to integrate visual data with transparent, structured reasoning, achieving high performance and enhanced explainability on complex, long-range questions.

This paper was accepted and selected as a spotlight paper at the Multimodal Algorithmic Reasoning Workshop, NeurIPS 2024

Please see my Google Scholar page for an up-to-date list.

Education


Virginia Tech Virginia Tech
08.2024 - Present
PhD in Computer Science
GPA: 4.0 / 4.0
Advisor: Dr. Chris Thomas
George Washington University George Washington University
08.2022 - 12.2023
MSc. in Computer Science
GPA: 3.76 / 4.00
Advisors: Dr. Rebecca Hwa & Dr. Samir Rustamov

Honors & Awards


  • 2022 - State Program on Education of Azerbaijani Youth Abroad Scholarship
  • 2022 - 3x 1st Place, National AI Competition in ML Applications (2020-2022).
  • 2020 - Ranked among the top 15 teams at the III World Robot Olympiad in Gyor, Hungary
  • 2019 - Golden Medal at the III World Robot Olympiad in Azerbaijan
  • 2018 - Presidential Scholarship for Exceptional Academic Performance
  • 2017 - Scored 690 out of 700 on the University Entrance Exam
  • 2017 - High School Graduate with a Golden Medal, placing in the top 0.1% among 90,000 graduating students

Teaching


  • CS3114: - Data Structures & Algorithms, Spring 2025, Virginia Tech
  • CS5644: - Machine Learning with Big Data, Fall 2024, Virginia Tech

The source for this website is from here.