SHI Collaboration Profiles

Profile pages for Sustainable Horizons Institute SRP 2025-2026 Student of Faculty


Sourajit Saha

Sourajit Saha

he/him/his

University of Maryland, Baltimore County

Computer Science and Electrical Engineering

Biography

I am a Ph.D. student in Computer Science at the University of Maryland, Baltimore County (UMBC), where I work under the guidance of Dr. Tejas Gokhale in the Cognitive Vision Group. I received my Masters in Computer Science from UMBC and Bachelor of Science from BRAC University in Bangladesh. My research focuses on advancing the capabilities of computer vision systems by addressing challenges in interactive video retrieval, visual reasoning, and video understanding. In the area of interactive video retrieval, search, and understanding, I explore methods that integrate vision-language models (VLMs), scene-graph reasoning, dialogue-driven interaction and lessen the burden of human annotation. My goal is to make video retrieval more interactive and semantically aligned with human intent. My work on visual reasoning investigates spatial understanding, counterfactual inference, and visual editing, with the broader aim of improving interpretability, adaptability in AI models. This line of research seeks to push models beyond pattern recognition toward deeper semantic understanding. In Summer Camp for Applied Language Exploration (SCALE) 2024 at Johns Hopkins University, I have also worked in semantic video frame sampling, caption based video event localization to enhance event retrieval in multilingual video.

Academic Information

Status: PhD Student

Year in Program: 5th

Major/Specialty: Computer Science (Computer Vision, Machine Learning)

Degrees: 2021โ€“Ongoing PhD, Computer Science, University of Maryland, Baltimore County, Advisor: Tejas Gokhale. 2021โ€“2023 Masters of Science, Computer Science, University of Maryland, Baltimore County, Academic Supervisor: Tim Oates, David Chapman. 2013โ€“2017 Bachelor of Science, Computer Science, BRAC University, Advisor: Suraiya Tairin

Research Areas

Computer Science; Machine Learning/AI; other

Research Interests

My research lies at the intersection of computer vision, multimodal reasoning, and video understanding, with a focus on developing interactive and human-centered video retrieval systems. As a Ph.D. student at UMBC in the Cognitive Vision Group under Dr. Tejas Gokhale, I explore methods that combine vision-language models, scene-graph reasoning, and dialogue-driven interaction to reduce annotation burden and make retrieval more semantically aligned with human intent. In interactive video retrieval, I am interested in designing systems that allow users to refine search through natural, context-aware interaction. In visual reasoning, my work investigates spatial understanding, counterfactual inference, and visual editing, with the broader goal of pushing models beyond pattern recognition toward deeper semantic understanding. These directions aim to improve both the interpretability and adaptability of AI systems. At Summer Camp for Applied Language Exploration (SCALE) 2024, I contributed to semantic video frame sampling and caption-based event localization to enhance event retrieval in multilingual videos. This experience sharpened my focus on bridging retrieval and reasoning for diverse, real-world applications. Moving forward, my dissertation will expand on these foundations, with an emphasis on interactive, scalable, and explainable systems. Ultimately, I aim to advance multimodal AI that is interpretable, inclusive, and impactful across disciplines.

Topical Areas

Applied Computer Science; Artificial Intelligence and Intelligent Systems; Computer Science; Electrical, Electronic, and Information Engineering; Informatics, Analytics and Information Science; Visualization and Human-Computer Systems

Relevant Coursework

Computer Vision, Image Processing, Data Visualization, Natural Language Processing, Machine Learning, Pattern Recognition, Artificial Intelligence, Advanced Artificial Intelligence, Optimization Algorithms, Design and Analysis of Algorithms, Advanced Computer Architecture

Publications & Research Projects

Saha, Shaswati, Sourajit Saha, Manas Gaur, and Tejas Gokhale. "Side Effects of Erasing Concepts from Diffusion Models." arXiv preprint arXiv:2508.15124 (2025), EMNLP 2025 Findings. Saha, Sourajit, and Tejas Gokhale. "Improving shift invariance in convolutional neural networks with translation invariant polyphase sampling." In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 620-629. IEEE, 2025. Saha, Sourajit, Shaswati Saha, Md Osman Gani, Tim Oates, and David Chapman. "RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget (Student Abstract)." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 13, pp. 16314-16315. 2023. Saha, Sourajit, and Yaacov Yesha. "Pairwise meta learning pipeline: classifying COVID-19 abnormalities on chest radio-graphs." SPIE Medical Imaging 2022: Computer-Aided Diagnosis; PC1203302 (2022) Proceedings Volume PC12033, Medical Imaging 2022: Computer-Aided Diagnosis; PC1203302 (2022) (2022). Kamran, Sharif Amit, Sourajit Saha, Ali Shihab Sabbir, and Alireza Tavakkoli. "Optic-net: A novel convolutional neural network for diagnosis of retinal diseases from optical tomography images." In 2019 18th IEEE international conference on machine learning and applications (ICMLA), pp. 964-971. IEEE, 2019.

Faculty Mentor

Tejas Gokhale