Eagle SPUR Award winner.
Computer Vision and Multimodal AI Researcher
Ashim Dahal
I'm an undergraduate researcher at The University of Southern Mississippi, working with Dr. Nick Rahimi in the Cyber Innovations Lab on computer vision, 3D/4D vision, and multimodal systems. I have also worked in collaboration with Bikramjit Banerjee and Rabab Abdelfattah.
I like building systems that make visual reasoning cheaper, clearer, and easier to inspect: video QA, Gaussian Splatting, segmentation, image-caption evaluation, and model analysis.
Outside research, I play bansuri, read literature, organize campus developer events, and occasionally write essays fast enough to win trouble.
Recent Dispatches
News
Selected updates in life and research, newest first.
Presenting 1 workshop paper in CVPR
Funded for a Google I/O event.
DCUR Undergraduate Symposium Best Paper on Computational Approach winner.
US Semi-Finalist, Hult Prize.
Finalist, International Researcher of the year, USM.
Awarded a $5,500 DCUR summer research grant for Gaussian Splatting.
Drafted objectives for a $51,000 NASA EPSCoR-funded project led by Dr. Rahimi.
Became Lead Organizer of Google Developer Groups (GDG) On Campus at USM.
Became Research Liaison for the School of Computing Sciences and Computer Engineering Student Ambassadors.
Received a $500 Checkpoint award to build an XR application for dyslexia.
Won Best Local Project and Global Nomination at NASA Space Apps Challenge.
Selected Publications
Research
Selected work, ordered newest first. Rows with the acid side mark are computer vision and multimodal highlights; thumbnails open larger previews.
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
Making video question answering possible on large context scenes with minimal input tokens.
Adaptive Anchor Policies for Efficient 4D Gaussian Streaming
We introduce an RL-based policy that selects anchor budgets and informative Gaussians for efficient 4D Gaussian streaming, improving the quality-runtime trade-off.
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
A short paper on how CLIP representations shift under different augmentation strengths and types.
Multi-Lingual Cyber Threat Detection in Tweets/X Using ML, DL, and LLM: A Comparative Analysis
This paper proposes a new multilingual dataset and methodology for detecting cyber threats spread through posts on X.
Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery
A comparison of ViTs and CNNs on iSAID segmentation, including a loss function that helps a smaller CNN compete with a much larger ViT.
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
A robust framework to evaluate image-text pairs under perceptual, semantic, pragmatic, and distributional alignment.
EEG-to-Text Translation: A Model for Deciphering Human Brain Activity
We propose a new model, R1 Translator, which aims to improve the performance of EEG-to-text decoding.
Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification
A study of convolutional Kolmogorov-Arnold networks across ImageNet, MNIST, and tabular classification settings.
Analysis of Zero Day Attack Detection Using MLP and XAI
Analyzing zero-day cyber attacks with MLP and SHAP using a weighted loss that prevents the model from overfitting to the majority class.
Predicting Handwritten Devanagari Characters using modified-LeNet Model Architecture
We fine-tuned a LeNet-style CNN to perform OCR on handwritten Devanagari characters, which are more structurally complex than Roman letters.
Would you own a ROBOT?: A detailed research on public response to the nooks and crannies of owning a robot.
A survey of 300+ individuals about privacy, labor, mainstream adoption, and the features people expect before robots become household companions.
Effectiveness of Native Language for Conversational Bots
We created Jelly, the first Romanized Nepali chatbot using BlenderBot, and studied the efficacy of native language in mental health conversational systems.
Do you "Go big or go home" with Neural Networks?
Using a self-curated chicken recipe dataset, I trained a GRU-based generator and studied how different preprocessing choices changed the generated recipes.