Projects | Susim Mukul Roy

MagFormer:- Parameter-Efficient Modular Approach for Person Re-Identification

Thu, 01 May 2025 00:00:00 +0000

Summary

MagFormer, a modular transformer model, improves person re-identification by integrating image-to-image interactions during training to reduce noisy or unstable representations.
It introduces three components(1)MALAA: Approximates dense relations using magnitude-aware landmarks. (2)RNS: Focuses attention on contextually relevant samples by sparsifying it. (3)DiffAttn: Cancels residual noise to boost identity consistency.
MagFormer is scalable, interpretable, and consistently outperforms baselines.

A Sequential Memory Preserving Approach for Few-Shot Image Classification

Sat, 25 Nov 2023 00:00:00 +0000

Summary

we model the meta-training set as the combination of all the individual task specific training sets instead of a multi-task setting.
We understand the cross-domain connection stored in the feature extractor using our Memory Augmented Propagation network which stores the information from the previous layers of our backbone.
We apply self-attention on each output feature map of the layers of our backbone in a hierarchical manner to find co-dependency.
We test on the popular CIFAR-FS and miniImageNet datasets and find that our results are at par and sometimes even better than convolutional based SOTA approaches like MetaOptNet.

Training with Continuous Sensor System Parameters and Irregular Data

Tue, 01 Aug 2023 00:00:00 +0000

Summary

Modelling of noise for CT simulations(e.g. Poisson-noise)by considering spatially non-uniform intensity distributions and in limited dosage environments(ULDCT).
Create a pattern so that the intensity distribution becomes learnable.
Apply an appropriate reconstruction technique which takes care of artifacts or likely noise patterns and generates back the original image which has high metric values when compared with ssim etc.

Visual Motion Analysis from Images and Videos

Tue, 01 Aug 2023 00:00:00 +0000

Summary

The main objective was to work towards developing a software to find the optimal time to spray fungicides to prevent the spread of FHB disease among the common vegetation.
Wheat and Canola flower data collection and annotation using SCVAT and storing them as PASCAL-VOC or YOLO format.
Improving Yolov8 and Efficient-Det(pytorch version) algorithms to work on the created dataset by tuning hyperparameters in the former case and incorporating an attention module(e.g. CBAM) in the latter case.
Deployment of the trained models on the demo Website for usage by farmers and similar people for FHB prevention.

Deep Video Summarization

Sat, 20 May 2023 00:00:00 +0000

Summary

Given an input video data, we find the most informative slides and summarize the content in the video in the form of natural language.
We first encode the images using the CLIP model and then pass it through a U-Net inspired transformer encoder-decoder architecture with skip connections in order to score each frame.
Finally, the frame-level scores to shot-level scores and finally use dynamic programming (0/1 knapsack) to decide which shots to pick as keyshots.

Deep Q Learning

Tue, 25 Apr 2023 00:00:00 +0000

Summary

The objective was to train a RL agent to play the world’s hardest game which is essentially to reach a goal point among arbitratrily moving obstacles.
Created a simple MLP which projects from a n-dimensional state space to a m-dimensional action state which the RL agent should take.
Created a custom reward function which penalizes the agent for being idle or getting stuck by an obstacle and rewards it for reaching the goal.

Federated Learning

Thu, 20 Apr 2023 00:00:00 +0000

Summary

- Implemented the FedAvg algorithm from scratch on three popular datasets, namely MNIST, Coloured-MNIST and SVHN.
- Compared them with the case when they were trained and tested in a non-federated manner.

Feedback Approach to Foster Motion Information in FPAR

Sun, 02 Apr 2023 00:00:00 +0000

Summary

Improved the existing SparNet architecture by encorporating a feedback mechanism which basically embeds the finer information from the later layers of Resnet in it’s earlier layers.
The Motion Prediction Block used the knowledge from the Action Recognition Block to incorporate it back into that in it’s first layer while taking care of the dimensions.
Deployed the model on a webpage using flask framework where an user can input a video or an image and gets returned a heat map indicating the location of the action.

Multimodal Art Database

Sun, 30 Oct 2022 00:00:00 +0000

Summary

- A user had to input an artistic image and our software would detect the top k artists who could have possibly made the painting and store them in a local MariaDB database.
- To find the similarity between input painting and the database, we used the CLIP model to first finetune on our art database and then using the cosine similarity between the stored embeddings and our input feature embedding.
- The most likely images got stored in a normalized SQL tables with foreign keys linking between the image id table and the artist id table.

UAV Guided UGV Movement

Sun, 20 Feb 2022 00:00:00 +0000

Summary

- An autonomous drone had to navigate in hilly environment and store it's path while moving towards the goal point following which a UGV would follow the path made by the UAV and reach the endpoint.
- Used OpenCV and segmentation techniques along with ROS to segment out road from surroundings using depth images.
- Analyzed the drone movement using Gazebo and RVIZ in order to accurately find an optimal path for the UGV.
- Sent messages to the UGV using ROSBags for it's movement post UAV arrival at goal.

International Micromouse Challenge

Tue, 30 Nov 2021 00:00:00 +0000

Summary

- The objective was to make the micromouse autnomously reach a goal point after navigating through a maze.
- Implemented the wall-following algorithm so that the micromouse keeps following a single wall using ROS.

Playlist Converter

Fri, 20 Aug 2021 00:00:00 +0000

Summary

Created a website with VueJS and tailwindcss where an user can input an Apple playlist and a Spotify playlist will be created in their account which can be private/public as per their choice.
Used an Apple token to get songs from the user’s playlist and thereby sent GET requests to Spotify to fetch the songs and finally sending a POST request to make the playlist.
The algorithm we used to get the songs from Spotify is to add the first song which appears on searching with the corresponding name in the Apple playlist.

Face Recognition using Clustering Algorithms.

Thu, 27 May 2021 00:00:00 +0000

Summary

- Deploy a web model where an user can input an image and gets returned the most similar looking images to his face from a database.
- Using clustering algorithms such as Resnet with PCA for feature dimensionality reduction, find the most similar faces with respect to the user.

Real Time Face Mask Detection

Tue, 20 Apr 2021 00:00:00 +0000

Summary

Used VGG-19 to train on a publicly available kaggle dataset on masked/non-masked faces and obtained an accuracy of 99.8%.
Used the haarCascades to detect the face in a real-time video and then apply our model to check whether the user has worn a mask or not.

Autonomous Drone Obstacle Avoidance

Tue, 30 Mar 2021 00:00:00 +0000

Summary

- An autonomous drone had to navigate in varying environment, starting from taking off and then detecting an Aruco Marker and landing on it.
- Worked on detection of aruco marker and it's take-off and landing using ardupilot.
- Worked on converting pointcloud data to laserscan for swift movement of the drone.

Pokemon Dashboard

Wed, 27 Apr 2016 00:00:00 +0000

Summary

A single webpage which shows different features of different pokemons and compares them with each other through multiple visualizations using interactive and informative graphics.
Used interactive graphics like bubble plot along with radar plots and parallel graph plots to show distinctions between pokemons.