Prateek Pani - Portfolio

Machine Learning Developer

Alumni @IIIT-Hyderabad, Machine Learning Lab.

Skilled in Multi-modal AI Systems, GenAI, Computer Vision, Language Models and Model Optimization with a strong dedication to applying best engineering practices in both research and industry environments.

Solving in Domain Adaptation and Batch Active Learning on Computer Vision problems.

I started my career working as a backend developer in Qualcomm, but transitioned to a career aligned more towards Deep Learning.

Interned at Nykaa in winter 2023-24, where I worked on multi-modal AI, utilizing text-image pairs from Nykaa's catalog to enhance the Shop-The-Look recommendations using CLIP and SegFormer.

Developed a novel, device-agnostic image classifier based on Unsupervised Domain Adaptation, for childhood retinal disorders with diagnostic accuracy comparable to experienced clinicians. This was recognized with multiple national and international awards and funded by the UK Medical Research Council's Confidence in Concept award.

Master's thesis titled: "Application of Domain Adaptation and Active Learning in Ophthalmology and Quality Testing of Food Grains," with real-world implementations in medical labs and agricultural settings

Feel free to reach out, for projects related to AI.

Paligemma-MultiModal-System

Contrastive Vision Model
Gemma Language Model
Linear Projector

Paligemma Multi-Modal System: A Comprehensive implementation from Scratch, Emulating Multi-Modal Architectures

Sensor Fusion and 3D Object Detection

Familiarity with Point Cloud data and 3D Object Detection, including Sensor Fusion Techniques.

Sensor Fusion and 3D Object Detection

Quantisation, Compression and Knowledge Distillation on Semantic Segmentation

Real-time Deployment Solutions to Optimising the Inference time and Size of the model.

Quantisation, Compression and Knowledge Distillation

Semantic Segmentation(coding models from scratch): SegFormer

Transformer-based Models on Vision.

Visualising Attention Maps

Implementing/Coding the components of DeepSeek and documenting

Multi-Latent Attention
Multi-token Prediction
Mixture-Of-Experts
Rotation Positional Encodings
Quantisation

DeepSeek Components

Custom GPU Kernels Using Triton and benchmarking

Custom softmax with shared memory
Fast Matrix Multiplication from scratch
Dropout kernel with random mask generation
LayerNorm implementation and fusion
Efficient attention kernel with block-wise processing
End-to-end fused CE loss with memory profiling

GPU Kernel Programming with Triton

Distributed LLM Fine-Tuning

DeepSpeed
Distributed Data Parallel (DDP)
Fully Sharded Data Parallel (FSDP)
LoRA (Low-Rank Adaptation)
Quantization

This project addresses these challenges by demonstrating various distributed training paradigms and VRAM optimization strategies.

Fine-tuned an open-source LLM using GRPO with reward models tailored for math reasoning and structured answer generation.

Match Format Exactly
Match Format Approx
Check Answers
Check Numbers

Qwen3-4B GRPO Fine-Tuning for Math Reasoning