microDINOv3

A from-scratch implementation of the DINOv3 self-supervised Vision Transformer training pipeline, written in pure Python with no external ML framework dependencies.

Built with: Pure Python, Computer Vision

GitHub: RyanKim17920

  • Implements the complete student-teacher exponential moving average (EMA) setup described in the original paper
  • Supports multi-crop augmentation strategy and the centering mechanism
  • Zero external ML framework dependencies — entire training system in pure Python

February – April 2026