-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
GSoC_2026
Example use of computer vision:
Parent of this page Last year's idea page IDEAS LIST below
| Contributor | Title | Mentors | Notes |
|---|---|---|---|
| Date (2026) | Description | Comment |
|---|---|---|
| Jan 19 | Organization Applications Open | 👍 |
| Feb 3 | Org Application Deadline | 👍 |
| Feb 19 | List of Accepted Mentoring Organizations Published | |
| Mar 16 | Contributor Proposals Open | |
| Mar 31 | Contributor Proposal Deadline | |
| Apr 21 | Contributor Ranking Deadline | |
| Apr 30 | Accepted Projects Announced | |
| May 1-24 | Bonding | |
| May 25 | Coding Starts | |
| Jul 6-10 | Midterm Evals | |
| Aug 24-31 | Final Evals | |
| Nov 9 | Extended Mentor Evals |
GSoC Timeline UTC time UTC time converter
- GSoC Home Page
- GSoC OpenCV Project Page Feb 18
- Project Discussion List Mentor List
- OpenCV Home Site OpenCV Wiki OpenCV Forum Q&A Developer meeting notes
- How to do a pull request/How to Contribute Code
- Source Code: GitHub/opencv and GitHub/opencv_contrib
- IRC Channel: #opencv Libera.chat, Slack: https://open-cv.slack.com
- TBD: OpenCV GSoC Dashboard
You are allowed to use the AI coding tool of your choice. You must submit the result as pull requests and must be able to answer and fix issues that the pull master requests. You are expected to respond to feedback or fixes w/in a day of any issues being posted on your submission.
- End-to-End AI Feature Extraction and LightGlue Matching Pipeline
- Advanced Vision Language Model (VLM) Inference via ONNX Runtime GenAI
- OpenCV Documentation Redesign and Modernization (PyTorch Style)
- Comprehensive Validation and Sample Generation for Modern DNN Models
- Efficient Image Augmentation Module for OpenCV
- Lightweight 3D Viewer + PLY/Cloud I/O for SLAM Debugging
- Automated Code Hygiene & Security Hardening for OpenCV
- Non-CPU Hardware Abstraction Layer (HAL) plugins for OpenCV 5
- ONNX QDQ Inference Support in OpenCV DNN
- Efficient large and multi-channel images support
- Chessboard Charuco board and circles grids detectors tuning
- AI feature detectors
- LightGlue Matcher
- Basic SLAM
- Multi-camera calibration part 4
- Multi-camera calibration test part 2
- Multi-camera calibration toolbox
- Quantized models for OpenCV Model Zoo
- LLM Tokenizers part 2
- Better VLM support in OpenCV
- Dynamic CUDA support in DNN
- RISC-V Optimizations
- Computational photography algorithms for better image quality
- Integrate JuMarker ArUco into OpenCV
- QR/Barcode/ArUco detector
- Synchronized multi-camera video recorder
- Improve OpenCV security
- Bubbaloop simulation integration and improvements MuJoCo
- Bubbaloop real-robot integration
- GPU acceleration for kornia-tensor and kornia-imgproc
- Expand kornia-vlm with ONNX and TensorRT backends
- kornia-SLAM baseline Visual-Inertial Odometry VIO
All work is in C++ unless otherwise noted.
Ideas: back to Index
-
- Description: Modern computer vision relies heavily on deep learning-based feature detection, description, and matching. This project aims to build a complete, state-of-the-art neural feature matching pipeline natively inside OpenCV. The first phase involves incorporating modern, highly efficient AI feature detectors and descriptors (such as ALIKED, XFeat, and the open version of SuperPoint) into the OpenCV Model Zoo. This requires converting them into ONNX format so they can run efficiently via OpenCV's DNN module. Once the extractors are integrated, the second phase involves building a matching framework using LightGlue—a state-of-the-art deep matcher. The goal is to connect the newly integrated models (specifically ALIKED, but extensible to the others) with LightGlue to achieve subpixel-accurate feature matching between image pairs directly within OpenCV's ecosystem.
-
Expected Outcomes:
- Model Porting: Successfully export ALIKED, XFeat, and open SuperPoint to ONNX formats and integrate them into the OpenCV Model Zoo.
- LightGlue Integration: Develop a LightGlue matching module that ingests the features/descriptors generated by the DNN module (e.g., ALIKED) to perform highly accurate feature matching.
- Subpixel Accuracy: Ensure the matching pipeline supports subpixel-accurate coordinate predictions.
- Tests & Documentation: Write comprehensive unit tests and user documentation for both the AI feature detectors and the LightGlue matcher.
- Demo & Showcase: Create a polished C++/Python demo application and a YouTube video demonstrating the end-to-end pipeline (from image input to matched keypoint visualization) working efficiently.
- Skills Required: Strong C++ + Python, Deep Learning/ONNX model conversion, familiarity with the OpenCV DNN module, and computer vision (feature detection/matching).
- Possible Mentor: Gary Bradski, Abhishek Gola, Gursimar Singh
- Difficulty: Hard 350 hrs
-
- Description: As Vision Language Models (VLMs) like PaliGemma, LLaVA, and Florence-2 become standard tools in computer vision, running them efficiently requires specialized inference techniques that go beyond standard forward passes. Autoregressive text generation requires complex state management and text processing. This project aims to drastically improve OpenCV's VLM capabilities by integrating the onnxruntime-genai library. The goal is to build a high-performance backend that natively supports essential generative features: robust text tokenization (handling modern formats like BPE and WordPiece directly from tokenizer.json), KV-cache management (retaining key and value states to exponentially speed up token-by-token generation), and LoRA (Low-Rank Adaptation) adapters (allowing users to hot-swap fine-tuned behaviors without reloading massive base models into memory).
-
Expected Outcomes:
- onnxruntime-genai Integration: Create a bridge between OpenCV and the ONNX Runtime GenAI C++ API to handle the generation loop for multimodal models.
- Demo & Validation: Create an end-to-end C++ and Python tutorial demonstrating a VLM answering questions about an image.
- Skills Required: Mastery of C++ and Python, deep understanding of LLM/Transformer architectures (specifically autoregressive generation), familiarity with ONNX, and memory management optimization.
- Possible Mentor: Vadim Pisarevsky, Abhishek Gola
- Difficulty: Medium 175 hrs
- Assigned Student: Ayush Kumar (Bigvision)
-
- Description: As computer vision and AI ecosystems evolve, developer expectations for documentation have shifted toward the clean, highly readable, and interactive standards set by frameworks like PyTorch and Hugging Face. OpenCV’s current documentation, heavily reliant on traditional Doxygen HTML outputs, contains a wealth of information but can be difficult to navigate, especially for Python users and newcomers. This project aims to completely redesign and modernize the OpenCV documentation pipeline. This overhaul will unify the C++ and Python API references, present tutorials in a clean notebook/gallery format, and dramatically improve searchability and mobile responsiveness.
- Possible Mentor: Shiqi Yu
- Difficulty: Medium 175 hrs
-
- Description: The OpenCV DNN module is undergoing rapid evolution, adding support for new operations, and precision formats (like ONNX QDQ and INT8). To guarantee the robustness of the new OpenCV DNN engine, we need to systematically validate it against today's most popular and complex neural networks (e.g., YOLOv8/v9/v10, RT-DETR, Segment Anything (SAM), modern pose estimation, and facial recognition models). This project focuses on building out a robust testing pipeline to verify accuracy and performance parity with native frameworks (like PyTorch or ONNXRuntime). More importantly, the contributor will identify unsupported layers or bugs, create reproducible bug reports and fix them, and build clean, well-documented C++ and Python sample applications that demonstrate how to easily run these modern networks using pure OpenCV.
- Skills Required: Strong proficiency in Python and C++, deep understanding of neural network architectures and ONNX graph structures, experience with software testing methodologies, and an eye for writing clean, exemplary code.
- Mentor: Abhishek Gola, Vadim Pisarevsky
- Assigned Student: Naresh Raja (Bigvision)
-
- Description: OpenCV’s own roadmap discussions explicitly call out image augmentation as a key modern use-case of imgproc, and point to an existing draft augmentation framework in opencv_contrib https://github.com/opencv/opencv/issues/25012 as a starting point. That draft is the (still-open) GSoC’22 PR https://github.com/opencv/opencv_contrib/pull/3335, which already implements several augmentation ops and reports performance comparisons vs torchvision.transforms, including support for tasks that require transforming labels (e.g., detection/segmentation). This project would turn the draft into a production-grade contrib module (and optionally add a couple of missing low-level kernels in core/imgproc if needed for speed/batching), with a stable API, tests, and reproducible benchmarks.
-
Expected Outcomes:
- New/updated contrib module (e.g., modules/aug) providing common geometric + photometric transforms, composition/policies, deterministic RNG seeding
- Task-aware transforms (image + boxes/masks) for detection/segmentation
- Batch-friendly API (aligning with OpenCV 5.x direction for batch processing where possible) https://github.com/opencv/opencv/issues/25012
- Python bindings + samples (classification + detection-style pipelines)
- Benchmark scripts comparing throughput/latency vs common Python stacks (e.g., torchvision), plus quality sanity checks (pixel-level invariants / IoU preservation for boxes, etc.)
- If required: targeted core/imgproc patches for missing primitives or performance-critical batching hooks (not refactors—new capability).
- Skills Required: Strong C++ + Python, SIMD/parallelization basics, image processing, dataset annotation handling (bboxes/masks), benchmarking & test design.
- Mentor: Gursimar Singh, Alexander Smorkalov, Abhishek Gola
- Difficulty: Medium–Hard 350 hrs
-
- Description: OpenCV already has the contrib viz module (VTK-based) with cloud/mesh I/O (PLY/XYZ/OBJ, etc.) https://docs.opencv.org/4.x/d1/d19/group__viz.html. Separately, OpenCV introduced a lightweight OpenGL-based 3D viewer inside highgui (cv::viz3d) specifically to avoid the heavy VTK dependency, and it supports rendering point clouds, triangle meshes, and camera trajectories. Meanwhile, the SLAM work-in-progress PR adds a compact SLAM module with a simple visualizer/top-down trajectory and trajectory saving https://github.com/opencv/opencv_contrib/pull/4043. This project would connect these pieces by adding practical SLAM-oriented 3D visualization + lightweight file I/O on top of viz3d, so users can inspect maps/trajectories without VTK and with minimal friction. Additional reference: https://github.com/opencv/opencv/pull/20371
-
Expected Outcomes:
- Stream-in camera poses/keyframes and draw frustums + trajectory
- Stream-in map points with coloring (age/track count/depth) and simple filtering (subsample, voxel grid)
- Export “current map” to PLY (and optionally load a PLY to view) using a minimal dependency implementation
- A reference opencv_contrib/modules/slam sample app: run VO/SLAM → live 3D viewer + saved trajectory/map artifacts
- Tests for I/O correctness (PLY read/write roundtrip; numeric tolerances), and performance notes for large point clouds
- Documentation that clarifies when to use viz (VTK) vs viz3d (lightweight)
- Skills Required: C++ (OpenGL-ish familiarity helpful), 3D math (poses/frames), point cloud basics, file formats (PLY), test/benchmark discipline.
- Mentor: Abhishek Gola, Gary Bradski
- Difficulty: Medium 175 hrs
-
- Description: Add mandatory PR gating for new static-analysis and memory/UB regressions by integrating Clang-Tidy and sanitizer builds into GitHub Actions. The key challenge is “legacy noise”: implement a baseline mechanism that reports only new issues (e.g., diff-based linting + targeted suppressions), while enabling Address/UB sanitization runs without being derailed by third-party code.
-
Expected Outcomes:
- A repo-root .clang-tidy tuned for OpenCV (focus on bugprone-, performance-, modernize-*)
- CMake toggles and local tooling support
- CI “Linter” job that flags clang-tidy regressions on changed lines
- CI “Sanitizer Build” job using AddressSanitizer + UndefinedBehaviorSanitizer
- A short developer guide for local usage and suppressions
- Skills Required: C++, CMake, Python, CI/YAML, LLVM/Clang tooling, familiarity with GoogleTest.
- Mentor: Alexander Smorkalov, Gursimar Singh
- Difficulty: Medium 175 hrs
-
- Description: OpenCV 5 plans a non-CPU HAL to stop the “backend explosion” by extending UMat and adding a general dispatcher that can offload whole ops (e.g., gemm, resize) to a backend if available. The design is tracked in https://github.com/opencv/opencv/issues/25025.
-
Expected Outcomes:
- Draft non-CPU HAL ABI + headers
- Core infrastructure for device-aware UMat allocation and backend selection
- Dispatcher integration for ≥3 ops (gemm, resize, one filter)
- Reference backend plugin with tests
- Developer documentation
- Skills Required: Advanced C++, ABI design, dynamic linking, CMake, performance engineering, accelerator runtimes.
- Mentor: Alexander Smorkalov, Vadim Pisarevsky, Gursimar Singh, Abhishek Gola
- Difficulty: Medium–Hard 350 hrs
-
- Description: Many quantized ONNX models today are exported in tensor-oriented QDQ form using QuantizeLinear / DequantizeLinear nodes. OpenCV DNN has partial INT8 support, but QDQ graphs are not yet consistently importable or runnable end-to-end. This project adds native QDQ-model inference to OpenCV DNN 5.x.
-
Expected Outcomes:
- Importer support for QuantizeLinear / DequantizeLinear across opsets
- QDQ execution path that avoids unnecessary FP32 fallbacks
- INT8 performance enablement using OpenCV Universal Intrinsics
- Tests using QDQ ONNX models and reproducible benchmarks
- Skills Required: Strong C++, ONNX graphs, quantization fundamentals, SIMD basics, Python tooling.
- Mentor: Vadim Pisarevsky, Abhishek Gola, Gursimar Singh
- Difficulty: Medium–Hard 175 hrs
-
- Description: Current OpenCV accuracy and performance tests usually cover images up to FullHD and in some rare cases 4k. We'd like to extend accuracy and performance test coverage with 8k cases and m.b. more and also tune function performance for the large scale inputs. Also we have more and more demand for efficient multi-channel images support (more than 4 channels) and some other options associated with large images, e.g. large kernel size filtering.
-
Expected Outcomes:
- Add performance and accuracy tests for FullHD images.
- Add performance and accuracy tests for 4K images.
- Add performance and accuracy tests for 8K images.
- Tune function performance for lage scale inputs
- Add efficient >4 channel image support
- Skills Required: Strong C++, quantization fundamentals, SIMD basics, Python tooling.
- Mentor: Alexander Smorkalov
- Difficulty: Medium–Hard 175 hrs
-
- Description: Camera calibration quality hardly depends on calibration pattern detection. The goal of the project is to assess current boards detection quality with synthetic benchmark and fix found gaps. Additional useful outcome of the project - set of recommendations for practical calibration.
-
Expected Outcomes:
- Create simulation of various calibration boards and camera types (pin hole, wide view, 180, various distortion types)
- Assess performance of these boards for different camera types, create a chart of boards, vs camera types and results.
- Create a tutorial and document recommending what to use for different situations
- Skills Required: Python tooling, knowledge of calibration, ability to use 3D simulators.
- Mentor: Jean-Yves Bouguet
- Difficulty: Medium 175 hrs
-
- Description: We'd like to incorporate modern computer vision feature detector/descriptors into OpenCV model zoo. This means getting them into an ONNX form that OpenCV's DNN module can run. We'd like to have Aliked, XFeat, DeDoDe v2 and SuperPoint (open version for comparison) in the OpenCV model zoo.
-
Expected Outcomes:
- Get the models in the above priority order into OpenCV's model zoo
- Create examples of how to use each one using a live camera feed or a video
- Demonstrate your above examples in YouTube tutorials
-
Resources:
- Original SuperPoint Github CANNOT USE/NON-COMMERCIAL LICENSE
- XFeat feature detector github This is the fast descriptor but better than SuperPoint
- DeDoDe feature detector github This is the most accurate feature detector
- OpenCV Model Zoo
- Skills Required: Python, OpenCV, use of DNN library and experience in dealing with model structure for converting the model formats to ONNX
- Mentor: Gary Bradski and Gholamreza Amayeh
- Difficulty:
- Duration: 350 hrs because 3 models and testing
-
- Description: Add the LightGlue feature matcher into opencv (to join the BFMatcher and the FLAAN matcher) so that together with above-mentioned deep-learning based features we will have much better image alignment pipeline.
-
Expected Outcomes:
- Add LightGlue as a new feature matcher
- Create an example of subpixel accurate feature matching between pairs of images
- Create test code, documentation and a video of it working
-
Resources:
- LightGlue Code that works with ALIKED https://github.com/cvg/LightGlue
- LightGlue Paper
- Skills Required: Python, Computer vision AI model training, pytorch
- Mentor: Gary Bradski, Gursimar Singh
- Difficulty: Medium
- Duration: 175
-
- Description: Using feature detector, descriptors such as LightGlue Code that works with ALIKED, create a SLAM framework for OpenCV (with help from an expert mentor)
-
Expected Outcomes:
- Collect video sequences to be tracked such as with LightBlue+ALIKED above or in public SLAM databases
- Develop a SLAM algorithm
- Include unit test code and data
- Create a demo code and video of how to use it
- Resources:
- Skills Required: Python, Ceres, understanding of SLAM
- Mentor: Reza Amayeh
- Difficulty: Hard
- Duration: 350
-
- Description: During GSoC 2022-2025 a new cool multi-camera calibration algorithm was created and gradually improved: https://github.com/opencv/opencv/pull/24052. This year we would like to continue this work with more test cases, tune the accuracy and continue to build high-level user-friendly tool (based on the script from the tutorial) to perform multi-camera calibration. If this is completed before the program is over, then we'll move on to leveraging the IMU or marker-free calibration.
-
Expected Outcomes:
- A series of patches with more unit tests and bug fixes for the multi-camera calibration algorithm
- New/improved documentation on how to calibrate cameras
- A short YouTube video showing off how to use the calibration routines
- Skills Required: Mastery of C++ and Python, mathematical knowledge of camera calibration, ability to code up mathematical models
- Difficulty: Medium-Difficult
- Possible Mentors: Maksym Ivashechkin, Alexander Smorkalov
- Duration: 175 hours
-
- Description: We are looking for a student to curate best of class calibration data, collect calibration data with various OpenCV Fiducials, and graphically produce calibration board and camera models data (script). Simultaneously, begin to write comprehensive test scripts of all the existing calibration functions. While doing this, if necessary, improve the calibration documentation. Derive from this expected accuracy of fiducial types for various camera types.
-
Expected Outcomes:
- Curate camera calibration data from public datasets.
- Collect calibration data for various fiducials and camera types.
- Graphically create camera calibration data with ready to go scripts
- Write test functions for the OpenCV Calibration pipeline
- New/improved documentation on how to calibrate cameras as needed.
- Statistical analysis of the performance (accuracy and variance) of OpenCV fiducials, algorithms and camera types.
- A YouTube video showing describing and demonstrating the OpenCV Calibration testss.
- Resources: OpenCV Fiducial Markers, OpenCV Calibration Functions, OpenCV Camera Calibration Tutorial 1, OpenCV Camera Calibration Tutorial 2
- Skills Required: Mastery of C++ and Python, mathematical knowledge of camera calibration, ability to code up mathematical models
- Difficulty: Medium
- Possible Mentors: Jean-Yves Bouguet, Alexander Smorkalov
- Duration: 175 hours
-
- Description: Build a higher-level user-friendly tool (based on the script from the calibration tutorial) to perform multi-camera calibration. This should allow easy multi-camera calibration with at multiple Charco patterns and possibly other calibration fiducial patterns. The results will use Monte-Carlo sampling to determine parameter stability, allow easy switching of camera models and output the camera calibration parameters and the fiducial patterns pose in space as well as the extrinsic locations of each camera relative to the others.
-
Expected Outcomes:
- Tool with convenient API that will be more or less comparable and compatible with Kalibr tool (https://github.com/ethz-asl/kalibr)
- New/improved documentation on how to calibrate cameras
- A Youtube video demonstrating how to use the box
- Skills Required: Python, mathematical knowledge of camera calibration, ability to code up mathematical models
- Difficulty: Medium-Difficult
- Possible Mentors: Jean-Yves Bouguet, Gary Bradski
- Duration: 175 hours
-
- Description: Many modern CPUs, GPUs and specialized NPUs include special instructions and hardware blocks for accelerated inference, especially for INT8 inference. The INT8 models are not just ~4x smaller than FP32 original models, the inference speed increases significantly (by 2x-4x or more) as well. The number of quantized models steadily increases and we would like to quantize more computer vision models. In some tough cases it could be a partial quantization where, for example, just the model backbone is quantized. We will be interested to add to our model zoo (https://github.com/opencv/opencv_zoo) INT8 models for object detection, optical flow, pose estimation, text detection and recognition etc.
-
Expected Outcomes:
- Series of patches to OpenCV Zoo and maybe to OpenCV DNN (when OpenCV DNN misses 8-bit flavors of certain operations) to add the corresponding models.
- If quantization is performed by student during the project, we will request the corresponding scripts to perform the quantization
- Benchmark results to prove the quality of the quantized models along with the corresponding scripts so that we can reproduce it.
- Skills Required: very good ML engineering skills, good Python programming skills, familiarity with model quantization algorithms and model quality assessment approaches
- Possible Mentors: Wu Jia, Vadim Pisarevsky
- Difficulty: Medium
- Duration: 175 hours, depending on the particular model.
-
-
Description: Last year we started to add tokenizers to OpenCV DNN module (#27534). Unfortunately, that work is still unfinished (as of 2026 Feb), but it will likely be finalized and merged soon. In any case, the patch only supports GPT-2 tokenizer. We are interested in expanding this functionality to support many more kinds of tokenizers. The tokenizer class should be able to read tokenizer.json for popular HF models and be able to tokenize input text file based on this description. What we are interested in:
- other than BPE kinds of tokenization: WordPiece, Unigram.
- customizable normalization and regex-based pre-tokenization
- optional post-processing after tokenization (https://hexdocs.pm/tokenizers/Tokenizers.PostProcessor.html)
- performance testing and optimizations
- greatly extended set of tests and samples to showcase and test the expanded functionality.
- Expected Outcomes: A set of patches to implement the above-mentioned functionality.
- Resources:
- Skills Required: C++, Python, LLMs/Transformers
- Mentor: Gursimar Singh, Vadim Pisarevsky
- Difficulty: Hard
- Duration: 350 hrs
-
Description: Last year we started to add tokenizers to OpenCV DNN module (#27534). Unfortunately, that work is still unfinished (as of 2026 Feb), but it will likely be finalized and merged soon. In any case, the patch only supports GPT-2 tokenizer. We are interested in expanding this functionality to support many more kinds of tokenizers. The tokenizer class should be able to read tokenizer.json for popular HF models and be able to tokenize input text file based on this description. What we are interested in:
-
-
Description: Vision Language Models (VLMs) are now considered as a silver bullet (well, considering their weight, rather a platinum bullet) that solve many computer vision problems without coding, just by providing a proper query along with a picture of interest to VLM model. We already have partial support for Paligemma model (#27988, #27238, #27449, ...). We are now (2026 Feb) adding support for KV-cache. We will be interested to continue to improve support for such models in OpenCV. In particular, we are interested in:
- adding support for tokenizers to handle Paligemma and similar VLM models (see the idea above as well).
- accelerating attention and MLP parts of transformer blocks by using reduced accuracy (bf16, int8, ..., int4 for weights maybe?) and tensor blocks.
- supporting more variations of attention.
- Expected Outcomes: A set of patches to implement all or some of the above-mentioned functionality, together with tests.
- Resources:
- Skills Required: C++, LLMs/Transformers
- Mentor: Gursimar Singh, Vadim Pisarevsky
- Difficulty: Hard
- Duration: 350 hrs
-
Description: Vision Language Models (VLMs) are now considered as a silver bullet (well, considering their weight, rather a platinum bullet) that solve many computer vision problems without coding, just by providing a proper query along with a picture of interest to VLM model. We already have partial support for Paligemma model (#27988, #27238, #27449, ...). We are now (2026 Feb) adding support for KV-cache. We will be interested to continue to improve support for such models in OpenCV. In particular, we are interested in:
-
- Description: OpenCV DNN module includes several backends for efficient inference on various platforms. Some of the backends are heavy and bring in a lot of dependencies, so it makes sense to make the backends dynamic. Recently, we did it with OpenVINO backend: https://github.com/opencv/opencv/pull/21745. The goal of this project is to make CUDA backend of OpenCV DNN dynamic as well. Once it's implemented, we can have a single set of OpenCV binaries and then add the necessary plugin (also in binary form) to accelerate inference on NVidia GPUs without recompiling OpenCV.
-
Expected Outcomes:
- A series of patches for dnn and maybe core module to build OpenCV DNN CUDA plugin as a separate binary that could be used by OpenCV DNN. In this case OpenCV itself should not have any dependency of CUDA SDK or runtime - the plugin should encapsulate it. It is fine if the user-supplied tensors (cv::Mat) are automatically uploaded to GPU memory by the engine (cv::dnn::Net) before the inference and the output tensors are downloaded from GPU memory after the inference in such a case.
- Resources:
- Skills Required: mastery plus experience coding in C++; good practical experience in CUDA. Acquaintance with deep learning is desirable but not necessary, since the project is mostly about software engineering, not about ML algorithms or their optimization.
- Possible Mentors: Alexander Smorkalov
- Difficulty: Hard
- Duration: 350 hours
-
- Description: RISC-V is one of main target platforms for OpenCV. During past several years we brought in some RISC-V optimizations based on RISC-V Vector extension by adding another backend to OpenCV scalable universal intrinsics. We refactored a lot of code in OpenCV to make the vectorized loops compatible with RISC-V backend and more or less efficient. Still, we see a lot of gaps and the performance of certain functions can be further improved. For some critical functions, like convolution in deep learning, it makes sense perhaps to implement custom loops using native RVV intrinsics instead of using OpenCV scalable universal intrinsics. This is what we invite you to do.
-
Expected Outcomes:
- A series of patches for core, imgproc, video and dnn modules to bring improved loops that use OpenCV scalable universal intrinsics or native RVV intrinsics to improve the performance. In the first case the optimizations should not degrade performance on other major platforms like x86-64 or ARMv8 with NEON.
- Resources:
- Skills Required: mastery plus experience coding in C++; good skills of optimizing code using SIMD.
- Possible Mentors: Mingjie Xing, Maxim Shabunin
- Difficulty: Hard
- Duration: 350 hours
-
-
Description: Improving image quality is important task, which is still not covered well in OpenCV. We already have "non-local means" (photo module) and BM3D (opencv_contrib) denoising algorithms, simple white balance algorithms (opencv_contrib), very simple exposure correction function (equalizeHist in imgproc: grayscale images only) and function for distortion correction (undistort function in imgproc), that's it. The following could be useful to have:
- more efficient/better-quality denoising algorithms
- vignetting correction
- chromatic aberration correction
- smarter white balance algorithms
- exposure correction for color images
- superresolution for still images and video
- deblurring
- color enhancement, defogging
- etc.
Note that:
- this idea is not about any special effects, 'beautification' etc. It's about improving pure technical image quality
- the idea is quite big, applicant(s) may and probably should suggest to implement a subset of the above items, they can also add something on top (as long as note (a) above is taken into account).
-
Expected Outcomes:
- Several patches to opencv_photo module and/or opencv_contrib repo that add the new functionality, tests, samples etc.
-
Resources:
-
Skills Required: C++, Python
-
Mentor(s): Gursimar Singh, Vadim Pisarevsky.
-
Difficulty: Hard
-
Duration: 175 hours
-
-
- Description: Fiducial markers such as QR codes, ArUco, and AprilTag have become very popular tools for labeling and camera positioning. They are robust and easy to detect, even in devices with low computing power. However, their industrial appearance deters their use in scenarios where an attractive and visually appealing look is required. In these cases, it would be preferable to use customized markers showing, for instance, a company logo. This work proposes a novel method to design, detect, and track customizable fiducial markers. Our work allows creating markers templates imposing few restrictions on its design, e.g., a company logo or a picture can be used. The designer must indicate positions into the template where bits will encode a unique identifier for each marker. Then, our method will automatically create a dictionary of markers, all following the same design, but each with a unique identifier.
-
Expected Outcomes:
- Integrate JuMarker ArUco into OpenCV with a simple API, which should be similar to the ArUco API currently in OpenCV.
- Detailed documents for Fractal ArUco API in OpenCV
- A nice demo to show how to create a JuMarker and to detect it.
- Resources: JuMarker ArUco
- Skills Required: C++, Python.
- Mentor: Rafael Muñoz Salinas, Shiqi Yu
- Difficulty: Hard
- Duration: 350 hours
-
- Description: QR, Barcode and ArUco are all popular code in computer vision applications. OpenCV now support all of them, and can detect and decode them. But OpenCV still expect a better detector and decoder for them. If possible, one efficient deep detector for all kinds of codes can simplify the usage. If one efficent deep detector cannot be achieved, several deep models are also acceptable.
-
Expected Outcomes:
- Train a deep detector for QR, Barcode and ArUco. Or train three different deep detectors for different codes specifically.
- The trained model should be easy to implement with the current algorithms in OpenCV on QR/Barcode/ArUco.
- A nice demo to show how to use the algorithm.
- Detailed report to demontrate if the trained detector(s) are better than the current solution in OpenCV.
- Resources:
- Skills Required: C++, Python, and experience on object detection.
- Mentor: Shiqi Yu
- Difficulty: Hard
- Duration: 350 hours
-
- Description: Multi-camera calibration and multi-view scenarios require synchronous recording with multiple cameras. Need to tune cv::VideoCapture or/and VideoWriter and implement sample for video recording with several cameras with timestamps
-
Expected Outcomes:
- Sync video recording sample for several cameras: V4L2, RTSP(?)
- Resources: Overview
- Skills Required: C++
- Possible Mentors: Alexander S.
- Difficulty: Easy-Medium
- Duration: 175 hours
-
- Description: OpenCV can be better integrated with https://oss-fuzz.com/ to get better fuzzing, mostly by moving the tests to the OpenCV repository and writing more tests (especially for imgcodecs and videoio). Additionally, sandboxing could be integrated into OpenCV to make sure invalid inputs are safely discarded by integrating sandbox2.
-
Expected Outcomes:
- Several patches that improve the oss-fuz integration.
- Several patches that add an optional build of OpenCV with sandbox2 integrated for some functions.
-
Resources:
- sandbox2 documentation https://developers.google.com/code-sandboxing/sandbox2
- fuzztest documentation https://github.com/google/fuzztest
- ongoing PR for fuzztest integration: https://github.com/opencv/opencv/pull/24193
- Skills Required: C++
- Mentor: Vincent Rabaud
- Difficulty: Hard
- Duration: 350
-
- Description: Improve the Bubbaloop simulation stack by integrating or enhancing a MuJoCo-based workflow. Focus on stable simulation bindings, reproducible training/evaluation loops, and better tooling for dataset generation and benchmarking.
-
Expected Outcomes:
- MuJoCo integration with example tasks
- Simulation-focused training/evaluation scripts
- Documentation + demo video
- Resources:
- Skills Required: < Experience with robotics simulation (specifically MuJoCo), Reinforcement Learning (RL), and Python/Rust integration. >
- Mentor: Edgar Riba, Jian Shi, Christie Purackal
- Difficulty: Medium–Hard
- Duration: 350
-
- Description: Connect the Bubbaloop learning framework to real robot hardware. Focus on hardware abstraction, safety/recovery procedures, and a minimal real-world demo that mirrors a simulation task.
-
Expected Outcomes:
- Hardware integration layer with a small example robot
- Real-world demo aligned with a simulation scenario
- Documentation + setup guide
- Resources:
- Skills Required: < Hands-on experience with robotics hardware, hardware abstraction layers, Rust, and safety-critical system design. >
- Mentor: Edgar Riba, Jian Shi, Christie Purackal, Miquel Farré
- Difficulty: Medium–Hard
- Duration: 350
-
- Description: Implement GPU kernels for core tensor ops and image processing transforms in Rust. Focus on frequently used operations (resize, warp_affine/warp_perspective, color transforms, distance transform) and ensure API parity with existing CPU implementations. Use CubeCL (multi-platform GPU compute in Rust) and/or native CUDA interop as appropriate.
-
Expected Outcomes:
- Minimal GPU backend for kornia-tensor ops used by kornia-imgproc
- GPU implementations for a handful of high-impact transforms
- Benchmarks vs CPU and basic docs/examples
- Resources:
- Skills Required: < Mastery of Rust, experience with GPU programming (CUDA, WGSL, or CubeCL), and knowledge of image processing kernels. >
- Mentor: Edgar Riba, Jian Shi, Christie Purackal
- Difficulty: Hard
- Duration: 350
-
- Description: Bring more vision-language models from kornia (PyTorch) into kornia-rs and deliver optimized inference via ONNX Runtime with TensorRT where available. Focus on model portability and clear runtime selection (CPU/CUDA/TensorRT), with an emphasis on embedded targets (e.g., Jetson).
-
Expected Outcomes:
- 1–2 VLM models integrated into kornia-vlm
- ONNX Runtime backend + optional TensorRT path
- Benchmarks and a small demo
- Resources:
- Skills Required: < Proficiency in Rust and Python, experience with ONNX Runtime or TensorRT, and understanding of transformer-based vision models. >
- Mentor: Edgar Riba, Miquel Farré
- Difficulty: Medium–Hard
- Duration: 350
-
- Description: Formalize and extend our kornia-SLAM crate within kornia-rs and implement a baseline visual-inertial odometry pipeline. Extend the crate to support stereo VIO, add GPU support, optimize existing kernels, and extend functionality toward LiDAR SLAM techniques. Emphasis on a modular API that can later host learning-based modules.
-
Expected Outcomes:
- kornia-slam crate integrated into workspace
- Baseline VIO pipeline + dataset evaluation
- Benchmarks for CPU/GPU performance and accuracy
- Documentation + example CLI
- Resources:
- Skills Required: < Strong background in 3D computer vision, experience with SLAM/VIO algorithms, Rust, and C++. >
- Mentor: Christie Purackal, Dmytro Mishkin, Edgar Riba
- Difficulty: Hard
- Duration: 350
1. #### _IDEA:_ Your title here
* ***Description:*** 3-7 sentences describing the task
* ***Expected Outcomes:***
* < Short bullet list describing what is to be accomplished >
* <i.e. create a new module called "bla bla">
* < Has method to accomplish X >
* <...>
* ***Resources:***
* [For example a paper citation](https://arxiv.org/pdf/1802.08091.pdf)
* [For example an existing feature request](https://github.com/opencv/opencv/issues/11013)
* [Possibly an existing related module](https://github.com/opencv/opencv_contrib/tree/master/modules/optflow) that includes some new optical flow algorithms.
* ***Skills Required:*** < for example mastery plus experience coding in C++, college course work in vision that covers optical flow, python. Best if you have also worked with deep neural networks. >
* ***Mentor:*** < your name goes here >
* ***Difficulty:*** <Easy, Medium, Hard>
* ***Duration:*** <175 <normal> 350 <extended>>
-
-
- Use the OpenCV How to Contribute and see Line Descriptor as a high quality finished example.
- Add unit tests described here, see also the Line Descriptor example
- Add a tutorial, and sample code
- see the Line Descriptor tutorial and how they look on the web.
- See the Line Descriptor samples
- Make a short video showing off your algorithm and post it to Youtube. Here's an Example.
-
The process at Google is described at GSoC home page
- Contributors will be paid only if:
-
Phase 1:
- You must generate a pull request
- That builds
- Has at least stubbed out (place holder functions such as just displaying an image) functionality
- With OpenCV appropriate Doxygen documentation (example tutorial)
- Includes What the function or net is, what the function or net is used for
- Has at least stubbed out unit test
- Has a stubbed out example/tutorial of use that builds
- See the contribution guild
- and the coding style guild
- the line_descriptor is a good example of contribution
- You must generate a pull request
-
Phase 2:
- You must generate a pull request
- That builds
- Has all or most of the planned functionality (but still usable without those missing parts)
- With OpenCV appropriate Doxygen documentation
- Includes What the function or net is, what the function or net is used for
- Has some unit tests
- Has a tutorial/sample of how to use the function or net and why you'd want to use it.
- Optionally, but highly desirable: create a (short! 30sec-1min) Movie (preferably on Youtube, but any movie) that demonstrates your project. We will use it to create the final video:
- You must generate a pull request
-
Extended period:
- TBD
-
Phase 1:
- Contact us, preferably in February or early March, on the opencv-gsoc googlegroups mailing list above and ask to be a mentor (or we will ask you in some known cases)
- If we accept you, we will post a request from the Google Summer of Code OpenCV project site asking you to join.
- You must accept the request and you are a mentor!
- You will also need to get on:
- You then:
- Look through the ideas above, choose one you'd like to mentor or create your own and post it for discussion on the mentor list.
- Go to the opencv-gsoc googlegroups mailing list above and look through the project proposals and discussions. Discuss the ideas you've chosen.
- Find likely contributors, ask them to apply to your project(s)
- You will get a list of contributors who have applied to your project. Go through them and select a contributor or rejecting them all if none suits and joining to co-mentor or to quit this year are acceptable outcomes.
- Make sure your contributors officially apply through the Google Summer of Code site prior to the deadline as indicate by the Contributor Application Period in the time line
- Then, when we get a slot allocation from Google, the administrators "spend" the slots in order of priority influenced by whether there's a capable mentor or not for each topic.
- Contributors must finally actually accept to do that project (some sign up for multiple organizations and then choose)
- Get to work!
If you are accepted as a mentor and you find a suitable contributor and we give you a slot and the contributor signs up for it, then you are an actual mentor! Otherwise you are not a mentor and have no other obligations.
- Thank you for trying.
- You may contact other mentors and co-mentor a project.
You get paid a modest stipend over the summer to mentor, typically $500 minus an org fee of 10%.
Several mentors donate their salary, earning ever better positions in heaven when that comes.
Ankit Sachan
Anatoliy Talamanov
Clément Pinard
Davis King
Dmitry Kurtaev
Dmitry Matveev
Edgar Riba
Gholamreza Amayeh
Grace Vesom
Jiri Hörner
João Cartucho
Justin Shenk
Michael Tetelman
Ningxin Hu
Rafael Muñoz Salinas
Rostislav Vasilikhin
Satya Mallick
Stefano Fabri
Steven Puttemans
Sunita Nayak
Vikas Gupta
Vincent Rabaud
Vitaly Tuzov
Vladimir Tyan
Yida Wang
Jia Wu
Yuantao Feng
Zihao Mu
Gary Bradski
Vadim Pisarevsky
Shiqi Yu
© Copyright 2019-2025, OpenCV team
- Home
- Deep Learning in OpenCV
- Running OpenCV on Various Platforms
- OpenCV 5
- OpenCV 4
- OpenCV 3
- Development process
- OpenCV GSoC
- Archive
