Sonic Foundation

SONiC Foundation Welcomes Upscale AI as a Premier Member to Advance Open AI Networking Infrastructure

Sunny Schatz — Tue, 24 Feb 2026 02:45:31 +0000

Upscale AI will help shape the AI fabric roadmap and take on key leadership positions

SAN FRANCISCO — February 24, 2026 — The Software for Open Networking in the Cloud (SONiC) Foundation, an open source network operating system (NOS) hosted under the Linux Foundation, today announced that Upscale AI has upgraded its membership to Premier to help shape the future of AI networking infrastructure. A pure-play AI networking company, Upscale AI optimizes infrastructure from end-to-end, including silicon, systems, and the software substrate. Upscale AI’s platform is built on SONiC and other open standards.

“Upscale AI’s Premier membership reflects the growing role SONiC plays as the horizontal software layer for next-generation AI networking,” said Arpit Joshipura, general manager, Networking, Edge & IoT at the Linux Foundation. “We’re excited to deepen this collaboration as the industry scales AI fabrics with openness, performance, and resilience.”

Upscale AI has been heavily involved in the SONiC ecosystem, optimizing SONiC for large-scale AI clusters and advancing congestion control and reliability for deterministic workloads. The company was also one of the top contributors at the 2025 SONiC Hackathon, delivering advanced debugging and log correlation.

“Looking ahead, Upscale AI will deploy SONiC across our full product portfolio,” said Barun Kar, CEO of Upscale AI. “SONiC is a core pillar of our software strategy. We are building AI-native reliability, validation, and lifecycle security frameworks on SONiC while advancing reference architectures and operational best practices with the broader SONiC ecosystem. This commitment strengthens software integrity, accelerates innovation, and ensures production-grade deployments at scale.”

As a Premier member of the SONiC Foundation, Upscale AI will help shape the AI networking roadmap. Additionally, Upscale AI’s executives will assume leadership positions across the SONiC Foundation, with Aravind Srikumar on the Governing Board, Deepti Chandra on the Outreach Committee, and Santhosh K Thodupunoori on the Technical Steering Committee (TSC).

Upscale AI joins existing SONiC Foundation Premier member organizations, including Alibaba, Arista Networks, Broadcom, Cisco, Dell Technologies, Google, Intel, Marvell, Microsoft, Nexthop AI, Nokia, and Nvidia. To learn more about the SONiC Foundation and how to get involved, visit www.sonicfoundation.dev.

SONiC Mentorship Spotlight: Aditi Reddy on Advancing Disaggregated SONiC-VPP Architecture in KNE

Sunny Schatz — Mon, 09 Feb 2026 05:00:40 +0000

Through guided, hands-on project work, the SONiC Mentorship Program helps new contributors turn technical challenges into real-world open source impact. In this spotlight, we speak with Aditi Reddy, a graduate student at North Carolina State University, about her work on building and validating a disaggregated SONiC-VPP architecture inside a Kubernetes-based network emulation environment.

About the Mentee

My name is Aditi Reddy, a graduate student in computer science at North Carolina State University. I joined the SONiC Mentorship Program because I wanted hands-on exposure to real network operating systems and the chance to understand how modern switching platforms are built and deployed.

I am interested in modular, cloud-native networking systems, and this program was a great way to explore technologies like SONiC, VPP, and Kubernetes in a practical setting. Working directly with open-source infrastructure gave me a clearer view of how control-plane and dataplane components interact in real-world environments.

Q: What project did you work on, and why is it important to SONiC?

The integration of the VPP (Vector Packet Processing) dataplane with SONiC has traditionally followed a monolithic model, where VPP runs as a tightly coupled process inside SONiC containers. While this design works for a single virtual switch, it limits scalability and makes it difficult to build large or distributed test environments. With the Alpine–Lucius pipeline and the KNE (Kubernetes Network Emulation) cluster, there is now an opportunity to replace this monolithic model with a more modern, containerized architecture.

This project aims to separate the VPP dataplane from the SONiC control plane, running VPP as an independent Kubernetes pod inside the KNE environment while still being programmed by SONiC. Using Kubernetes orchestration, the control plane can instantiate, configure, and manage multiple VPP instances dynamically, enabling horizontal scaling and supporting more complex topologies. This disaggregated architecture also improves flexibility—allowing operators to isolate dataplane failures, allocate resources more efficiently, and manage VPP and SONiC lifecycles independently—while still preserving the operational advantages of a SONiC-based system.

Q: What were your main technical contributions?

My work centered on bringing up the disaggregated SONiC-VPP architecture inside the KNE environment and validating dataplane communication between SONiC-VPP and SONiC-Alpine. Since the upstream SONiC-VPP Docker image was broken, I first rebuilt a functional image locally using Docker, Make, and the SONiC build system. This involved debugging multistage Dockerfiles, fixing missing dependencies, and resolving build errors related to supervisor scripts and VPP startup components. Once the image was stable, I loaded it into the Kind cluster and integrated it into the existing KNE topology using protocol buffer–based topology files.

Throughout this process, I worked extensively with Kubernetes, kubectl and containerd to manage pods, load images, and inspect network interfaces inside the cluster. I also used tools

like tcpdump, iproute2 (ip link/route), and temporary debug containers to trace packets, verify interface mappings, and understand how KNE wires veth pairs between pods.

After configuring the interfaces on both SONiC-VPP and SONiC-Alpine, I successfully achieved end-to-end ping connectivity between the two pods—a key milestone proving that VPP’s dataplane path and the SONiC control-plane logic were operating correctly inside the emulated Kubernetes environment.

Q: What challenges did you face, and what did you learn?

A large part of the project involved troubleshooting and understanding how the different layers of the SONiC-VPP and KNE ecosystem interact. One of the earliest challenges was dealing with the broken SONiC-VPP Docker build, which failed due to missing base images, outdated slave images, and supervisor-script issues. Fixing these required digging through the SONiC build system, patching multistage Dockerfiles, and resolving dependency mismatches that prevented VPP from starting correctly.

The next set of issues appeared when integrating the image into Kind and KNE, where interface wiring behaves very differently from a traditional VM setup. VPP initially could not detect or bind to Kubernetes-provided interfaces, which led me to learn how Kubernetes veth pairs are created, how pods inherit network namespaces, and how those namespaces are connected through meshnet. I also had to understand how SONiC’s control-plane interfaces map onto VPP’s dataplane interfaces, and how KNE injects these links during pod creation.

There were also challenges around cluster accessibility, image loading, and pod startup failures caused by leftover cluster state. Despite these hurdles, each issue gave me a deeper understanding of the full pipeline and ultimately led to successful SONiC-VPP SONiC-Alpine communication inside KNE.

Q: What impact has this mentorship had, and what are your next steps?

With basic connectivity between SONiC-VPP and SONiC-Alpine now established inside KNE, the next phase of the project is to fully separate VPP into its own standalone dataplane container and integrate it cleanly with the SONiC-Alpine control plane. This includes creating a dedicated VPP pod template, defining stable interface mappings, and ensuring that SONiC can program VPP dynamically through standard mechanisms. Another focus area will be improving automation around topology creation—ideally allowing multiple VPP instances to be orchestrated and scaled horizontally to support larger testbeds. Finally, I plan to verify more complex traffic flows, extend support for additional interfaces, and refine the configuration so that the disaggregated SONiC-VPP architecture becomes fully reproducible and stable within modern Kubernetes environments.

Q: Is there anyone you’d like to acknowledge?

This project has been a valuable learning experience and a great introduction to working with real open-source networking systems. I’m grateful to have had the opportunity to explore SONiC, VPP, and Kubernetes in a hands-on setting, and to contribute to an architecture that moves SONiC toward a more flexible, containerized future.

I would like to thank my mentors, Brian O’Connor and Arpitha Raghunandan, for their continuous guidance, technical direction, and support throughout the project. I’m also thankful to Sreemoolanathan Iyer and Sonika Jindal from the SONiC community, whose work on SONiC Alpine and willingness to answer questions helped me overcome several critical issues. A special thanks to Kalyan Pullela, another mentee working under the same mentorship track— many of the challenges we faced were solved collaboratively through discussion and shared debugging.

Finally, I want to thank the LFX Mentorship Program and the broader SONiC community for providing this opportunity and creating such a welcoming environment for contributors.

Get Involved

Interested in contributing to SONiC? Join the community and get involved through wiki, mailing lists, and working groups.

SONiC Mentorship Spotlight: Shale Lucas on Improving Team Daemon Performance with a Netlink Proxy

Sunny Schatz — Fri, 06 Feb 2026 05:00:45 +0000

The SONiC Mentorship Program brings contributors and mentors together to work on practical technical challenges and strengthen the open networking ecosystem. In this spotlight, we speak with Shale Lucas, a student at The City College of New York, about his work on improving team daemon performance by implementing a proxy between the Linux kernel and userspace.

About the Mentee

My name is Shale and I attend The City College of New York. My current majors are Computer Science and Pure Math. I was inspired to apply to the Linux Foundation as a whole because I wanted to get experience with open source development.

I saw the SONiC organization as an opportunity to learn more about programs involving computer networking and low level systems. Being able to learn more about Linux by working on a part of a NOS (Network Operating System) let me gain experience working on daemons and my own proxy server.

A simple diagram of the original setup with teamd processes and the kernel.

Q: What project did you work on, and why is it important to SONiC?

Team daemons are background programs that monitor and control network “teams” or “bonds,” which are groups of network connections combined to work as one for better speed or reliability. These daemons, often called teamd, use information from the system to keep each network team running smoothly and to react to changing conditions, like switching to a backup link/device if one fails.

When a system has many (~240) of these network teams (also called Link Aggregation Groups, or LAGs), problems can happen if the team daemons can’t keep up with the system’s updates. The Linux kernel sends out netlink messages (which report on the status of network devices) very quickly, faster than the daemons can read them. When the daemons fall behind, they might miss changes in the network, leading to trouble like ports/links/devices rapidly going up and down, also known as port flapping. The focus of the project aimed to reduce system resource usage by implementing a proxy between the kernel and userspace.

Q: What were your main technical contributions?

My mentor helped me improve the way messages move from the Linux kernel to userspace by guiding me in creating a proxy server. Instead of having each network team daemon manage its own netlink socket to receive network status updates directly from the kernel, I changed the setup so each daemon used a Unix Domain Socket instead. Now, there is just one netlink socket handled by the proxy server, which reads all the messages from the kernel and then sends the relevant ones to each team daemon through their Unix Domain Sockets.

To test this new setup, you wrote bash scripts that would repeatedly flap (bring up and down) network ports, causing the kernel to generate lots of netlink messages. For the proxy server, I used the C programming language, containerlab for network simulation, standard Linux utilities, and bash for scripting and automation.

A simple diagram of the proxy, kernel and teamd processes.

Q: What challenges did you face, and what did you learn?

The biggest challenge I faced was “getting up to speed”. Learning about a lot of the commonly used vocabulary in this environment was the first step. Once that was over, I was able to start writing simple programs to log diagnostics about network devices. Throughout the course of the program I learned more about containerization technologies like containerlab and docker. Additionally, I learned about lib-nl (lib-netlink) and its suite of libraries. I was able to improve my bash scripting skills, C skills and my use of epoll.

Q: What impact has this mentorship had, and what are your next steps?

My work serves as a starting point for other daemons to also improve their performance and reduce their resource utilization. I want to continue to contribute to open source tools in the SONiC environment and ultimately keep working on the proxy server in the future.

Q: Is there anyone you’d like to acknowledge?

Special thanks to Shenglin Zhu, Evan Harrison, the SONiC organization and The Linux Foundation for this opportunity.

Get Involved

Interested in contributing to SONiC? Join the community and get involved through wiki, mailing lists, and working groups.

SONiC Mentorship Spotlight: Gaurav Nagesh on Enhancing Redis Access Efficiency and Robustness in SONiC

Sunny Schatz — Wed, 04 Feb 2026 05:00:30 +0000

Through hands-on collaboration with experienced mentors, the SONiC Mentorship Program empowers contributors to address real-world challenges and advance open networking technologies.

In this spotlight, we speak with Gaurav Nagesh, a graduate student from Illinois Tech, about his work on improving Redis performance and stability across SONiC platform daemons, helping make the system more efficient, resilient, and scalable under load.

About the Mentee

I am Gaurav Nagesh, a graduate student from Illinois Tech with a master’s degree in CS and a specialization in Distributed Systems. One of the biggest motive for applying to the SONiC Mentorship Program was seeing how SONiC is actually used in production by so many companies in massive data centers. The scale at which it runs, the adoption rate, the way the community actively contributes to it and its acceptance in the industry even beyond hyperscalers, these things stood out to me immediately.

Reading that SONiC would be like the “Linux of NOS” according to a Gartner report, made it even more exciting. Moreover, getting an opportunity to be mentored by experienced industry engineers and learn directly from people who actively work on SONiC felt like a huge bonus. Finally, the idea that I could be part of a project that powers so many real-world data centers and make an impact, felt like a rare opportunity which could not be missed.

Another reason this project caught my attention is because it aligned perfectly with my background. I’ve worked on benchmarking and profiling systems to improve performance, so the nature of this task felt like a great fit. On top of that, my brief experience with SDN controllers and programmable networks made sure I had a solid foundation in open-source networking to understand SONiC better and dive deep into the project.

Q: What project did you work on, and why is it important to SONiC?

The project I worked on was titled “Enhancing Redis Access Efficiency and Robustness in SONiC.”

When we talk about Redis performance optimization in SONiC, there are usually three major areas to look at: optimizing the application code and how it interacts with Redis, improving the Redis client library itself, or tuning the Redis server for the specific workload. In this project, my focus was on the first part, optimizing things from the application-code perspective and improving how the daemons talk to Redis.

The main idea of the project was to analyze how SONiC’s critical services and platform daemons interact with Redis, understand where unnecessary Redis traffic was coming from, and improve both the efficiency and stability of these interactions.

This meant profiling the current Redis access patterns, setting up baseline load tests, and benchmarking behavior across different hardware platforms to see how Redis performed under varying workloads.

Overall, the project aimed to reduce redundant operations, avoid long-latency interactions, and make sure the system remained responsive even under load.

The main goals were:

Analyze the existing Redis access patterns across SONiC’s critical services and platform daemons
Identify and eliminate unnecessary Redis operations, especially repeated GET/SET patterns
Improve responsiveness under load by reducing long-latency Redis interactions
Benchmark and compare key metrics before and after optimization
Add stronger exception-handling paths to prevent critical daemons from crashing

Q: What were your main technical contributions?

During the mentorship, I focused on one of the key subsystems in SONiC — the sonic-platform-daemons — and more specifically on some of its most critical processes: pcied, psud, thermalctld, ledd, and parts of the Redis client library. These components involve both Python and C++ code, so the work spanned across both languages.

The first step was to understand how each of these daemons was interacting with Redis. I approached this in two ways. One was profiling the code using tools like py-spy to generate flamegraphs and identify hot paths—especially around Redis operations. The second approach was tracing every Redis operation end-to-end: which Redis table was being accessed, what command was executed, how long it took, what data was written or read, and the size of the payload. With this, I was able to collect key metrics such as total Redis operations performed, latency and throughput per table and per operation, time-series latency graphs, and CPU/memory usage.

To make this tracing possible, I wrote a custom tool, a Python module that automatically intercepts Redis operations from the target daemons. It works across multi-process and multi-threaded programs, requires almost no code changes, and can be toggled using an environment variable in the supervisord configuration file.

After analyzing all this data, I identified major areas of improvement and implemented them. Key contributions included:

Conditional writes with change detection: The daemons were updating all attributes every cycle, even when nothing changed. I added logic to track previous states and write to Redis only when values actually changed (e.g., model number, serial number, revision, thresholds, LED states, etc.). This cuts down a large amount of unnecessary Redis traffic.
Better Redis connection management: Many daemons were opening separate Redis connections for each table they wrote to. Since most of them are single-threaded and operations happen sequentially, this was wasteful. I consolidated this into a single connection reused for all writes across different tables on the same logic database (Eg STATE_DB, which is db6 in Redis)
Batching Redis operations using Pipeline: Instead of making individual set() calls for every attribute, I modified the logic to batch them using Redis Pipeline. This significantly improves throughput and reduces network overhead.
Misc bug fixes: Along the way, I also made small bug fixes in swss-mgmt and platform-daemons wherever I found issues while testing and reading the codebase which have already been merged.

After implementing the changes and benchmarking it to get the metrics, we see a significant improvement in the throughput and the over amount of time taken to execute the operations reduces. The performance improvement values may vary slightly in practice due to the dynamic nature of the daemons and external factors like current system load on Redis, background processes, hardware conditions and system resource availability. So, a certain variance is expected in these results. Accounting for the variance from multiple benchmarking runs, the graph below shows the range of performance improvement when compared to baseline across three daemons: pcied, thermalctld and psud.

Here are the draft pull requests implementing the changes:

pcied: Redis Performance Improvements
psud: Redis Performance Improvements
thermalctld: Redis Performance Improvements

Q: What challenges did you face, and what did you learn?

One of the main challenges I faced during the project was that most of my work targeted the pmon container, and many of the platform daemons inside it are platform-dependent. Because of this, developing and testing everything on the virtual testbed wasn’t always straightforward. Still, I tried to be as careful and disciplined as possible while writing and testing my changes in the virtual environment. For the actual benchmarking and platform-specific validation, my mentor helped by running the tests on physical hardware, which made the entire process much smoother.

On the technical side, I got to learn a lot of interesting things about SONiC’s architecture. For example, SONiC uses a microservice-style design, but unlike typical microservices where you run one service/process per container, here a single container runs multiple services/processes, all managed by supervisord. Another interesting detail was understanding how Redis is deployed: on single-ASIC devices, Redis runs in the host’s network namespace, but on multi-ASIC devices, you also have separate Redis instances running in different network namespaces—one per ASIC. These kinds of unique architectural decisions and the reasoning behind them were really insightful to learn.

Moreover, I also got the chance to collaborate and present my work to a small SONiC team at Microsoft, and I received some great feedback that helped me refine the final changes.

I also spent a lot of time exploring different SONiC repositories to understand how everything fits together and how the whole system operates end-to-end. Reading open-source code written by engineers from top companies gave me a good sense of how things are structured in large real-world systems, why certain design decisions were made, and what alternative approaches could have been. It helped me spot patterns, best practices, and common styles used across the project. It also broadened my understanding of SONiC from an end-user perspective—its features, use cases, and how the various pieces interact.

Q: What impact has this mentorship had, and what are your next steps?

The main impact of this project is that it brings an overall improvement in Redis performance within the platform subsystem and adds more stability to how these daemons interact with the database. It essentially makes Redis access lighter, more predictable and more resilient under load. Along with that, the work also contributed to reducing unnecessary pressure on system resources. Some of the key improvements include:

Reduced unnecessary memory allocations by re-using objects and avoiding duplicated or redundant resources.
Increased throughput, Reduced I/O and network overhead by:
- Reusing the same Redis connection for pipeline operations instead of creating separate connections for each table
- Batching operations using Redis Pipeline instead of sending individual commands
Reduced Redis traffic (effectively increasing available bandwidth for other traffic in the system) by using conditional writes—only updating Redis when a change in table fields is actually detected.

Looking forward, as I mentioned earlier, while working across different components I’ve already been able to catch and fix a few important bugs and I plan to continue doing that as I explore more of SONiC. I want to extend this optimization work to other daemons across different subsystems, understand more internals and keep improving things wherever I can. My goal is to stay actively involved and grow as a contributor, and I genuinely hope to be a long-term contributor to SONiC, continuing to learn, experiment, and build as the project evolves.

Q: Is there anyone you’d like to acknowledge?

First and foremost, I want to thank my mentor, Vasundhara Volam. None of this would have been possible without her. I’m extremely grateful for the opportunity she gave me and for all the time she took out of her busy schedule to mentor me. Ma’am has been incredibly understanding, flexible, supportive and very patient throughout the entire journey. She guided me with a structured approach, helped me learn things the right way. I genuinely couldn’t have asked for a better mentor.

I am also thankful to the extended team members, Rita Hui and Judy Joseph. Rita was very supportive and encouraging, and I really appreciate her consideration for enabling this opportunity within her team. Judy provided valuable evaluation, guidance and quick review cycles, which helped me move forward efficiently.

A big thank you to the LFX team — especially Evan Harrison and Tracey Li. They oversaw and managed the entire program end to end, ensured the onboarding and logistics were smooth and were extremely patient throughout the process. I had very long email chains with them because of uncertainties around my visa situation and they were always understanding, calm and positive. A special mention to Sriji Ammanath from the LFX HR team for kindly accommodating multiple requests due to my university policy constraints.

Thanks as well to the other mentors in this program. I appreciate the time and effort they put into supporting their mentees and sharing their experience throughout the program. I also want to thank the reviewers and project maintainers who took the time to review my PRs, provide feedback, and guide the merge process. Their input helped me refine my contributions and understand SONiC’s workflows better.

Get Involved

Interested in contributing to SONiC? Join the community and get involved through wiki, mailing lists, and working groups.

SONiC Mentorship Spotlight: Prathyusha Bathula on Expanding Community Testbeds with cSONiC

Sunny Schatz — Mon, 02 Feb 2026 05:00:00 +0000

Through the SONiC Mentorship Program, contributors collaborate with experienced mentors to solve real-world technical challenges and help advance the open networking ecosystem.

In this spotlight, we speak with Prathyusha Bathula about her work on adding cSONiC support to the SONiC community testbed, helping reduce external dependencies and simplify testbed deployment for contributors.

About the Mentee

My name is Prathyusha Bathula, and I graduated with a Master’s in Information Systems Technology at the University of North Texas. I applied to the SONiC Mentorship Program because I want to learn from experienced professionals to strengthen my technical and professional skills, and also about the chance to contribute to a real community-driven SONiC project. I’m interested in cloud-native networking, network automation, distributed systems, and how open-source NOS platforms like SONiC integrate with cloud infrastructure.

Q: What project did you work on, and why is it important to SONiC?

My project, “Adding cSONiC support in the SONiC community testbed topology,” aims to replace the current vendor NOS (for example, cEOS) used in the community testbed with cSONiC.

The goal is to eliminate external dependencies and simplify the setup and operation of the community testbed for contributors. This project focuses on configuring cSONiC as neighbor devices, enabling features like warm reboot, LACP lag extension, and MACsec, and ultimately upstreaming the full solution to the SONiC project.

Q: What were your main technical contributions?

During the mentorship, I successfully deployed cSONiC as neighbor devices in the SONiC community testbed and established BGP session connectivity between the DUT and the cSONiC neighbors.

My work involved configuring testbed topology, integrating new network setup steps, and validating interfaces and routing behavior. I primarily used Python and Ansible for automation and configuration tasks and collaborated closely with my mentor, Dawei Huang, throughout the development and troubleshooting process. Here is the pull request.

Q: What challenges did you face, and what did you learn?

During the project, I faced several challenges, especially around connectivity issues between the DUT and cSONiC, debugging OVS bridge mappings, and aligning the topology with the community testbed workflow.

These challenges helped me strengthen my troubleshooting abilities, gain a deeper understanding of SONiC’s testbed internals, and improve my Ansible automation and Python development skills. I also gained valuable experience in the open-source contribution process.

Q: What impact has this mentorship had, and what are your next steps?

This work helps the SONiC community by replacing the dependency on vendor NOS (cEOS) with cSONiC for testbed setup, making it easier for people to create and use community testbeds.

I successfully established connectivity between cSONiC and DUT (Device Under Test), which is the foundation for future testing capabilities. This groundwork opens the path for continued development of enabling warm reboot, LACP lag extension, and MACsec.

I plan to continue contributing to complete these features and help the community adopt cSONiC-based testbeds. This experience has strengthened my skills in network automation and open-source collaboration, which I’ll apply in my future career in software-defined networking.

Q: Is there anyone you’d like to acknowledge?

I would like to express my sincere gratitude to my mentors, Dawei Huang and Vaibhav Hemant Dixit, for their guidance and support throughout this internship. Their expertise and patience were invaluable in helping me navigate the complexities of the SONiC testbed infrastructure.

I also want to thank the SONiC community members who provided feedback and assistance during the development process. This project would not have been possible without the collaborative and welcoming environment of the SONiC open-source community.

Get Involved

Interested in contributing to SONiC? Join the community and get involved through wiki, mailing lists, and working groups.

SONiC Mentorship Spotlight: Meghana Ambalathingal on Improving Test Flakiness and Runtime in sonic-swss

Sunny Schatz — Fri, 30 Jan 2026 05:00:32 +0000

The SONiC Mentorship Program brings together new SONiC contributors and mentors to work hands-on on real-world technical challenges and strengthen the open networking community. In this spotlight, we speak with Meghana Ambalathingal, a recent UC Berkeley graduate, about her work on improving test speed, reliability, and determinism in the sonic-swss repository by converting Python-based tests into fast, in-process C++ unit tests.

About the Mentee

Hi, I’m Meghana! I recently graduated from UC Berkeley with a major in Computer Science. I joined the SONiC Mentorship Program to contribute to a real open-source networking platform and to learn how large systems are tested at scale.

Q: What project did you work on, and why is it important to SONiC?

Topic: Converting selected Python tests (pytest + DVS) into C++ unit tests using GoogleTest/GoogleMock inside the sonic-swss repo.

Goal: Make tests faster, more deterministic, and less dependent on external services when we’re validating orchestration logic.

Problem: Full DVS/ASIC_DB tests are excellent for end-to-end coverage, but they can be slow and occasionally flaky. Many behaviors can be verified in-process if we replace external dependencies with mocks and assert on public state (tables, counters, mock SAI calls).

Q: What were your main technical contributions?

I translated targeted pytests into GoogleTest/GoogleMock suites that:

Seed minimal DB state (e.g., ports, interfaces, neighbors)
Drive specific orch code paths by enqueueing inputs and running doTask()/timers directly
Assert on outcomes in SONiC tables and on calls captured by a mock SAI layer

Example Merge Request

What I did:
Added a single test, RouteOrch_AddRemoveIPv4_And_DefaultRoute_State, that covers adding/removing 2.2.2.0/24 and the default route. I wrote small helpers to read the route state from STATE_DB and to wait for ok/na.

Why:
It mirrors the original pytest intent but fits a mock unit-test harness that pre-seeds ports, interfaces, neighbors in SetUp(). It stays fast and deterministic by verifying behavior via SAI mocks and DB checks instead of relying on DVS/ASIC_DB.

How verified:
The test runs in milliseconds and asserts that RouteOrch writes the expected fields and that the mock SAI saw the expected add/remove calls.

Technical Stack:

GoogleTest/GoogleMock, SONiC table helpers for APPL_DB / STATE_DB / config, and small SAI mock hooks. Timers/consumers are ticked directly (no sleeps) to avoid flakes.

Q: Why did you choose to use mocks?

Speed: Everything runs inside the process, no containers or external daemons to spin up.
Determinism: Fewer timing races; results are repeatable.
Isolation: We test the orchestration logic itself and check what it outputs (DB rows, SAI calls) without depending on a full system.

Q: What challenges did you face, and what did you learn?

Onboarding to C++ after Python

Most of my testing background was Python. Moving to C++/GoogleTest meant learning the C++ build flow (headers vs. sources), handling link errors, and being careful with types and const-correctness. Treating warnings as hints and iterating in small steps helped a lot.

Understanding sonic-swss architecture

To mock well, I first mapped which orch was under test (e.g., RouteOrch), which tables it read/wrote (CONFIG_DB → APPL_DB → STATE_DB), and what SAI actions should fire (add/remove route). Then I mocked only the boundary I needed and asserted on observable outcomes.

Translating pytests to GTest

Pytests often validated outcomes indirectly via DVS/ASIC_DB. I rewrote those checks as: enqueue inputs → call doTask()/tick timers → assert on table fields and recorded SAI calls. For unordered field/value checks, I compared sets so tests wouldn’t fail on key ordering.

Debugging in a VM using build logs

Interactive debugging in the VM was tough. I leaned heavily on build logs (compiler errors, link failures, test output) to iterate: fix one error, rebuild a narrow target, re-run, repeat. This forced a disciplined loop of minimal changes and fast feedback that closely matched CI conditions.

Q: What impact has this mentorship had, and what are your next steps?

Today:
The converted tests run much faster and are more reliable, giving maintainers quick feedback on orchestration behavior.

For contributors:
It’s easier to iterate, write a small unit test, run it in milliseconds, and get a clear failure pointing to a missing field or unexpected call.

Next:

Apply the same pattern to more orch modules
Factor out small, reusable helpers (state waits, enqueue ops, SAI expectations)
Keep a healthy split: fast unit tests plus targeted integration/system tests for balanced coverage

Q: Is there anyone you’d like to acknowledge?

Huge thanks to my mentors Prabhat Aravind and Prince Sunny for guidance throughout this project. I learned a ton about practical C++, the sonic-swss testing model, and how to write fast, stable tests. I’m excited to keep contributing!

Get Involved

SONiC Mentorship Spotlight: Kalyan Pullela on Advancing Dataplane Testing with SONiC-Alpine

Sunny Schatz — Thu, 29 Jan 2026 23:16:24 +0000

As part of the SONiC Mentorship Program, contributors work closely with experienced mentors to tackle real-world technical challenges and strengthen the open networking ecosystem.

In this spotlight, we speak with Kalyan Pullela, a first-year PhD student at Oklahoma State University, about his work on integrating SONiC-alpine dataplane testing into SONiC’s Azure Pipelines CI flow. His journey through the program helped shape both his technical expertise and his academic research direction.

About the Mentee

Kalyan Pullela is a first-year PhD student at Oklahoma State University whose research focuses on resilient network systems. During the SONiC Mentorship Program, he worked under the guidance of Brian O’Connor and Arpitha Raghunandan on the “Basic Azure Pipeline with Alpine and KNE” project, where he focused on strengthening SONiC’s dataplane testing and continuous integration workflows.

Q: Can you introduce yourself and share your background?

My name is Kalyan Pullela, and I am a first-year PhD student at Oklahoma State University. I applied to the SONiC Mentorship Program because I wanted to gain a deeper understanding of the software stack on a switch. I had prior experience with P4, and SONiC felt like the natural next step.

As the mentorship progressed, it evolved from simply learning SONiC into a much broader journey. Working with SONiC and its ecosystem helped me discover a concrete research direction in resilient network systems, which I am now pursuing as part of my PhD.

Q: What project did you work on, and why is it important to SONiC?

My mentorship project, “Basic Azure Pipeline with Alpine and KNE,” focused on integrating SONiC-alpine dataplane testing into SONiC’s Azure Pipelines CI flow.

SONiC utilizes Azure Pipelines, but current checks only validate the control plane. This project made dataplane testing a first-class citizen by provisioning a VM in the pipeline, deploying Kubernetes Network Emulation (KNE) and SONiC-alpine, and then running automated dataplane test suites.

This work aims to:

Improve test coverage beyond the control plane
Catch dataplane regressions earlier
Provide a path for contributors to add tests and interpret results

In the longer term, it lays the groundwork for using Alpine-based dataplane tests as pre-submit gates across the SONiC ecosystem, thereby improving overall quality.

Q: What were your main technical contributions?

Core contributions:

Successfully brought up SONiC-alpine outside of Google’s environment
Executed SpyTest-based test cases using Alpine and KNE
Ensured the setup aligned with SONiC community workflows and expectations

This involved using tools such as Alpine, KNE, kubectl, Wireshark, and Docker, as well as modifying Python files on virtual devices to ensure tests ran consistently. I spent significant time debugging issues in containers, topologies, and configurations, often working alongside my mentors and their teammates to resolve problems.

I also presented progress to the SONiC Virtual Data Plane (VDP) working group, which provided valuable feedback from others working on SONiC-alpine and related efforts.

In parallel, I participated in the SONiC hackathon, where I extended Alpine to solve problems that the wider community faces. That work evolved into a separate project, which ultimately led to me receiving the “Rising Star Award” at OCP. I pushed the project to the open-source SONiC-alpine repository, an essential milestone in my open-source journey.

Q: What challenges did you face, and what did you learn?

Like any other project, solving one problem or bottleneck allowed me to progress to the next challenge, as explained below.

Initially, I did not have a server capable of comfortably running SONiC-alpine. After resolving that and bringing up Alpine, I moved on to running SpyTest-based suites, which exposed issues related to containers, architecture, and assumptions built into the broader test framework.

Devices had to be configured in a precise way so that tests could run end-to-end. Multiple containers needed to be healthy, connections to the external traffic generator had to match the topology, and the layout of front-panel ports had to align with what the tests expected.

Getting tests to run all the way through required extensive debugging. Files were missing, ill-formatted files, or were generated in the wrong location. Fixing these issues revealed the next hurdle: many ports were not operational even though they appeared configured.

With guidance from my mentors, I learned how lanes map to front-panel ports and how Lucius (the dataplane) interprets them. Correcting that mapping finally gave us an end-to-end run, though not all tests passed yet.

The hardest remaining problem was enabling packets to traverse from one device to another. Solving this required careful, step-by-step debugging, including validating the configuration, tracing paths, and confirming behavior using Wireshark. That process uncovered the root causes and led to a working solution that allowed connectivity beyond a single device.

From this experience, I learned to approach complex problems methodically, rather than getting discouraged when things go wrong. Technically, I gained a deeper understanding of SONiC’s architecture, its component interactions, and its application in real-world networks. I also became more comfortable with Docker, KNE, and the scripts and configuration files that tie the system together, and I saw firsthand the level of testing and robustness required before a system can be considered production-ready.

Q: What impact has this mentorship had, and what are your next steps?

I plan to continue working in the SONiC-alpine space. Building on the hackathon project, I aim to upstream additional minor changes that make it easier for newcomers to set up Alpine and start testing quickly.

Beyond Alpine, I intend to explore other SONiC variants such as SONiC-VPP and SONiC-DASH. This aligns directly with my research on scale-up and scale-out networks, where SONiC will be a central platform. The mentorship has helped influence my PhD roadmap, and I plan to build and research on top of this work, leading to publications.

Q: Is there anyone you’d like to acknowledge?

I would like to genuinely thank my mentors, Brian O’Connor and Arpitha Raghunandan, for taking the time out of their busy schedules to mentor, guide, and answer my questions. I could not have done this work without their support and help.

I would also like to thank Sonika Jindal and Sreemoolanathan Iyer for their help throughout this journey. Although they were not listed as official mentors, they contributed significantly and were always patient and generous with their time and expertise.

Finally, I would like to thank the SONiC community for creating this opportunity and for being so welcoming. I hope SONiC establishes a pipeline of students with the current batch, helping the next batch while continuing to intern, solving the next set of architectural challenges.

Get Involved

Interested in contributing to SONiC? Join the community and get involved through wiki, mailing lists, and working groups.

Nokia Joins SONiC Foundation as Premier Member, Strengthening Open Networking Innovation for AI-Scale Infrastructure

Sunny Schatz — Mon, 08 Dec 2025 05:00:18 +0000

Nokia’s Premier membership deepens its long-standing contributions and reinforces SONiC’s momentum as the leading open source NOS for AI and cloud-scale networking.
Mirza Arifovic, R&D Lead at Nokia, joins the SONiC Governing Board to shape strategic mission
Nokia will expand upstream collaboration and help advance high-capacity, AI-optimized data center fabrics.

San Francisco — December 8, 2025 — The Software for Open Networking in the Cloud (SONiC) Foundation, an open source network operating system (NOS) hosted under the Linux Foundation, today announced that Nokia has joined as a Premier member. The advancement reflects Nokia’s long-standing contributions to the SONiC community and reinforces its commitment to open, scalable, and AI-ready data center networking. Additionally, Nokia’s Mirza Arifovic, R&D Lead, joins the SONiC Governing Board to collaboratively advance the project’s strategic mission.

Nokia has been a leading SONiC contributor since 2019, ranking among the top five organizations for contributions, and delivering key innovations such as chassis and multi-ASIC architecture implementations, Switch Abstraction Interface (SAI) contributions, ARM architecture enablement and small-footprint optimizations. Nokia plans to deepen its engagement and expand participation across working groups. “We are delighted to welcome Nokia as a Premier member of the SONiC Foundation,” said Arpit Joshipura, general manager, Networking, Edge and IoT at the Linux Foundation. “Nokia has played a critical role in advancing SONiC from its early days to today’s AI-scale deployments. Their leadership in high-performance hardware, expertise in software development and global-scale network engineering strengthens the community and accelerates the adoption of open source NOS across hyperscale, enterprise, and telecom markets.”

“For the past five years, Nokia has been a proactive and committed member of the SONiC community, delivering key innovations that are now deployed at global scale in data center networks. Joining the SONiC Foundation as a Premier member builds on this proven commitment, allowing us to accelerate open source collaboration and combine the community’s efforts with our high-performance hardware and modern automation solutions to power the next generation of cloud and AI infrastructure,” said Rudy Hoebeke, Vice President, Software Product Management, Nokia IP Networks Business Division.

The addition of Nokia to SONiC’s Premier member roster further solidifies its position as a leading open source NOS for AI, cloud, and large-scale networking, supported by a global ecosystem that relies on SONiC to deploy flexible, vendor-agnostic networks with stronger automation, reliability, and community-driven innovation. Nokia joins existing SONiC Foundation Premier member organizations, including Alibaba, Arista Networks, Broadcom, Cisco, Dell Technologies, Google, Intel, Marvell, Microsoft, Nexthop AI, and Nvidia. To learn more about the SONiC Foundation and how to get involved, visit www.sonicfoundation.dev

About the Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, ONAP, OpenChain, OpenSSF, OpenStack, PyTorch, RISC-V, SPDX, Zephyr, and more. The Linux Foundation is focused on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.

For a list of trademarks of The Linux Foundation, please see its trademark usage page: linuxfoundation.org/trademark-usage. Linux is a registered trademark of Linus Torvalds.

####

Media Contact:

Sunny Cai

[email protected]

The Linux Foundation

2025 SONiC Hackathon Most Innovative Award Spotlight: Data Path Diagnostics on SONiC

Sunny Schatz — Wed, 03 Dec 2025 03:21:08 +0000

At the 2025 SONiC Hackathon, innovation meant tackling challenges that are rarely seen, and even harder to diagnose. The Most Innovative Award was presented to Sridhar Talari, Anand Mehra, and Xiaohu Huang from Cisco for their work on a key feature: data path diagnostics on SONiC.

Silent packet corruption and packet loss are rare, but when they occur, they create significant disruption and are extremely difficult to isolate. This Cisco team developed a proactive, intelligent solution designed to detect these errors early, locate them precisely, and help operators keep SONiC environments healthy and predictable.

The Problem

Instances have been observed where packet data corruption and packet loss occurs along the forwarding path without being detected by hardware. Undetected errors related to corruption may result in packets being dropped at later stages or incorrect data being delivered to the end user. Although the likelihood of these incidents is relatively very low, their occurrence can lead to significant disruptions and are very difficult to diagnose. It is therefore essential to implement proactive and accurate detection mechanisms for such corruption.

The Hackathon Solution

This feature periodically injects specially crafted packets into the forwarding path and subsequently receives them. It then compares the content of the transmitted packets with that of the received packets to ensure the integrity of the data path.

Feature implementation should satisfy the below requirements:

It should cover all the blocks in the data path.
It should cover all software paths taken by packet from CPU to ASIC and vice versa
Along with detection of packet corruption and loss, it should be able to pinpoint the exact location at a granular level. (i.e. corruption caused by Lookup, nexthop resolution, encapsulation, packet buffer etc.)
Feature should be enabled by default and should run in background
Feature should raise syslog when error is detected
Feature should allow user to configure interval, packet size, packet content and burst size of packets used for error detection
Additional CPU usage should be minimum.
No impact to any existing host path applications
These packets should not be forwarded
Detection time should be as fast as possible
False alarms should not happen.
Should be light weight and latency sensitive
Should be able to run the test even if all interfaces are down
CLI should be available to display packets sent, packets received correctly, packets received out of order, packets corrupted, location of corruption, and additional details as part of operational data.

The method for building a packet that traverses all hardware blocks in the data path and returns to the feature depends on the platform. Therefore, the team chose to add this feature to the SAI code and made provisions for its configuration and operational controls available through the SAI API, ensuring a standardized and hardware-independent format. User will configure the feature via SONIC CLI, which subsequently calls the SAI API to disable/enable the feature and configure it. Operational CLI commands are planned to display the results. FlexCounter mechanism available in SONiC will be enhanced to export the statistics from SAI layer.

Impacts and Benefits

This feature catches data path problems leading to packet drops, packet corruption, and total HW failure proactively in a faster and least intrusive manner. It saves a lot of time and effort spent debugging the network for these silent errors. It reduces network downtime. Feature attempts to identify the exact problem location which can be used to selectively disable certain sections of data path instead of turning it off completely reducing the scope of impact. In a modular system it helps to replace only the impacted parts instead of the entire chassis.

Next Steps:

The team is working to upstream the below implementations:

YANG model for configuration and operational data needed for this feature.
SAI API for implementing the feature.
Flexcounter enhancements to collect and display operational data

There are instances where packet corruption happens only with traffic bursts at a higher rate, and it will be helpful to detect these scenarios. Such tests cannot be run on production devices as it impacts control plane traffic. They can be run during maintenance window after costing the device from the network.

To address corruption issues with bursts, the team is working to enhance the data path diagnostics tool to handle traffic at very high rate (20K packets per second instead of one packet per second or so) in offline mode.

Their goal is to provide a modular framework which is easily extensible to address all the current data path errors and any new ones identified in the future.

2025 SONiC Hackathon Most Wanted by Devs Award Spotlight: SONiC COWBOYS

Sunny Schatz — Mon, 24 Nov 2025 06:00:33 +0000

Among 31 teams from 12 organizations competing in this year’s SONiC Hackathon, one project earned exceptional enthusiasm from the developer community. The Most Wanted by Devs Award was awarded to the SONiC COWBOYS team from Alibaba Cloud’s Network R&D Department: Jiayi Hu (team lead), Yubin Li, Fengsheng Yang, and Shijie Yang. Their submission, COWBOYS, is a highly flexible, parallelized virtual testing framework that dramatically accelerates SONiC development workflows.

Introduction

SONiC COWBOYS is a highly integrated and efficient virtual framework that operates 24/7. It consists primarily of three components: BMS (Bare Metal Server), VendorSIM, and Jenkins. Traditionally, running a full test suite on a single T0 testbed using modules, fibers, fanout, servers, and switches takes more than 30 hours. However, COWBOYS can simultaneously execute up to eight testbeds per BMS, significantly shortening test time. It can also simulate a wide range of testbed topologies and run test cases under different environments, including large-scale and production scenarios.

The team’s solution was selected by SONiC developers across the community, earning the Most Wanted by Devs Award.

The Problem They Solved

First, let’s address the problems the team was facing. In today’s development and testing environment, there are several challenges. The two major problems are as follows:

1. Lack of resources.
Single-machine functional testing requires a significant amount of physical equipment. For example, building a standard ASW topology from sonic-mgmt requires modules, fibers, fanout devices, servers, switches, and more.

2. Lack of flexibility.
Current tests cannot easily change the test topology at any time. Because of the resource limitations above, teams must design a “perfect” topology that can support many test requirements with minimal changes. In reality, it is difficult to find a single topology that fits all situations.

There are also some minor problems, such as difficulty debugging in a terminal, long execution time (a single full test round can take more than 30 hours), and heavy migration effort from existing tests.

Their Hackathon Solution

To address the problems mentioned above, the team designed and built COWBOYS as the solution.

COWBOYS is made up of two main components: vSONiC and Jenkins.

vSONiC is a virtual switch running on BMS that can simulate physical devices almost 100 percent accurately, which means the team no longer needs a large amount of physical hardware. It also supports easy migration of existing test cases from physical environments to virtual ones.

Jenkins provides a platform that allows the team to quickly launch different devices and topologies. It also enables test cases to be split into multiple groups that run in parallel on different vSONiCs on the BMS, greatly reducing overall test time.

Taking Broadcom chip devices as an example, the team used BCMSIM as the ASIC model and used real AliNOS to drive it. This allows vSONiC to simulate almost the entire pipeline and all SAI/SDK logic. They also designed a framework to ensure that multiple vSONiCs can communicate with each other.

Impact & Benefits

With the basic vSONiC test framework in place, the team can build their own topologies using vSONiC and run any test they need.

They arranged all management cases and topologies for ASW/PSW/DSW and divided them into separate jobs to ensure that a single device model can complete all of its cases within four hours. Jenkins can now launch all parallel jobs with one click and run them non-stop, 24 hours a day.

In theory, the total testing time for all topologies (ASW + PSW + DSW) on a single device has been reduced from 30 hours to just 3 hours, a 90% improvement.

Lessons Learned

The team noted that it is important to focus on general, architecturally reusable efforts that improve efficiency. This mindset reflects the true spirit of open-source communities and aligns with their original motivation for building infrastructure.

Next Steps

The team hopes to custom-create large-scale simulations to test capabilities, while also simulating online production environments to reproduce and locate issues.
They hope to collaborate with the community to promote vSONiC testing with more chip emulators.
They hope to continue improving the visualization of test results.