Asterfusion Data Technologies https://cloudswit.ch Single Chip Cloud Fabric, RoCE for AIGC, DPU for Security and Storage, SONiC on PoE Access Mon, 02 Mar 2026 03:31:52 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 GPU Backend Fabric Design Guide for AI Compute Network https://cloudswit.ch/whitepapers/gpu-backend-fabric-design-guide/ Sat, 21 Feb 2026 16:33:53 +0000 https://cloudswit.ch/?p=23172

GPU Backend Fabric Design Guide for AI Compute Network

Preface

AI clusters involve three types of networks: Frontend Fabric, GPU Backend Fabric, and Storage Backend Fabric.

  • Frontend Fabric: used to connect to the Internet or storage systems for loading training data.
  • GPU Backend Fabric (Compute Network) : supports GPU-to-GPU communication, provides lossless connectivity and enables cluster scaling. It is the core carrier for training data interaction between GPU nodes.
  • Storage Backend Fabric: handles massive data storage, retrieval, and management between GPUs and high-performance storage.

This guide focuses on the design of 400G AI intelligent computing GPU backend networks at different scales. Using Asterfusion high-density 400G/800G data center switches as hardware carrier, the solutions implements Clos topology based on Rail-only and Rail-optimized architectures to provide standardized deployment guides.

Target Audience

Intended for solution planners, designers, and on-site implementation engineers who are familiar with:

  • Asterfusion data center switches
  • RoCE, PFC, ECN, and related technologies

  1. Overview

The rapid evolution of AI/ML (Artificial Intelligence/Machine Learning) applications has driven a continuous surge in demand for large-scale clusters. AI training is a network-intensive workload where GPU nodes interact with massive gradient data and model parameters at high frequencies. This drives the need for a network infrastructure defined by high bandwidth, low latency, and interference resistance.

Traditional general-purpose data center networks struggle to adapt to the traffic characteristics of AI training, which are dominated by “elephant flows” and low entropy. This often leads to bandwidth bottlenecks, transmission congestion, and latency jitter, failing to meet the rigorous requirements of AI training. As the “communication backbone” of the AI cluster, the backend network directly determines the efficiency of GPU compute release. Therefore, an efficient cluster networking solution is urgently needed to satisfy low-latency, high-throughput inter-node communication.

2. AI GPU Backend Network Architecture

2.1 Rail-Only Architecture

Leaf nodes connected to GPUs with the same index across different servers are defined as a Rail plane. That is, Rail N achieves interconnection for all #N GPUs via the N-th Leaf switch. As shown in the figure below, the GPUs on each server are numbered 0–7, corresponding to Rail 1–Rail 8. Intra-rail transmission occurs when the source and destination GPUs’ corresponding NICs are connected to the same Leaf switch. LLM (Large Language Model) training optimizes traffic distribution through hybrid parallelism strategies (Data, Tensor, and Pipeline parallelism), concentrating most traffic within nodes and within the same rail.

The Rail-only architecture adopts a single-tier network design, physically partitioning the entire cluster network into 8 independent rails. Communication between GPUs of different nodes is intra-rail, achieving “single-hop” connectivity.

Figure-1-gpu-backend-fabric-design-Rail-only Architecture

Compared to traditional Clos architectures, the Rail-only design eliminates the Spine layer. By reducing network tiers, it saves on the number of switches and optical modules, thereby reducing hardware costs. It is a cost-effective, high-performance architecture tailored for AI large model training in small-scale compute clusters.

2.2 Rail-Optimized Architecture

Building on the Rail concept, a basic building block consisting of a set of Rails is regarded as a Group, which includes several Leaf switches and GPU servers. As the cluster scale increases, expansion is achieved by horizontally stacking multiple Groups.

The compute network can be visualized as a railway system: compute nodes are “stations” loaded with computing power; Rails are “exclusive rail lines” connecting the same-numbered GPUs at each station to ensure high-speed direct access; and Groups are “standard platform” units integrating multiple tracks and their supporting switches. Through this modular stacking, an intelligent computing center can scale horizontally like building blocks, ensuring both ultra-fast intra-rail communication and efficient interconnection for 10,000-GPU clusters.

Figure-2-gpu-backend-fabric-design-Rail-optimized Architecture

As shown above, the key design of the Rail-optimized architecture is to connect the same-indexed NICs of every server to the same Leaf switch, ensuring that multi-node GPU communication completes in the fewest possible hops. In this design, communication between GPU nodes can utilize internal NVSwitch[1] paths, requiring only one network hop to reach the destination without crossing multiple switches, thus avoiding additional latency. The details are as follows:

  1. Intra-server: 8 GPUs connect to the NVSwitch via the NVLink bus, achieving low-latency intra-server communication and reducing Scale-Out network transmission pressure.
  2. Server-to-Leaf: All servers follow a uniform cabling rule: NICs are connected to multiple Leaf switches according to the “NIC1-Leaf1, NIC2-Leaf2…”.
  3. Network Layer: Leaf and Spine switches are fully meshed in a 2-tier Clos architecture.

A key design factor in multi-stage Clos architectures is the Oversubscription Ratio. This is the ratio of total downlink bandwidth (Leaf nodes to GPU servers) to total uplink bandwidth (Leaf nodes to Spine nodes), as shown below. If the ratio is greater than 1:1, the fabric may lack sufficient capacity to handle inter-GPU traffic when downlink traffic reaches line rate, potentially causing congestion or packet loss.

Figure-3-gpu-backend-fabric-design-Oversubscription Ratio in Rail-optimized Architecture

In short, a smaller oversubscription ratio leads to non-blocking communication but higher costs, while a larger ratio reduces costs but increases congestion risk. In high-performance AI networks, a 1:1 non-blocking design is generally recommended.

2.3 Traffic Path Analysis

The intra-server and intra-rail communication paths are similar for both architectures. Taking the Rail-optimized architecture as an example, the following analyzes inter-GPU communication paths in different scenarios:

  • Intra-server Communication

Intra-server Communication completed via NVSwitch without passing through the external network.

Figure-4-gpu-backend-fabric-design-Intra-server Communication

  • Intra-rail Communication

Intra-rail Communication forwarded through a single Leaf switch.

Figure-5-gpu-backend-fabric-design-Intra-rail Communication

  • Inter-rail (without PXN) and Cross-group Communication

Inter-rail communication is routed through the Spine layer. Similarly, inter-group communication traverses the Spine fabric to reach its destination.

Figure-6-gpu-backend-fabric-design-Inter-rail (without PXN) and Inter-group Communication

  • Inter-rail (with PXN) Communication

With PXN[2] technology, transmission is completed in a single hop without crossing the Spine.

Figure-7-gpu-backend-fabric-design-Inter-rail (with PXN) Communication

3. Technologies Supporting Lossless Networking

3.1 DCQCN Technology

RDMA (Remote Direct Memory Access) is widely used in HPC, AI training, and storage. Originally implemented on InfiniBand, it evolved into iWARP and RoCE (RDMA over Converged Ethernet) for Ethernet transport.

RoCEv2 utilizes UDP for transport, which necessitates end-to-end congestion control via PFC (Priority Flow Control) and ECN (Explicit Congestion Notification) to guarantee lossless performance. A PFC-only strategy risks unnecessary head-of-line blocking by halting traffic too aggressively. while a standalone ECN approach may suffer from reaction-time latency, potentially leading to buffer overflows and packet loss. Consequentlya unified congestion control strategy is required to balance responsiveness with stability.

DCQCN (Data Center Quantized Congestion Notification) serves as a hybrid congestion control algorithm designed to balance throughput and latency. It triggers ECN during the early congestion to proactively throttle the NIC’s transmission rate. Should congestion intensify, PFC acts as a fail-safe to prevent buffer overflows by exerting backpressure hop-by-hop.

The DCQCN operational logic follows a structured hierarchy:

  1. ECN First (Proactive Intervention): As egress queues begin to accumulate and breach WRED thresholds, the switch marks packets (CE bits). Upon receiving these marked packets, the destination node generates CNPs (Congestion Notification Packets) directed back to the sender, which then smoothly scales down its injection rate to alleviate pressure without halting traffic.
  2. PFC Second (Reactive Safeguard): If congestion persists and buffer occupancy hits the xOFF threshold, the switch issues a PAUSE frame upstream. This temporarily halts transmission for the affected queue, ensuring zero packet loss.
  3. Flow Recovery: Once buffer levels recede below the xON threshold, a RESUME frame is sent to notify the upstream device to resume the transmission.

To streamline the complexities of lossless Ethernet, Asterfusion has introduced the Easy RoCE capability in AsterNOS. This feature automates optimized parameter generation and abstracts intricate configurations into business-level operations, significantly enhancing cluster maintainability.

3.2 Load Balancing Technology

ECMP (Equal-Cost Multi-Path) per-flow load balancing is the most widely used routing strategy in data center networks. It assigns packets to several paths by hashing fields, such as the IP 5-tuple. This approach is known as static load balancing.

However, per-flow hashing struggles with uniform distribution when traffic lacks entropy. The impact is severe during “elephant flows”, which overwhelm specific member links and trigger packet loss.

AI workloads further challenge this model. Deep learning relies on collective communication (e.g., All-Reduce, All-Gather, and Broadcast) that generates massive, bursty traffic reaching Terabits per second (Tbps). These operations are subject to the “straggler effect” — where congestion on a single link bottlenecks the entire training job. This makes traditional ECMP unfit for RoCEv2-based AI backend fabrics.

To address this, the following solutions are introduced:

3.2.1 Adaptive Routing and Switching (ARS)

ARS is a flowlet-based load balancing technology. Leveraging hardware ALB (Auto-Load-Balancing)[3] capabilities, ARS achieves near per-packet equilibrium while mitigating packet reordering. The technology partitions a flow into a series of flowlets based on gap time. By sensing real-time link quality—such as bandwidth utilization and queue depth—ARS dynamically assigns flowlets to the most idle paths, maximizing overall fabric throughput.

3.2.2 Intelligent Routing

Intelligent routing provides both dynamic and static mechanisms.

  • Dynamic Intelligent Routing: This strategy evaluates path quality based on bandwidth usage, queue occupancy, and forwarding latency. Bandwidth and queue statistics are pulled from hardware registers at millisecond-precision, while latency is monitored via INT (In-band Network Telemetry) at nanosecond-resolution. Switches exchange this real-time telemetry via BGP extensions and utilize dynamic WCMP (Weighted Cost Multipath) to steer traffic toward the optimal path, proactively eliminating bottlenecks.
  • Static Intelligent Routing: Designed for scenarios requiring high path stability, this method uses PBR (Policy-Based Routing) to enforce deterministic forwarding. By binding specific GPU traffic to dedicated physical paths (Leaf-to-Spine), it ensures a strict 1:1 non-blocking oversubscription for fixed traffic models.

3.2.3 Packet Spraying

Packet Spraying [4]is a per-packet load balancing technique that distributes packets uniformly across all available member links to prevent any single-path congestion. It supports two primary algorithms:

  • Random: Disperses packets across members using a randomized distribution.
  • Round Robin: Sequences packets across members in a cyclic, equal-weight manner.

While packet spraying theoretically maximizes network utilization, it introduces the challenge of packet reordering due to varying link latencies. Thus, this technology requires robust hardware support, specifically high-performance NICs capable of sophisticated out-of-order reassembly at the endpoint.

4. Building A 400Gbps GPU Backend Fabric for AI Compute Network

Based on hardware cost and scalability, the following design recommendations are provided:

Table 1: Solution Design by GPU Cluster Scale

GPU Cluster Scale

Design Recommendation

32–256 GPUs

Using CX732Q-N as Leaf nodes in a single-tier Clos Rail-only architecture, supporting up to 256 GPUs.

256–1024 GPUs

Using CX864E-N as Leaf nodes in a single-tier Clos Rail-only architecture, supporting up to 1024 GPUs.

1024–2048 GPUs

Using CX732Q-N as Leaf nodes and CX732Q-N or CX864E-N as Spine nodes to build a 2-tier Clos Fabric with Rail-optimized architecture, supporting up to 2048 GPUs. At least 2 Spine nodes are recommended for redundancy.

2048–8192 GPUs

Using CX864E-N as both Leaf and Spine nodes to build a 2-tier Clos Fabric with Rail-optimized architecture, supporting up to 8192 GPUs.

4.1 Small-Scale Cluster Design

4.1.1 Standardized Networking Solution

Figure-8-gpu-backend-fabric-design-Standardized 400G AI Backend Network for Small-Scale Clusters

The figure above illustrates a Rail-only architecture for a 400G AI backend network consisting of 32 compute nodes (256 GPUs) with 8 CX732Q-N switches deployed as Leaf nodes.The key design principles are as follows:

  • Each GPU connects to a dedicated NIC; NICs follow the “NIC N to Leaf N” rule. Independent subnets per Rail.
  • Single-tier Clos architecture. 
  • Easy RoCE enabled on Leaf switches.

4.1.2 Hardware Selection

For small-scale 400Gbps RoCEv2 fabrics, Asterfusion CX864E-N or CX732Q-N switches are recommended. Taking the NVIDIA DGX H100 server (equipped with 8 GPUs) as a baseline, the maximum capacity for different models is summarized below:

Table 2: Max Capacity per Model (Rail-only Architecture)

ModelMax GPUs per SwitchMax GPUs (8 Switches)Max Servers (8 Switches)
CX732Q-N3225632
CX864E-N1281024128

Note: CX864E-N provides 64 x 800G ports, which can be split into 128 x 400G ports.

Example: Building a 512-GPU Cluster.

To build a cluster with 64 H100 servers (512 GPUs) using CX864E-N as Leaf nodes:

  • Number of Leaf Nodes Required = 512 / 128 = 4 
  • Scalability Limit (Leafs) = 8 (matching the 8 GPUs per server)
  • Scalability Limit (GPUs) = 8 * 128 = 1024

Node Requirements and Scalability Summary:

  • Number of Leaf Nodes = Total GPUs / Max GPUs per switch.
  • Maximum Scalability (Leafs) = Number of GPUs per server. 
  • Maximum Scalability (Total GPUs) = GPUs per server * Max GPUs per switch.

4.2 Medium-to-Large Scale Cluster Design

4.2.1 Standardized Networking Solution

Figure-9-gpu-backend-fabric-design-Standardized 400G AI Backend Network for Medium-to-Large Clusters

The figure above depicts a Rail-optimized architecture for 128 compute nodes (1024 GPUs). It employs 24 CX864E-N switches (8 Spines, 16 Leafs) organized into two Groups, with 8 Leaf nodes per Group. Key design principles include:

  • Each GPU connects to a dedicated NIC; NICs follow the “NIC N to Leaf N” rule. Independent subnets per Rail.
  • 2-Tier Clos Fabric: Leaf and Spine switches are fully meshed. Leveraging IPv6 Link-Local, unnumbered BGP neighbors are established to exchange Rail subnet routes, eliminating the need for IP planning on interconnect interfaces.
  • 1:1 Oversubscription: To ensure non-blocking transport, the oversubscription ratio on Leaf switches is strictly maintained at 1:1.
  • Unified Lossless Fabric: Easy RoCE and advanced load balancing features are enabled on both Leaf and Spine nodes.

4.2.2 Hardware Selection

For these fabrics, we recommend CX864E-N and CX732Q-N due to their ultra-low latency. The CX864E-N offers end-to-end latency as low as 560ns, while the CX732Q-N reaches 500ns. This ensures intra-rail latency remains around 600ns and Inter-rail (3-hop) latency stays under 2μs.

In a Rail-optimized design, the number of Leaf nodes per Group matches the number of GPUs per server (Rails). For H100 servers (8 GPUs), each Group contains 8 Leaf nodes. To maintain a 1:1 oversubscription, half of the Leaf’s ports connect to GPUs and half to Spines.

Table 3: Maximum Capacity per Group (Rail-optimized Architecture)

Leaf ModelAvailable 400G PortsMax GPUs/Servers per Group
CX732Q-N32128 / 16
CX864E-N128512 / 64

Spine Node Calculation: The number of Spine nodes is determined by the port density (radix) of the Leaf nodes. If Leaf and Spine switches provide M and N ports respectively, the required number of Spines = (Total Leafs * M / 2) / N. If Leaf and Spine use identical models, the Spine count = Total Leafs / 2.

Example:

Building a 4096-GPU Cluster To build a cluster with 512 H100 servers (totaling 4096 GPUs) using CX864E-N for both Leaf and Spine layers, the calculation is as follows:

  • Leaf nodes per Group = 8
  • Max servers per Group = 128 / 2 = 64
  • Max GPUs per Group = 64 * 8 = 512
  • Number of Groups required = 4096 / 512 = 8
  • Total Leaf count = 8 (per Group) * 8 (Groups) = 64 Nodes
  • Total Spine count = 64 (Leafs) / 2 = 32 Nodes

Scalability Limits (CX864E-N as Spine/Leaf): When designing a compute network, scalability is limited by the Spine switch radix. For the CX864E-N (128 x 400G ports), the theoretical maximum scale is:

  • Max Groups Supported: 128 (Spine ports) / 8 (Leafs per Group) = 16.
  • Max Servers: 16 * 64 = 1024. – Max GPUs: 16 * 512 = 8192.

The following tables detail the node configuration requirements for deploying backend networks of varying GPU scales using the CX864E-N and CX732Q-N in Rail-optimized architecture:

Table 4: Node Requirements for CX864E-N

Total GPUs/ServersLeaf NodesSpine Nodes400G Links (per Leaf-Spine)
256 / 324232
512 / 648416
1024 / 1281688
2048 / 25632164
4096 / 51264322
8192 / 1024128641

Table 5: Node Requirements for CX732Q-N

Total GPUs / ServersLeaf NodesSpine Nodes400G Links (per Leaf-Spine)
128 / 16844
256/321682
512/6432161

Node Requirements Summary:

For a given cluster size, the required number of components is determined as follows:

  • Leaf Nodes per Group = Number of GPUs per server.
  • Max Servers per Group = Available Leaf ports / 2 (based on 1:1 oversubscription).
  • Max GPUs per Group = Max servers per Group * GPUs per server.
  • Total Number of Groups = Total target GPUs / Max GPUs per Group.
  • Total Leaf Count = Leaf nodes per Group * Total number of Groups.
  • Total Spine Count = (Total Leaf count * Leaf port count / 2) / Spine port count.
    Note: M is the port count of the Leaf switch and N is the port count of the Spine switch

Maximum Scalability Limits Summary:

The ultimate scale of a 2-tier Clos network is physically constrained by the Spine switch radix (port count):

  • Max Supportable Groups = Spine available ports / Leaf nodes per Group.
  • Max Supportable Servers = Max supportable Groups * Max servers per Group.
  • Max Supportable GPUs = Max supportable Groups * Max GPUs per Group.

The configuration guides for small-scale and medium-to-large-scale AI Compute Backend Networks are shown below. Click each link for detailed configuration commands.

5. Conclusion

By leveraging Rail-only and Rail-optimized architectures, this solution minimizes communication hops between GPUs, significantly accelerating alltoall performance and reducing overall training cycles. This design provides a robust and scalable framework for AI compute fabrics of any magnitude. For detailed deployment cases and configuration specifics, please refer to our Best Practices documentation.

[1] NVSwitch: A high-speed switching chip by NVIDIA designed for Scale-Up fabrics. It enables multi-GPU communication at maximum NVLink speeds within a single node.
[2] PXN (PCIe x NVLink): A pivotal NCCL technology that allows a GPU to aggregate data via NVLink to a peer GPU directly connected to a NIC. This data is then dispatched via PCIe, significantly enhancing the efficiency of cross-node collective communication.
[3] Supported by CX864E-N.
[4] Supported by CX864E-N.


]]>
2025Q4 AsterNOS-VPP for Router Release Note Version V6.1R0102P02 https://cloudswit.ch/sonic-software-updates/asternos-vpp-v61-r0102p02/ Fri, 13 Feb 2026 06:27:09 +0000 https://cloudswit.ch/?p=23155

Enterprise SONiC Distribution (AsterNOS-VPP) for Router Release Note Version V6.1R0102P02

Date: February 9, 2026

Modify Remarks: AsterNOS-VPP_V6.1_R0102P02 released.

Target Audience

This manual is primarily intended for following engineers.

  • Software Developers
  • Software Testers
  • Customer Site Implementers

1. Instruction

The release version is AsterNOS-VPP_V6.1_R0102P02.

AsterNOS-VPP_V6.1_R0102P02.bin for ET2508-4S4M8ET3608-2P2SET3616-4P4S.
MD5: 9b3f6fc0d2825001caf0bcf271ed7c23

AsterNOS-VPP_V6.1-R0102P02_x86.img.gz  for x86_VM
MD5: 9f96cf4ba5fb1928155d0236df0cc9d1

2. Update Records

2.1 New Features

  • Support for ET3600 hardware platform
  • Support for MPLS protocol
  • Support for IP multicast functionality
  • Support for HQoS (Hierarchical Quality of Service)
  • Support for PPPoE Server functionality
  • Support for IPSec offload functionality
  • Support for RIP protocol
  • Support for MAP-CE (Mapping of Address and Port – Customer Edge) functionality
  • Support for HASH
  • Support for ACL Table priority

2.2 Bug Fixes and Optimizations

[BGP] Issue where BGP connections established using MD5 authentication failed to take effect
[Performance] Packet loss observed during extended periods of high‑load performance testing

3. List of Features

Function
Level 1 Function
Level 2 Function
Interface
Port Speed
Port configuration
PoE Interface
Interface Statistics
Module information acquisition
Interface Bandwidth Utilization Alert
Port batch configuration
Loopback Interface
LAN Interface
*WAN Interface
Layer 2 Forwarding
MAC
VLAN
QinQ
LAG
STP/MSTP
LLDP
*MVRP
Port isolation


IP Unicast Routing


Static routing
VRF
Policy Based Routing
BGP / MP-BGP
OSPF v2/v3
*RIP v1/v2
Routing policy
ECMP/UCMP
IP Service
L3 interface
ARP/NDP
ARP/ND to host
DHCPv4 Server
DHCPv4 Relay
DHCPv6 Server
DHCPv6 Relay
*DNS
*NAT
*MAP-E
*MAP-T
IP Multicast
IGMP Snooping
MLD Snooping
PIM
MPLS Service
Dynamic LSP
Static LSP
Enable/Disable on phy port
L2VPN
L3VPN
Tunnel
VXLAN
Security
BUM packet policy based on Interface
Storm Suppression based on interface
System user access control policy
DHCP v4/v6 Snooping
ND Snooping
ND policy
DAI (Dynamic ARP Inspection)
IPSG v4/v6
ACL
*SPI(Stateful packet inspection)
*Unicast Reverse Path Forwarding (uRPF)
*PPPoE Client
*PPPoE Server
*Ipsec VPN
*Wireguard security VPN
High Availability
MC-LAG
VRRP
BFD
Monitor Link
SLA
Routing Track
Hash

QoS/HQoS

priority mapping
queue schedule
speed limit, support 1r3c(RFC 2697)/2r3c(RFC 2698/4115)
shaping
flow classify
show qos status
*HQoS
Reporting and Monitoring
SPAN/ ERSPAN
SNMP
NetFlow/IPFIX
Prometheus Exporter
User Access and Authentication
Dot1x
Portal
Device Management
System user/privilege management
Login method
Management
Troubleshooting information
ZTP (Zero Touch Provisioning)
Configure host name
System time management
Configuration management
License management
FTP/TFTP
Device status summary
Critical Resource Monitoring (CRM)
Exception alarm
NTP Client
Log management
Diagnostic tools


]]>
APNOS for AP Release Note https://cloudswit.ch/sonic-software-updates/apnos-version-r008/ Mon, 02 Feb 2026 10:21:21 +0000 https://cloudswit.ch/?p=23049

APNOS for AP Release Note Version R008

Date: January 21, 2026

Modify Remarks: APNOS version-R008 released

1 Preface

The purpose of this document is to provide important information about the released software version, including but not limited to the following information: running platform, important components, main features, key updates.

Target Audience

This manual is primarily intended for following engineers.

  • Software Developers
  • Software Testers
  • Customer Site Implementers

2 Image Information

Type
MD5
Image
AP6020
38fa440f4d7635b0b8b3506fb1d6554e
AsterAPNOS-AP6020-V4.2R008.tar
AP6020F
0809c33508770d89aef0df89a076e809
ASterAPNOS-AP6020F-V4.2R008.tar
AP6020W
7107465c7aadd6800d4da39c72fb10c7
ASterAPNOS-AP6020W-V4.2R008.tar
AP6031
68b8fa04dbb1ed650a25069de0aa1e29
ASterAPNOS-AP6031-V4.2R008.tar
AP6240
2062d7296beb80c95c27cf7b8140d154
ASterAPNOS-AP6240-V4.2R008.tar
AP6241E
623b26fdea0accf26f629517dec15b7b
ASterAPNOS-AP6241E-V4.2R008.tar
AP7330
3439592eb1a78413c52f73d584526d66
ASterAPNOS-AP7330-V4.2R008.tar
AP7360
85297b4f5eecc462334048512f1eed20
ASterAPNOS-AP7360-V4.2R008.tar
AP7341E
1790cbeeeb0b36aed28377f7e742cf04
ASterAPNOS-AP7341E-V4.2R008.tar
AP6020WF
8b27ff2652379cdc57ed27e88c64fb33
ASterAPNOS-AP6020WF-V4.2R008.tar

3 Depend Components

Components
Versions
OpenWiFi
4.2.0
OpenWrt
23.05
Kernel
5.4 & 6.1

4 List of Features

Features
Level 1
Level 2
Wireless features
OFDMA(Orthogonal Frequency Division Multiple Access)
MU-MIMO (Multi-User, Multiple-Input, Multiple-Output)
BSS Coloring
multicast/broadcast to unicast conversion
SSID
Chinese SSID
Hidden SSID
Roaming
802.11 k
802.11 v
802.11 r
Sticky Client Steering
Isolation
VLAN-based isolation
isolation between VAPs
isolation based on SSID
Wi-Fi 7
802.11be
Multi-RU
4096-QAM modulation
Network features
DHCP Client
VLAN
tagged
untagged
VXLAN
Establish VXLAN tunnels in centralized gateway scenarios
ACL
MAC address-based whitelist and blacklist
LLDP(Link Layer Discovery Protocol)
DHCP Server
LAN Port
Assigning different VLANs
Fiber Module Monitor
Radio frequnency management
Dynamic Frequency Selection(DFS)
Auto Channel Selection(ACS)
Automatic channel scanning
Automatic channel adjustment
Manually disable RF
Maximum number of connected clients
100+
Maximum number of SSIDs per radio
8
Security features
WPA-SPK
WPA2-PSK
WPA-SPK + RADIUS(MAC)
WPA-PSK/WPA2-PSK Personal Mixed

WPA-Enterprise
RADIUS
WPA2-Enterprise EAP-TLS
WPA-Enterprise-Mixed
SAE
WPA2/WPA3 Transitional
Simultaneous Authentication of Equals(SAE)
WPA3-Enterprise EAP-TLS

WPA3-Enterprise-Mixed
Vendor-based Access Control
Portal authentication
Built-in/Local Portal Authentication
External Portal Authentication
portal + radius
portal 2.0
AAA
RADIUS Authenticated SSID
RADIUS MAC-Auth(MAB)
Dynamic VLAN
Accounting
Escape
Managemement
Time settings
Network Time Protocol (NTP)
TimeZone
Collection metrics
Device status information
Client information
Health data reporting
Device reboot

Device upgrade
Patch
Restore factory settings

remote terminal (rtty)

Remote commands/scripts

Telemetry
wifi-frame
dhcp
state
ZTP

Maintenance
SSH

Wireline & Wireless Tracing (PCAP Cloud Remote Troubleshooting)
Capture packets remotely through the cloud management platform
log
System Administration
Station Lifecycle Information Reporting
syslog client

Station Key Event
wifi event
dhcp event
reject by ACL or Radius
Association greater than max number
Reject for low wifi signal
Reject by Vendor Access List
Inspection
Association List
Usage of wifi radio bandwidth
Warning
Device
Service
GPS
Support reporting GPS data
tech support

QoS

ratelimit
Bandwidth throttling based on SSID
priority
DSCP mapping based on port
DSCP mapping based on DNS
WMM(Wi-Fi Multimedia)
License
License control
License management for radio frequency
sync with Openwifi
synced with latest V4.2


]]>
OpenWiFi Controller Version V1.0R009 Update in 2025 Q4 https://cloudswit.ch/sonic-software-updates/openwifi-controller-v1-0r009/ Fri, 30 Jan 2026 10:51:03 +0000 https://cloudswit.ch/?p=23035

Campus OpenWiFi Controller Release Note Version V1.0R009

Date: January 14, 2026

Modify Remarks: Campus Controller –V1.0R009 released.

Target Audience

This manual is primarily intended for following engineers.

  • Software Developers
  • Software Testers
  • Customer Site Implementers

1 Description

The release version is V1.0R009, the specific information is as follows:

  • Image:
    • controller_V1.0_R09.bin, md5sum:ef8507accedb4b54c5a2b77536945671
    • controller_V1.0_R09_arm.bin, md5sum:8e159d5a1d9c253f98d1eefaafd82ff2
    • CX-M-SW-2025.12.25-00229.bin, md5sum:90f02f52f8572021f8141dcb88548cd0

uCentral Client for switch

  • Supporting user manual
    • User Manual-Campus Controller Usage Guide-en-v9.0

2 Update Records

2.1 New Feature

  • Network and Topology
    • Support open cloud interconnection scenarios
  • OLT Stick/ONU Stick management
    • Support status management
    • Support configuration deployment/work mode switching/blacklist management
    • Support 0&M alarms
  • Operations and Visualization
    • Support Dashboard monitoring data indicators by cient dimension/operations dimension
    • Support AP alarms/inspection/patch management
    • Support AP RF signal shutdown
    • Support device diagnostic operation commands
  • Client Management
    • Support full life cycle management of wireless clients

2.2 Improvements and Enhancements

  • WEBUI Page Optimization
    • Optimize the Dashboard monitoring metrics to jump to device/client details
    • Optimize OAuth 2.0 login process
    • Optimize switch device interface panel display
    • Optimize styles of some chart components
  • Upgrade and Installation
    • Support controller/switch/AP file upgrade via URL download
    • Support uploading controller image files/switch version/AP version files with real-time progressbar display

2.3 Bugs

  • Operations and Maintenance
    • After sorting the alert list, paging does not work and the default sorting method is restored
    • After sorting the terminal list, paging does not work and the default sorting method is restored
  • Others
    • Corrected internationalization information for some entriesFixed occasional login failure issue for OAuth 2.0 users

2.4 Document Changes

2.4.1 User Guide

  • Added configuration instructions for new features

3 Upgrade Considerations

  • The switch version to which the controller adapts is AsterNOs-V5.2R013, and later, other switch versions need to update the ucentral client. Refer to Chapter Three for the corresponding ucentral client version.
  • The controller deployed on the cloud needs to add a port access policy to the Alibaba Cloud deployment example:
Term
Direction
Policy
IP Protocol
Port Range
Priority
SIP Address
...
Description
intranet
ingress
Accept
UDP
64218/64218
1
0.0.0.0/0
olt stick
intranet
ingress
Accept
TCP
16013/16013
1
0.0.0.0/0
owupgrade
intranet
ingress
Accept
TCP
16012/16012
1
0.0.0.0/0

owmgmt
intranet
ingress
Accept
TCP
16011/16011
1
0.0.0.0/0
owm
intranet
ingress
Accept
TCP
16008/16008
1
0.0.0.0/0

ownacm
intranet
ingress
Accept
TCP
16006/16006
1
0.0.0.0/0

owsub
intranet
ingress
Accept
TCP
15002/15002
1
0.0.0.0/0

owgw
intranet
ingress
Accept
TCP
16002/16003
1
0.0.0.0/0

owgw
intranet
ingress
Accept
TCP
16004/16004
1
0.0.0.0/0

owfms
intranet
ingress
Accept
TCP
16009/16009
1
0.0.0.0/0

owanalytics
intranet
ingress
Accept
TCP
16005/16005
1
0.0.0.0/0

owprov
intranet
ingress
Accept
TCP
16001/16001
1
0.0.0.0/0

owsec
intranet
ingress
Accept
TCP
5912/5913
1
0.0.0.0/0

owgw
intranet
ingress
Accept
TCP
8088/8088
1
0.0.0.0/0

port-server
intranet
ingress
Accept
TCP
443/443
100
0.0.0.0/0

port-server


]]>
AsterNOS Data Center-Fast CNP Congestion Notification Technology White Paper https://cloudswit.ch/whitepapers/fast-cnp-congestion-notification-technology/ Fri, 23 Jan 2026 08:29:22 +0000 https://cloudswit.ch/?p=22925

Data Center Fast CNP Congestion Notification Technology White Paper

1 Background

In data center and high-performance network environments, network congestion is a critical issue affecting data transmission efficiency and service quality.
As shown in the figure below, traditional congestion control mechanisms require network devices to detect congestion, mark the ECN field in packets, and forward them to the traffic receiver. After receiving the marked packets, the receiver sends a CNP(Congestion Notification Packet)[1] to the traffic sender through upper-layer protocols (such as RoCEv2). The sender then reduces its transmission rate upon receiving the CNP. This prolonged congestion feedback path can result in feedback delays of up to half an RTT(Round-Trip Time)[2], preventing sender servers from reducing traffic in a timely manner. This leads to further congestion deterioration due to increased buffer occupancy in forwarding devices, potentially triggering network-wide traffic suspension caused by PFC flow control.

PoE-intro

Figure 1: Traditional CNP Congestion Feedback Path

To address the slow feedback issue in traditional congestion control mechanisms, the industry has introduced Fast CNP (Fast Congestion Notification Packet) technology. By optimizing congestion marking and feedback paths, this technology significantly improves the real-time responsiveness and effectiveness of network congestion control, making it a core technology for modern data center network optimization.

PoE-intro

Figure 2: Fast CNP Congestion Feedback Path

2 Operating Principles

2.1 Basic Concepts

Fast CNP technology incorporates the following concepts:

Table 1: Fast CNP Related Terms and Definitions

Term
Definition
Flow
A group of packets sharing common attributes (typically IP 5-tuple)
Flow Table
A collection of information entries recording sender and receiver IP addresses and QP numbers from packets
Session
A network communication connection established based on the RoCEv2 protocol for data exchange

2.2 Flow Table Maintenance

Fast CNP technology actively learns information from packets passing through the device and establishes relevant flow tables on switches, thereby obtaining information about traffic senders and receivers. When congestion occurs, the switch directly constructs corresponding CNPs for flows on the congested path and sends them to senders to reduce the transmission rate of relevant flows, achieving rapid congestion feedback. Flow entries in the flow table support aging mechanisms based on either entry capacity or time. When disconnect request packets are detected, corresponding flow entries are removed from the flow table.

2.2.1 Flow Table Establishment

poe-power-supply-singal-power-mode

Figure 3: RoCEv2 Session Establishment Process

As shown above, when a sender and receiver interact through the RoCEv2 protocol, session establishment is completed through a four-way CM message exchange:

By capturing CM interaction messages, the switch can extract key information including source/destination IP, source/destination QP, and determine whether the corresponding RoCEv2 session has been successfully established. When a session is successfully established, the corresponding flow entry is added to the flow table.

2.2.2 Flow Table Updates

Figure 4: RoCEv2 Data Interaction Process

After connection establishment, the sender and receiver complete data exchange within the session through RC Send/Write/Read and RC ACK messages.

RC Send/Write/Read messages contain only the receiver’s QP number, while RC ACK messages contain only the sender’s QP number. During data exchange, the switch continuously captures Send and ACK messages in the RoCEv2 flow, extracting source/destination IPs and destination QP numbers. It then queries the flow table and updates the flow expiration time, ensuring that active entries in the flow table do not age out while the switch is carrying service traffic.

2.2.3 Flow Table Aging

RoCEv2 Session Disconnection Process

Figure 5: RoCEv2 Session Disconnection Process

After data interaction is complete, the sender and receiver complete session disconnection through a two-way CM exchange:

By capturing RoCEv2 CM messages, the switch can extract source/destination IPs and destination QP numbers, and determine whether the corresponding RoCEv2 session has been disconnected. When a session is disconnected, the flow table is queried and the corresponding flow entry is removed, achieving a session-state-based flow table aging mechanism.

Additionally, aging mechanisms based on entry capacity or time are supported. If the number of flow entries in the flow table reaches the user-configured flow table size, newly added flow entries will replace the least active flow entries in the table, preventing entry resource overflow. When a RoCEv2 session has no data exchange for an extended period and the idle time exceeds the user-configured threshold, the switch considers the session expired and removes the corresponding flow entry from the flow table.

2.3 Congestion Feedback

2.3.1 Congestion Detection

Through forwarding delay monitoring technology, the switch can capture packets whose forwarding delay exceeds the user-configured threshold and record their forwarding delay. Since forwarding delay is strongly correlated with queue depth, the switch linearly converts the recorded delay value to queue depth and compares it with the real-time available buffer of the queue to confirm whether congestion has occurred. If congestion is detected, a CNP is constructed to notify the sender to reduce its transmission rate.

2.3.2 Congestion Notification

When congestion is confirmed on a particular path, the switch constructs a corresponding CNP for flows on that path. A properly formatted CNP that can be accepted and processed by NICs must contain the following information:

  • Source MAC address / Destination MAC address
  • Source IP address / Destination IP address / IP-DSCP value
  • Destination UDP port
  • Opcode, Destination QP number

The source MAC address/destination MAC address, source IP address/destination IP address, and destination QP number can be obtained by querying the flow table. The destination UDP port and CNP Opcode can be determined through RoCEv2 protocol specifications. The IP-DSCP value is associated with endpoint NIC configuration and is typically manually configured by users.

After constructing the appropriate CNP, the switch directly sends it to the sender, enabling timely rate reduction.

3 Typical Application Scenarios

3.1 High-Bandwidth RoCEv2 Networks in Data Centers

poe-power-negotiation

Figure 6: Traditional CNP Feedback Path in High-Bandwidth RoCEv2 Data Center Networks

In high-bandwidth networks, due to multiple links and flows transmitting simultaneously, link bandwidth growth often far exceeds forwarding device buffer capacity growth.

Traditional congestion feedback mechanisms—switch ECN marking plus endpoint device CNP feedback—have relatively long paths. As shown in Figure 6, when multiple servers in POD#1 interact with Server65 in POD#2 and congestion occurs at Leaf1, the switch marks congested flow packets with ECN. These ECN-marked packets flow through Spine1 and Leaf9 before reaching Server65. During this process, since Server65 has not yet sent CNP notifications to reduce the transmission rate of multiple servers in POD#1, congestion at Leaf1 will further intensify, potentially causing buffer overflow and triggering PFC flow control.

Figure 7: Fast CNP Feedback Path in High-Bandwidth RoCEv2 Data Center Networks

As shown in Figure 7, after enabling Fast CNP functionality on Leaf1, when congestion occurs at Leaf1, the switch directly sends CNPs to multiple servers in POD#1, effectively shortening the CNP feedback path. This allows senders to reduce transmission rates in time before forwarding device buffers overflow, significantly reducing PFC trigger probability and improving overall network bandwidth utilization while ensuring network-wide traffic stability.


[1] CNP (Congestion Notification Packet): A protocol control packet sent by forwarding devices or receivers to notify senders to reduce their transmission rate.
[2] RTT (Round-Trip Time): The total time required for a data packet to travel from sender to receiver and back to sender, serving as a key metric for measuring network latency.


]]>
AsterNOS for Campus Release Note Version V5.2R015 https://cloudswit.ch/sonic-software-updates/asternos-v52r015/ Fri, 23 Jan 2026 02:36:09 +0000 https://cloudswit.ch/?p=22838

Enterprise SONiC Distribution (AsterNOS) for Campus Release Note Version V5.2R015

Date: December 31, 2025

Modify Remarks: AsterNOS version-V5.2R015 released

1 Preface

The purpose of this document is to provide important information about the released software version, including but not limited to the following information: running platform, important components, main features, key updates.

Target Audience

This manual is primarily intended for following engineers.

  • Software Developers
  • Software Testers
  • Customer Site Implementers

2 Description

AsterNOS is a SONiC-based network operating system. The release version is AsterNOS-V5.2R015; the specific information is as follows:

  • AsterNOS-V5.2R015.bin
    • Md5sum: f2eb04a664cdf9ee5336bbbe748160e4
    • Supported models: All device models of CX102S, CX104S, CX204Y, CX206Y, CX202P, CX204P, CX206P series
  • AsterNOS-V5.2R015-x86.bin
    • Md5sum: 97686f7f07293dae8cac7ad54ee46be7
    • Supported models: CX306P, CX308P, CX532P series
  • Supporting user manual
    • AsterNOS-Command_Line_Manual-en-v5.2.15
    • AsterNOS-Configuration_Guide-en-v5.2.15

3 Dependent Components

Components
Versions
Linux kernel
5.10.34
SAI
1.6.3
FRR
8.2.2
lldpd
1.0.5
libteam
1.29
snmpd
5.7.3
redis
5.0.3
isc-dhcp
4.3.5
radvd
2.17
mstp
0.0.9
parpd
1.0.0
ndppd
1.0.0
docker-engine
18.09.5

4 Update Records

4.1 New Feature & Enhance

  • Supported the dynamic RP function for PIM
  • Supported the GNSS-based time synchronization in PTP
  • Supported the global ACL mode
  • Supported the ACL filtering of DSCP field
  • Supported the dynamic VLAN authorization vid VLAN Pool and VLAN Name
  • Supported the DHCP Server to perform discontinuous IP address allocation range
  • Supported the DHCPv6 Relay to carry the client MAC address through Option79
  • Supported the tunnel routing leakage between VRFs
  • Supported the data packet statistics based on VLAN
  • Supported the switch of STP/RSTP/MSTP modes for Spanning Tree Protocol
  • Supported the viewing of changes between running-config and startup-config
  • Supported the new models of switch:
    • CX306P-48Y
    • CX204P-16Y
    • CX104S-8MT24GT
  • Enhanced QoS Policer specifications
  • Defaulted to disabling rarely used containers
  • Optimized system startup time in pure Layer 2 scenarios
  • Enhanced dot1x authentication, allowing dynamic VLANs carried in challenge messages to be authorized to users
  • Optimized the dynamic IP acquisition process for interfaces, not granting IP addresses to VLAN members/LAG members

4.2 Major Bug Fix

  • BGP: An abnormal “no extended-nexthop” configuration appears in the BGP view.
  • Klish: The “show running-config” command unexpectedly loses some content.
  • Tunnel Route: When the next hop of the tunnel route is ECMP, there is a failure in the issuance of the chip.
  • Dot1x: During dynamic VLAN authorization, when the vlanid is a string, it causes the system to restart.

4.3 Document Changes

4.3.1 Command Line

  • Chapter 1.2 Added new view-related CLI commands
  • Chapter 1.4 Added image partition management CLI command: set-default for setting the default boot image
  • Chapter 1.5 Added enhanced configuration management commands
  • Chapter 2.1 Added license and docker management commands
  • Chapter 2.6 Added system operation and maintenance (O&M) commands
  • Chapter 2.7 Added a new subchapter for Controller Configuration
  • Chapter 3.1.8 Added descriptions and CLI commands for viewing and managing DPU expansion ports
  • Chapter 4.4 Added VLAN-related CLI commands
  • Chapter 4.7 Updated STP-related show commands
  • Chapter 5.4.1 Added support for Option 79 configuration commands
  • Chapter 5.5.9 Updated the DHCP Server address-pool configuration
  • Chapter 7.4 Added CLI commands for dynamic RP, Auto-RP configuration
  • Chapter 8.1 Add global ACL binding configuration commands
  • Chapter 14.1.3 Added CLI commands to display MACsec status and statistics

4.3.1 User Guide

  • Chapter 6.5.2.9 Added dynamic RP-BSR configuration
  • Chapter 6.5.3.3 Added dynamic RP configuration
  • Chapter 6.5.3.4 Added source RP configuration
  • Chapter 6.5.6 Added Dynamic RP configuration examples
  • Chapter 9.4.6.5 Added GNSS as the PTP clock source configuration example

5 List of Features

Features
Level 1
Level 2
CX102S, CX104S Series
CX204Y, CX206Y Series
CX202P, CX206P Series
CX308P, CX532P Series
Interface and Port
Interface Speed
10M


Interface and Port
Interface Speed
100M


Interface and Port
Interface Speed
1G

Interface and Port
Interface Speed
2.5G


Interface and Port
Interface Speed
10G
Interface and Port
Interface Speed
25G

Interface and Port
Interface Speed
40G


Interface and Port
Interface Speed
100G


Interface and Port
100G Interface Breakout
4x1G


Interface and Port
100G Interface Breakout
4x10G


Interface and Port
100G Interface Breakout
4x25G


Interface and Port
Interface MTU
Physical Interface MTU
Interface and Port
Interface MTU
Virtual Interface MTU
Interface and Port
Interface MTU
MAX Jumbo Frame Size (Default 9k)
Interface and Port
Interface Startup Delay

Interface and Port
Link Flapping

Interface and Port
Interface Statistics
Packets/Bytes;Speed; Error/Drop Packets
Interface and Port
Buffer
Buffer Management per Interface
Interface and Port
Buffer
Buffer Management per System
Interface and Port
Optical Module
Reading Information (Power Meter,Vendor,Model)
Interface and Port
PoE
LLDP Negotiates PoE Delivery Parameters


Interface and Port
PoE
Delayed Power Supply


Interface and Port
PoE
PD Alive Check


Interface and Port
PoE
POE Supply Status Diagnosis


Interface and Port
Loopback
Multiple Loopback
Interface and Port
Loopback
Loopback as Syslog Source
Interface and Port
Loopback
Loopback as FTP Source
Interface and Port
Loopback
Loopback as TFTP/FTP Source
Interface and Port
Loopback
Loopback as NTP Source
Interface and Port
Storm Suppression
Broadcast Suppression
Interface and Port
Storm Suppression
Unknown Unicast/Multicast Suppression
Interface and Port
Storm Suppression
Known Multicast Suppression
L2 Switching
MAC
Static Configuration
L2 Switching
MAC
Dynamic Learning
L2 Switching
MAC
MAC Aging
L2 Switching
MAC
MAC Address Move
L2 Switching
MAC
MAC Flapping Detection
L2 Switching
MAC
MAC Limit
L2 Switching
VLAN
VLAN Trunk Mode
L2 Switching
VLAN
VLAN Access Mode
L2 Switching
VLAN
Strategy for BUM Packets
L2 Switching
Batch VLAN
Batch VLAN Creation
L2 Switching
QinQ
Basic QinQ
L2 Switching
QinQ
Flexible QinQ
L2 Switching
Link Aggregation
Static Link Aggregation
L2 Switching
Link Aggregation
LACP Mode
L2 Switching
Port Isolation Group
Layer 2 Port Isolation
L2 Switching
MSTP
STP / RSTP / MSTP
L2 Switching
MSTP
Set BPDU Packet Interval
L2 Switching
MSTP
Set Interface State Delay Switching Time
L2 Switching
MSTP
Set Maximum Aging Time of BPDU
L2 Switching
MSTP
Set Instance Priority
L2 Switching
MSTP
Set Interface Priority
L2 Switching
MSTP
Ignore STP Results based on VLAN
L2 Switching
MSTP
BPDU Filter
L2 Switching
MSTP
BPDU Guard
L2 Switching
Loopback Detection
Strict Mode
L2 Switching
Loopback Detection
Loose Mode
L2 Switching
Loopback Detection
Loopback Action: Warning or Shutdown Interface
L2 Switching
Hash
Load Balance Hash Key (src-dst- ip,src-dst-mac,src-dst-ip-port,src-dst-mac-ip,src-dst-mac- ip-port)
L2 Switching
LLDP

L3 Switching
ARP/NDP
Static ARP/NDP
L3 Switching
ARP/NDP
Dynamic ARP/NDP
L3 Switching
ARP/NDP
ARP/NDP Aging and Update
L3 Switching
ARP/NDP
ARP/NDP Proxy
L3 Switching
ARP/NDP
ARP/NDP to Host Route
L3 Switching
Basic Route
Static Route
L3 Switching
Basic Route
ECMP
L3 Switching
BGP
IBGP
L3 Switching
BGP
EBGP
L3 Switching
BGP
MP-BGP
L3 Switching
BGP
Peer Group
L3 Switching
BGP
Route Redistribution
L3 Switching
BGP
Route Aggregation
L3 Switching
BGP
Route Reflector
L3 Switching
BGP
AS Dot Notation
L3 Switching
BGP
Graceful Restart
L3 Switching
OSPF v2
Instance: Single or Multiple
L3 Switching
OSPF v2
Stub Area
L3 Switching
OSPF v2
NSSA
L3 Switching
OSPF v2
Route Redistribution
L3 Switching
OSPF v2
MD5 Authentication
L3 Switching
OSPF v3
Instance: Single
L3 Switching
OSPF v3
Stub Area
L3 Switching
OSPF v3
NSSA
L3 Switching
OSPF v3
Route Redistribution
L3 Switching
Route policy
Route Map
L3 Switching
Route policy
IP Prefix List
L3 Switching
Policy Route
ECMP
L3 Switching
Policy Route
Master-Backup
L3 Switching
DHCPv4 Relay
Multiple DHCP Server
L3 Switching
DHCPv4 Relay
Agent IP
L3 Switching
DHCPv4 Relay
Option 82
L3 Switching
DHCPv6 Relay
Multiple DHCP Server
L3 Switching
DHCPv6 Relay
Agent IP
L3 Switching
DHCPv4 Server
Fixed Allocation by MAC+IP
L3 Switching
DHCPv4 Server
Dynamic Allocation by Address Pool
L3 Switching
DHCPv4 Server
Setting Renewal Period
L3 Switching
DHCPv4 Server
DHCP Failover
L3 Switching
DHCPv6 Server
Dynamic Allocation by Address Pool
L3 Switching
DHCPv6 Server
Setting Renewal Period
L3 Switching
DHCPv4 Client
-
L3 Switching
IPv6 Router Advertisement
Set M/0/A/L Flag
L3 Switching
IPv6 Router Advertisement
Set Prefix
L3 Switching
IPv6 Router Advertisement
Set Route Information
L3 Switching
IPv6 Router Advertisement
Set DNS
L3 Switching
IPv6 Router Advertisement
Set MTU
L3 Switching
MAC trigger
-
L3 Switching
VRF
Max VRF Instance
L3 Switching
VRF
ARP/Route Isolation
L3 Switching
VRF
Bind L3 Port to VRF
Multicast
IGMP Snooping
v1/v2/v3
Multicast
IGMP Snooping
Static Table Entry
Multicast
IGMP Snooping
Dynamic Table Entry
Multicast
IGMP Snooping
IGMP Snooping Querier
Multicast
IGMP Snooping
IGMP Snooping Proxy
Multicast
MLD Snooping
v1/v2
Multicast
MLD Snooping
Static Table Entry
Multicast
MLD Snooping
Dynamic Table Entry
Multicast
MLD Snooping
IGMP Snooping Querier
Multicast
MLD Snooping
IGMP Snooping Proxy
Multicast
Multicast VLAN
-
Multicast
PIMv4
PIM-SM
Multicast
PIMv4
Dynamic RP
Security
ACL
L3
Security
ACL
IACL/EACL
Security
ACL
ACL for Management
Security
TACACS+
Authentication & Authorization
Security
RADIUS
Authentication & Authorization
Security
DHCPv4/DHCPv6 Snooping
Snooping Entry Learning
Security
DHCPv4/DHCPv6 Snooping
Snooping Entry Aging
Security
DHCPv4/DHCPv6 Snooping
Snooping Entry Synchronization
Security
DHCPv4/DHCPv6 Snooping
Snooping Trust Interface
Security
ND Snooping
Snooping Entry Learning
Security
ND Snooping
Snooping Entry Aging
Security
ND Snooping
Snooping Entry Synchronization
Security
ND protection
SMAC Conformance Check

Security
ND protection
RA Guard

Security
ND protection
SAVI

Security
DAI (Dynamic ARP Inspection)
Activate based-on VLAN

Security
DAI (Dynamic ARP Inspection)
Setting Trusted Interface

Security
IPSGv4/IPSGv6 (IP Source Guard)
Activate based-on VLAN

Security
IPSGv4/IPSGv6 (IP Source Guard)
Setting Trusted Interface

Security
802.1x
Restrict VLAN

Security
802.1x
Guest VLAN

Security
802.1x
MAC Address-based 802.1x Authentication

Security
802.1x
802.1x Authentication based on Ethernet Port

Security
802.1x
Dynamic Authorization

Security
802.1x
Escape Mode

Security
Portal Authentication
Guest VLAN

Security
Portal Authentication


Security
Portal Authentication
MAC Address-based 802.1x Authentication

Security
Portal Authentication
Dynamic Authorization

Security
Portal Authentication
Escape Mode

Security
MACSec
GCM-AES-128 / 256
Only support on ASICs that integrated with MACSec chip
Security
MACSec
GCM-AES-XPN-128 / 256
Security
MACSec
Replay Protection
Security
COPP
Setting the rate of packets forwarded to CPU
Security
COPP
Setting actions for packets forwarded to CPU
QoS
Interface-based Priority Mapping
Dot1p to TC
QoS
Interface-based Priority Mapping
DSCP to TC
QoS
Interface-based Priority Mapping
TC to Queue
QoS
Traffic Policing
Filter
QoS
Traffic Policing
Remark/Drop/Forward
QoS
Traffic Shaping
Port based
QoS
Traffic Shaping
Queue based
QoS
Queue Scheduling
PQ
QoS
Queue Scheduling
DWRR
QoS
Queue Scheduling
PQ+DWRR
Reliability
Track
Track for Static Route
Reliability
Monitor Link
-
Reliability
BFD
BFD for OSPF
Reliability
BFD
BFD for BGP
Reliability
BFD
BFD for Static Route
Reliability
MC-LAG
MAC Table Synchronization
Reliability
MC-LAG
ARP/ND Table Synchronization
Reliability
MC-LAG
Peer Link
Reliability
MC-LAG
DAD Detection
Reliability
VRRPv2/v3
Setting Priority
Reliability
VRRPv2/v3
Setting Advertisement Message Interval
Reliability
VRRPv2/v3
Enabling Preemptive Mode
Reliability
VRRPv2/v3
Periodic sending of free ARP
Reliability
MAC-Scan
Scanning based-on IP Address Ranges
Reliability
MAC-Scan
Scanning based on DHCP Snooping Entry
Reliability
MAC-Scan
Scanning based on Static Snooping Entry
Reliability
System Robust
Docker automatic Recovery
Reliability
System Robust
Memory Detection for Key Processes
Network Management & Monitor
Management
SSH
Network Management & Monitor
Management
Telnet
Network Management & Monitor
Management
Console
Network Management & Monitor
SNMP
v1/v2/v3
Network Management & Monitor
Syslog
Rsyslog
Network Management & Monitor
Local User Management
-
Network Management & Monitor
System Information
-
Network Management & Monitor
NTP
-
Network Management & Monitor
PTP
1588v2
Network Management & Monitor
PTP
Smpte-2059-2
Network Management & Monitor
PTP
Aes67
Network Management & Monitor
SyncE

Network Management & Monitor
sFlow
Setting the Sampling Rate
Network Management & Monitor
sFlow
Setting the Sampling Direction
Network Management & Monitor
Mirror
SPAN
Network Management & Monitor
Mirror
RSPAN
Network Management & Monitor
Mirror
ERSPAN
Network Management & Monitor
ZTP
System Upgrade
Network Management & Monitor
ZTP
Load the Configuration
Network Management & Monitor
Cluster
Manage Devices in a Clustered Manner
Virtualization
VXLAN
v4-v4 / v4-v6
Virtualization
VXLAN
VTEP Encap/Decap
Virtualization
VXLAN
L2 Forwarding
Virtualization
VXLAN
VXLAN Mapping (VLAN-VNI(1:1), VRF-VNI)
Virtualization
VXLAN
L3 Gateway (Distributed
Virtualization
VXLAN
Gateway, Centralized Gateway)
Virtualization
VXLAN
MP-BGP EVPN (Type 1/2/3/4/5)
Virtualization
VXLAN
EVPN Multihoming
Virtualization
VXLAN
Cross-connect
Virtualization
VXLAN
Multicast Mode with (S,G) per VNI
Virtualization
VXLAN
ARP suppression
Virtualization
VXLAN
Tunnel Auto Establish/Tear Down
Virtualization
GRE
GRE Tunnel Establish

Virtualization
GRE
v4-v4 / v6-v6 / v4-v6 / v6-v4


]]>
AsterNOS for Data Center Release Note Version 3.1 R0408P04 https://cloudswit.ch/sonic-software-updates/asternos-v3-1-r0408p04/ Fri, 16 Jan 2026 07:26:27 +0000 https://cloudswit.ch/?p=22708

AsterNOS for Data Center Release Note Version 3.1

Date: January 15, 2026

Modify Remarks: AsterNOS_V3.1_R0408P04 released.

Target Audience

This manual is primarily intended for following engineers.

  • Software Developers
  • Software Testers
  • Customer Site Implementers

1 Instruction

The release version is AsterNOS_V3.1_R0408P04.
AsterNOS_V3.1_R0408P04-FL.bin for CX308P-48Y-N-V2, CX532P-N-V2 and CX732Q-N-V2.
md5: 9befab9853fd2abdcb54d09163787365
sha1:e15f7fe1ec9d281fdb9c3cc98ce54325f2a9297f

AsterNOS_V3.1_R0408P04.bin for other models.
md5: 6884ff123732de608b4310838d17dff5
sha1: 16bbf16b1ea95d7ed7467a78809db70f8f1f1240

2 List of Features

Features
Level 1
Level 2
Interfaces


Ethernet Port
1G[1]
10G[2]
25G
40G[3]
100G
200G[4]
400G[5]
800G[6]
Breakout[7]
Logical Interfaces
Ethernet port based L3 Interface
Port-Channel port based L3 Interface
SVI
Sub-interface
Loopback
Interface management
Port management
Statistics
MTU
Jumbo Frame
Optical module
CMIS Diagnostic
Presence
Reading info
L2 Switching



MAC
Static MAC configuration
Dynamic learning
MAC address move
MAC Flapping detection
MAC blackhole
MAC flushing
MAC filtering by source
VLAN
VLAN management
VLAN member mode: Access/Trunk
VLAN member type
BUM forwarding control
L2PT
Port-Channel
Port-Channel Mode: Static/LACP
LACP Parameter
Load balance mode: Static hash/ Eligible Load Balance
Load balance hash key
Hash configuration
LLDP
Working mode
LLDP Neighbor Information
STP
STP mode: MSTP
STP Parameter
Edge-port
BPDU protection
L3 Switching










IP Address
IPv4 address
IPv6 address
Secondary IP
ARP
Static ARP
Dynamic ARP
ARP aging and update
Gratuitous ARP
ARP proxy
ARP moving
ARP-to-host-routing
NDP
ND
SLAAC
NDP proxy
ND-to-host-routing
Basic routing
IPv4 static routing
IPv6 static routing
Default routing
IPv4 routing with IPv6 nexthops
Loopback Packet Control
PBR
IPv4 Policy Based Routing
IPv6 Policy Based Routing
Bind Port Type
Nexthop action
ECMP
Group member type
Load balance hash key
Hash configuration
Load balance mode: Static hash/ Eligible Load Balance
BGP
IBGP
EBGP
Peer Group
Peer Type
Route Reflection
AS-Path replace
Route redistribution
Graceful restart
MP-BGP
OSPF
OSPF Version
Network type
Instance
Area
Authentication
Route redistribution
Graceful restart
IS-IS
-
Routing Policy
Prefix Lists
Route Map
VRF
Loopback interface assignment
Inter-VRF route leaking
Management VRF
ping/ssh to VRF
DHCP
DHCPv4 server
DHCPv6 server
DHCPv4 relay
DHCPv6 relay
DHCP relay over VXLAN
Virtualization and tunnel
VXLAN
VTEP[8]
VXLAN mapping
L2 forwarding
ARP/ND suppression
VXLAN maintenance
VXLAN multicast underlay
VXLAN cross connect
BGP-EVPN
Route type
Tunnel auto establish/tear down
Anycast gateway
L3 Gateway type
Symmetry IRB
Routing dynamic population
VM migration
Inter-VRF Local Route Leaking
Multi-homing
DCI
VLAN hand-off


QoS and DCB





Classification & Scheduling
Classification
Queue scheduling
Traffic shaping
Bandwidth limiting
WRED
Queue statistics
Rewrite
Matching with ACL
Mark action
DCB
ECN
PFC
PFC Watchdog
DCBX
RoCE
RoCEv2
Easy RoCE
Load Balance
Adaptive Routing and Switching [9]
Packet Spray[10]
Security 
CoPP
Bandwidth limit for CPU port
CoPP Configuration
Storm Suppression
Suppression type
Control mode: Value-based
ACL
Match field
ACL action
ACL type
Time-ranged ACL
Control-Plane ACL
AAA
TACACS+
Radius
Port Isolation[11]
Working mode: L2
Interface type: Ethernet port
Service Operation and Reliability

Software Architecture
Apps in container
Configuration database
Warm restart
MC-LAG
Ethernet-based MC-LAG
MC-LAG peer gateway
Consistent check
Secondary ICCP Session
L3 Forwarding
Unique IP
Routing protocol: OSPF/BGP over MC-LAG
MC-LAG with EVPN
Loopback detection
BFD
BFD Mode
BFD for routing protocol
BFD acceleration[12]
SLA
Echo mode
User defined
TRACK with static route
Monitoring Link
Monitoring group
Monitoring configuration
VRRP
VRRPv2
VRRPv3
Visibility and Monitoring 
SNMP
SNMP v2
SNMP v3
SNMP Trap
Network Quality Analysis
Port Mirroring
sFlow
gRPC
In-Network-Telemetry
AsterNOS exporter
Visibility template
System info
Device monitoring
Interface
VLAN
ACL
BGP
MC-LAG
EVPN VXLAN
RoCE
AIDC Intelligent Routing[13] 

Static routing
VRF assignment
Path assignment
Failure recovery
Configuration templates
Dynamic routing
Path Quality Measurement
Path Quality Advertisement
Dynamic path selection
ECMP for multi-tenant
Adaptive Multipath Load Balancing
Multicast 

Multicast Route
IPv4 static multicast routing
multicast route counter
multicast route based policer
multicast route type
IGMP
IGMP snooping
Management
Device Management
User interface
NOS Maintenance
License
Device Information
System Management
Login & MOTD
User management
Feature Management
System configuration
System time
Syslog
Critical Resource Monitoring
NTP
PTP
DevOps
ZTP
Ansible
FTP
TFTP
SCP
Toolkit

Note:
[1] 25GE interfaces of CX308P-48Y-N-V2 support to set rate to 1G.
[2] 25GE interfaces of CX308P-48Y-N-V2 support to set rate to 10G.
[3] 100GE interfaces of all series products support setting the rate to 40G.
[4] CX664D-N supports 200GE interfaces, which can be set to 100G/40G.
[5] CX732Q-N and CX732Q-N-V2 support 400GE interfaces, which can be set to 200G/100G/40G.
[6] CX864E-N supports 800GE interfaces, which can be set to 400G/200G/100G/50G/25G.
[7] The breakout modes supported by different speed interfaces are as follows: – 100GE interfaces support splitting into 4x25G[10G]. – 200GE interfaces support splitting into 2x100G[50G], 4x50G, or 4x25G. – 400GE interfaces of CX732Q-N support splitting into 4x100G, 2x200G[100G], or 4x25G[10G]. 400GE interfaces of CX732Q-N-V2 supports splitting into 4x25G[10G]. – 800GE interfaces support splitting into 2x400G[200G] or 4x200G[100G].
[8] Only CX308P-48Y-N-V2, CX532P-N-V2 and CX732Q-N-V2 support VXLAN Multi VTEP.
[9] This feature is only supported on CX864E-N.
[10] This feature is only supported on CX864E-N.
[11] Port isolation is supported on CX308P-48Y-N-V2, CX532P-N-V2, and CX732Q-N-V2.
[12] Hardware BFD is supported on CX308P-48Y-N-V2, CX532P-N-V2, and CX732Q-N-V2.
[13] CX308P-48Y-N-V2, CX532P-N-V2 and CX732Q-N-V2 do not support intelligent routing.

3 Update Records

3.1 New Features

[BGP] Support for BGP As-Notation.
[SLA] IP SLA supports jitter metric calculation and display.
[Exporter] Support for ACL configuration and statistics counters, as well as VLAN statistics counters.
[VXLAN]CX308P-48Y-N-V2, CX532P-N-V2 and CX732Q-N-V2 support VXLAN Cross-Connect and L2PT.
[VXLAN] Support for BUM traffic replication via VXLAN multicast tunnels.
[AIDC]Dynamic intelligent routing scheme supports VLAN interfaces.

3.2 Major Bug Fixes and Optimizations

[Easy RoCE] Support for specifying lossless queues.
[PTP]Support for configuring the minor_version field in PTP messages.
[ARS] ARS supports for displaying the bound target Nexthop Group members.
[Easy RoCE] A prompt is added when an interface undergoes split and rate changes with an existing RoCE template.
[AIDC]The “show instance” command displays configuration check results.[MAC] Added a restriction that prevents removal from VLAN after disabling MAC learning on an interface.
[LLDP] Modified the maximum value displayed for TTL value to 65535 to prevent overflow.
[SLA] Fixed an issue where SLA couldn’t specify a Lag sub-interface as the src_port.
[AIDC] Fixed a BGP configuration failure when multiple router BGP instances with different ASNs exist.
[DHCP] Fixed an issue where relay instances with the same name across different VRFs weremistakenly treated as the same relay instance.
[BGP] Fixed an issue where BGP route advertisements failed when multiple unnumbered BGP neighbors had the same link-local address.
[Interfacel Fixed an issue where the interface of NT devices incorrectly counted received ARP request, IPv6 NS, and RA messages into RX DRP statistics.


]]>
IPT In-band Path Telemetry Technology Whitepaper https://cloudswit.ch/whitepapers/ipt-in-band-path-telemetry/ Mon, 22 Dec 2025 11:23:36 +0000 https://cloudswit.ch/?p=22376

IPT (In-band Path Telemetry) Technology Whitepaper: Network Quality Monitoring for AI Data Centers

1 Overview

With the rapid development of high-performance applications such as AI large model training and distributed computing, AI computing networks face increasingly stringent requirements for end-to-end path quality monitoring. Traditional network monitoring technologies (such as SNMP) are limited by their “pull-only” collection mode and insufficient granularity, making them inadequate for monitoring micro-burst anomalies in network-wide path latency and packet loss.

INT (In-band Network Telemetry) represents the next generation of network quality analysis technology. Through active “push-mode” data collection by network devices, INT achieves millisecond-level data acquisition and precisely captures network anomalies. INT technology encompasses three solutions: BDC (Buffer Drop Capture), HDC (High Delay Capture), and IPT (In-band Path Telemetry). BDC and HDC solutions have been introduced in previous whitepapers and will not be elaborated upon here.

IPT is one of the standard solutions within INT technology. By replicating packets from specific traffic flows and carrying path statistics information, IPT enables precise end-to-end path quality monitoring. IPT technology configures ingress nodes, egress nodes, and transit nodes within a telemetry domain, utilizing an 8-byte Probe Marker to uniquely identify the telemetry domain. It generates probe packets along the original path, collects statistics from each node, and ultimately encapsulates the data for transmission to a collector, providing network operations with multi-dimensional analysis capabilities for network-wide path quality.

The following table compares the three solutions across different dimensions:

Solution
BDC
HDC
IPT
Trigger Condition
Queue buffer overflow causing packet drop
Queue forwarding latency reaches the configured threshold
None
Telemetry Information
Queue occupancy status
Forwarding latency
Queue depth and forwarding latency
Sampling Mechanism
Probabilistic capture, micro-burst capture
Probabilistic capture, micro-burst capture
Probabilistic capture
Focus Scenario
Buffer packet drop capture and reporting
High latency anomaly diagnosis in lossless networks
Problem localization in large-scale networks, full-path quality monitoring

Table 1: INT Technology Solution Comparison

1.1 Functional Scenarios

IPT is well-suited for end-to-end path monitoring scenarios in AI computing networks, particularly playing a critical role in the following areas:

  • Network-wide Path Quality Analysis: By collecting latency, queue status, and other information from each node, IPT identifies performance bottlenecks along the path.
  • Dynamic Path Optimization: Combined with path quality data, IPT assists in adjusting intelligent routing strategies to improve data transmission efficiency.
  • Rapid Fault Troubleshooting: Through node information carried in probe packets, IPT quickly pinpoints anomalous nodes or links.

1.2 Basic Concepts

1.2.1 IPT Packet Format
As illustrated in Figure 1, an IPT packet consists of multiple header layers, including outer L2/L3 encapsulation, GRE header, IPT Shim header, Probe Marker, and per-node statistics information fields.

IPT-Packet-Format

Figure 1: IPT Packet Format

  • L2/IPv4
    Users specify the outer encapsulation Layer 2 and IPv4 packet headers in the IPT configuration.
  • GRE Header
    Figure 2 shows the GRE Header packet format, with Table 2 containing descriptions of each field.
    Figure 2: GRE Header

GRE-Header

Field
Length (bits)
Description
C
1
Flag indicating whether Checksum is present
Reserved
12
Reserved bits
Version
3
Version information
Protocol Type
16
IPT Shim Header protocol type

Table 2: IPT GRE Header Information

IPT Shim Header
Figure 3 shows the IPT Shim Header packet format, with Table 3 containing descriptions of each field.

IPT-Shim-Header

Figure 3: IPT Shim Header

Field
Length (bits)
Description
Next Header
8
Indicates the next packet header. For Ethernet II, the value is 3.
Length
4
Shim Header length (in 4-byte units). For IPT, this value will be 4 (i.e., 4×4 = 16 bytes).
Switch ID
16
Identifies the Switch ID of the egress node device
Extension Header
6
Type of extension header. For IPT, this value is 7

Table 3: IPT Shim Header Information

IPT Probe Marker
The IPT Probe Marker is a 64-bit user-specified value used to identify IPT packets. The most significant 2 bytes of the IPT Probe Marker must be 0.

Field
Length (bits)
Description
Probe Marker 
64
A 64-bit user-specified value used to identify the telemetry domain uniquely.

Table 4: IPT Probe Marker Information

IPT Base Header
Following the IPT Probe Marker is the IPT Base Header (4 bytes), which is used to identify the version and hop count. Figure 4 shows the IPT Base Header packet format, with Table 5 containing descriptions of each field.

Figure 4: IPT Base Header

Field
Length (bits)
Description
Version
5
Version of the IPT Base Header
Hop Count
8
Number of hops for IPT node information

Table 5: IPT Base Header Information

IPT Hop Information
Between each switching node in the telemetry domain (including ingress and egress nodes), per-hop statistics information is inserted into the transmitted IPT packet. Figure 5 shows the packet format for per-hop information, with Table 6 listing descriptions of each field.

Figure 5: IPT Hop Information Header

Field
Length (bits)
Description
Switch ID 
16
Switch ID of the node device corresponding to this hop information
Dev Class 
6
Unique encoding used to identify the device, used for decoding information in the packet
Queue Size Info*
20
Information about queue occupancy size
Dinfo 2*
4
Egress queue information for IPT packet forwarded from this hop node
Dinfo 1*
12
Egress interface information for IPT packet forwarded from this hop node
Egress Timestamp Info*
20
Timestamp information for IPT packet forwarded from this hop node
Sinfo*
12
Ingress interface information for IPT packet entering this hop node
Ingress Timestamp Info*
20
Timestamp information for IPT packet entering this hop node

*Note: Decoding the corresponding real values from raw data depends on Dev Class.

Table 6: IPT Hop Information

2 Working Principles

2.1 Workflow

IPT-Workflow-Diagram

Figure 6: IPT Workflow Diagram

Figure 6 illustrates the overall workflow of IPT: the ingress node generates probe packets, transit nodes collect information, and the egress node encapsulates and sends packets, achieving end-to-end path information collection. Probe packets are clones of original packets (with payload truncated), transmitted along the same path as the original packets, with statistics information inserted at each node, and ultimately sent to a user-configured collector.

2.2 Process Breakdown

2.2.1 Ingress Node

After enabling the IPT function, the ingress node identifies specific traffic flows through two methods: sampling or configuring DSCP to specify queues. It replicates the original packet and truncates the payload, inserting the Probe Marker, Base Header, and ingress node statistics information after the first sixteen bytes of UDP or TCP, generating a probe packet that is forwarded along the original packet’s forwarding path.

2.2.2 Transit Node

The transit node identifies probe packets carrying the Probe Marker, collects local node statistics information, and inserts it after the probe packet’s Base Header. The modified probe packet is then forwarded along the original packet’s forwarding path.

2.2.3 Egress Node

The egress node identifies probe packets carrying the Probe Marker, collects local node statistics information and inserts it after the probe packet’s Base Header, then adds outer encapsulation. Based on the outer MAC and IP addresses, it performs a forwarding table lookup and forwards the probe packet to the user-configured collector.

3 Typical Application Scenarios

3.1 End-to-End Path Optimization for AI Computing Networks

In a large model training scenario involving a GPU cluster with over a thousand cards, the cluster relies on a high-performance network to achieve inter-node data synchronization (such as All Reduce operations). Path quality directly impacts training efficiency. IPT technology can optimize path performance in the following areas:

1. End-to-End Path Latency Monitoring

As shown in Figure 7, during the training process, gradient data must be forwarded through multiple Leaf/Spine switches. IPT collects forwarding latency from each node through probe packets and, combined with the total latency from ingress to egress, pinpoints high-latency nodes (such as a Spine switch with abnormally elevated forwarding latency). This assists in adjusting traffic forwarding paths to avoid overall training efficiency degradation caused by single-node delays.

fault-configuration-diagram

Figure 7: Identifying High-Latency Nodes

2. Dynamic Queue State Awareness

As shown in Figure 8, when multiple GPU servers send data through the same switch port, the egress queue may experience congestion due to traffic surges. IPT probe packets carry information such as queue occupancy size and QP (Queue Pair). Operations personnel can quickly identify congested queues and adjust buffer allocation strategies (such as increasing burst traffic handling capacity) to ensure data synchronization stability.

dcbx-configuration-exchange-diagram-between-switches

Figure 8: Multiple GPU Servers Sending Data Through the Same Switch Port


]]>
AsterNOS Data Center – INT Technology: BDC and HDC White Paper https://cloudswit.ch/whitepapers/int-technology-bdc-hdc/ Wed, 17 Dec 2025 08:35:14 +0000 https://cloudswit.ch/?p=22306

INT Technology: Buffer Drop Capture (BDC) and High Delay Capture (HDC)

1 Overview

With the rapid development of high-performance applications such as AI large model training and distributed computing, AI computing networks face increasingly stringent requirements for real-time performance and stability. Issues such as network latency and buffer overflow directly impact training efficiency and model accuracy. Traditional monitoring technologies (e.g., SNMP) struggle to meet complex scenario demands due to limitations such as low collection precision and insufficient granularity.

INT (In-band Network Telemetry), as a next-generation network quality analysis technology, achieves millisecond-level data collection through an active device “push” mode, accurately capturing network microbursts and anomalies. Among these, BDC (Buffer Drop Capture) and HDC (High Delay Capture) are core sub-solutions of INT technology, focusing on buffer packet loss and high latency monitoring, respectively.

  • BDC enables users to record information about data packets dropped due to buffer capacity limitations.
  • HDC allows users to record information about data packets experiencing high latency caused by queue congestion within devices.

Both BDC and HDC provide sampling mechanisms such as probabilistic capture and microburst capture, which can be applied to both BDC and HDC packet processing.

1.1 Functional Scenarios

BDC and HDC are essential technologies for data center network and AI computing network operations and troubleshooting.

Through BDC technology, when a data packet is dropped due to buffer capacity limitations, the switching device captures the first 150 bytes of the dropped packet and appends metadata, then sends it as a BDC packet to a remote collector or the local switch CPU.

Through HDC technology, the switching device captures all queue-congested packets exceeding the user-defined latency threshold, packages the first 150 bytes of the original packet along with metadata into an HDC packet, and sends it to a remote collector or the local switch CPU, while the original packet continues normal transmission.

1.2 Fundamental Concepts

1.2.1 BDC Packet Format

    BDC-Packet-Format

    Figure 1: BDC Packet Format

    • L2/IPv4
      Users specify the outer Layer 2 and IPv4 headers in the BDC configuration.
    • GRE Header
      Figure 2 shows the GRE Header packet format, with Table 1 describing each field.

    Figure 2: GRE Header

    Field
    Length(bits)
    Description
    C
    1
    Checksum present flag
    Reserved
    12
    Reserved bits
    Version
    3
    Version information
    Protocol Type
    16
    BDC protocol type

    Table 1: BDC GRE Header Information

    • BDC Shim Header
      Figure 3 shows the BDC Shim Header packet format, with Table 2 describing each field.

    Figure 3: BDC Shim Header

    Field
    Length(bits)
    Description
    Next Header
    8
    Indicates the next header. For Ethernet II, the value is 3.
    Length
    4
    Shim Header length in 4-byte units. For BDC, this value is 7 (i.e., 7×4 = 28 bytes).
    Switch ID
    16
    Identifies the device's Switch ID
    Extension Header
    6
    Type of extension header. For BDC, this value is 6.
    Sinfo*
    12
    Information about the port through which the packet entered the device.
    Dinfo*
    14
    Information about the destination port and queue where the packet was dropped.
    Dev Class
    6
    Unique device identifier encoding used to decode packet information.
    Queue Size Info*
    12
    Information about queue size.

    Table 2: BDC Shim Header Information

    *Note: Decoding actual values from raw data depends on Dev Class.

    1.2.2 HDC Packet Format

    Data Center Bridging Exchange (DCBX) working flow

    Figure 4: HDC Packet Format

    • L2/IPv4
      Users specify the outer Layer 2 and IPv4 headers in the HDC configuration.
    • GRE Header

    Figure 5: GRE Header

    Field
    Length(bits)
    Description
    C
    1
    Checksum present flag
    Reserved
    12
    Reserved bits
    Version
    3
    Version information
    Protocol Type
    16
    HDC protocol type

    Table 3: HDC GRE Header Information

    • HDC Shim Header
      Figure 6 shows the HDC Shim Header packet format, with Table 4 describing each field.

    HDC-Shim-Header

    Figure 6: HDC Shim Header

    Field
    Length(bits)
    Description
    Next Header
    8
    Indicates the next header. For Ethernet II, the value is 3.
    Length
    4
    Shim Header length in 4-byte units. For BDC, this value is 7 (i.e., 7×4 = 28 bytes).
    Switch ID
    16
    Identifies the device's Switch ID
    Extension Header
    6
    Type of extension header. For HDC, this value is 5.
    Sinfo*
    12
    Information about the port through which the packet entered the device.
    Dinfo*
    14
    Information about the destination port and queue where the packet was dropped.
    Dev Class
    6
    Unique device identifier encoding used to decode packet information.
    Queue Size Info*
    12
    Information about queue size.

    Table 4: HDC Shim Header Information

    *Note: Decoding actual values from raw data depends on Dev Class.

    2 Operating Principles

    BDC-and-HDC-Operating-Principle-Diagram

    Figure 7: BDC and HDC Operating Principle Diagram

    2.1 Configuration Distribution

    The green line segments in Figure 7 represent the BDC/HDC configuration distribution flow. First, BDC and HDC configuration information is distributed via CLI or APP to the Control APP in the AsterNOS system’s Telemetry container. It is then written to the syncd process SDK by the Innovium Shell in the Syncd container, passed through ioctl to the Innovium Driver, and finally reaches the ASIC via the PCIe channel where it takes effect.

    2.2 Feature Triggering

    BDC: When a packet is dropped due to buffer capacity limitations, the BDC feature triggers. The switching device adds an outer encapsulation to the first 150 bytes of the original packet and inserts a BDC header, sending it out as a BDC packet.

    HDC: When a packet is forwarded through the switching device and the latency caused by queue congestion exceeds the user-defined threshold, the HDC feature triggers. The ASIC clones the first 150 bytes of the original packet, adds an outer encapsulation and inserts an HDC header, sending it out as an HDC packet, while the original packet continues normal transmission.

    2.3 Packet Collection

    Both BDC and HDC support two collection modes: remote collection and local collection, as shown by the yellow line segments in Figure 7.

    • Remote Collection
      Using the outer packet header information (L2/IPv4) configured by the user, the ASIC looks up forwarding entries to forward Telemetry Packets to the corresponding egress interface, ultimately reaching the server specified by the destination IP through network forwarding.
    • Local Collection
      After BDC and HDC are triggered, the ASIC transmits packets to the control plane CPU via the PCIe channel. The Telemetry APP in the control plane Telemetry container reads packet information through a Unix Domain Socket.

    3 Typical Application Scenarios

    3.1 AI Computing Network Operations and Troubleshooting

    In an AI training center, a GPU cluster exceeding one thousand cards is deployed for large-scale distributed model training. Multiple GPU servers within the cluster are interconnected via CX-N series switches, relying on high-performance networks to achieve inter-node data synchronization (such as All Reduce operations). At this point, low latency and zero packet loss characteristics are critical to training efficiency. BDC and HDC can play important roles in the following scenarios:

    3.1.1 BDC: Buffer Overflow Alerting and Optimization

    Figure 8: Multiple Servers Transmitting Traffic Simultaneously Through the Same Switch Port

    During training, GPU servers transmit gradient data to switches at high frequency via the RoCEv2 protocol. When multiple servers simultaneously send data through the same switch port, the egress queue buffer may approach overflow due to instantaneous traffic spikes. At this point, BDC monitors the buffer size, QP, and other information of that port queue in real-time. If buffer overflow is detected, it immediately triggers an alert and records critical data such as node ID and queue number. Network operations personnel can quickly identify the problematic queue through the AsterNOS Exporter + Grafana visualization monitoring platform (as shown in Figures 9 and 10), and prevent significant data loss leading to All Reduce synchronization delays by adjusting buffer allocation strategies (such as increasing burst traffic handling capacity for that queue).

    Figure 9: BDC Packet Information Example

    Figure 10: HDC/BDC Traffic Statistics Example

    3.1.2 HDC: Rapid High-Latency Node Localization

    During a training session, the AI platform reported abnormally slow model convergence. HDC monitored switch forwarding latency and, combined with node ID information, identified elevated forwarding latency at a Leaf switch. Operations personnel quickly located the problematic node through the AsterNOS Exporter + Grafana visualization monitoring platform (as shown in Figures 11 and 12), and subsequently adjusted traffic forwarding paths. Additionally, HDC supports full-queue configuration, ensuring all queues that might impact training are covered, preventing overall performance degradation due to single-queue latency.

    Figure 11: HDC Packet Information Example

    Figure 12: HDC/BDC Traffic Statistics Example


    ]]>
    Understanding the NETCONF XML Management Protocol https://cloudswit.ch/whitepapers/netconf-xml-management-protocol/ Fri, 05 Dec 2025 11:34:34 +0000 https://cloudswit.ch/?p=22137

    NETCONF XML Management Protocol

    1 Overview

    Traditional network device configuration management methods, such as CLI (Command Line Interface) and SNMP (Simple Network Management Protocol), have various limitations in handling complex network environments. To address these challenges, the IETF standardized NETCONF (Network Configuration Protocol) based on RFC 3535.

    NETCONF is a network configuration management protocol defined by IETF RFC standards, designed to replace SNMP. It provides a standardized mechanism for installing, manipulating, and deleting network device configurations, offering more powerful capabilities and flexibility than traditional methods. NETCONF uses XML-based data encoding and supports structured configuration operations, making network management more efficient and reliable.

    The core advantages of NETCONF include: clear separation between configuration data and operational state data, support for transaction-based configuration operations with rollback capabilities, use of XML as the data encoding format, and support for YANG (Yet Another Next Generation) data modeling language. YANG models define the structure and constraints of NETCONF protocol data, providing a standardized way to describe network device configurations and state information.

    Additionally, NETCONF supports event notification mechanisms defined in RFC 5277, allowing network devices to actively push status change events to management systems, enabling real-time network monitoring and management.

    This white paper provides a detailed introduction to the working principles, protocol structure, and typical application scenarios of NETCONF, helping readers comprehensively understand and apply this important network management technology.

    2 Working Principles

    2.1 Basic Concepts

    NETCONF uses XML-based encoding, with four basic components: tags, elements, content, and attributes.

    Key Terminology:

    • Client: Initiates operation requests to the Server, or subscribes to messages and receives notifications from the Server. This may be a script or part of a network management application.
    • Server: Executes the operation requests initiated by the Client or sends notifications to the Client. This is typically a network device.
    • Capability: Optional features that supplement the basic NETCONF specification. The protocol defines various standard capabilities, and the table below lists some common ones:

    Table 1: NETCONF Capability List

    Capability Name
    Description
    Reference
    Writable-Running Capability
    Supports direct modification of <running> configuration
    RFC 6241
    Candidate Configuration Capability
    Supports <candidate> configuration and commit operations
    RFC 6241
    Distinct Startup Capability
    Supports separate <startup> configuration
    RFC 6241
    Rollback-on-Error Capability
    Supports automatic rollback on <edit-config> errors
    RFC 6241
    URL Capability
    Supports specifying configuration sources via URL in edit-config operations
    RFC 6241
    With-defaults Capability
    Supports querying and handling default values
    RFC 6243
    Notification Capability
    Supports event notification push mechanism
    RFC 5277
    YANG Library Capability
    Supports querying device-supported YANG models
    RFC 8525

    The Server may or may not support these Capabilities, and the Client can query the Server’s supported Capabilities before initiating business requests.

    • Session: The Client and Server exchange messages via a secure, connection-oriented session.
    • Message: An XML document defined by the protocol standard and transmitted within a Session.
    • RPC (Remote Procedure Call): In NETCONF, an RPC is realized by exchanging the <rpc> request and <rpc-reply> response messages.
    • Configuration: Data Data required to adjust the device state from its initial condition to the desired operational state. The core characteristic is that this data is writable.
    • State Data: Data on the device that is readable but not configurable, such as read-only system status information and statistics.
    • Datastore: A conceptual repository for storing and accessing information. The specific carrier may be a file, database, memory, or a combination thereof.
    • Configuration Datastore: A datastore that holds a complete set of configuration data. The protocol defines multiple Configuration Datastores and allows devices to support them to varying extents, including but not limited to:
      • Running Configuration Datastore (<running>): The datastore that holds the device’s currently active and complete configuration. <running> always exists. If <running> is writable, the device must declare support for the Writable-Running Capability.
      • Startup Configuration Datastore (<startup>): Stores the configuration that will be loaded when the device boots. Some devices may integrate <startup> with <running>. If a device has an independent <startup>, it must declare support for the Distinct Startup Capability.
      • Candidate Configuration Datastore (<candidate>): An optional configuration datastore where the stored configuration data does not affect the device’s current running state and can be committed to <running>. If a device supports <candidate>, it must declare support for the Candidate Configuration Capability.

    Corresponding to the Configuration Datastores is the Operational State Datastore (abbreviated as <operational>), which stores the device’s complete run-time state data.
    RFC 8342 provides a complete description of this, and the relationship between these datastores is shown in the figure below:

    Configuration-Datastore-Relationships

    2.2 Workflow

    NETCONF follows a client-server architecture with a typical workflow as illustrated below:

    NETCONF-Protocol-Workflow

    3 Protocol Analysis

    The NETCONF protocol is logically divided into four distinct layers: Content Layer, Operations Layer, Message Layer, and Secure Transport Layer.

    NETCONF-Protocol-Stack

    Data above the Secure Transport Layer is XML-encoded, which is why many NETCONF elements are concisely referred to by their XML start tags.
    The Content Layer refers to the specific data and models manipulated by the Operations Layer. This content is outside the scope of the NETCONF protocol standard; readers should consult relevant standards, particularly RFC 7950, for the YANG data modeling language.
    YANG (Yet Another Next Generation) is a standardized modeling language used to define network device data and operations.

    • Function: It provides a strict schema (data hierarchy, constraints, and operations) for a network device’s configuration and state data.
    • Significance: It is the core foundation for achieving network configuration automation and programmability.
    • Encoding: YANG models are machine-readable and human-friendly, and the resulting data can be encoded in XML or JSON.

    3.1 Secure Transport Layer

    The NETCONF protocol does not strictly mandate a specific transport layer protocol between the Client and the Server; any transport protocol meeting the following requirements may be used:

    • Mandatory authentication, data integrity verification, confidentiality, and replay attack protection mechanisms.
    • Mandatory authentication mechanisms supporting both automated management tools and interactive CLI-like tools.

    Currently, NETCONF primarily uses SSH (Secure Shell) and TLS (Transport Layer Security) as transport protocols, as defined in RFC 6242 and RFC 7589.

    NETCONF over SSH is the most commonly used transport method.

    3.2 Message Layer

    The Message Layer defines fundamental XML elements used as message envelopes, including the message boundaries: <hello>, <rpc>, and <rpc-reply>, as well as the specially purposed <ok> and <rpc-error> elements.
    <hello>: A special message used by the Client and Server, exchanged immediately after connection establishment, to negotiate and exchange capabilities (e.g., supported NETCONF version). The Server’s <hello> must include the unique identifier session-id assigned to the session, which the Client can subsequently use for <lock> and <kill-session> operations. Conversely, the Client’s <hello> must not contain a session-id.
    <rpc>: The Client’s request message sent to the Server, used to encapsulate a specific operation command, including the operation name and its parameters. This element must carry an attribute named message-id, which is a message identifier generated by the Client used to correlate the Server’s reply with the request.
    <rpc-reply>: The Server’s response to an <rpc> request. It must also include the message-id attribute, and its value must match the message-id of the corresponding <rpc>. Furthermore, the <rpc-reply> must contain all attributes and attribute values present in the corresponding <rpc>, regardless of whether they are relevant to the reply.

    Depending on the request and the processing result, the <rpc-reply> may encapsulate three different types of content:

    • Specific Data Elements (<data>): When the operation is successful and the operation returns data (e.g., a retrieval operation).
    • The <ok> Element: When the operation is successful but does not return data (e.g., a configuration modification).
    • The <rpc-error> Element (Implicit): (Implied by standard) When the operation fails or encounters an exception.

    When the server encounters an exceptional condition while processing a request, an <rpc-error> element MUST be returned. Some servers may support detecting and reporting multiple exceptions, which are organized into several <rpc-error> elements; these elements are order-independent.
    An <rpc-error> element MAY contain the following sub-elements:

    • error-type: The layer where the error occurred, an enumerated value, which MUST be one of: transport, rpc, protocol, or message.
    • error-tag: A concise string identifying the error type, specifically defined by the protocol.
    • error-severity: A string identifying the error level, which MUST be error or warning.
    • error-app-tag: An OPTIONAL string identifying data-model-specific or implementation-specific errors.
    • error-path: An OPTIONAL XPath expression identifying the data node that triggered the error.
    • error-message: An OPTIONAL human-readable error description.
    • error-info: An OPTIONAL element providing additional, more detailed error information, which MAY contain server application-defined structures.

    3.3 Operations Layer

    NETCONF defines a set of fundamental operations used for managing device configurations and querying device state. There are also additional operations defined as capabilities; device support for these extra operations MUST be determined during the capability exchange phase.
    The following sections introduce the basic NETCONF operations, categorized by: Data Operations, Datastore Operations, Session Operations, and Extension Operations.

    3.3.1 <get-config>
    The <get-config> operation is used to retrieve configuration data from a specified configuration datastore.
    <get-config> supports two input parameters:
    source: Specifies the name of the configuration datastore, e.g., running.
    filter: An OPTIONAL parameter used to specify the configuration data filtering criteria. If this parameter is not provided, all configuration data from the entire configuration datastore MUST be returned. It has an OPTIONAL attribute, type, used to specify the filter type, which defaults to subtree.
    If the request is successful, the server returns an <rpc-reply> containing a <data> element. The specific configuration data is encapsulated within the <data> element.

    3.3.1.1 Subtree Filter
    A subtree filter is a data filtering mechanism used for both the <get-config> and <get> operations. It allows the client to provide an XML snippet that describes the filtering criteria, instructing the server to return only the data that satisfies the criteria, rather than returning all configuration and state data from the entire datastore.
    Specifically, a subtree filter appears as elements inside the <filter> element, and it is either empty or a data tree enclosed by a <top> element. An empty filter indicates that no data is selected, and the server SHOULD return an empty result (i.e., an empty <data> element). The <top> element is a virtual root node of the data model and does not exist in the actual data model.

    The subtree nodes within the <filter> can be categorized as follows:

    • Containment Node: A non-leaf node in the <filter> subtree that expresses the path leading to a leaf node.
    • Selection Node: An empty leaf node in the <filter> subtree, expressing the meaning of “select this node and all child nodes.” For instance, an empty <top> element is a selection node, which means “return all data,” and has the same effect as providing no filtering criteria.
    • Content Match Node: A non-empty leaf node in the <filter> subtree, expressing the meaning of “select this node and its siblings with the same value,” which implements exact matching.
      These node categories can be combined to form complex filtering criteria.

    Additionally, the subtree filter also defines rules for namespace selection (precise matching on namespaces) and attribute match. These can be combined with the aforementioned node types to enhance the filtering conditions. The latter (attribute match) has almost no application scenario when a YANG model is used as the data model, as the XML encoding of the YANG model uses element names and content to convey data information, and elements can only use XML standard attributes such as namespaces.

    3.3.3 <edit-config>
    The <edit-config> operation is used to load the specified configuration data into a designated configuration datastore.
    <edit-config> supports the following input parameters:

    • config: The configuration data to be loaded, formatted as a <config> element containing the specific configuration data. If the server supports the URL Capability, a <url> element MAY be used instead of the <config> element. Elements within the <config> subtree MAY include an operation attribute to explicitly define the specific operational semantics. The possible values for the operation attribute are:
      • merge: Merges the configuration data. This is the default value for the operation attribute.
      • replace: Replaces the configuration data. If the configuration already exists, it is overwritten; if it does not exist, it is created.
      • create: Creates the configuration data. If the configuration already exists, an <rpc-error> with an error-tag of data-exists MUST be returned.
      • delete: Deletes the configuration data. If the configuration does not exist, an <rpc-error> with an error-tag of data-missing MUST be returned.
      • remove: Deletes the configuration data. If the configuration does not exist, the operation is silently ignored.
    • target: The name of the configuration datastore to be modified, such as running.
    • default-operation: Sets the default operation for the entire <edit-config> request. When a sub-element in the <config> does not explicitly specify an operation attribute, the server follows the behavior corresponding to this parameter’s value. The possible values are:
      • merge: As above, the configuration data is merged into the existing configuration. This is the default value for default-operation.
      • replace: As above, the configuration data replaces the existing configuration.
      • none: Does not affect configuration data already existing in the target datastore, unless the given configuration data explicitly specifies an operation attribute. Unlike merge, if the configuration data does not exist, the server SHOULD return an <rpc-error> with an error-tag of data-missing. This mode allows the server to avoid creating parent elements for the elements being deleted when processing a delete operation.
    • error-option: Specifies the server’s strategy when encountering an error during request processing. The possible values are:
      • stop-on-error: Terminates the request processing upon encountering the first error. This is the default value for error-option.
      • continue-on-error: Logs the error and continues processing when an error is encountered, potentially resulting in multiple <rpc-error> elements being returned eventually.
      • rollback-on-error: Terminates processing upon encountering an error and reverts any configurations that were successfully applied within this <edit-config> operation. This enables the server to treat the configuration changes within <edit-config> as a transaction, ensuring they are either all applied or none are applied. This requires the server to support the Rollback-on-Error Capability.

    3.3.4 <copy-config>
    The <copy-config> operation is used to create or replace a configuration datastore using another configuration datastore or a complete configuration dataset.
    <copy-config> has two MANDATORY input parameters:

    • target: The name of the configuration datastore to be created or overwritten.
    • source: The name of the configuration datastore acting as the source, or a <config> element containing the complete configuration data.

    If the server supports the URL Capability, the content of both target and source MAY be a <url> element.

    It should be noted that even if the server supports the Writable-Running Capability, it does not necessarily support using running as the value for target. Similarly, even if the server supports the URL Capability, it does not necessarily support using <url> elements for both source and target, i.e., remote-to-remote copy is not guaranteed. If the values of source and target are the same, the server SHOULD return an <rpc-error> with an error-tag of invalid-value.

    3.3.5 <delete-config>
    The <delete-config> operation is used to delete a specified configuration datastore. The <running> datastore MUST NOT be deleted.
    <delete-config> has one MANDATORY input parameter, target, which is the name of the configuration datastore to be deleted. If the server supports the URL Capability, the target MAY contain a <url> element.

    3.3.6 <lock> and <unlock>
    The <lock> operation is used to lock a configuration datastore. A configuration datastore locked by one NETCONF session CANNOT be modified by other NETCONF sessions or other management plane interfaces (such as SNMP or CLI). This ensures that the locking session can modify the configuration without concerns about race conditions or data conflicts. These locks are expected to be short-lived.
    <lock> has one MANDATORY input parameter, target, used to specify the name of the configuration datastore to be locked.
    The <lock> request WILL fail if the lock on the target configuration datastore has already been acquired by another NETCONF session or another management plane interface.
    The server MUST release any locks held by the session if the session closes for any reason, whether due to <close-session>, <kill-session>, a transport layer error, timeout, or detection of abnormal peer behavior.
    The client MAY also explicitly release the lock via the <unlock> request. The parameters for <unlock> are the same as for <lock>. The <unlock> request WILL fail if the lock does not exist, or if the session holding the lock is not the current session.

    3.3.7 <close-session>
    The <close-session> operation is used to gracefully terminate a session.
    When the server receives a <close-session> request, it MUST ignore all subsequent requests received for that session, release all locks and resources associated with the session, and then close the connection.

    3.3.8 <kill-session>
    The <kill-session> operation is used to forcibly terminate a session.
    When the server receives a <kill-session> request, it MUST immediately terminate the request currently being processed by that session, release associated locks and resources, and close the connection.
    <kill-session> has one MANDATORY input parameter, session-id, used to specify the session to be forcibly terminated. The session-id is obtained from the <hello> message sent by the server at the beginning of the session establishment. This operation MUST NOT be used to terminate the current session. If the session-id is equal to the session-id of the current session, the server SHOULD return an invalid-value error.

    4 Typical Application Scenarios

    4.1 Centralized Automated Management of Data Center Network Configuration

    As the latest standardized network management protocol, NETCONF provides a unified, precise, and convenient method for bulk management of network devices. The following illustrates a typical out-of-band (OOB) management network topology used for managing data center equipment. 

    Typical-Data-Center-Network-and-Management-Network-Topology

    A management server is deployed within the OOB management network, enabling network communication with all switches. This setup allows for centralized management of all switches, including configuration changes and state checks.

    Switches in a data center may originate from multiple vendors. NETCONF and YANG offer a relatively unified operational interface to manage cross-vendor equipment, which simplifies operations and maintenance (O&M) to a certain extent. Although the specific YANG models supported by different vendors may vary, the management application can be built on the same foundation, executing the same operations (e.g., <edit-config>) across all devices. The only difference lies in the specific operational data, which may require network administrators to write different configuration data templates for various vendor devices. Crucially, the syntax of the configuration data remains consistent, as it is always the XML encoding corresponding to the YANG model.
    The precision of device management through NETCONF is embodied in the YANG model. The YANG model allows for the modeling of every configuration node and state node, enabling the client to precisely modify and query specific nodes.
    Furthermore, NETCONF supports batch configuration with a single request. For any given device, the management server can encapsulate the required configuration changes within one or several <edit-config> requests. Specifically, the Xingrongyuan (Xingrong Yuan) data center series switches support the Rollback-on-Error Capability, which guarantees the atomicity of a single <edit-config> operation—meaning the configuration changes within an <edit-config> either all take effect or none take effect. This feature prevents the device from entering an undesirable intermediate state upon encountering an error, thereby simplifying subsequent retry attempts.

    5 Appendix

    5.1 Protocol Standards Documentation

    The table below lists several current, valid, and relatively important RFC documents that constitute the NETCONF protocol standard:

    Category
    RFC ID
    Title
    Core Protocol
    RFC 6241
    The Network Configuration Protocol (NETCONF)
    Data Architecture
    RFC 8342
    Network Management Datastore Architecture (NMDA)
    Architectural Guidance
    RFC 6244
    An Architecture for Network Management Using NETCONF and YANG
    Extension Capabilities
    RFC 6243
    With-defaults Capability for NETCONF
    Extension Capabilities
    RFC 5277
    NETCONF Event Notifications
    Extension Capabilities
    RFC 6470
    Network Configuration Protocol (NETCONF) Base Notifications
    Extension Capabilities
    RFC 8639
    Subscription to YANG Notifications
    Extension Capabilities
    RFC 8640
    Dynamic Subscription to YANG Events and Datastores over NETCONF
    Extension Capabilities
    RFC 8641
    Subscription to YANG Notifications for Datastore Updates
    Extension Capabilities
    RFC 8525
    YANG Library
    Extension Capabilities
    RFC 8526
    NETCONF Extensions to Support the Network Management Datastore Architecture
    Security Access Control
    RFC 8341
    Network Configuration Access Control Model (NACM)
    Secure Transport
    RFC 6242
    Using the NETCONF Protocol over Secure Shell (SSH)
    Secure Transport
    RFC 7589
    Using the NETCONF Protocol over Transport Layer Security (TLS)


    ]]>