|
| 1 | +# CloudEvents Reading Performance Optimization |
| 2 | + |
| 3 | +## Rationale |
| 4 | + |
| 5 | +After implementing the initial plan 0015-cloud-events-serialization, the current CloudEvent JSON reading implementation incurs unnecessary allocations when deserializing the `data` property. In `CloudEventEnvelopeJsonReader.ReadEnvelope`, when the `data` property is encountered, the implementation: |
| 6 | + |
| 7 | +1. Parses the data subtree into a `JsonDocument` |
| 8 | +2. Serializes it back to a `byte[]` via `JsonSerializer.SerializeToUtf8Bytes` |
| 9 | +3. Stores this copy in `CloudEventEnvelopePayload.DataBytes` |
| 10 | +4. Later deserializes from `DataBytes` to the actual payload type |
| 11 | + |
| 12 | +This creates three allocations (JsonDocument, internal buffers, byte[] copy) that can be eliminated. Since the caller provides a `ReadOnlyMemory<byte>` containing the original JSON, we can track byte positions and use a slice of the original buffer instead of copying. |
| 13 | + |
| 14 | +## Acceptance Criteria |
| 15 | + |
| 16 | +- [x] `CloudEventEnvelopePayload` stores position-based tracking (`DataStart`, `DataLength`) instead of `byte[]` for the data segment |
| 17 | +- [x] No `JsonDocument` is allocated when parsing the CloudEvent envelope |
| 18 | +- [x] No intermediate `byte[]` copy is created for the data payload |
| 19 | +- [x] The original buffer slice is used directly when deserializing the data payload |
| 20 | +- [x] All existing CloudEvent reading functionality remains intact |
| 21 | +- [x] Automated tests are written or updated to verify the new implementation |
| 22 | +- [x] BenchmarkDotNet benchmarks are added to `./benchmarks/Benchmarks/` to measure the allocation reduction |
| 23 | + |
| 24 | +## Technical Details |
| 25 | + |
| 26 | +### Current Allocation Flow |
| 27 | + |
| 28 | +``` |
| 29 | +ReadOnlyMemory<byte> cloudEvent (original buffer) |
| 30 | + │ |
| 31 | + ▼ |
| 32 | +JsonSerializer.Deserialize<CloudEventEnvelopePayload>(cloudEvent.Span, options) |
| 33 | + │ |
| 34 | + ▼ (inside CloudEventEnvelopePayloadJsonConverter.Read) |
| 35 | + │ |
| 36 | +CloudEventEnvelopeJsonReader.ReadEnvelope(ref Utf8JsonReader reader) |
| 37 | + │ |
| 38 | + └─ When hitting "data" property: |
| 39 | + │ |
| 40 | + ├─ JsonDocument.ParseValue(ref reader) ← Allocation #1 |
| 41 | + │ |
| 42 | + └─ JsonSerializer.SerializeToUtf8Bytes() ← Allocation #2 |
| 43 | + │ |
| 44 | + ▼ |
| 45 | + byte[] dataBytes stored in payload |
| 46 | + │ |
| 47 | + ▼ |
| 48 | +Later: JsonSerializer.Deserialize<T>(dataBytes) ← Re-parsing same content |
| 49 | +``` |
| 50 | + |
| 51 | +### Zero-Copy Architecture (Position-Based) |
| 52 | + |
| 53 | +The key insight is that `Utf8JsonReader.BytesConsumed` tracks the current byte position within the input buffer. By recording positions before and after skipping the `data` value, we can compute a slice of the original buffer **after** deserialization completes. |
| 54 | + |
| 55 | +This approach maintains full STJ converter extensibility - users can still provide custom `JsonConverter` implementations. |
| 56 | + |
| 57 | +#### 1. Modify `CloudEventEnvelopePayload` |
| 58 | + |
| 59 | +Change the data storage from `byte[]?` to position tracking: |
| 60 | + |
| 61 | +```csharp |
| 62 | +public readonly struct CloudEventEnvelopePayload |
| 63 | +{ |
| 64 | + // Remove: public byte[]? DataBytes { get; } |
| 65 | + // Add: |
| 66 | + public int DataStart { get; } // Byte offset where data value begins |
| 67 | + public int DataLength { get; } // Length of data value in bytes |
| 68 | +
|
| 69 | + // HasData and IsDataNull remain for semantic clarity |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +#### 2. Modify `CloudEventEnvelopeJsonReader.ReadEnvelope` |
| 74 | + |
| 75 | +Update the existing method to track byte positions using `BytesConsumed` instead of copying bytes: |
| 76 | + |
| 77 | +```csharp |
| 78 | +public static CloudEventEnvelopePayload ReadEnvelope(ref Utf8JsonReader reader) |
| 79 | +{ |
| 80 | + // ... existing envelope attribute parsing ... |
| 81 | +
|
| 82 | + int dataStart = 0; |
| 83 | + int dataLength = 0; |
| 84 | + var hasData = false; |
| 85 | + var isDataNull = false; |
| 86 | + |
| 87 | + // When hitting "data" property: |
| 88 | + else if (reader.ValueTextEquals("data")) |
| 89 | + { |
| 90 | + // Record position BEFORE reading the data value token |
| 91 | + // BytesConsumed is the position after the property name |
| 92 | + int positionBeforeDataValue = (int)reader.BytesConsumed; |
| 93 | + |
| 94 | + if (!reader.Read()) |
| 95 | + { |
| 96 | + throw new JsonException("Unexpected end of JSON while reading data."); |
| 97 | + } |
| 98 | + |
| 99 | + hasData = true; |
| 100 | + if (reader.TokenType == JsonTokenType.Null) |
| 101 | + { |
| 102 | + isDataNull = true; |
| 103 | + // dataStart and dataLength remain 0 |
| 104 | + } |
| 105 | + else |
| 106 | + { |
| 107 | + // Skip the entire data subtree without parsing into JsonDocument |
| 108 | + reader.Skip(); |
| 109 | + |
| 110 | + int positionAfterDataValue = (int)reader.BytesConsumed; |
| 111 | + |
| 112 | + dataStart = positionBeforeDataValue; |
| 113 | + dataLength = positionAfterDataValue - positionBeforeDataValue; |
| 114 | + } |
| 115 | + } |
| 116 | + |
| 117 | + // ... rest of parsing ... |
| 118 | +
|
| 119 | + return new CloudEventEnvelopePayload( |
| 120 | + type!, |
| 121 | + source!, |
| 122 | + id!, |
| 123 | + subject, |
| 124 | + time, |
| 125 | + dataContentType, |
| 126 | + dataSchema, |
| 127 | + extensionAttributes, |
| 128 | + hasData, |
| 129 | + isDataNull, |
| 130 | + dataStart, |
| 131 | + dataLength |
| 132 | + ); |
| 133 | +} |
| 134 | +``` |
| 135 | + |
| 136 | +**Important:** `Utf8JsonReader.BytesConsumed` returns the number of bytes consumed *up to and including* the current token. We capture the position *after* reading the property name (before the value), then again after `Skip()` to get the complete range. |
| 137 | + |
| 138 | +#### 3. Update Extension Methods |
| 139 | + |
| 140 | +Modify `ReadOnlyMemoryCloudEventExtensions` to slice the original buffer after deserialization: |
| 141 | + |
| 142 | +```csharp |
| 143 | +public static CloudEventEnvelope ReadResultWithCloudEventEnvelope( |
| 144 | + this ReadOnlyMemory<byte> cloudEvent, |
| 145 | + LightResultsCloudEventReadOptions? options = null) |
| 146 | +{ |
| 147 | + var readOptions = options ?? LightResultsCloudEventReadOptions.Default; |
| 148 | + |
| 149 | + // Deserialize through STJ (maintains converter extensibility) |
| 150 | + var parsedEnvelope = JsonSerializer.Deserialize<CloudEventEnvelopePayload>( |
| 151 | + cloudEvent.Span, |
| 152 | + readOptions.SerializerOptions |
| 153 | + ); |
| 154 | + |
| 155 | + var isFailure = DetermineIsFailure(parsedEnvelope, readOptions); |
| 156 | + |
| 157 | + // Slice the original buffer using tracked positions - ZERO COPY |
| 158 | + var dataSegment = parsedEnvelope.HasData && !parsedEnvelope.IsDataNull |
| 159 | + ? cloudEvent.Slice(parsedEnvelope.DataStart, parsedEnvelope.DataLength) |
| 160 | + : ReadOnlyMemory<byte>.Empty; |
| 161 | + |
| 162 | + var result = ParseResultPayload(dataSegment, isFailure, readOptions); |
| 163 | + |
| 164 | + // ... rest unchanged ... |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +#### 4. Update Payload Parsing Methods |
| 169 | + |
| 170 | +Change signature to accept `ReadOnlyMemory<byte>` instead of extracting from payload: |
| 171 | + |
| 172 | +```csharp |
| 173 | +private static Result ParseResultPayload( |
| 174 | + ReadOnlyMemory<byte> dataSegment, |
| 175 | + bool isFailure, |
| 176 | + LightResultsCloudEventReadOptions options) |
| 177 | +{ |
| 178 | + if (dataSegment.IsEmpty) |
| 179 | + { |
| 180 | + if (isFailure) |
| 181 | + { |
| 182 | + throw new JsonException( |
| 183 | + "CloudEvent failure payloads for non-generic Result must contain non-null data." |
| 184 | + ); |
| 185 | + } |
| 186 | + return Result.Ok(); |
| 187 | + } |
| 188 | + |
| 189 | + if (isFailure) |
| 190 | + { |
| 191 | + var failurePayload = JsonSerializer.Deserialize<CloudEventFailurePayload>( |
| 192 | + dataSegment.Span, |
| 193 | + options.SerializerOptions |
| 194 | + ); |
| 195 | + return Result.Fail(failurePayload.Errors, failurePayload.Metadata); |
| 196 | + } |
| 197 | + |
| 198 | + // ... etc ... |
| 199 | +} |
| 200 | +``` |
| 201 | + |
| 202 | +#### 5. Converter Extensibility Preserved |
| 203 | + |
| 204 | +The `CloudEventEnvelopePayloadJsonConverter` continues to work through STJ's normal deserialization pipeline. Users can: |
| 205 | +- Register custom converters for envelope parsing |
| 206 | +- Override behavior via `JsonSerializerOptions` |
| 207 | +- Extend without modifying core library code |
| 208 | + |
| 209 | +The optimization is transparent to the converter - it simply stores positions instead of copying bytes. |
| 210 | + |
| 211 | +### Benchmark Design |
| 212 | + |
| 213 | +Create `CloudEventReadingBenchmarks.cs` with: |
| 214 | + |
| 215 | +1. **Baseline:** Current implementation (JsonDocument + SerializeToUtf8Bytes copy) |
| 216 | +2. **Optimized:** Position-based approach (BytesConsumed + Skip + slice) |
| 217 | +3. **Test cases:** |
| 218 | + - Small data payload (~100 bytes) |
| 219 | + - Medium data payload (~1KB) |
| 220 | + - Large data payload (~10KB) |
| 221 | + - Success vs failure payloads |
| 222 | + |
| 223 | +Measure both execution time and allocations using `[MemoryDiagnoser]`. |
| 224 | + |
| 225 | +### Edge Cases |
| 226 | + |
| 227 | +- **Empty data segment:** `DataStart = 0, DataLength = 0` when `IsDataNull` is true or `HasData` is false |
| 228 | +- **Nested complex data:** `reader.Skip()` correctly handles any valid JSON subtree |
| 229 | +- **Unicode and escapes:** The slice captures raw UTF-8 bytes which `JsonSerializer` handles correctly |
| 230 | +- **Whitespace:** `BytesConsumed` includes any whitespace between tokens; this is fine since the slice is re-parsed by STJ |
| 231 | + |
| 232 | +### Breaking Changes |
| 233 | + |
| 234 | +Changing `byte[]? DataBytes` to `int DataStart` and `int DataLength` is a breaking change, but the library is not published yet, so there is no issue here. |
0 commit comments