Oversampling Mitigation#366
Conversation
|
This change has been tested in a Lambda environment using the following Lambda function: |
| """ | ||
| self._check_ended() | ||
| subsegment.parent_id = self.id | ||
| subsegment.sampled = self.sampled if subsegment.sampled == True else False |
There was a problem hiding this comment.
Why are we doing parent based sampling if the subsegment itself is sampled? Do we need this line at all?
There was a problem hiding this comment.
This is specifically for the case when a subsegment is added to an unsampled subsegment. We have to ensure that once an unsampled subsegment is created, all of its child subsegments are also unsampled.
So for example, it is possible to do this:
segment = Segment('parent_segment', 'id', 'id', True)
subsegment = Subsegment('subsegment1', 'local', segment)
subsegment.sampled = False
segment.add_subsegment(subsegment)
subsegment2 = Subsegment('subsegment2', 'local', segment)
subsegment.add_subsegment(subsegment2)
In this case, it would be illegal for subsegment2 to be sampled, so it has to take the decision of its direct parent subsegment - subsegment1- which is unsampled.
There was a problem hiding this comment.
Does it make more sense to move this assignment to the "begin_subsegment" function in recorder.py and just base it off of the parent there? Then we can just base it off of the entity here "current_entity = self.get_trace_entity()". That entity would be the direct parent no?
There was a problem hiding this comment.
Having to check two levels like that is a code smell to me.
There was a problem hiding this comment.
There are two different ways to create subsegments, which is why this had to be done on two different levels - @srprash please correct me if I'm missing something
| :param bool sampling: sampling decision for the subsegment being created, | ||
| defaults to True |
There was a problem hiding this comment.
This sampling parameter is not the same in meaning with the sampled flag on an entity.
sampling=True here means that the subsegment will inherit the sampling decision of its parent entity, and could either be sampled or not.
sampling=False would mean that the subsegment will not inherit the sampling decision and will be unsampled.
I hope I was able to explain the difference. Please reword the description. :)
| # if segment is already close, we check if we can send entire segment | ||
| # otherwise we check if we need to stream some subsegments | ||
| if self.current_segment().ready_to_send(): | ||
| if self.current_segment().ready_to_send(): |
There was a problem hiding this comment.
nit: unnecessary white space.
| def isSampled(sqs_message): | ||
| attributes = sqs_message['attributes'] | ||
|
|
||
| if 'AWSTraceHeader' not in attributes: |
There was a problem hiding this comment.
I suggest making AWSTraceHeader into a constant and reuse it in both the places in this method.
| header = entity.get_origin_trace_header() | ||
| data = header.data if header else None | ||
|
|
||
There was a problem hiding this comment.
nit: unnecessary white spaces
|
Can we also get a unit test for the emitter? |
|
@carolabadeer I know we decided on this approach in our conversation. But now when I think of it again, I agree with @atshaw43 that we should keep the SDKs consistent. There could be confusion among customers why the a |
| if not segment.sampled: | ||
| current_entity = self.get_trace_entity() | ||
|
|
||
| if not current_entity.sampled or not sampling: |
There was a problem hiding this comment.
Would looking at just current_entity.sampled be enough here?
There was a problem hiding this comment.
I don't think so. If the current entity (segment or subsegment) is sampled and a new unsampled subsegment is being added, it is necessary to check the sampling flag being set on the newly created subsegment.
|
Updated lambda handler function with the new |
|
|
||
| if not current_entity.sampled: | ||
| subsegment = DummySubsegment(segment, name) | ||
| subsegment.sampled = False |
There was a problem hiding this comment.
This code is redundant. A DummySubsegment is already unsampled.
https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/core/models/dummy_entities.py#L96
|
|
||
| return subsegment | ||
|
|
||
| def begin_subsegment_without_sampling(self, name): |
There was a problem hiding this comment.
There's lot of duplicate code here with begin_subsegment. Can you move the common code to another function and reuse?
| return None | ||
|
|
||
| subsegment = DummySubsegment(segment, name) | ||
| subsegment.sampled = False |
There was a problem hiding this comment.
Need not set sampled = False on a DummySubsegment.
| ''' | ||
| Helper method to begin_subsegment and begin_subsegment_without_sampling | ||
| ''' | ||
| result = True |
There was a problem hiding this comment.
I feel this is still problematic. I'm not sure I get the point of using result and returning it.
Also, result seem to be of variable types. Here it is a boolean but later it can be a Subsegment. This can easily confusion and bugs.
I believe there is a better way to leverage this helper method than it's callers having to check its return types and values.
_begin_subsegment_helper can accept a parameter as flag for sampling or not. The callers can simply call this method with the flag and get back a Subsegment depending on the flag's value. How does that sound?
The method definition could look like:
def _begin_subsegment_helper(name, namespace='local', beginWithoutSampling=False)

Issue #, if available: N/A
Description of changes: Implemented oversampling mitigation to support creating subsegments without sampling (control sampling decisions at the subsegment level) and an SQS message helper class.
Note: Since it is possible to modify the
sampledflag of a subsegment without the need to create a new API, this is the design implemented in the changes below. For instance, to create a new unsampled subsegment and add it to a Lambda facade segment:Similarly, the
begin_subsegmentAPI has been modified to include a defaultsamplingparameter which defaults toTrue, and can be set toFalsewhen creating a new unsampled subsegment.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.