-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add benchmarks for end-to-end metrics SDK usage #7768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add benchmarks for end-to-end metrics SDK usage #7768
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7768 +/- ##
=====================================
Coverage 86.1% 86.1%
=====================================
Files 302 302
Lines 22046 22046
=====================================
+ Hits 18992 18993 +1
+ Misses 2674 2673 -1
Partials 380 380 🚀 New features to boost your workflow:
|
a4632e9 to
472ef54
Compare
f11a492 to
7c8a0ad
Compare
|
I added more benchmark cases to try and make it comprehensive, but results are now unreadable. I'll probably get rid of the varying cardinality, as it doesn't impact any of the results, only do 1 and 10 attributes, and get rid of the no-op results (as soon we can just use the Enabled() method anyways. |
|
I've updated the description to explain the scenarios, and why those were chosen, and removed some of the unnecessary permutations of the benchmark. @MrAlias, you had some issues with the framing of the "Dynamic" benchmark case which I tried to address above. I did implement varying degrees of cardinality for the test, but it did not impact the results at all, so I removed it. |
|
I would probably rather have this just in the metrics SDK package. @MrAlias are you ok with that? Otherwise I can keep it in a new module. |
| attrsSlice := attrPool.Get().(*[]attribute.KeyValue) | ||
| defer func() { | ||
| *attrsSlice = (*attrsSlice)[:0] // Reset. | ||
| attrPool.Put(attrsSlice) | ||
| }() | ||
| appendAttributes(attrsLen, attrsSlice) | ||
| addOpt := addOptPool.Get().(*[]metric.AddOption) | ||
| defer func() { | ||
| *addOpt = (*addOpt)[:0] | ||
| addOptPool.Put(addOpt) | ||
| }() | ||
| counter.Add(ctx, 1, metric.WithAttributes(*attrsSlice...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to what was mentioned in this thread, this seems to be biasing results. #7790 is using these results as a comparison, but is using an internal map to further optimize its results. A similar approach could be taken, as pointed out here #7770 (comment).
If we are going to use these benchmarks to make comparisons they need to done in an apples-to-apples way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7790 does use an additional sync.Map for the PoC, but that was just to avoid a larger refactor that would re-use the existing internal storage of the SDK. The bound instrument API can be implemented without using an additional sync.Map. If that is the blocker, I can do the refactor (it is just a day or two of work I'd rather not do if it doesn't matter).
The real question is whether we are going to recommend using a sync.Map for performance-sensitive users. Today, a user would need to:
- Copy our hashing algorithm for []attribute.KeyValue
- Implement (or copy) a sync.Map that limits cardinality, and (potentially) includes a TTL.
So I don't think it is viable today. Or at least it isn't viable enough that it is in our contributor guidelines. We could add libraries and functions to address those downsides. Maybe attribute.NewSetWithCaching(kvs ...attribute.KeyValue) would be ergonomic? But I consider that an alternative proposal, not the current state. It has other downsides, like incurring memory usage when no SDK is involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to say. I was told to have performance conversations here instead of #7790. Now I'm being told this should not be compared to the performance expectation of future API changes.
I'm fine not talking about future API changes here. I just don't know where to have this conversation is the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose that we:
- Agree that this PR (or something similar) represents the current state of the performance of the API + SDK, and merge it.
- Co-write a proposal and/or PoC for how we could make a cache-based approach viable.
- Debate the two proposals.
There are a drawbacks to a cache-based approach, but I can't argue against a proposal that doesn't exist. I can compare prototypes against the current state of the API + SDK, which i've tried to do fairly. But if we are serious about a cache-based alternative, we should work through the design before we assume it is a viable solution to the performance problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've documented what I think the best cache-based approaches are, and why they are not an acceptable solution to the stated problem in #7743 (comment)
Objective
Part of #7743. I need a benchmark that can demonstrate the performance of using our API, SDK, and attributes packages together when following our performance guide. https://github.com/open-telemetry/opentelemetry-go/blob/main/CONTRIBUTING.md#attribute-and-option-allocation-management.
I settled on benchmarking three scenarios: "Precomputed", "Dynamic", and "Naive".
In the "Precomputed" scenario, it is assumed that the attribute set being measured against is known ahead of time, and that the instrumentation author can enumerate all possible sets, and precompute whatever they want, and keep references to it.
In the "Dynamic" scenario, it is assumed that the attribute set being measured against is not known ahead of time, and that it is not feasible to enumerate all possible attribute sets ahead of time. However, this scenario still assumes bounded cardinality, as writing metrics with an unbounded cardinality is not the intended use of the API. I had originally written these benchmarks with varying overall cardinality, but the cardinality does not impact the test results, as long as it is reasonable and bounded (e.g. < 100,000).
In the "Naive" scenario, it is assumed the user uses the API in the simplest, most ergonomic way. This is an attempt to measure the "default" experience of our API + SDK that users get when they use it.
I also found that relative benchmark results did not change when different levels of parallelism are used, so all benchmark results are single-threaded.
Code location
@MrAlias IIRC you have pushed back against including benchmarks like these in the metrics SDK package, so i've put them in an internal directory. LMK if I should move them. I just want them somewhere in this repo so it is easy to evaluate changes to the attributes package involving using the metrics SDK.
Results
Observations