Skip to content

Made flag for debugging build time of user created widgets#100926

Merged
gaaclarke merged 9 commits intoflutter:masterfrom
gaaclarke:log-local-creations
Apr 5, 2022
Merged

Made flag for debugging build time of user created widgets#100926
gaaclarke merged 9 commits intoflutter:masterfrom
gaaclarke:log-local-creations

Conversation

@gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Mar 28, 2022

Turning on widget creation tracking for profile builds allows us to do performance evaluation techniques that wouldn't scale across all widgets. In this case we are using it to track build time for widgets on the CPU but it will also allow us to start tracking raster performance of widgets (cc @iskakaushik)

Added overhead (profile builds):

  • memory
    • ~50 bytes per widget for the construction location (I don't believe we are taking advantage of const to reduce this)
    • Memoization of _isLocalCreationLocation, ~5 bytes per source file used in the app that constructs widgets.
  • cpu
    • An extra timeline event for each user created widget's build
    • Extra parameter for each widget constructor call
    • String comparisons against construction file uri (memoized with HashMap) for build of each widget

fixes #100928

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
  • I signed the CLA.
  • I listed at least one issue that this PR fixes in the description above.
  • I updated/added relevant documentation (doc comments with ///).
  • I added new tests to check the change I am making, or this PR is test-exempt.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@flutter-dashboard flutter-dashboard bot added framework flutter/packages/flutter repository. See also f: labels. tool Affects the "flutter" command-line tool. See also t: labels. labels Mar 28, 2022
@gaaclarke gaaclarke force-pushed the log-local-creations branch 5 times, most recently from 943dc64 to d9196f9 Compare March 28, 2022 22:56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faster would be an identity hash from
Map<Uri, bool>
That will be significantly faster and as these objects are all const, you can trust identity hashes to be safe to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require changing the kernel transform to write Uri's into the _Location fields. We wouldn't want to parse the Uri in order to look up the cached result. That would potentially wipe out our wins from the identityHash. Since this is a complicated change, the kernel transform doesn't have any tests, and the performance written this way isn't bad, I'll leave that as a future optimization.

dart-lang/sdk#48694

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple notes:
there are multiple tests in widget_inspector_test.dart that E2E test the kernel transformer. Adding some more tests there would be good. I'll add one to ensure the _Location fields are still const. If they are not, that is a serious bug.
Actually I was confused. The URI class isn't const so that wouldn't work as the _Location objects need to be const.

@gaaclarke
Copy link
Member Author

I measured the BUILD times of the Flutter Gallery app to make sure we aren't introducing a lot of overhead by turning on track-widget-creation on in profile builds. The Flutter Gallery has 3 BUILD's before the first LAYOUT. Here are the results:

type 1st BUILD 2nd BUILD 3rd BUILD
w/ track-widget-creation 6.3 ms 1.8 ms 2.1 ms
w/o track-widget-creation 6ms 1.8 ms 2.8 ms

This was tested on an iOS device. At first blush there doesn't appear to be a significant change in BUILD times as a result of tracking widget creation.

@gaaclarke
Copy link
Member Author

gaaclarke commented Mar 29, 2022

I ran a performance test with debugProfileBuildsEnabledUserWidgets turn off an on as it is currently written:

debugProfileBuildsEnabledUserWidgets 1st BUILD 2nd BUILD 3rd BUILD
off 6.3 ms 1.8 ms 2.1 ms
on 6.5 ms 1.5 ms 2.7 ms

(Tested with the flutter gallery on an iOS device)

@gaaclarke
Copy link
Member Author

I switched to a HashMap since that should technically be faster and take up less memory. With the same gallery test I got 6.5ms 2.1ms 2.3ms.

@gaaclarke gaaclarke force-pushed the log-local-creations branch from d9196f9 to 4f940c7 Compare March 29, 2022 18:48
@flutter-dashboard flutter-dashboard bot added the c: contributor-productivity Team-specific productivity, code health, technical debt. label Mar 29, 2022
@gaaclarke gaaclarke force-pushed the log-local-creations branch 2 times, most recently from 36b9514 to 193c845 Compare March 29, 2022 22:08
@gaaclarke gaaclarke marked this pull request as ready for review March 29, 2022 22:09
@gaaclarke gaaclarke changed the title Made debugProfileBuildsEnabled filter on user created widgets Made flag for debugging build time of user created widgets Mar 29, 2022
@gaaclarke gaaclarke requested a review from Hixie March 29, 2022 22:10
@gaaclarke
Copy link
Member Author

@Hixie I chose you to review this since you've done work on the Timeline events and we talked about this feature in the past.

@gaaclarke gaaclarke force-pushed the log-local-creations branch 2 times, most recently from 3d4f903 to a59bf80 Compare March 29, 2022 23:03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what performance impact does this have?

it changes the behaviour of constant widgets, which seems significant, for one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not land this until we've answered this question. Ideally, we'd know how much overhead just this line adds, compared to the rest of this patch, as measured on a low end Android device with a large application like customer: money.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it changes the behaviour of constant widgets, which seems significant, for one

Good point. As far as the Gallery is concerned, the change was negligible in local testing. I'm open to suggestions to answering this question more definitively. I can write a microbenchmark but what would be a reasonable way to exercise this? A widget tree of 1000 const Widgets?

If this is a problem we can tweak track-widget-creation to store its data outside of the Widget.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to const behavior would be much smaller if you stored a single bool with each widget indicating whether it is created from the local object or not. That way the only changes in const behavior would be cases where the same exact const widget instance was created by two packages one that happened to be local and one that didn't.

You can't track widget creation data outside the widgets and have it work for const widgets. You could track it outside the widgets for non-const widgets but the performance impact would be significantly higher due to the extra expando/hashmap lookup.

@gaaclarke
Copy link
Member Author

gaaclarke commented Apr 1, 2022

I implemented the approximation and set the sampling rate to 2,000 hz. Then I ran the stocks microbenchmark with the flag on comparing it with baseline dart:

test time
exact thread time 969.4 µs
approximate thread time at 2000 hz 893.4 µs

Before the change we were at 831.9 µs, so we just added 61 µs, would this be acceptable to you @dnfield? @bkonyi are you open to approximation if I can give you some data about the error?

Here is the patch to dart:

diff --git a/runtime/vm/os_macos.cc b/runtime/vm/os_macos.cc
index eed635df0d2..5a4c52dbe79 100644
--- a/runtime/vm/os_macos.cc
+++ b/runtime/vm/os_macos.cc
@@ -105,7 +105,14 @@ int64_t OS::GetCurrentMonotonicMicros() {
   return GetCurrentMonotonicTicks() / kNanosecondsPerMicrosecond;
 }
 
-int64_t OS::GetCurrentThreadCPUMicros() {
+namespace {
+struct ThreadTimeInfo {
+  int64_t threshold = 0;
+  int64_t last_thread_time = 0;
+  int64_t last_clock_time = 0;
+};
+
+int64_t ExpensiveGetCurrentThreadCPUMicros() {
   mach_msg_type_number_t count = THREAD_BASIC_INFO_COUNT;
   thread_basic_info_data_t info_data;
   thread_basic_info_t info = &info_data;
@@ -121,6 +128,28 @@ int64_t OS::GetCurrentThreadCPUMicros() {
   return thread_cpu_micros;
 }
 
+int64_t approximate(const ThreadTimeInfo* info, int64_t clock_time) {
+  return (clock_time - info->last_clock_time) + info->last_thread_time;
+}
+
+thread_local ThreadTimeInfo* s_info = nullptr;
+} // namespace
+
+int64_t OS::GetCurrentThreadCPUMicros() {
+  int64_t clock_time = GetCurrentTimeMicros();
+  if (s_info == nullptr) {
+    s_info = new ThreadTimeInfo();
+  }
+  if (clock_time > s_info->threshold) {
+    s_info->last_clock_time = clock_time;
+    s_info->last_thread_time = ExpensiveGetCurrentThreadCPUMicros();
+    s_info->threshold = clock_time + 500;
+    return s_info->last_thread_time;
+  } else {
+    return approximate(s_info, clock_time);
+  }
+}
+
 int64_t OS::GetCurrentThreadCPUMicrosForTimeline() {
   return OS::GetCurrentThreadCPUMicros();
 }

@gaaclarke
Copy link
Member Author

@dnfield nevermind the approximation question, let's land this turned off and I can follow up later about turning it on with improvements to Timeline in a different PR.

@bkonyi
Copy link
Contributor

bkonyi commented Apr 4, 2022

I implemented the approximation and set the sampling rate to 2,000 hz. Then I ran the stocks microbenchmark with the flag on comparing it with baseline dart:

test time
exact thread time 969.4 µs
approximate thread time at 2000 hz 893.4 µs
Before the change we were at 831.9 µs, so we just added 61 µs, would this be acceptable to you @dnfield? @bkonyi are you open to approximation if I can give you some data about the error?

Here is the patch to dart:
...

@rmacnak-google might have a better way of doing this that doesn't involve approximating thread time and is quite a bit faster (quick results show ~3x improvement on some simple C programs when using the current GetCurrentThreadCPUMicros() implementation and one that uses clock_gettime_nsec_np(CLOCK_THREAD_CPUTIME_ID) / 1000 instead). I'll follow up later whether or not this approach pans out.

@gaaclarke gaaclarke force-pushed the log-local-creations branch from c5290c9 to 4dc7094 Compare April 4, 2022 23:00
Copy link
Contributor

@jacob314 jacob314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gaaclarke
Copy link
Member Author

I just tested build_bench.dart with this flag on with the 2 new patches we've done to make Timeline events faster.
flag off: 789.5
flag on: 930.6

It didn't lower the percentage of overhead. It just dropped the overall numbers. I'm going to walk away from this now that the low hanging fruit has been explored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c: contributor-productivity Team-specific productivity, code health, technical debt. framework flutter/packages/flutter repository. See also f: labels. tool Affects the "flutter" command-line tool. See also t: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make debugProfileBuildsEnabled only work for user widgets

5 participants