Using V8’s sample-based profiler
V8 has built-in sample-based profiling. Profiling is turned off by default, but can be enabled via the
d8 shell following the instructions at Building with GN.
Command line #
To start profiling, use the
--prof option. When profiling, V8 generates a
v8.log file which contains profiling data.
build\Release\d8 --prof script.js
Other platforms (replace
x64 if you want to profile the
out/ia32.release/d8 --prof script.js
Process the generated output #
Log file processing is done using JS scripts running by the d8 shell. For this to work, a
d8 binary (or symlink, or
d8.exe on Windows) must be in the root of your V8 checkout, or in the path specified by the environment variable
D8_PATH. Note: this binary is just used to process the log, but not for the actual profiling, so it doesn’t matter which version etc. it is.
d8 used for analysis was not built with
Web UI for
Preprocess the log with
--preprocess (to resolve C++ symbols, etc).
$V8_PATH/tools/linux-tick-processor --preprocess > v8.json
tools/profview/index.html in your browser and select the
v8.json file there.
Snapshot-based VM build and builtins reporting #
When a snapshot-based VM build is being used, code objects from a snapshot that don't correspond to functions are reported with generic names like "A builtin from the snapshot", because their real names are not stored in the snapshot. To see the names the following steps must be taken:
--log-snapshot-positionsflag must be passed to VM (along with
--prof); this way, for deserialized objects the
(memory address, snapshot offset)pairs are being emitted into profiler log;
--snapshot-log=<log file from mksnapshot>flag must be passed to the tick processor script; a log file from the
mksnapshotprogram (a snapshot log) contains address-offset pairs for serialized objects, and their names; using the snapshot log, names can be mapped onto deserialized objects during profiler log processing; the snapshot log file is called
snapshot.logand resides alongside with V8's compiled files.
An example of usage:
out/ia32.release/d8 --prof --log-snapshot-positions script.js
tools/linux-tick-processor --snapshot-log=out/ia32.release/obj.target/v8_snapshot/geni/snapshot.log v8.log
Example output #
Timeline plot #
The timeline plot visualizes where V8 is spending time. This can be used to find bottlenecks and spot things that are unexpected (for example, too much time spent in the garbage collector). Data for the plot are gathered by both sampling and instrumentation. Linux with gnuplot 4.6 is required.
To create a timeline plot, run V8 as described above, with the option
--log-timer-events additional to
out/ia32.release/d8 --prof --log-timer-events script.js
The output is then passed to a plot script, similar to the tick-processor:
timer-events.png in the working directory, which can be opened with most image viewers.
Since recording log output comes with a certain performance overhead, the script attempts to correct this using a distortion factor. If not specified, it tries to find out automatically. You can however also specify the distortion factor manually.
tools/plot-timer-events --distortion=4500 v8.log
You can also manually specify a certain range for which to create the plot or statistical profile, expressed in milliseconds:
tools/plot-timer-events --distortion=4500 --range=1000,2000 v8.log
tools/linux-tick-processor --distortion=4500 --range=1000,2000 v8.log
HTML version #
Profiling web applications #
Today’s highly optimized virtual machines can run web apps at blazing speed. But one shouldn’t rely only on them to achieve great performance: a carefully optimized algorithm or a less expensive function can often reach many-fold speed improvements on all browsers. Chrome DevTools’ CPU Profiler helps you analyze your code’s bottlenecks. But sometimes, you need to go deeper and more granular: this is where V8’s internal profiler comes in handy.
Let’s use that profiler to examine the Mandelbrot explorer demo that Microsoft released together with IE10. After the demo release, V8 has fixed a bug that slowed down the computation unnecessarily (hence the poor performance of Chrome in the demo’s blog post) and further optimized the engine, implementing a faster
exp() approximation than what the standard system libraries provide. Following these changes, the demo ran 8× faster than previously measured in Chrome.
But what if you want the code to run faster on all browsers? You should first understand what keeps your CPU busy. Run Chrome (Windows and Linux Canary) with the following command line switches, which causes it to output profiler tick information (in the
v8.log file) for the URL you specify, which in our case was a local version of the Mandelbrot demo without web workers:
./chrome --js-flags='--prof' --no-sandbox 'http://localhost:8080/'
When preparing the test case, make sure it begins its work immediately upon load, and close Chrome when the computation is done (hit Alt+F4), so that you only have the ticks you care about in the log file. Also note that web workers aren’t yet profiled correctly with this technique.
Then, process the
v8.log file with the
tick-processor script that ships with V8 (or the new practical web version):
Here’s an interesting snippet of the processed output that should catch your attention:
Statistical profiling result from null, (14306 ticks, 0 unaccounted, 0 excluded). [Shared libraries]: ticks total nonlib name 6326 44.2% 0.0% /lib/x86_64-linux-gnu/libm-2.15.so 3258 22.8% 0.0% /.../chrome/src/out/Release/lib/libv8.so 1411 9.9% 0.0% /lib/x86_64-linux-gnu/libpthread-2.15.so 27 0.2% 0.0% /.../chrome/src/out/Release/lib/libwebkit.so
The top section shows that V8 is spending more time inside an OS-specific system library than in its own code. Let’s look at what’s responsible for it by examining the “bottom up” output section, where you can read indented lines as “was called by” (and lines starting with a
* mean that the function has been optimized by TurboFan):
[Bottom up (heavy) profile]: Note: percentage shows a share of a particular caller in the total amount of its parent calls. Callers occupying less than 2.0% are not shown. ticks parent name 6326 44.2% /lib/x86_64-linux-gnu/libm-2.15.so 6325 100.0% LazyCompile: *exp native math.js:91 6314 99.8% LazyCompile: *calculateMandelbrot http://localhost:8080/Demo.js:215
More than 44% of the total time is spent executing the
exp() function inside a system library! Adding some overhead for calling system libraries, that means about two thirds of the overall time are spent evaluating
exp() is used solely to produce a smooth grayscale palette. There are countless ways to produce a smooth grayscale palette, but let’s suppose you really really like exponential gradients. Here is where algorithmic optimization comes into play.
You’ll notice that
exp() is called with an argument in the range
-4 < x < 0, so we can safely replace it with its Taylor approximation for that range, which delivers the same smooth gradient with only a multiplication and a couple of divisions:
exp(x) ≈ 1 / ( 1 - x + x * x / 2) for -4 < x < 0
Tweaking the algorithm this way boosts the performance by an extra 30% compared to latest Canary and 5× to the system library based
Math.exp() on Chrome Canary.
This example shows how V8’s internal profiler can help you go deeper into understanding your code bottlenecks, and that a smarter algorithm can push performance even further.
To find out more about how benchmark that represent today’s complex and demanding web applications, read How V8 measures real-world performance.