{"id":15980,"date":"2021-03-14T08:31:29","date_gmt":"2021-03-14T07:31:29","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/"},"modified":"2021-03-14T08:31:29","modified_gmt":"2021-03-14T07:31:29","slug":"linux-perf-top-basics-understand-the","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/","title":{"rendered":"Linux perf-top basics: understand the %"},"content":{"rendered":"<h2>By Franck Pachot<\/h2>\n<p>.<br \/>\nLinux kernel has a powerful instrumentation that can be accessed easily. When you want to drill down into your program functions to understand their CPU usage, &#8220;perf&#8221; is the easiest. It can attach to the processes, sample the CPU cycles, get the symbol name, or even the call stack. And display an histogram of sample counts. This provides an easy profiling tool to understand in which function your program spends its CPU time, so that you know where an improvement can optimize the overall resource usage. But I&#8217;m seeing that people may be intimidated by this kind of tools and doesn&#8217;t know how to interpret the percentages.<\/p>\n<p>The best way to confirm your understanding is running a small example where you know the behavior, and look at which numbers the tool provides.<\/p>\n<pre><code>\nf1(int n) {\n  int i; for (i = 0; i &lt;= 5; i++) {}; return i;\n};\nf2(int n) {\n  int i; for (i = 0; i &lt;= 5; i++) {}; return i;\n};\nmain() {\n  int r; \n  int i; for (i = 0;; i++) {\n    (i % 3 == 0) ? f1(i): f2(i);\n  };\n  return r;\n}\n<\/code><\/pre>\n<p>I have two functions f1() and f2() which do exactly the same thing. I just want to have different names. And the main() function loops and calls the f1() function 1\/3rd of times and f2() 2\/3rd of times, thanks to the (i % 3) condition on modulo 3<\/p>\n<pre><code>\nperf top -F100 -d10 -p$(pgrep -d, a.out)\n<\/code><\/pre>\n<p>You can simply run `perf top` but here I explicitly reduced the sample frequency to 100Hz (safer to reduce the overhead of sampling) and the display refresh frequency to 10 seconds because it is simple to take consistent screenshots. I measure only my program by getting the PID through `pgrep -d, a.out`. I display only the userspace symbols by hiding the kernel ones (-K). There are many possibilities to target specific processes in `perf` as well in `pgrep`but that&#8217;s not the goal of this post. The man pages are clear about that.<\/p>\n<pre><code>\nSamples: 4K of event 'cycles', 100 Hz, Event count (approx.): 69818773752 lost: 0\/0 drop: 0\/0\nOverhead  Share  Symbol\n  59.86%  a.out  [.] f2\n  31.10%  a.out  [.] f1\n   9.04%  a.out  [.] main\n<\/code><\/pre>\n<p>The &#8220;Shared Object&#8221; is my program (a.out) or a library that is called, or the kernel. Here I didn&#8217;t specify the process. Symbols are the C functions. And the &#8220;Overhead&#8221; percentage sums to 100% and this is where it is important to filter what you sample so that it is easy to profile it as a ratio of the total samples for this target.<\/p>\n<p>This is the default option, showing the symbols but not the call stack, the percentages are the time spend in the function. My program spends only 9.04% of its CPU time in main() because very few is done there. When main() calls a function, the samples are accounted for the function but not for main(). On the remaining time (91%) one third of the time is spent in f1() and two third in f2().<\/p>\n<p>This may already be sufficient to know where to investigate for optimizing the code. For example, if you divide by two the work done in f2() you know that this will reduce by 30% the CPU usage for your program (the math is: 50% x 59.86%)<\/p>\n<h3>call-graph<\/h3>\n<p>Reducing the resources used by a function is one possibility. But the best optimization would be to avoid calling it too often. And I have no information to help me about that. Nothing here tells me that f1() is called from main(). It could have been called from f2(). Or from both. When troubleshooting a program execution, knowing the function is not enough, we need to see the whole call stack.<\/p>\n<pre><code>\nperf top -F100 -d10 -p$(pgrep a.out) -K -g\n<\/code><\/pre>\n<p>I&#8217;ve added -g here to record the call stack (not only in which function we are when sampling, but also where the functions come from). There are different modes that you can choose with &#8211;call-graph but I&#8217;m using the default here.<\/p>\n<pre><code>\nSamples: 3K of event 'cycles', 100 Hz, Event count (approx.): 37649486188 lost: 0\/0 drop: 0\/0\n  Children      Self  Shared Objec  Symbol\n+   95.61%     8.19%  a.out         [.] main\n+   60.64%    60.64%  a.out         [.] f2\n+   31.17%    31.17%  a.out         [.] f1\n+   15.55%     0.00%  libc-2.17.so  [.] __libc_start_main\n<\/code><\/pre>\n<p>Here the &#8220;Self&#8221; column is similar to what I had without &#8220;-g&#8221;: this is the percentage of CPU cycles spent in each function. But the &#8220;Children&#8221; column adds the time spent in all called functions below it. Not only the immediate children, but all descendants. For leaves of the call graph, functions not calling anything else, the value of Self and Children is equal. But for main() it adds the time spent in f1()&lt;-main() and f2()&lt;-main(). You read the first line as: 95.61% of time is spent in the call to main() and only 8.19% is on main() instructions because it calls other functions most of the time. Note that you cad add &#8220;Self&#8221; to cover 100% but in &#8220;Children&#8221; the children samples are accounted in many lines. The idea is to see on top the fragment of call stack that accounts for the most samples.<\/p>\n<p>There&#8217;s a &#8220;+&#8221; where you can drill down. I do it for all of them here be you typically go to the one you want to analyze deeper:<\/p>\n<pre><code>\n-   95.61%     8.19%  a.out         [.] main     \n   - 87.42% main                                 \n        57.68% f2                                \n        29.74% f1                                \n   - 8.19% __libc_start_main                     \n        main                                     \n-   60.64%    60.64%  a.out         [.] f2       \n   - 60.64% __libc_start_main                    \n      - 57.68% main                              \n           f2                                    \n        2.97% f2                                 \n-   31.17%    31.17%  a.out         [.] f1       \n   - 31.17% __libc_start_main                    \n      - 29.74% main                              \n           f1                                    \n        1.42% f1                                 \n-   15.55%     0.00%  libc-2.17.so  [.] __libc_start_main \n   - __libc_start_main                                    \n      - 14.63% main                                       \n           8.96% f2                                       \n           4.58% f1                      \n<\/code><\/pre>\n<p>When drilling-down the first line I can see that the difference between 95.61% in main() call and 8.19% in main() instructions is in calls to f2() and f1(). Note that you understand the reason why I&#8217;ve set &#8211;delay to 10 seconds: to be able to drill down on the same numbers before refresh in order to make this clearer in the blog post. Because with 100Hz sampling the numbers may change slightly.<\/p>\n<h3>callee\/caller<\/h3>\n<p>In order to investigate on functions that are called from many places, I replace the work done in f2() by a call to f1():<\/p>\n<pre><code>\nf1(int n) {\n  int i; for (i = 0; i &lt;= 5; i++) {}; return i;\n};\nf2(int n) {\n  return f1(n);\n};\nmain() {\n  int r; \n  int i; for (i = 0;; i++) {\n    (i % 3 == 0) ? f1(i): f2(i);\n  };\n  return r;\n}\n<\/code><\/pre>\n<p>Now, in addition to being called 1\/3 of time from main, most of the time spent in f2() is also calling f1().<\/p>\n<pre><code>\nSamples: 255  of event 'cycles', 100 Hz, Event count (approx.): 5934954123 lost: 0\/0 drop: 0\/0\nOverhead  Share  Symbol\n  83.59%  a.out  [.] f1\n  12.81%  a.out  [.] main\n   3.60%  a.out  [.] f2\n<\/code><\/pre>\n<p>Without the call graph, the time spend in the function instructions is, proportionally to the total time, now mostly in f1() because f1() is called in all by branch condition. The part accounted in f2() is minimal.<\/p>\n<pre><code>\nperf record -F100 -g --delay=5 -v -p $(pgrep -d, a.out) -a sleep 30\nperf report | cat\n<\/code><\/pre>\n<p>Rather than looking at it live, I record the samples (mentioning the PID but running a `sleep 30` to record for 30 seconds).<\/p>\n<pre><code>\nSamples: 764  of event 'cycles', Event count (approx.): 18048581591\n  Children      Self  Command  Shared Object      Symbol\n+  100.00%     0.00%  a.out    libc-2.17.so       [.] __libc_start_main\n+   99.47%    10.87%  a.out    a.out              [.] main\n-   86.22%    85.68%  a.out    a.out              [.] f1\n   - 85.68% __libc_start_main\n      - main\n         - 57.78% f2\n              f1\n           27.90% f1\n   - 0.54% f1\n      + 0.54% apic_timer_interrupt\n-   60.82%     2.77%  a.out    a.out              [.] f2\n   - 58.05% f2\n        f1\n   - 2.77% __libc_start_main\n      - 2.24% main\n           f2\n        0.53% f2\n+    0.68%     0.00%  a.out    [kernel.kallsyms]  [k] apic_timer_interrupt\n+    0.68%     0.00%  a.out    [kernel.kallsyms]  [k] smp_apic_timer_interrupt\n<\/code><\/pre>\n<p>If we look at the f1() detail we can see 27.90% with f1()&lt;-main() which is the 1\/3rd calls from there. And 57.78% with f1()&lt;-f2()&lt;-main() for the 2\/3rd of the conditional branch. With of course some time in main() itself (10.87%) and in f2() itself (2.77%)<\/p>\n<p>This is the caller breakdown which can also be displayed with `perf report &#8211;call-graph ,,,,caller`<br \/>\nWe see that f1() is on CPU 57.78% of time through f1()&lt;-f2()&lt;-main() and 27.90% directly through f2()&lt;-main()<\/p>\n<p>We can also display it with breakdown on callee: by changing the call graph order as `perf report &#8211;call-graph ,,,,callee`<\/p>\n<pre><code>\nSamples: 764  of event 'cycles', Event count (approx.): 18048581591\n  Children      Self  Command  Shared Object      Symbol\n-  100.00%     0.00%  a.out    libc-2.17.so       [.] __libc_start_main\n     __libc_start_main\n-   99.47%    10.87%  a.out    a.out              [.] main\n     main\n     __libc_start_main\n-   86.22%    85.68%  a.out    a.out              [.] f1\n   - f1\n      - 58.05% f2\n           main\n           __libc_start_main\n      - 28.17% main\n           __libc_start_main\n-   60.82%     2.77%  a.out    a.out              [.] f2\n   - f2\n      - 60.29% main\n           __libc_start_main\n        0.53% __libc_start_main\n+    0.68%     0.00%  a.out    [kernel.kallsyms]  [k] apic_timer_interrupt\n+    0.68%     0.00%  a.out    [kernel.kallsyms]  [k] smp_apic_timer_interrupt\n<\/code><\/pre>\n<p>This shows that among the 86.22% samples in f1() 58.05% are from f2() and 28.17% from main<\/p>\n<h3>folded<\/h3>\n<p>With long call stack, it may be easier to read them folded (and this is what is used by Brendan Gregg <a href=\"https:\/\/github.com\/brendangregg\/FlameGraph\" target=\"_blank\" rel=\"noopener\">Flame Graphs<\/a> ):<\/p>\n<pre><code>\nperf report --call-graph ,,,,caller -g folded --stdio\n\n# Children      Self  Command  Shared Object      Symbol\n# ........  ........  .......  .................  ..............................................\n#\n   100.00%     0.00%  a.out    libc-2.17.so       [.] __libc_start_main\n57.78% __libc_start_main;main;f2;f1\n27.90% __libc_start_main;main;f1\n10.87% __libc_start_main;main\n2.24% __libc_start_main;main;f2\n0.53% __libc_start_main;f2\n    99.47%    10.87%  a.out    a.out              [.] main\n57.78% main;f2;f1\n27.90% main;f1\n10.87% __libc_start_main;main\n2.24% main;f2\n    86.22%    85.68%  a.out    a.out              [.] f1\n57.78% __libc_start_main;main;f2;f1\n27.90% __libc_start_main;main;f1\n    60.82%     2.77%  a.out    a.out              [.] f2\n57.78% f2;f1\n2.24% __libc_start_main;main;f2\n0.53% __libc_start_main;f2\n<\/code><\/pre>\n<p>Here are the number of sample in each call stack. For example, just looking at the top I can see that 57.78% is on main()-&gt;f2-&gt;f1(). So if I can optimize something there in the number of calls from f2() to f1(), I know that I can address a large part of the response time, and CPU resource. This even without optimizing f1() itself. Remember that there are two ways to improve the performance: do it faster and do it less.<\/p>\n<pre><code>\nperf report --call-graph ,,,,callee -g folded --stdio\n\n# Children      Self  Command  Shared Object      Symbol\n# ........  ........  .......  .................  ..............................................\n#\n   100.00%     0.00%  a.out    libc-2.17.so       [.] __libc_start_main\n100.00% __libc_start_main\n    99.47%    10.87%  a.out    a.out              [.] main\n99.47% main;__libc_start_main\n    86.22%    85.68%  a.out    a.out              [.] f1\n58.05% f1;f2;main;__libc_start_main\n28.17% f1;main;__libc_start_main\n    60.82%     2.77%  a.out    a.out              [.] f2\n60.29% f2;main;__libc_start_main\n0.53% f2;__libc_start_main\n     0.68%     0.00%  a.out    [kernel.kallsyms]  [k] apic_timer_interrupt\n     0.68%     0.00%  a.out    [kernel.kallsyms]  [k] smp_apic_timer_interrupt\n<\/code><\/pre>\n<p>When folding in the callee order, the focus is on the function itself. Here I can quickly see that f1() is the hotspot, through f1()&lt;-f2()&lt;-main() and f1()&lt;-main() call chains.<\/p>\n<h3>filter<\/h3>\n<p>Between caller and callee, when you have a very large call stack, with functions called from many places, and and others calling many functions, it may be difficult to zoom on the point where you want to investigate. I do it in two steps in this case. First, a simple `sudo perf top` to see the functions which is on the top of CPU usage:<\/p>\n<pre><code>   PerfTop:     435 irqs\/sec  kernel: 6.0%  exact:  0.0% lost: 0\/0 drop: 0\/617 [1000Hz cycles],  (all, 2 CPUs)\n---------------------------------------------------------------------------------------------------------------------------------------------\n\n    15.41%  libsnappy.so.1.3.0                      [.] snappy::RawUncompress\n     4.75%  librocksdb.so                           [.] rocksdb::(anonymous namespace)::BytewiseComparatorImpl::Compare\n     4.19%  librocksdb.so                           [.] rocksdb::BlockIter::BinarySeek\n     2.32%  librocksdb.so                           [.] rocksdb::MemTable::KeyComparator::operator()\n     2.14%  librocksdb.so                           [.] rocksdb::BlockIter::ParseNextKey\n     2.12%  librocksdb.so                           [.] rocksdb::InternalKeyComparator::Compare<\/code><\/pre>\n<p>This is the &#8220;Self&#8221; value: 15% of the samples system-wide are in the snappy::RawUncompress C++ method from the libsnappy.so.1.3.0<\/p>\n<pre><code>\nsudo perf record -F99 -g --call-graph fp --delay=5 -v -p $(pgrep -d, yb-tserver) -a sleep 10\nsudo perf report --call-graph ,,,,callee --symbol-filter=RawUncompress\n<\/code><\/pre>\n<p>Here I record more precisely the processes I&#8217;m analyzing and filter the report on &#8220;RawUncompress&#8221;<\/p>\n<p><a href=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-48404\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\" alt=\"\" width=\"2106\" height=\"1424\" \/><\/a><\/p>\n<p>Here I have the full call stack from the start of the thread down to this snappy::RawUncompress callee function, knowing that this code path accounts for 21.35% of the processes I recorded. This is an example of quick profiling. It will not solve your issue, but will help you to know where you have to look at to reduce the CPU usage. In performance troubleshooting, when you feel like trying to find the needle in the haystack, you should start by finding the haystack and that&#8217;s where you need profiling. The event sampling approach is less intrusive than attaching a debugger. And samples are often sufficient to find the hotspot.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Franck Pachot . Linux kernel has a powerful instrumentation that can be accessed easily. When you want to drill down into your program functions to understand their CPU usage, &#8220;perf&#8221; is the easiest. It can attach to the processes, sample the CPU cycles, get the symbol name, or even the call stack. And display [&hellip;]<\/p>\n","protected":false},"author":28,"featured_media":15981,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[955],"tags":[73,2292,2293],"type_dbi":[],"class_list":["post-15980","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud","tag-linux","tag-perf","tag-perf-top"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Linux perf-top basics: understand the % - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Linux perf-top basics: understand the %\" \/>\n<meta property=\"og:description\" content=\"By Franck Pachot . Linux kernel has a powerful instrumentation that can be accessed easily. When you want to drill down into your program functions to understand their CPU usage, &#8220;perf&#8221; is the easiest. It can attach to the processes, sample the CPU cycles, get the symbol name, or even the call stack. And display [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-14T07:31:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2048\" \/>\n\t<meta property=\"og:image:height\" content=\"1385\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Open source Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Open source Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\"},\"author\":{\"name\":\"Open source Team\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b\"},\"headline\":\"Linux perf-top basics: understand the %\",\"datePublished\":\"2021-03-14T07:31:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\"},\"wordCount\":1400,\"commentCount\":1,\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\",\"keywords\":[\"Linux\",\"perf\",\"perf-top\"],\"articleSection\":[\"Cloud\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\",\"name\":\"Linux perf-top basics: understand the % - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\",\"datePublished\":\"2021-03-14T07:31:29+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\",\"contentUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg\",\"width\":2048,\"height\":1385},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Linux perf-top basics: understand the %\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b\",\"name\":\"Open source Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g\",\"caption\":\"Open source Team\"},\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/open-source-team\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Linux perf-top basics: understand the % - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/","og_locale":"en_US","og_type":"article","og_title":"Linux perf-top basics: understand the %","og_description":"By Franck Pachot . Linux kernel has a powerful instrumentation that can be accessed easily. When you want to drill down into your program functions to understand their CPU usage, &#8220;perf&#8221; is the easiest. It can attach to the processes, sample the CPU cycles, get the symbol name, or even the call stack. And display [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/","og_site_name":"dbi Blog","article_published_time":"2021-03-14T07:31:29+00:00","og_image":[{"width":2048,"height":1385,"url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg","type":"image\/jpeg"}],"author":"Open source Team","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Open source Team","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/"},"author":{"name":"Open source Team","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b"},"headline":"Linux perf-top basics: understand the %","datePublished":"2021-03-14T07:31:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/"},"wordCount":1400,"commentCount":1,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg","keywords":["Linux","perf","perf-top"],"articleSection":["Cloud"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/","url":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/","name":"Linux perf-top basics: understand the % - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg","datePublished":"2021-03-14T07:31:29+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Screenshot-2021-03-13-234231.jpg","width":2048,"height":1385},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/linux-perf-top-basics-understand-the\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Linux perf-top basics: understand the %"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/59554f0d99383431eb6ed427e338952b","name":"Open source Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/eb4fb12e386e8c41fdef0733e8114594cf2653e4f55e9fa2161442b8eaf3f657?s=96&d=mm&r=g","caption":"Open source Team"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/open-source-team\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/15980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=15980"}],"version-history":[{"count":0,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/15980\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/15981"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=15980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=15980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=15980"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=15980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}