...making Linux just a little more fun!
Linux ( and other Unices ) have lots of nifty small utilities which can be combined together to do interesting things. There is a certain joy in creating these software or using them to tweak your programs. In this series we shall look at some such tools which are useful for a programmer. This tools will help you to code better and make your life easy.
After we have designed and coded a software comes the stage of optimizing the program. Before we talk about profiling and optimization in general I would like to draw your attention to two quotes regarding optimization.
There are mainly 2 types of profiling information we can get :-
The source code has to be compiled with the -pg option ( also with -g if you want line-by-line profiling ). If the number of lines in the Make file is small you can append these options to each compilation command. However if the number of compilation commands is large then you can define/redefine the CFLAGS/CXXFLAGS parameter in the makefile and add this to every compilation command in the makefile. I will demonstrate the use of gprof using the gnu make utility.
Unpack the gzipped tarball $ tar zxf make-3.80.tar.gz $ cd make-3.80 Run the configure script to create the makefiles $ ./configure [configure output snipped]Edit the CFLAGS parameter in the makefile generated to remove optimization flags and add -pg to CFLAGS. GCC optimization flags are removed as compiler optimization can sometimes cause problems while profiling. Especially if you are doing line-by-line profiling, certain lines may be removed while optimizing source code.
Build the source code $ make [build output snipped]We can use this make to build other software such as Apache, lynx and cvs. We build apache using this make as an example. When we untar, configure and run make on the source of Apache , a file called gmon.out containing profiling information is generated. You may observe that make may run slower than expected as it is logging the profile data. An important thing to be remembered while collecting profile data is that we have to run the program giving it the inputs we give it normally and then exiting when it is all done. This way you would have simulated a real-world scenario to collect data.
In the last step we have got a binary output file called "gmon.out". Unfortunately there is no way currently to specify the name for the profiling data file. This "gmon.out" file can be interpreted by gprof to generate human readable output. The syntax for the same is :
gprof options [Executable file [profile data files ... ] ] [ > human-readable-output-file] $ gprof make gmon.out > profile-make-with-Apache.txtyou can find the whole file here
A section of the flat profile is shown below -
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 33.33 0.01 0.01 207 0.05 0.05 file_hash_2 33.33 0.02 0.01 38 0.26 0.26 new_pattern_rule 33.33 0.03 0.01 6 1.67 2.81 pattern_search 0.00 0.03 0.00 2881 0.00 0.00 hash_find_slot 0.00 0.03 0.00 2529 0.00 0.00 xmalloc 0.00 0.03 0.00 1327 0.00 0.00 hash_find_item 0.00 0.03 0.00 1015 0.00 0.00 directory_hash_cmp 0.00 0.03 0.00 963 0.00 0.00 find_char_unquote 0.00 0.03 0.00 881 0.00 0.00 file_hash_1 0.00 0.03 0.00 870 0.00 0.00 variable_buffer_outputFrom the above data we can draw the following conclusions :
This is however insufficient data for gathering information. So this specially compiled make was used for building lynx, cvs, make and patch. All the renamed gmon.out files were gathered and profiling data was compiled using the following commands.
$ gprof make gmon-*.out > overall-profile.txtThis file can be found here
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 18.18 0.06 0.06 23480 0.00 0.00 find_char_unquote 12.12 0.10 0.04 120 0.33 0.73 pattern_search 9.09 0.13 0.03 5120 0.01 0.01 collapse_continuations 9.09 0.16 0.03 148 0.20 0.88 update_file_1 9.09 0.19 0.03 37 0.81 4.76 eval 6.06 0.21 0.02 12484 0.00 0.00 file_hash_1 6.06 0.23 0.02 6596 0.00 0.00 get_next_mword 3.03 0.24 0.01 29981 0.00 0.00 hash_find_slot 3.03 0.25 0.01 14769 0.00 0.00 next_token 3.03 0.26 0.01 5800 0.00 0.00 variable_expand_stringAs we can see, the picture has changed a bit from the make profile we got from compiling apache.
Let us now have a look at a snippet of the call graph profile from compiling Apache.
index % time self children called name ----------------------------------------------- 6 eval_makefileWe can make the following observations from the snippet above :[49] [25] 3.7 0.00 0.00 6 eval [25] 0.00 0.00 219/219 try_variable_definition [28] 0.00 0.00 48/48 record_files [40] 0.00 0.00 122/314 variable_expand_string [59] 0.00 0.00 5/314 allocated_variable_expand_for_file [85] 0.00 0.00 490/490 readline [76] 0.00 0.00 403/403 collapse_continuations [79] 0.00 0.00 355/355 remove_comments [80] 0.00 0.00 321/963 find_char_unquote [66] 0.00 0.00 170/170 get_next_mword [88] 0.00 0.00 101/111 parse_file_seq [93] 0.00 0.00 101/111 multi_glob [92] 0.00 0.00 48/767 next_token [70] 0.00 0.00 19/870 variable_buffer_output [68] 0.00 0.00 13/2529 xmalloc [64] 0.00 0.00 2/25 xrealloc [99] 5 eval_makefile [49] -----------------------------------------------
Using gprof you can also get annotated source list and line-by-line profiling. These might be useful once you have identified the the sections of code that need to be optimized. These options will help you drill down in the source code to find inefficiencies. Line-by-line profiling along with flat profile can be used to check which are the code paths which are frequently traversed. The annotated source listing can be used to drill down within function calls themselves up to the basic block (loops and branching statements), to find out which loops are executed most and which branches are taken most frequently . This is useful in fine tuning the code for optimum performance. There are some other options which are not covered here. Refer to the info documentation of gprof for more details. There is a KDE front end which is available for gprof called kprof. See the reference section for the URL.
Profiling tools such as gprof can be a big help in optimizing programs. Profiling is one of the first steps for manual optimization of programs to know where the bottlenecks and remove them.
Vinayak Hegde is currently working for Akamai Technologies Inc. He
first stumbled upon Linux in 1997 and has never looked back since. He
is interested in large-scale computer networks, distributed computing
systems and programming languages. In his non-existent free time he
likes trekking, listening to music and reading books. He also
maintains an intermittently updated blog.