Timing a Fortran Program
In many, of not most, cases, the time it takes to execute a program is not important. But there are exceptions, such as compute-intensive applications like weather prediction and simulation. Also, in some applications, response to a human must be timely, so even a few extra seconds can be significant.
Thus, in most cases, the time spent writing a program and especially debugging and modifying it are far more important than the time it takes to run it. This implies that programs should be written in a way that is clear, conforms well to the algorithms being implemented, and is easy to maintain.
To optimize a program, the first task is to figure out where the execution time is being spent; it is wasteful to tinker with parts of a program that do not contribute significantly to the execution time. In order to experiment with something, we consider the following program, some of which was extracted from a real program, but does nothing useful by itself. This program in the file
loops.f90 is included in the
Examples folder of the Fortran Tools distribution and this topic is discussed more fully in the Fortran Tools Manual.
module data implicit none integer, parameter :: N_ROWS = 150 integer, parameter :: N_COLS = 250 integer, parameter :: ABCD_SIZE = 1000 integer, dimension(:,:), allocatable :: MAT real, dimension(:,:), allocatable :: A, B, C, D integer, dimension(:), allocatable :: temp end module data module setup use random use data implicit none private public :: setup_data contains subroutine setup_data() integer :: alloc_stat allocate(MAT(N_ROWS, N_COLS), & stat=alloc_stat) if (alloc_stat > 0) then print *, "Allocation of MAT failed" stop end if allocate(A(ABCD_SIZE, ABCD_SIZE), & B(ABCD_SIZE, ABCD_SIZE), & C(ABCD_SIZE, ABCD_SIZE), & D(ABCD_SIZE, ABCD_SIZE), & stat=alloc_stat) if (alloc_stat > 0) then print *, & "Allocation of A, B, C, or D failed" stop end if call random_int(MAT, 0, ABCD_SIZE) call random_number(B) call random_number(D) call mix_mat() end subroutine setup_data ! Rearrange MAT just for something to do subroutine mix_mat() integer, parameter :: mixes = 1000000 integer :: i, ir, jr do i = 1, mixes call random_int(ir, 1, N_ROWS) call random_int(jr, 1, N_ROWS) temp = mat(ir, :) mat(ir, :) = mat(jr, :) mat(jr, :) = temp end do end subroutine mix_mat end module setup module compute use data implicit none private public :: do_loops contains subroutine do_loops() integer :: I, J, K, L, II, JJ A = 0 C = 0 DO 379 I=1,N_ROWS DO 379 J=1,N_ROWS DO 378 K=1,N_COLS DO 378 L=1,N_COLS II=MAT(I,K) JJ=MAT(J,L) IF(II .EQ. 0 .OR. JJ .EQ. 0)GO TO 378 A(II,JJ)=A(II,JJ)+B(I,J) C(II,JJ)=C(II,JJ)+D(I,J) 378 CONTINUE 379 CONTINUE end subroutine do_loops end module compute program loops use setup use compute use data, only: A, C implicit none real :: start_time, stop_time call cpu_time(start_time) call setup_data() call cpu_time(stop_time) print *, "Setup time:", & stop_time - start_time, "seconds" call cpu_time(start_time) call do_loops() call cpu_time(stop_time) print *, "Original loop time:", & stop_time - start_time, "seconds" print *, sum(A), sum(C) end program loops
This program uses a generic subroutine named
random_int in the module
random, which is also contained in the file
loops.f90. The subroutine uses the intrinsic subroutine
random_number to fill the first argument with pseudo-random integers between the second and third arguments.
Timing a Program
In order to time portions of the program, the intrinsic subroutine
cpu_time may be used. It returns the time in seconds since the beginning of execution of the program. Thus, to time a portion of a program, record the cpu time just before and just after the portion of the program to be timed. Then the execution time of the portion of the program is the difference between the two times.
Note: If a program is run on multiple processors using, for example, coarrays, OpenMP, or MPI,
cpu_time returns the total time for all of the processors. For this situation, the intrinsic subroutine
system_clock is better, as it returns elapsed time, which is probably a better indicator of the performance of the program.
system_clock is discussed in a future blog about OpenMP.
This scheme is used to time the portion of the program used to set up some values in the arrays and also to time the execution of the quadruple loops.
The result of running this program is as follows.
Setup time: 3.86882401 seconds Original loop time: 88.6865692 seconds 699112512. 702719552.
The program prints
sum(C) because some compilers could optimize away the entire quadruple loops if the results computed in the loops were never used.
These results indicate that most of the time is spent executing the part of the program with the loops, so we may concentrate on improving that part of the program. In the next blog, we will use profiling to refine our knowledge of where the compute time is being spent. But first, we can improve the execution time of the code simply by specifying optimization during compilation.
Suppose the 88-second execution time is not acceptable. Probably trying some different compiler options is the simplest way to improve the running time. The main gfortran compiler option is
-O3 also are available.
To compile the program with optimization, use the command
gfortran -O3 loops.f90
Running the program produced by this compilation produces
Setup time: 3.57242203 seconds Original loop time: 30.5449944 seconds 699112512. 702719552.
This is quite a bit better, but there is still room for improvement.
This optimization also may be accomplished by setting the appropriate compiler option in Code::Blocks.