Exercise 2.2. How long does it take to count to 1 billion (ignoring
overflow)? Determine the amount of time it takes the program
int i, j, k, count = 0;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
count++;
to complete in your programming environment, for N = 10, 100, and
1000. If your compiler has optimization features that are supposed to
make programs more efficient, check whether or not they do so for this
program.
Here is my program, which is stored in /home/libs/dataStr/Ex2_2/main.cc
:
// Sedgewick, Exercise 2.2
// Programmer: Brian Howard
// Date: September 1, 2002
#include <iostream>
#include <ctime>
using namespace std;
// Here is a general-purpose timing function.
//
// It takes a pointer to a function which will be run repeatedly
// in order to obtain an accurate estimate of its running time; by
// running for at least 1000 clock ticks or 10 seconds (whichever is
// longer), we should get three digits of accuracy. Choosing 10 seconds
// means that even timing a function that only takes 10 clock cycles,
// on a 4 GHz processor, we will not be able to overflow a 32-bit count.
//
// It also takes a pointer to another function which embodies all of
// the computation we _don't_ want to time; this will include the function
// call and timing loop overhead.
//
// Returns the running time in seconds.
double time_function(void (*f)(), void (*g)()) {
// First, we'll figure out how many clock ticks to run;
// use the longer of 1000 ticks and 10 seconds
clock_t num_ticks = 10 * CLOCKS_PER_SEC;
if (num_ticks < 1000) num_ticks = 1000;
// We count how many times the function can be called in at least
// num_ticks clock ticks (guaranteed to run at least once)
unsigned long count = 0, n;
clock_t end_time = clock() + num_ticks;
do {
f();
++count;
} while (clock() < end_time);
// Run it again count times and time it, without all the extra calls to clock()
clock_t start = clock();
for (n = 0; n < count; ++n) f();
clock_t ticks = clock() - start;
// Now run the overhead function the same number of times and time it
clock_t start2 = clock();
for (n = 0; n < count; ++n) g();
clock_t ticks2 = clock() - start2;
// Compute the number of seconds each call took (less overhead) and return
return static_cast<double>(ticks - ticks2) / CLOCKS_PER_SEC / count;
}
// This is the function we want to test for the exercise, for N = 10
void test10() {
int i, j, k, count = 0;
for (i = 0; i < 10; i++)
for (j = 0; j < 10; j++)
for (k = 0; k < 10; k++)
count++;
}
// This is the function we want to test for the exercise, for N = 100
void test100() {
int i, j, k, count = 0;
for (i = 0; i < 100; i++)
for (j = 0; j < 100; j++)
for (k = 0; k < 100; k++)
count++;
}
// This is the function we want to test for the exercise, for N = 1000
void test1000() {
int i, j, k, count = 0;
for (i = 0; i < 1000; i++)
for (j = 0; j < 1000; j++)
for (k = 0; k < 1000; k++)
count++;
}
// Here is a dummy version so we can subtract the loop and function call overhead
void dummy() {
int i, j, k, count = 0;
}
// Here is the driver
int main() {
double test_time10, test_time100, test_time1000;
test_time10 = time_function(test10, dummy);
cout << "Time taken for N = 10 is " << test_time10 << endl;
test_time100 = time_function(test100, dummy);
cout << "Time taken for N = 100 is " << test_time100 << endl;
test_time1000 = time_function(test1000, dummy);
cout << "Time taken for N = 1000 is " << test_time1000 << endl;
return 0;
}
Running this for the given values of N on Jupiter (which has a 1.4 GHz
Pentium 4 processor--do cat /proc/cpuinfo
for details) produces
the following output:
Time taken for N = 10 is 5.45081e-06
Time taken for N = 100 is 0.00455509
Time taken for N = 1000 is 4.39
Turning on optimization with the -O2
flag to g++
produces
the following:
Time taken for N = 10 is 1.80492e-06
Time taken for N = 100 is 0.0012358
Time taken for N = 1000 is 1.092
Using Borland 5.02 on the PC in my office, which has a 2 GHz
Pentium 4 running Windows XP, produces the following (after correcting
for the fact that Borland 5.02 prefers old-style headers, such as
time.h
):
Time taken for N = 10 is 1.25869e-06
Time taken for N = 100 is 0.000900523
Time taken for N = 1000 is 0.696867
Essentially the same numbers were obtained after turning on optimization
for speed, as well as after disabling all optimizations; one possibility
is that I wasn't changing the optimization mode correctly (since I'm not
very familiar with the Borland compiler).
Visual C++ 6.0 on the PC in my office produces the following:
Time taken for N = 10 is 2.87745e-006
Time taken for N = 100 is 0.00247712
Time taken for N = 1000 is 2.2624
This is running in Debug mode, with no optimizations at all.
Three years ago I ran essentially this same program under Visual C++ 6.0 on
a 350 MHz Pentium 2 running Windows 95, which produced the following output:
Time taken for N = 10 is 2.23923e-005
Time taken for N = 100 is 0.0158786
Time taken for N = 1000 is 15.33
Switching from Debug mode (optimization turned off) to Release mode
(optimization for fastest speed) in Visual C++ produced the following:
Time taken for N = 10 is 0
Time taken for N = 100 is 0
Time taken for N = 1000 is 0
These indicate that the for
loops took essentially no time at all
(beyond overhead) for any value of N. My guess is that the
optimization figured out that the loops were not doing anything, so it
eliminated them.
Note that in each case, the time taken to count to one billion N = 1000) was
roughly 1000 times the time to count to one million (N = 100), which was
in turn roughly 1000 times the time to count to one thousand (N = 10).
This is what we expect from an algorithm which takes time proportional
to N, even though the constants may differ widely.