Homework 1 Solutions

Exercise 1.1. Give the output that a connectivity algorithm should produce when given the input 0-2, 1-4, 2-5, 3-6, 0-4, 6-0, and 1-3. The output should be the edges 0-2, 1-4, 2-5, 3-6, 0-4, and 6-0. After those six connections, all of the nodes are connected to each other, so any further input should cause no output. In particular, the connection 1-3 is redundant because there was already the path 1-4-0-6-3.
Exercise 1.3. Describe a simple method for counting the number of sets remaining after using the union and find operations to solve the connectivity problem as described in the text. If there are N nodes, then there will initially be N sets. After each union operation, two sets will be joined into one, thus reducing the total number of sets by one. Therefore, a simple method of counting the sets is to count the number of unions, U; the number of sets will be N - U. As a corollary to this, after N - 1 union operations there will only be one set remaining, so all further find operations should show that every node belongs to the same set.
Exercise 1.4. Show the contents of the id array after each union operation when you use the quick-find algorithm (Program 1.1) to solve the connectivity problem for the sequence 0-2, 1-4, 2-5, 3-6, 0-4, 6-0, and 1-3. Also give the number of times the program accesses the id array for each input pair.
$\begin{tabular}{cc\vert ccccccc\vert c} \multicolumn{9}{c}{} & accesses\\ \te... ...box{Light}{4} & $N+6$\\ 1 & 3 & 4 & 4 & 4 & 4 & 4 & 4 & 4 & $2$ \end{tabular}$

The total number of accesses to id[] for the seven pairs is 6N + 34, where N is the size of the object array (which is set at 10,000 in Program 1.1, but which can be as small as 7 for this list of pairs). This does not include the N accesses required to initialize the array. Each pair processed requires 2 accesses for the find, plus N accesses if a union is required, and an additional 2 accesses for each entry whose identity needs to be changed (although a good compiler would eliminate one of these, since the value of id[q] doesn't need to be looked up each time).
Exercise 1.5. Do Exercise 1.4, but use the quick-union algorithm (Program 1.2).
$\begin{tabular}{cc\vert ccccccc\vert c} \multicolumn{9}{c}{} & accesses\\ \te... ...orbox{Light}{4} & $9$\\ 1 & 3 & 2 & 4 & 5 & 6 & 4 & 4 & 4 & $8$ \end{tabular}$

The total number of accesses to id[] for the seven pairs is 36. Again, this does not include the N accesses required to initialize the array. Each pair processed requires 1 access for each node visited in the find (including p and q, and all of the nodes at higher levels in their trees). If there is a union, it requires only 1 more access.
Exercise 1.7. Do Exercise 1.4, but use the weighted quick-union algorithm (Program 1.3).
$\begin{tabular}{cc\vert ccccccc\vert c} & & \texttt{id[0]}/ & \texttt{id[1]}/ &... ...1 & $5$\\ 1 & 3 & 0/7 & 0/2 & 0/1 & 0/2 & 1/1 & 0/1 & 3/1 & $6$ \end{tabular}$

The total number of accesses to id[] for the seven pairs is 30. Again, this does not include the N accesses required to initialize the array. The number of accesses required for finds and unions are the same as for the quick-union algorithm in the previous exercise, but the numbers are different because different choices are made (according to the weights in the sz array).
Exercise 1.8. Do Exercise 1.4, but use the weighted quick-union algorithm with path compression by halving (Program 1.4).
$\begin{tabular}{cc\vert ccccccc\vert c} & & \texttt{id[0]}/ & \texttt{id[1]}/ &... ... & $8$\\ 1 & 3 & 0/7 & 0/2 & 0/1 & 0/2 & 1/1 & 0/1 & 3/1 & $12$ \end{tabular}$

The total number of accesses to id[] for the seven pairs is 45. Again, this does not include the N accesses required to initialize the array. The additional code for path compression causes the find to perform 3 additional array accesses for each non-root node visited. On a larger amount of data, this should be offset by the compressed path lengths; this effect does not show up on the data in the exercise. In fact, no path compression occurs at all (as can be seen by comparing the above table with the previous exercise). If another find were done on node 4, for example, then id[4] would be changed to point directly to the root, 0, thereby improving the speed of future finds; the same would happen with a find on node 6.
Exercise 1.16. Show how to modify Program 1.3 to implement full path compression, where we complete each union operation by making every node that we touch point to the root of the new tree. Here is my solution, which does not do exactly what the problem states, but which is as close as I could get in a reasonably clean program (and which matches full path compression as described in other texts). What is missing is that it does the compression entirely within the find step, so each touched node only points to the root of its old tree. The alternative would be to add more code to the union operation, so that the root used for the nodes touched in the smaller tree is the root of the new, combined tree. This would be complicated, since there would also have to be compression code in the case that the union is not needed. My code works by making a second pass from the starting node up to the root, after doing the initial find; on this pass, the id value of each node on the path is set directly to point to the root found in the first pass. Replace the for loops in the find part of Program 1.3 by the following code:
```
for (i = p; i != id[i]; i = id[i]) ;
int pRoot = i;
i = p;
while (i != pRoot) {
    int temp = id[i];  // Save the link to the next node
    id[i] = pRoot;     // Update the id entry
    i = temp;          // Move to the next node
}
for (j = q; j != id[j]; j = id[j]) ;
int qRoot = j;
j = q;
while (j != qRoot) {
    int temp = id[j];  // Save the link to the next node
    id[j] = qRoot;     // Update the id entry
    j = temp;          // Move to the next node
}
```
Exercise 1.17. Answer Exercise 1.4, but using the weighted quick-union algorithm with full path compression (Exercise 1.16).
$\begin{tabular}{cc\vert ccccccc\vert c} & & \texttt{id[0]}/ & \texttt{id[1]}/ &... ... & $7$\\ 1 & 3 & 0/7 & 0/2 & 0/1 & 0/2 & 1/1 & 0/1 & 3/1 & $11$ \end{tabular}$

The total number of accesses to id[] for the seven pairs is 41. Once again, this does not include the N accesses required to initialize the array. The table is identical to the one for Exercise 1.8, except for the number of array accesses. My version of full path compression adds two extra accesses for each non-root node touched while doing the find, as compared to the weighted quick-union algorithm with no compression. Again, the data set is too small to see any compression taking place.

Exercise 2.2. How long does it take to count to 1 billion (ignoring overflow)? Determine the amount of time it takes the program

int i, j, k, count = 0;
for (i = 0; i < N; i++)
  for (j = 0; j < N; j++)
    for (k = 0; k < N; k++)
      count++;

to complete in your programming environment, for N = 10, 100, and 1000. If your compiler has optimization features that are supposed to make programs more efficient, check whether or not they do so for this program. Here is my program, which is stored in /home/libs/dataStr/Ex2_2/main.cc:

// Sedgewick, Exercise 2.2
// Programmer:  Brian Howard
// Date:  September 1, 2002


#include <iostream>
#include <ctime>
using namespace std;


// Here is a general-purpose timing function.
//
// It takes a pointer to a function which will be run repeatedly
// in order to obtain an accurate estimate of its running time; by
// running for at least 1000 clock ticks or 10 seconds (whichever is
// longer), we should get three digits of accuracy.  Choosing 10 seconds
// means that even timing a function that only takes 10 clock cycles,
// on a 4 GHz processor, we will not be able to overflow a 32-bit count.
//
// It also takes a pointer to another function which embodies all of
// the computation we _don't_ want to time; this will include the function
// call and timing loop overhead.
//
// Returns the running time in seconds.
double time_function(void (*f)(), void (*g)()) {
  // First, we'll figure out how many clock ticks to run;
  // use the longer of 1000 ticks and 10 seconds
  clock_t num_ticks = 10 * CLOCKS_PER_SEC;
  if (num_ticks < 1000) num_ticks = 1000;


  // We count how many times the function can be called in at least
  // num_ticks clock ticks (guaranteed to run at least once)
  unsigned long count = 0, n;
  clock_t end_time = clock() + num_ticks;
  do {
    f();
    ++count;
  } while (clock() < end_time);


  // Run it again count times and time it, without all the extra calls to clock()
  clock_t start = clock();
  for (n = 0; n < count; ++n) f();
  clock_t ticks = clock() - start;


  // Now run the overhead function the same number of times and time it
  clock_t start2 = clock();
  for (n = 0; n < count; ++n) g();
  clock_t ticks2 = clock() - start2;


  // Compute the number of seconds each call took (less overhead) and return
  return static_cast<double>(ticks - ticks2) / CLOCKS_PER_SEC / count;
}


// This is the function we want to test for the exercise, for N = 10
void test10() {
  int i, j, k, count = 0;
  for (i = 0; i < 10; i++)
    for (j = 0; j < 10; j++)
      for (k = 0; k < 10; k++)
        count++;
}


// This is the function we want to test for the exercise, for N = 100
void test100() {
  int i, j, k, count = 0;
  for (i = 0; i < 100; i++)
    for (j = 0; j < 100; j++)
      for (k = 0; k < 100; k++)
        count++;
}


// This is the function we want to test for the exercise, for N = 1000
void test1000() {
  int i, j, k, count = 0;
  for (i = 0; i < 1000; i++)
    for (j = 0; j < 1000; j++)
      for (k = 0; k < 1000; k++)
        count++;
}


// Here is a dummy version so we can subtract the loop and function call overhead
void dummy() {
  int i, j, k, count = 0;
}


// Here is the driver
int main() {
  double test_time10, test_time100, test_time1000;


  test_time10 = time_function(test10, dummy);
  cout << "Time taken for N = 10 is " << test_time10 << endl;


  test_time100 = time_function(test100, dummy);
  cout << "Time taken for N = 100 is " << test_time100 << endl;


  test_time1000 = time_function(test1000, dummy);
  cout << "Time taken for N = 1000 is " << test_time1000 << endl;


  return 0;
}

Running this for the given values of N on Jupiter (which has a 1.4 GHz Pentium 4 processor--do cat /proc/cpuinfo for details) produces the following output:

Time taken for N = 10 is 5.45081e-06
Time taken for N = 100 is 0.00455509
Time taken for N = 1000 is 4.39

Turning on optimization with the -O2 flag to g++ produces the following:

Time taken for N = 10 is 1.80492e-06
Time taken for N = 100 is 0.0012358
Time taken for N = 1000 is 1.092

Using Borland 5.02 on the PC in my office, which has a 2 GHz Pentium 4 running Windows XP, produces the following (after correcting for the fact that Borland 5.02 prefers old-style headers, such as time.h):

Time taken for N = 10 is 1.25869e-06
Time taken for N = 100 is 0.000900523
Time taken for N = 1000 is 0.696867

Essentially the same numbers were obtained after turning on optimization for speed, as well as after disabling all optimizations; one possibility is that I wasn't changing the optimization mode correctly (since I'm not very familiar with the Borland compiler). Visual C++ 6.0 on the PC in my office produces the following:

Time taken for N = 10 is 2.87745e-006
Time taken for N = 100 is 0.00247712
Time taken for N = 1000 is 2.2624

This is running in Debug mode, with no optimizations at all. Three years ago I ran essentially this same program under Visual C++ 6.0 on a 350 MHz Pentium 2 running Windows 95, which produced the following output:

Time taken for N = 10 is 2.23923e-005
Time taken for N = 100 is 0.0158786
Time taken for N = 1000 is 15.33

Switching from Debug mode (optimization turned off) to Release mode (optimization for fastest speed) in Visual C++ produced the following:

Time taken for N = 10 is 0
Time taken for N = 100 is 0
Time taken for N = 1000 is 0

These indicate that the for loops took essentially no time at all (beyond overhead) for any value of N. My guess is that the optimization figured out that the loops were not doing anything, so it eliminated them. Note that in each case, the time taken to count to one billion N = 1000) was roughly 1000 times the time to count to one million (N = 100), which was in turn roughly 1000 times the time to count to one thousand (N = 10). This is what we expect from an algorithm which takes time proportional to N, even though the constants may differ widely.

Exercise 2.28. You are given the information that the running time of one algorithm is O(N log N) and that the running time of another algorithm is O(N³). What does this statement imply about the relative performance of the algorithms? Nothing. This is analogous to saying, ``Fred is less than 30 years old and Herb is less than 40 years old; which one is younger?'' Fred could be 29 and Herb 2, or the other way around, or they could be the same age. Just knowing upper bounds is not enough to make comparisons.
Exercise 2.29. You are given the information that the running time of one algorithm is always about N log N and that the running time of another algorithm is O(N³). What does this statement imply about the relative performance of the algorithms? Again, this is not enough information. If the running time of the second algorithm is always about N (that is, it is linear), then it will be faster than the first algorithm for large enough N. On the other hand, if the running time of the second algorithm is always about N², then it will be slower than the first algorithm for large enough N. In both cases it is true that the running time of the second algorithm is O(N³); this statement makes no claim that N³ is a tight upper bound on the running time.
Exercise 2.30. You are given the information that the running time of one algorithm is always about N log N and that the running time of another algorithm is always about N³. What does this statement imply about the relative performance of the algorithms? For large enough values of N, the first algorithm will run much faster than the second, because N log N grows only slightly more than linearly in N. The qualification is needed because the definition of a function being ``about f (N)'' includes terms which may only become insignificant relative to f (N) when N gets large.
Exercise 2.31. You are given the information that the running time of one algorithm is always proportional to N log N and that the running time of another algorithm is always proportional to N³. What does this statement imply about the relative performance of the algorithms? This is no different from the previous exercise, because for large enough N, the function N³ is larger than any fixed constant times N log N. There is a little more slack in that the constant of proportionality might be huge for the first algorithm and tiny for the second, so that it would take a very large value of N for the second to overtake the first (ignoring the effects of any additive terms which may also take a long time to become insignificant), but eventually the N log N algorithm will be faster than the N³ algorithm.
Exercise 2.51. You are given the information that the time complexity of one problem is N log N and that the time complexity of another problem is N³. What does this statement imply about the relative performance of specific algorithms that solve the problems? Saying that the time complexity of a problem is f (N) means that the worst-case running time of the best algorithm for the problem is proportional to f (N). While the statement does not provide enough information to compare arbitrary algorithms for the two problems (essentially, the time complexity gives us a lower bound, so we are dealing with a question like ``Fred is more than 30 years old and Herb is more than 40 years old; which is older?''), it does allow us to compare the performance of the best algorithms--the first problem has an algorithm whose worst-case behavior is much better than that of any algorithm for the second problem, for large enough values of N. This is only a statement about worst-case behavior, however; it could well be the case that the average behavior of an algorithm for the second problem is actually better than the average behavior of even the best algorithm for the first problem (this could probably be true only if the worst-case input were very rare, but that is not an unusual situation for worst-cases).