Tuesday, 27 November 2012

A Look at Recursion

Copyright 2012 by Shawn H Corey. Some rights reserved.
Licence under CC BY-SA 3.0

At first, recursion seems to be magic. Break the problem into smaller tasks, call the same subroutine to process those sub-tasks, and voilĂ , problem solved. In truth, it's not quite so simple but all recursion has four steps. Master those steps and you can do recursion with ease.

The Four Steps of Recursion

  1. The first step in any recursion subroutine is to handle the degenerate cases. These are the cases of the simplest problems. If the input set has zero or one element, figuring out what the subroutine has to do is often straight forward. The tricky part, if there is one, happens when a complex data structure must be returned. At this stage, what the data structure is may not be clear and its implementation must wait until later, after the rest of the subroutine is written.

  2. The second step is to break the problem set down into smaller subsets. The simplest way to do this is to reduce the problem by one. Just remove and save one element from the given set.

    Another technique is to divide the given set into two (or more) equal or nearly equal subsets. There are several things to watch out for. The first is to ensure that every element gets placed in one subset. If not, the element will be dropped and the results inaccurate. The second is to ensure that each element gets placed in only one subset. If not, it is duplicated and, again, the results will be inaccurate. The third is to ensure no subset gets all the elements. This often happens when there are only two elements, both get placed in one subset and none in the other; thus creating an infinite loop. The fourth is to ensure that all subsets have at least one element. If not, the subroutine is wasting time and if the results for an empty set is not null, the results will be inaccurate. The fifth is to ensure that no extra null elements are added. This will waste time and often produces inaccurate results.

  3. The third step is to call the same subroutine for each of the subsets and store the partial results.

  4. The final step is to merge the saved element with the partial results or to merge all of the partial results.

Number Sorting

To demonstrate recursion, I have chosen a simple number sort. Here it is with a discussion below.

    #!/usr/bin/env perl

    use strict;
    use warnings;

    # --------------------------------------
    #       Name: first_ge
    #      Usage: $index = first_ge( $n, @numbers );
    #    Purpose: Return the index of the first number in the list greater then or
    #             equal to the given number.
    # Parameters:       $n -- given number
    #             @numbers -- ascending sorted list of numbers
    #    Returns:   $index -- first index or just beyond end of list
    #
    sub first_ge {
      my $n       = shift @_;
      my @numbers = @_;

      # set to just beyond end of list
      my $index = @numbers;

      # scan the list
      for my $i ( 0 .. $#numbers ){

        # As $i increases, exit at the first number greater than or equal to $n
        if( $numbers[$i] >= $n ){
          $index = $i;
          last;
        }

      } # end scan

      return $index;
    }

    # --------------------------------------
    #       Name: nsort
    #      Usage: @numbers = nsort( @numbers );
    #    Purpose: Sort a list of numbers using recursion
    # Parameters: @numbers -- unsorted
    #    Returns: @numbers -- ascending sorted
    #
    sub nsort {
      my @numbers = @_;

      # step 1: check degenerate case(s)
      if( @numbers <= 1 ){
        return @numbers;
      }

      # step 2: reduce the problem by one
      my $n = shift @numbers;

      # step 3: use recursion to get the partial results
      @numbers = nsort( @numbers );

      # step 4: combine partial results to get full results
      my $index = first_ge( $n, @numbers );
      splice( @numbers, $index, 0, $n );

      return @numbers;
    }

    # --------------------------------------
    # Main

    # create a list
    my @numbers = ();

    # fill list with random numbers
    for my $count ( 1 .. ( $ARGV[0] || 2 )){
      push @numbers, int( rand( 100 ));
    }

    # before and after images
    print "before: @numbers\n";
    @numbers = nsort( @numbers );
    print "after:  @numbers\n";

first_ge

This subroutine is used to determine where to insert the number in a sorted array. It is used in the 4th step of the recursion. It is a simple linear search. It returns the index of the number that is greater than or equal to the given number. Or it returns one more than the last index of the array if all number in the array are less than the given number. This allows the index to be used directly in a splice to insert the given number into the correct place in the array.

nsort

This subroutine is a numeric, recursive sort. It follows the four steps of recursion.

  1. If the size of the array is one (or zero), simply return it. Obviously, an array with one element is sorted.

  2. Remove and save a number from the array.

  3. Sort the remaining numbers in the array by recursively calling itself.

  4. Determine where the saved number is to be inserted in the sorted array by calling first_ge. Then splice it in.

Main

Finally, the main part of the script creates a random list of numbers for demonstration.

Discussion

This algorithm is not the faster. The first_ge subroutine uses a linear search, so its execution time is O(n). The nsort subroutine is also O(n) since it removes only one element at a time. This means the total execution time is O(n²).

If the linear search in first_ge is replaced with a binary search, O(log₂(n)), then the execution time is O(n·log₂(n)). It is left as an exercise to the reader to do so.

Another Number Sort

    #!/usr/bin/env perl

    use strict;
    use warnings;

    # --------------------------------------
    #       Name: nsort
    #      Usage: @numbers = nsort( @numbers );
    #    Purpose: Sort a list of numbers using recursion
    # Parameters: @numbers -- unsorted
    #    Returns: @numbers -- ascending sorted
    #
    sub nsort {
      my @numbers = @_;

      # step 1: check degenerate case(s)
      if( @numbers <= 1 ){
        return @numbers;
      }

      # step 2: reduce the problem by split the given set into two
      my $midpt = int( @numbers / 2 );

      # step 3: use recursion to get the partial results
      my @partial1 = nsort( @numbers[ 0 .. $midpt-1 ] );
      my @partial2 = nsort( @numbers[ $midpt .. $#numbers ] );

      # step 4: combine partial results to get full results
      # use a merge sort to combine the partial results
      @numbers = ();
      while( @partial1 && @partial2 ){
        if( $partial1[0] <= $partial2[0] ){
          push @numbers, shift @partial1;
        }else{
          push @numbers, shift @partial2;
        }
      }

      # add any remaining
      # at least one of the lists is empty
      push @numbers, @partial1, @partial2;

      return @numbers;
    }

    # --------------------------------------
    # Main

    # create a list
    my @numbers = ();

    # fill list with random numbers
    for my $count ( 1 .. ( $ARGV[0] || 2 )){
      push @numbers, int( rand( 100 ));
    }

    # before and after images
    print "before: @numbers\n";
    @numbers = nsort( @numbers );
    print "after:  @numbers\n";

nsort

This script does not need a helper function, so nsort is the only subroutine.

  1. Again, the first step is to handle the degenerate cases. If the array has one (or zero) elements, just return it.

  2. The second step is to find the midpoint index for the array. Since the number of elements must be 2 or greater, the minimum midpoint must be 1. The midpoint is used to divide the array into two parts of equal size or of sizes that differ by 1.

  3. The third step is to call itself with each partial array and store the results. Care must be taken to ensure:

    1. There is at least one element in each sub-array.

    2. No element is partitioned into more than one sub-array.

    This is done by slicing the array using the midpoint index.

  4. Finally, the two partial results are combined using a merge sort.

    The merge sort works by comparing the two first elements of the partials arrays. It shift the smaller (or equal) of the two off its array and pushing it onto the end of the main array. It keeps this up until at least one of the partial arrays is empty.

    When the loop is done, it executes this command:

          push @numbers, @partial1, @partial2;
    
    1. If the second partial array is empty, the command pushed what's left in the first. Pushing an empty list onto an array does not change the array.

    2. If the first partial array is empty, the command pushed what's left in the second. Pushing an empty list onto an array does not change the array.

Main

Finally, the main part of the script creates a random list of numbers for demonstration.

Discussion

The number of times nsort is called is O(n). For example, if there are 8 elements in the array, it is called once form the main line. The array is divided into two of 4 elements and called twice. The arrays of 4 are divided into arrays of 2 and the subroutine is called 4 times. Finally, these arrays are divided again and subroutine called 8 times. Total: 1 + 2 + 4 + 8 = 15 or 2n - 1 or O(n).

Now, looking at the merge. The last time the subroutine is called is with arrays of 1, so no merge. Before that, merge created 4 arrays of 2 elements. Before that, it created 2 arrays of 4 elements. And finally, it created an array of 8 elements. Total time merge scanned the elements: 8 times per level and there were 3 levels.

This makes the total execution time for this algorithm as O(n·log₂(n)).

Conclusion

So, the magic recursion can be achieve following 4 steps.

  1. Process the degenerate case(s).

  2. Reduce the problem.

  3. Get the partial results

  4. Combine the partial results to get full results.

3 comments:

  1. after reading the announcement on blogs.perl.org I really expected this article here to say something like "I've posted an article on recursion on blog.perl.org" :-)

    domm

    ReplyDelete
  2. In your first example, step 4 could be replaced by:

    @numbers = map {
    (defined $n and $_ >= $n) ? do { my $tmp = $n; undef($n); ($tmp, $_) } : $_
    } @numbers;
    push @numbers, $n if defined $n;

    ... eliminating the need for first_ge.

    ReplyDelete
    Replies
    1. TIMTOWTDI—There Is More Than One Way To Do It ☺. I simply chose a more verbose method.

      Delete