How to Solve it by Computer #9

Algorithm 5.2: Given a randomly ordered set of n integers, sort them into non-descending order using the selection sort.

So implementing selection sort. Should be easy, let’s see.

(defn selection-sort
  "Does selection sort on a vector"
  [numvector]
  (when (seq numvector)
    (let [minimum (apply min numvector)
          new-vector (remove (set [minimum]) (set numvector))]
      (conj (selection-sort new-vector) minimum))))

Was easy enough. Not quite as nice as writing it in Haskell but okay. The idea is that we take the smallest element put it in front, then repeat this until our list is empty. This is a bit different from the ‘traditional’ implementation which swaps the minimum result with that at the current lowest unsorted index. But the idea is the same.

Algorithm 5.6: Sorting by Partitioning. Implement quicksort.

I love quicksort. It’s such an easy and efficient algorithm.

(defn qsort [[pivot & unsorted]]
  "A classical implementation of quicksort"
  (when pivot
    (let [smaller #(< % pivot)]
      (lazy-cat
        (qsort (filter smaller unsorted))
        [pivot]
        (qsort (remove smaller unsorted))))))

Look at that beautiful algorithm. The idea is that you take a pivot – often the first element and create a new list which consists of elements smaller as the pivot, the pivot, and elements larger than the pivot. And you just apply quicksort to these other lists. That’s it.

Algorithm 6.1 Given a set of lines of text of arbitrary length, reformat the text so that no lines of more than n characters are printed. In each output line the maximum number of words that occupy less than n characters, should be printed and no world should extend across two lines. Paragraphs should also remain indented.

Good old textwrap. Let’s take a look at an example.

This is a sentence which is quite long

Imagine that we want to have n = 10. In this case the output is

    This is a
    sentence 
    which is
    quite long

We don’t want to break words which is problematic in case the word is longer than n.

(defn wrap-word
  "Returns the sentence as strings with maxium length of n.
   Beware: Doesn't cut words, i.e. the largest word has to be
    smaller than n"
  [sentence n]
  (let [split-sentence (interleave (clojure.string/split sentence #" ") (repeat " "))]
    (loop [rest-sentence split-sentence
           result nil
           tmp-sentence nil
           current-length 0]
      (if (seq rest-sentence)
        (let [current-word (first rest-sentence)
              word-length (count current-word)]
          (if (<= (+ current-length word-length) n)  
            (recur (next rest-sentence) result (concat tmp-sentence current-word) (+ current-length word-length))
            (recur rest-sentence (conj result tmp-sentence) nil 0)))
        (conj result tmp-sentence)))))

I’m quite tired right now but a short explanation before going to bed. The software starts by splitting the current string into words and rebuilding the string with whitespaces. E.g. “ABC DEF” -> “ABC” ” ” “DEF”. Afterwards, it does two things:

a) In case the current-length of the temporary sentence is less than n -> append new word
b) In case the current-length is bigger, append the temporary sentence to the output and start with a new word

Here’s the example from above:

user=> (map #(apply str %) (wrap-word "This is a sentence which is quite long" 10))
(" " "quite long" "which is " "sentence " "This is a ")

You can see that it forms:

    This is a
    sentence 
    which is
    quite long

Exactly like above.

How to Solve it by Computer #8

Algorithm 4.7: Given a set of n distinct numbers, find the length of the longest monotone increasing subsequence.

Example: [2 9 4 7 3 11 8] -> [2 4 7 11] with length 4

Quite an interesting problem. I thought and tried different approaches. Here’s the problem I had or which I neglected. I thought the problem required all subsets. The idea of the algorithm in the book is to find the largest sequence given an end and then compare the length appending the next element.

However, I thought that it wouldn’t work in a case like this:

[2 3 55 4 5 56 57 58 59]

Start with index = 2:
[2 3]
-> largest subset: [2 3]

index = 3:
[2 3 55]
-> largest subset: [2 3 55]

index = 4:
[2 3 55 4]
-> largest subset: [2 3 55] OR
                   [2 3  4]

index = 5:
[2 3 55 4 5]
-> largest subset: [2 3 4 5]

I’m still wondering whether there’s a case in which we throw away a sequence which could work out as the longest one. Sadly, the author didn’t discuss the problem.

I see that there’s an inherent sub-structure. If we find the longest sequence in the last X numbers we can lengthen it by prepanding more stuff. Hm, thinking about it, that’s a good idea.

Let’s reverse the list:

[59 58 57 56 5 4 55 3 2]

Strangely, this does make more sense. Although, it’s probably equivalent. OK, new code.

We want a function which takes a list and returns a list with the largest subsets. More interesting however are the intermediate steps. We want it to give various things:

a) the current longest subsets
b) the next number

And it should return the longest subsets.

Let’s code. This took a while :D. So, we start with get-lengthy-subsets which returns a list of the largest subsets for each ‘level’.

(defn get-lengthy-subsets
  "Returns a set of vectors of the longest subsets"
  [collection]
  (loop [longest-subsets #{[]}
         coll collection]
    (if (seq coll)
      (recur
        (into longest-subsets (map #(get-increasing-coll % (first coll)) longest-subsets))
        (next coll))
      longest-subsets)))

Basically very straight forward. We start with a set with just one empty vector. Then check for each element in our set whether adding the next number is monotone bigger. We use the function get-increasing-coll for that. In case it is increasing, it returns the new vector with the number added. Otherwise it just returns the current variable.

(defn increasing?
  "Checks if a number is bigger than the last
   element of a collection"
  [coll number]
  (or (empty? coll) (> number (last coll))))


(defn get-increasing-coll
  "Returns a new collection if the number is bigger
   than the last element"
  [coll number]
  (if (increasing? coll number)
    (conj coll number)
    coll))

This function uses increasing? which checks for the increment – which also works if we start with an empty vector.

The last function which helps us to get longest subset is easy:

(defn get-longest-subset
  "Returns the longest monotone increasing subset"
  [collection]
  (last (sort-by
          count
          (get-lengthy-subsets collection))))

We just sort by the length and select the last one. Done. Finally :D

How to Solve it by Computer #7

Algorithm 3.6: Use the linear congruential method to generate a uniform set of pseudo-random numbers.

This is great and something I never did before – like most of the stuff here. The method were using is the linear congruential method which should create a uniform distribution of numbers.

The basic idea is to use the following formula to create the next pseudo-random number:

$x_{n+1} = (ax_n + b) mod n$ for n >= 0

However, there are specific criteria for each parameter.

Parameter $x_0$ : $0 <= x_0 < m$
Parameter m: m >= length sequence required, (ax+b) mod m is an integer
Parameter a: if m is a power of 2 then a mod 8 = 5; if m is a power of 10 then a mod 200 = 21
sqrt(m) < a < m – sqrt(m)
(a-1) should be a multiple of every prime dividing into m
if m is a multiple of 4 then (a-1) should be a multiple of 4
Parameter b: b should be odd, not a multiple of 5

These are a lot of requirements and the easiest thing to do is use existing values. Wikipedia provides common parameters.

The ones provided by the book are:

m = 4096
b = 853
a = 109

Here’s the code:

(defn gen-pseudo-random
  " Generates a sequence of random integers
    less than 4096."
  [seed]
  {:pre [((complement neg?) seed), (< seed 4096)]}
    (let [a 109
          b 853
          m 4096
          random-number (mod (+ (* a seed) b) m)]
  (lazy-seq
      (cons random-number (gen-pseudo-random random-number)))))

Super straight forward. Let’s see how good it works. We want to compare the average.

Problem 3.6.1: Confirm that the algorithm repeats after generating m random numbers. Compute the mean value and variance for the set of m pseudo-numbers.

user=> (last (take 4096 (gen-pseudo-random 40)))
40
user=> (last (take 4096 (gen-pseudo-random 2345)))
2345

Seems to work.

user=> (def pseudo-set (take 4096 (gen-pseudo-random 222)))
#'user/pseudo-set
user=> (average pseudo-set)
4095/2
user=> (variance pseudo-set)
1398101.25

Looks fine. Let’s see how big the variance over the average over the set is.

user=> (dotimes [n 20] (println "Seed:" n "Average:" (average (take 4096 (gen-pseudo-random n)))))
Seed: 0 Average: 4095/2
Seed: 1 Average: 4095/2
Seed: 2 Average: 4095/2
Seed: 3 Average: 4095/2
Seed: 4 Average: 4095/2
Seed: 5 Average: 4095/2
Seed: 6 Average: 4095/2
Seed: 7 Average: 4095/2
Seed: 8 Average: 4095/2
Seed: 9 Average: 4095/2
Seed: 10 Average: 4095/2
Seed: 11 Average: 4095/2
Seed: 12 Average: 4095/2
Seed: 13 Average: 4095/2
Seed: 14 Average: 4095/2
Seed: 15 Average: 4095/2
Seed: 16 Average: 4095/2
Seed: 17 Average: 4095/2
Seed: 18 Average: 4095/2
Seed: 19 Average: 4095/2

Yup. By the way, here are the functions for calculating the average and variance:

(defn average
  "Calculates the average of a collection"
  [coll]
  (/ (reduce + coll) (count coll)))


(defn variance
  "Calculates the variance of a collection"
  [coll]
  (* (/ 1 (count coll))
     (let [my-average (average coll)]
       (reduce +
               (map #(Math/pow (- % my-average) 2) coll)))))

Pretty standard.

How to Solve it by Computer #6

Problem 2.7.1: Design an algorithm that counts the number of digits in an integer.

Again, not a super interesting problem but one which we can solve for free. And I love free stuff.

user=> (count (get-digits 2345))
4
user=> (count (get-digits 890))
3

So is the next one.

Problem 2.7.2: Design an algorithm to sum the digits in an integer.

user=> (reduce + (get-digits 2345))
14
user=> (reduce + (get-digits 890))
17

And this one is good too.

Problem 2.7.3: Design an algorithm that reads in a set of n single digits and converts them into a single decimal integer. For example, the algorithm should convert the set of 5 digits {2,7,4,9,3} to 27493.

user=> (reduce + (map-indexed get-multiplied (reverse [2 7 4 9 3])))
27493

And that’s why it’s cool to make small complete functions without side-effects. Reusability, baby.

Problem 3.4.2: It is possible to implement a sieving algorithm which crosses out each composite (non-prime) number exactly once rather than a number of times. […] Try to implement this algorithm aka. Sieve of Eratosthenes.

The wikipedia page got a cool animation of this algorithm. It’s a great starting point to start with this.

Let’s make this interactive. I start with a list of all integers from 2 to 40:

user=> (def numbers (take 39 (iterate inc 2)))
#'user/numbers
user=> numbers
(2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40)

Now, imagine we know already that 2 is a prime number.

user=> (def primes [2])
#'user/primes
user=> primes
[2]

What we would now do is remove every number which is a multiple by 2. Or whose characteristic is mod 2 = 0.

user=> (remove #(= 0 (rem % (first primes))) numbers)
(3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39)

Neat. Now we just need to find the next prime. We know that the next bigger number in our number list is a prime, in this case 3. And then we start again.

Let’s write a function:

(defn prime-sieve
  "Generates all primes up to n using the Sieve of
   Eratosthenes"
  [n]
  {:pre [(> n 1)]}
  (loop [primes nil
         numbers (take (dec n) (iterate inc 2))]
    (if (seq numbers)
        (recur (conj primes (first numbers))
               (remove #(= 0 (rem % (first numbers))) numbers))
        primes)))

I was surprised how small and easy to function is. Let’s go through it. We start easily with creating a list of numbers, like before. It starts with 2 and ends with n. Also we have an empty collection primes in which we put our primes. We know that the first element of our list numbers is always a prime, so we can put that into there. The next step is to remove all multiplies of that prime in our list. Now we just recursively call that function again. This runs so long until there’s no number left in numbers and the function returns our – now – big list of primes.

user=> (prime-sieve 20)
(19 17 13 11 7 5 3 2)
user=> (prime-sieve 60)
(59 53 47 43 41 37 31 29 23 19 17 13 11 7 5 3 2)

Neat.