The next major step is to transform existing rules to CNF. We have to do two things:
A) Transform the terminals to new non-terminals
B) Shorten the rule
A) seems easier, so let’s do that first.
Examples: xA, By
In this case, we have to create a new variable – for the sake of simplicity its name will be the name of the terminal in uppercase – and replace it.
(transform-terminals "A -> xA") -> [{:A "XA"}, {:X "x"}]
Before that I need a function which decomposes the string.
(decompose-rule "A -> xA") -> {:A "xA"}
Here’s the code and test: (I edited the input later to directly take a decomposed map)
(defn decompose-rule "Takes a rule in the format A -> B and returns a hash-map containing the decomposed rule." [string] (let [[kw content] (clojure.string/split string #" -> ")] (hash-map (keyword kw) content))) (deftest test-decompose (is (= (decompose-rule "A -> xA") {:A "xA"})) (is (= (decompose-rule "B -> y") {:B "y"})) (is (= (decompose-rule "C -> yBz") {:C "yBz"})))
Ok, now we can concentrate on the right-side of the rules. Want we want the function from before:
(transform-terminals "A -> xA") -> [{:A "XA"}, {:X "x"}]
The idea is pretty basic. We extract each terminal, i.e. lower case, and create a new rule for that, afterwards we take the initial rule and uppercase everything.
(defn transform-terminals "Takes a decomposed map and generates new non-terminals for each terminal" [decomposed] (let [[name rule] (first decomposed)] (conj (map #(hash-map (keyword (clojure.string/upper-case %)) (str %)) (filter #(Character/isLowerCase %) rule)) (hash-map name (clojure.string/upper-case rule))))) (deftest test-transform (is (= (transform-terminals {:A "xA"}) [{:A "XA"} {:X "x"}])) (is (= (transform-terminals {:B "xAy"}) [{:B "XAY"} {:X "x"} {:Y "y"}])))
Now we have a list of rules which may be too long. Let’s take care of that.
For the sake of simplicity I will use variables A to H for existing rules. And everything higher than that for new rules which are needed for shortening the existing one.
Let’s say we have the following transformed rules:
{:A "CHC"} {:B "DABA"}
What we want is:
{:A "CI"} {:I "HC"} {:B "DJ"} {:J "AK"} {:K "BA"}
Here’s the function to do that:
(defn shorten-rule "Takes a transformed rule and returns new ones" [hmap start-index] (let [[rname rule] (first hmap) length (count rule) num-sequence (iterate inc (int start-index)) extra-indexes (apply str (map char (take (- length 2) num-sequence))) end-string (apply str (drop (- length 2) rule))] (loop [rule-string (str (apply str (interleave rule extra-indexes)) end-string) index (name rname) result nil] (if (seq rule-string) (recur (drop 2 rule-string) (second rule-string) (conj result (hash-map (keyword (str index)) (apply str (take 2 rule-string))))) result)))) (deftest test-shorten (is (= (shorten-rule {:A "CHC"} \I) [{:I "HC"} {:A "CI"}])) (is (= (shorten-rule {:B "ABCD"} \L) [{:M "CD"} {:L "BM"} {:B "AL"}])) (is (= (shorten-rule {:C "BABA"} \O) [{:P "BA"} {:O "AP"} {:C "BO"}])))
It works. So, now we need to put everything together.
Continue tomorrow