CIT 591 Notes on the Lojban assignment
Fall 2013, David Matuszek

Useful methods

The methods required by the assignment all work with Strings. These notes don't change that requirement. However, in many cases it's more convenient to work with "words," as in an Array or List of Strings. Fortunately, Scala provides methods that make it easy to convert between these two representations.

scala> "This is a string.".split(" ")
res0: Array[String] = Array(This, is, a, string.)

scala> val s = "This string has   lots of spaces, \t\t tabs, and \nnewlines"
s: String =
This string has   lots of spaces,                tabs, and
newlines

scala> s.split("\\W+")
res1: Array[String] = Array(This, string, has, lots, of, spaces, tabs, and, newlines)

scala> List("These", "are", "some", "words").mkString(" ")
res3: String = These are some words

scala> List("These", "are", "some", "words").mkString("[before] ", " ", ".")
res4: String = [before] These are some words.
Another useful method is "contains."
scala> List("one", "two", "three").contains("two")
res5: Boolean = true

Lojban definitions

Predstring is a PRED or a Predstring Pred
This says that a Predstring can be a single PRED; or, it can be anything you already know to be a Predstring, followed by another PRED. Translated into simpler English, a Predstring is just one or more consecutive PREDs.

To recognize a Predstring: Recognize one or more consecutive PREDs.

To generate a Predstring: There are various ways to do this. One that I particularly like is to generate one PRED, then repeatedly, with 50% probability, generate another PRED. This can generate any number of PREDs, but the probability of getting a really long sequence gets vanishingly small.

Preds is a Predstring or a Preds A Predstring
Translated, this means: Any number of Predstrings, separated by As.

To recognize a Preds: Recognize one Predstring, then so long as it's followed by an A, recognize another.

To generate a Preds: You can randomly choose to generate a small number of Predstrings (say, up to 3), but this is again a case where the 50% idea (or any other probability) works well.

Generating forms

Whenever you want to generate an arbitrary number of things (for example, a Predstring consists of one or more PREDs), the following approach works well:

It turns out that, if p = 1/2, then the average number of things generated is two. This procedure could generate a huge number of things--and you could win the lottery--but it's really not worth considering.

Note: When you generate a sentence to display to the user, put one space between words. When you are trying to recognize what the user types in (over which you have little control!), you should allow any whitespace--that is, not reject a sentence just because they started with a space, or used a tab, or something like that.

Recognizing forms

There are a lot of ways you might recognize Lojban sentences, but here are two that should work.

Recognizing from the front

Always work from the beginning of the string, and recognize as much as you can, starting with the top-level element (Sentence). For example: To recognize la sruda ba pakde rodmu as a Sentence:

Replacements

Replace the "words" in the sentence with their "parts of speech". Repeat with higher-level concepts. For example,

la sruda e pakde LA PRED A PRED LA Predstring A PREDPredname A PRED → stuck!

la sruda ba pakde rodmuLA PRED BA PRED PREDLA Predstring BA Predstring PREDPredname BA Predstring Predname BA PredsPredclaimSentence

You do have to be careful with the ordering in this approach. If you replace PRED PRED with PRED Predstring instead of Predstring PRED (for example), you can get stuck and fail to recognize a valid sentence.