Haskell logo CIS 5520: Advanced Programming

Fall 2024

  • Home
  • Schedule
  • Homework
  • Resources
  • Software
  • Style guide

Problem - XML Transformation

> module Play where 
> -- see http://www.seas.upenn.edu/~cis5520/current/hw/hw02/Play.html
> import XMLTypes ( xml2string, ElementName, SimpleXML(..) )
> import Midsummer ( play )
> import Test.HUnit ( assertFailure, Test(TestCase) )

For this XML transformation problem, you are allowed to import modules from the Haskell base library.

WARNING: this problem requires some design as well as implementation! Make sure that you read all of the instructions (from this point to the end of the module) before starting to code.

This problem involves transforming XML documents. To keep things simple, we will not deal with the full generality of XML, or with issues of parsing. Instead, we will represent XML documents as values of the following simplified type:

data SimpleXML
    = PCDATA  String
    | Element ElementName [SimpleXML]

type ElementName = String

That is, a SimpleXML value is either a PCDATA ("parsed character data") node containing a String, corresponding to a leaf, or else an Element node containing a tag and a list of sub-nodes, corresponding to a branch with arbitrarily many children.

For example, this XML snippet

<body>
  <p>Hello!</p>
  <p>Bye!</p>
</body>

is represented by this Haskell value

Element "body" [
    Element "p" [PCDATA "Hello!"],
    Element "p" [PCDATA "Bye!"  ] ]

The goal of this exercise is to write a transformation function 'formatPlay', which takes a play in an XML format specific for plays and converts it to HTML (which is also an XML format). See below for more information about what we expect from your solution, in terms of behavior and design.

> 
> formatPlay :: SimpleXML -> SimpleXML
> formatPlay = error "implement formatPlay"

The input format is demonstrated by the sample file Play.hs.

The XML value in Play.hs has the following structure (as it would be written in standard XML syntax):

 <PLAY>
   <TITLE>TITLE OF THE PLAY</TITLE>
   <PERSONAE>
     <PERSONA> PERSON1 </PERSONA>
     <PERSONA> PERSON2 </PERSONA>
     ... -- MORE PERSONAE
   </PERSONAE>
   <ACT>
     <TITLE>TITLE OF FIRST ACT</TITLE>
     <SCENE>
       <TITLE>TITLE OF FIRST SCENE</TITLE>
       <SPEECH>
         <SPEAKER> PERSON1 </SPEAKER>
         <LINE>LINE1</LINE>
         <LINE>LINE2</LINE>
         ... -- MORE LINES
       </SPEECH>
       ... -- MORE SPEECHES
     </SCENE>
     ... -- MORE SCENES
   </ACT>
   ... -- MORE ACTS
 </PLAY>

The output format is demonstrated by the file sample.html. This file contains a very basic HTML rendition of the same information as Play.hs. You may want to have a look at it in your favorite browser. The HTML in sample.html has the following structure (with whitespace added for readability). Note that the <br/> tags below should be represented as br elements with no children.

  <html>
    <body>
      <h1>TITLE OF THE PLAY</h1>
      <h2>Dramatis Personae</h2>
      PERSON1<br/>
      PERSON2<br/>
      ...
      <h2>TITLE OF THE FIRST ACT</h2>
      <h3>TITLE OF THE FIRST SCENE</h3>
      <b>PERSON1</b><br/>
      LINE1<br/>
      LINE2<br/>
      ...
      <b>PERSON2</b><br/>
      LINE1<br/>
      LINE2<br/>
      ...

      <h3>TITLE OF THE SECOND SCENE</h3>
      <b>PERSON3</b><br/>
      LINE1<br/>
      LINE2<br/>
      ...
    </body>
  </html>

Your version of formatPlay should add no whitespace except what's in the textual data in the original XML.

The test below uses your function to generate a file dream.html from the sample play. To receive any credit for this problem, the contents of this file after your program runs must be character for character identical to sample.html.

Your solution only needs to work for input in the same format as in Play.hs. You do not need to worry about formatting errors for this assignment. (We will only test your code on valid input.)

> -- | Find the first point where two lists differ and return 
> -- the remaining elements in the two lists.
> firstDiff :: Eq a => [a] -> [a] -> Maybe ([a], [a])
> firstDiff [] [] = Nothing
> firstDiff (c : cs) (d : ds)
>     | c == d = firstDiff cs ds
>     | otherwise = Just (c : cs, d : ds)
> firstDiff cs ds = Just (cs, ds)
> -- | Test the two files character by character, to determine whether
> -- they match.
> testResults :: String -> String -> IO ()
> testResults file1 file2 = do
>     f1 <- readFile file1
>     f2 <- readFile file2
>     case firstDiff f1 f2 of
>         Nothing -> return ()
>         Just (cs,ds) -> assertFailure msg where
>             msg = "Results differ: '" ++ take 20 cs ++ "' vs '" ++ take 20 ds
> testXML :: Test
> testXML = TestCase $ do
>     writeFile "dat/dream.html" (xml2string (formatPlay play))
>     testResults "dat/dream.html" "dat/sample.html"

Important: The purpose of this assignment is not just to “get the job done”, i.e. to produce the right HTML. A more important goal is to think about what is a good way to do this job, and jobs like it.

To this end, your solution should be organized into two parts:

  1. a collection of generic functions for transforming XML structures that have nothing to do with plays, plus

  2. a short piece of code (a single function definition or a collection of short functions) that uses the generic functions to do the particular job of transforming a play into HTML.

Obviously, there are many ways to do the first part. The main challenge of the assignment is to find a clean design that matches the needs of the second part. You will be graded not only on correctness (producing the required output), but also on the elegance of your solution and the clarity and readability of your code and documentation. As always, style most definitely counts.

It is strongly recommended that you rewrite this part of the assignment a couple of times: get something working, then step back and see if there is anything you can abstract out or generalize, rewrite it, then leave it alone for a few hours or overnight and rewrite it again. Try to use some of the higher-order programming techniques we’ve been discussing in class.

> -----------------------------------------------------------------------------

Describe how you and your partner worked together on this assignment. Who did what? What parts did you complete separately and what parts did you complete together? Were your contributions even?

> answer2 :: String
> answer2 = undefined
Design adapted from Minimalistic Design | Powered by Pandoc and Hakyll