Hands-on 3: Transformations. From TAGML to HTML

This notebook provides two examples of the transformation from TAGML to HTML. It assumes that you already completed the steps in this notebook.

That means that Alexandria is running on your local machine, that your local Alexandria repository contains at least one TAGML file, and one or more views.

If this is not the case, go back to the previous notebook and make sure to follow all steps in that notebook.

Introduction

If you have an existing editorial workflow, you're likely to use X-technology tools like XSLT or XQuery for transformation, publication, visualisation or analysis.

You can export your TAGML to a number of existing formats (XML, SVG, PNG and dot) so that you can continue using the publication framework of your choice. As a result, it is easy to implement Alexandria into an existing workflow.

Before you start, let's take a closer look at the implications and consequences of transforming a TAGML document to another data format.

Envision the Alexandria workflow

Workflow of Alexandria

Let's say that document A in your local repo contains three layers of markup. You may want to export the document as is, but keep in mind that it's rich in information and this has consequences for the export. For example: the more information document A contains, the more visually challenging an SVG file will be.

Reduction of information

Exporting a TAGML document to another format usually implies an information reduction, because of the richt data structure of TAG and the different types of data it contains. This means you have to make a number of decisions:

  1. What are the properties, strengths, and limitations of the format you are converting to? What are the consequences for the information you are converting? For example: if you are converting to XML, your text will be transformed in a hierarchical tree structure.
  1. What do you want to do with the exported file? Is it going to be a PNG or SVG file for visualisation, an XML file for further analysis or publication, etc. In short: what are the reasons for converting?

With these considerations in mind, you can make an informed choice about what information you want to see, what information is irrelevant and what format will best suit your purposes.

Step 1. From TAGML to XML

Go to you Jupyter Hub server and open your terminal window in a new tab.

In your terminal, navigate to the root directory in which you're running Alexandria.

Check whether you're in the right place by typing ls. If you see these notebooks and the subdirectories tagml/, views/ and sparql/ you're in the right place. If not, type pwd and compare your current directory with the root directory. You may need to type cd .. to navigate one or two levels up.

Once you're in the root/ directory in which Alexandria is running, you can type alexandria status to check the status of your document(s) and view(s).

In [ ]:
! alexandria status

You'll get an overview of

  • the active view
  • the document(s) registered in Alexandria; their source, date of creation and modification
  • the views registered in Alexandria and the layers or markup elements that each view shows

The previous notebook illustrated how you could upload a document and two views in Alexandria, and how you could checkout a view. If you followed those steps accurately, the output of the alexandria status command will probably look as follows:

Overview of Alexandria status command

If another view is active, run alexandria checkout - to checkout the main view:

In [ ]:
! alexandria checkout -

From the folder overview in your Jupyter Hub, you can open the file lighthouse-woolf.tagml in a new tab. This is the master file that is registered in Alexandria as lighthouse-woolf.

Note that the file contains two layers, T and D.

The T-layer represents textual markup (chapters, paragraphs, sentences) and the D-layer represents documentary markup (page, lines). In XML, the structure of these markup elements would overlap. The layer-functionality of TAG ensures that one document can contain potentially overlapping markup structures.

We are now going to export the TAGML document lighthouse-woolf to XML. Since the XML data structure supports one hierarchy only, we will checkout only the markup contained in the T-layer. This means that the resulting XML file will have a hierarchical structure of chapters, paragraphs and lines.

Switch to the tab with your terminal and run alexandria checkout view-T-layer:

In [ ]:
! alexandria checkout view-T-layer

Go back to the tab with your text editor in which the file lighthouse-woolf.tagml is opened. Refresh the page to see the result of your checkout.

Check that the lighthouse-woolf document now contains the following markup elements only:

  • excerpt
  • title
  • head
  • p
  • s

It no longer contains overlapping structures. This makes it easier to export it to an XML format.

Go back to the tab with your terminal window. Run the command alexandria export-xml lighthouse-woolf -o lighthouse-woolf.xml. With this command, you ask Alexandria to export the document lighthouse-woolf to an XML file called lighthouse-woolf.xml.

In [ ]:
! alexandria export-xml lighthouse-woolf -o lighthouse-woolf.xml

Go back to your folder overview and refresh the window.

There is now a new XML document called lighthouse-woolf.xml. Open the document in a new tab. It should look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
    <excerpt work="To The Lighthouse" author="Virginia Woolf" source="http://www.woolfonline.com">
    <title>TIME PASSES</title>
    <chapter n="1">
    <head>I</head>
    <p n="1">
        <s n="1">It grew darker.</s>
        <s n="2">Clouds covered the moon; in the early hours of the morning a thin rain drummed on the roof, and starlight and moonlight and all light on sky and earth was quenched.</s>

        <! -- more text here -->

    </p>
    </excerpt>
</xml>

Step 2. From XML to HTML

The output file lighthouse-woolf.xml can be processed with existing, XML-based analysis tools.

By way of illustration, this section demonstrates how the XML file lighthouse-woolf.xml can be transformed to simple HTML output using an XSLT file.

Remember the workflow of Alexandria? In the case of an export, it would look like this:

Workflow of Alexandria

After checking out the document with the view view-T-layer, we have exported it to an XML format lighthouse-woolf.xml. We will now convert this XML file to HTML with XSLT.

In your folder overview, you see a file called lighthouse.xsl. If you like you can open the file in a new tab and check it out. It's a basic XSLT file; if you like you can adjust or customize it.

For the purpose of the workshop, we wrote a simple Python script called transform_xml.py that calls the file lighthouse.xsl and applies the XSL transformation to the xml file you just created.

In [ ]:
# take the XML file and apply the XSLT file and store the result in lighthouse.html 
! transform-xml lighthouse-woolf.xml lighthouse.xsl lighthouse.html

Refresh your folder overview again. It should now contain the file lighthouse.html. If you right-click the file, you can open it in a new browser window to check it out.