RST Annotation
UAM CorpusTool supports the annotation of texts in terms of Rhetorical
Structure Theory (RST) (Mann & Thompson 1987).
NOTE: This functionality is barely functional! - will improve in later releases.
Getting Started
It is assumed you are familiar with how to use UAM CorpusTool.
If not, read through the manual/tutorial.
- Create an RST Layer
Firstly, add a new layer, give it a name (e.g., "RST"), specify as
"Rhetorical Structure" and then select a mode of automatic segmentation.
Paragraph should work for any language. Sentence will work for any language
where a "." terminates a sentence (you may need to correct mistakes it makes).
- Load in texts: You can either load in new texts, or import
texts annotated with RSTTool:
- Importing RSTTool texts: you can load in texts already annotated with
RSTTool (they must be saved in format .rs3). Under the File menu, select "Import Files",
select "RSTTool", specify either Single File or "Folder of Files". Then select the
file or folder you want imported. On the next pane, select the layer you created in the last
step. On the next pane, specify where you want your file(s) stored within your project folder
(a subcorpus name: all texts are assigned to a sub-corpus).
- Importing new texts: press the "Extend Corpus" button on the Project window.
See the manual for more details. Then press the "Incorporate" button to make them available for annotation.
- Annotating a text: To open a text for annotation, click on the button next to a file
with the name of your RST layer.
A window should open, showing your text.
- Segmenting the text: click at the point in the text where you want to segment.
If you get it wrong, currently the only way to correct segmentation is to click on the Actions
menu in the Toolbar and select "Show Text", which switches to the usual CorpusTool interface
and you can change segment boundaries here. This will be fixed in later versions.
- Structuring the text: Each segment in the text tree has a blue dot.
Drag the blue dot of one node to another blue dot, and you will be prompted for the relation
type between them (dragging from the satelite to the nucleus).
- Inserting other types of elements: RSTTool supported three types of elements
as well as rhetorical relations:
- Multinuclear nodes: groupings
of text segments in lists, joint, sequence, etc. (all elements have equal status).
- Schema: basically constituency structure (each element links to a parent node by a named relation, e.g,
Orientation, Complication, Resolution).
- Span: just a line over a node and all of its satelites, used mainly for graphical
niceness.
To insert any of these nodes over a text node, right click on the blue dot of a node
(its 'handle') and select one of the options from the presented list.
- Unlinking a subtree or node: right click on a node and select "Unlink Node"
to unlink it from its nuceus, multinuclear complex or schema.
- Assigning features to segments: Move between segments with the < and > buttons,
and code as you do with non-RST interfaces (see "Text Annotation" in the menu on the left).