Text Visualisation (the Explore Window)
The Explore Window offers several alternatives for visualising patterns in text or corpora.
Source of Data
At the top of the pane, there should be three widgets:
- Text or Unit: select "Text" to select a single document to visualise.
Select "Unit" to specify a search query, and the text or codings covered
by this subcorpus will be visualised (note: "Heat" and "Text Flow" visualisations
are only available when visualising single texts.
- Subcorpus: if you selected "Text", then you can use this popup
to select between your various subcorpora (the folders of your corpus).
- File: if you selected "Text", then you can use this popup
to select between your various files in the subcorpus.
- Unit selection: if you selected "Unitt", a unit specification
widget is displayed, as used in the Search and Statistics panes. Use this
to specify a subsection of your corpus to be used as the basis of visualisation.
Aspect of Interest
Select from the list of various aspects you can visualise:
- Lexical patterns:
- Word Counts: the words in your designated data are counted and
visualisations will be based on simple frequencies.
- Keywords: words in your designated data are counted and the frequencies
relative to frequencies in a reference corpus will be used as the basis of visualisation.
(See "Keyword Calculation" below).
- Subjectivity: (English only): words in your data will be tagged as strongly, neutrally
or weakly subjective, and as positive or negative, and used as the basis of visualisation.
- Phrases (ngrams, lexical bundles): visualise frequently occuring phrases in your specified data.
- Feature Patterns: with this option selected you will be asked to select a unit
of interest (e.g., "clause" to see how frequent or key the features below "clause"
are in the designated corpus you are visualisating.
- Feature Counts: The simple count of features in the sub-corpus are counted.
- Key Features: The counts of features in the sub-corpus are compared to a reference corpus,
and the relative difference between then is used in visualisation.
Visualisation modes
You can choose from various visualisations. Not all visualisations are available for all data aspects.
- Table: data is displayed in a simple table, providing organised access to precise information,
but the patterns in the data are not immediately obvious to the eye.
- Cloud: show which words or features are most important in a single text, or a sub-corpus, compared to
some reference corpus. (cf. Wordle, http://www.wordle.net/).
- Text: The data is displayed within the text of the file, changing the color and font-size of the text.
- Heat diagram: allow the user to see how a single variable changes throughout time, or throughout a text.
E.g., for subjectivity, shows how the subjectivity changes at each point through the text.
- Text Flow: allow you to see how the feature selections in a system changes throughout a text.
Reference Corpora
When you select "Keywords" or "Key Features", the keyness (specialness to the designated corpus)
needs to be derived by comparing counts in the designated corpus to some reference corpus.
Currently three options are provided:
- Everything else in project: the part of the corpus not included
in the designated corpus will be used as the reference corpus. For instance,
if you have a corpus with each document tagged as various text types, and
specify the source of data as "editorial", then the reference corpus will be
all the documents not tagged as "editorial".
- Specific Subset of Corpus: You can use a unit specifier to
specify which units of the corpus to be used as the reference corpus. For instance,
if you have a corpus with each document tagged as various text types, and
specify the source of data as "editorial", then you might specify the reference
corpus as "short stories".
- Other UAMCT project: you will be prompted to select the .ctpr file
of a different project, and the data in that project will be used as the reference
project. Note: if you are visualising feature data, the other project must
have the same features as the current one.