The Statistics Window
Types of Study
The Statistics Window allows you to extract patterns from your corpus.
It allows three basic types of study:
- Describe a Dataset: shows means and counts
for the corpus as a whole, or a specified sub-corpus.
- Compare Two Datasets: produces the statistics
of each specified sub-corpus, and shows which of the differences
between the two sets is statistically significant.
- Compare Multiple Files: shows descriptive statistics for
each of the individual files in your corpus (often useful to export
this data for importing into a more general statistics package).
Aspect of Interest
UAM CorpusTool currently offers two kinds of statistics:
- General text Statistics: offers general statistics of the
corpus, such as total number of segments, number of words per segment.
For English text, it also display lexical density, pronominal usage, etc.
- Feature Statistics: displays the frequency of usage of the features
with which you have tagged your corpus, as count and as a mean. When comparing two
datasets, the level of significance is also shown, using both a T-Test and Chi Squared
(see below).
To Perform a Study
- Select the type of study you want to perform (Descriptive, Comparitive or multi-file).
- Select the aspect of interest (general text or features).
- Specify the unit of interest: On the next row of the interface, there is a selector labelled "Unit:".
Use this to specify the unit of interest. See the section on "Corpus Search" for details how to use this selector.
- (if comparative) Specify Sets: If you are comparing two datasets, you need additionally to specify which sub-corpora should be used. On the line
below the Unit selection, there will be two selectors, one labelled "Set1" and the other labelled "set2". Use these
selectors to specify subsets of the data. Most typically, the "Unit" of interest will be a smaller unit, such as a clause
or NP. Set1 and Set2 will typically be at another level, a larger unit, for instance, specifying features at a whole
document level (e.g., english vs spanish, formal vs. informal, biology vs chemistry, etc.). The software will then contrast
those units of interest which occur within the documents specified by set1 and set2.
- Press the "Show" button: the results will be displayed in the window.
Interpreting the Results
[TO BE WRITTEN]
Other Views of the Results
[TO BE WRITTEN]
Saving the the Results
[TO BE WRITTEN]