The Statistics Window

Types of Study

The Statistics Window allows you to extract patterns from your corpus. It allows three basic types of study:
  1. Describe a Dataset: shows means and counts for the corpus as a whole, or a specified sub-corpus.
  2. Compare Two Datasets: produces the statistics of each specified sub-corpus, and shows which of the differences between the two sets is statistically significant.
  3. Compare Multiple Files: shows descriptive statistics for each of the individual files in your corpus (often useful to export this data for importing into a more general statistics package).

Aspect of Interest

UAM CorpusTool currently offers two kinds of statistics:
  1. General text Statistics: offers general statistics of the corpus, such as total number of segments, number of words per segment. For English text, it also display lexical density, pronominal usage, etc.
  2. Feature Statistics: displays the frequency of usage of the features with which you have tagged your corpus, as count and as a mean. When comparing two datasets, the level of significance is also shown, using both a T-Test and Chi Squared (see below).

To Perform a Study

  1. Select the type of study you want to perform (Descriptive, Comparitive or multi-file).
  2. Select the aspect of interest (general text or features).
  3. Specify the unit of interest: On the next row of the interface, there is a selector labelled "Unit:". Use this to specify the unit of interest. See the section on "Corpus Search" for details how to use this selector.
  4. (if comparative) Specify Sets: If you are comparing two datasets, you need additionally to specify which sub-corpora should be used. On the line below the Unit selection, there will be two selectors, one labelled "Set1" and the other labelled "set2". Use these selectors to specify subsets of the data. Most typically, the "Unit" of interest will be a smaller unit, such as a clause or NP. Set1 and Set2 will typically be at another level, a larger unit, for instance, specifying features at a whole document level (e.g., english vs spanish, formal vs. informal, biology vs chemistry, etc.). The software will then contrast those units of interest which occur within the documents specified by set1 and set2.
  5. Press the "Show" button: the results will be displayed in the window.

Interpreting the Results

[TO BE WRITTEN]

Other Views of the Results

[TO BE WRITTEN]

Saving the the Results

[TO BE WRITTEN]