Date: created 1 April 2004
updated Sunday, 17 October 2004
Copyright © 2004 Jason H. Stover. All rights reserved. Permission to copy, store, & view this document unmodified & in its entirety is granted.
The question of when
has a density is
related to the field of Bernoulli convolutions [6]. In this
area, the questions usually relate to the singularity or
nonsingularity of the distribution of a geometric series of the form
where
and the
are independent identically distributed on
or
. Most of the literature in this field
describes conditions for which
has a density and what types of
sets support its distribution function. Unlike the problems previously
treated in this field, the
here are
dependent and take values on a finite set of order
with
probabilities
. For independent
, Garsia
[2] proved a condition
under which
must have a singular distribution function, and his
theorem is extended with minor modifications in section
5 to include a case in which the
's are
stationary and ergodic. Hill and Blanco [3], and
Sugiyama and Huzii [9] showed that for independent
's
there are some values of
for which
has a continuous
density given by polynomial splines. In section 4, we
will see our data match these results though they are dependent.
Let
be the distribution function of
, and
that of
. To maximize the relation of
with the original text, it is
desirable to alter as little text as possible while attempting to
create a nonsingular
. To achieve a non-singular distribution
function for the data shown here, some rare words must be
replaced. This will obviously remove some relevant semantic content
from the text. It may be possible to create a variant of
which alters the text little or not at all, and still gives a
nonsingular
. This improvement is suggested in the final
section. The suggestion stems from the fact that, from the perspective
of the linguistic community, the presentation of the data in this
paper would be considered naive, failing to account for such features
as parts of speech, tenses of verbs, plurality of nouns, and
identification of proper nouns. I hope the novelty of the statistical
approach compensates for this shortcoming.
Section two defines
by permuting, replacing and matching words in
concordances. Section three examines the behavior of
for the
words ``fruit'' and ``door'' in 158 novels. Section four presents a
generalization of Garsia's theorem which partly explains the behavior
seen in the data.
In this procedure, the rarest words must be replaced to reveal a non-degenerate probability distribution of the distances. If we do not replace these words, the distances between phrases will be either very large or very small with probability one. Such dichotomous distances, which correspond to phrases that either match exactly or almost nowhere, cannot tell us about the range of similarity between phrases, so replacement of rare words is necessary. There is a danger of removing too many words, and if this happens, the probability distribution again becomes singular, with most of its mass around 0. We will see that we can replace sufficiently many rare words to give a continuous distribution without replacing enough words to remove all content of the original phrase.
Before defining the distance between phrases, we shall see an example
of the KWIC concordances on which
is based. In this method,
phrases with identical central words are aligned to compare the uses
of the central word. The following example shows concordances of for
the word ``fruit,'' taken from the novels A Christmas Carol,
A Portrait of the Artist as a Young Man, A Study in
Scarlet, American Notes and An Old-Fashioned Girl:
| ...sausages, oysters, pies, puddings, | fruit | and punch all vanished... |
| ...as easily as a | fruit | is divested of its... |
| ...to eat of the | fruit | of the forbidden tree.... |
| ...ate of the forbidden | fruit | they would become as... |
| ...or rust stains or | fruit | stains or what are... |
| ...time corrupts the whole | fruit | Will you come with... |
| ...savoury cold meats, and | fruit, | and wine, we started... |
| ...islands where every known | fruit | vegetable and flower is... |
| ...the green and purple | fruit | lay all about us... |
| ...we never saw the | fruit | that Nelly didn't look... |
The concordances above reveal much about the meaning of the word ``fruit.'' We can see this word is surrounded by words related to food, eating, or plants, staining and bright colors. From this, one who did not know what ``fruit'' means could surmise that a fruit is a food produced by a plant. It may be green or purple, and may stain. One might induce from the middle phrase that the word can be used to describe metaphorically something desirable and forbidden. With knowledge of the surrounding words, one could infer a lot from examining these contexts.
Much linguistic literature suggests that humans interpret meanings of words by the contexts in which they are used ([4], [8], [5]).
If context determines the meaning of a word, then a measurement of distance between contexts of that word should have certain properties that relate to its meaning. Among these properties is the probabilistic behavior of the distance, which should give an idea of how far apart one can expect concordances to be. Moreover, if two different words typically are surrounded by different patterns of contextual words, the statistical behavior of the distances between their respective concordances should be different. Any soundly-defined measure of distance will be 0 between identical copies of a phrase, and will increase with a rise in the proportion of non-matching words between the two phrases. In addition, a measure of distance between two phrases should place more weight around the central word, since words closer to the central word are more likely to relate to its meaning.
To state distinctly whether we are referring to a lexicon or a corpus (i.e., a collection of text), define token to be particular occurrence of a word in the corpus. Word hereafter refers to an element of the lexicon. For example, the phrase ``the Sun and the Moon'' contains four words but five tokens. We can think of a word as a possible value in the sample space and a token as an observed value.
The distance between two phrases is measured as follows: First, denote a phrase by
First replace all tokens representing ``rare'' words with a common pseudo-token to give a new sequence of words
Denote a second phrase, after replacing these same rare words, by
where both phrases are chosen so that
might be ``sausages, oysters, pies, puddings, fruit and punch all vanished,'' in which case
Let
Then shift the window forward one word, defining
Again, define
,
and
as for the
previous window, and define
Continue this process until we have a sequence
of
Then define the distance between the two phrases to be
If no words were be replaced
would be singular since so many of the tokens represent rare
words. Because of the window used, common words will often match. Most
of the non-matches are caused by the appearance of infrequently-used
words. The large number of low-probability words in a lexicon is known
as Zipf's Law [4], which states that the probability of
an appearance of a word is proportional to the reciprocal of its rank,
i.e., if the word
is the
most commonly used word,
see
. While Zipf's Law does not
perfectly describe the distributions of words [4], it is
a close enough approximation to tell us that there is a
large proportion of the lexicon whose individual members are used
rarely, but in sum these words constitute a large proportion of
tokens, thereby causing many non-matching tokens, even among
phrases with similar meaning. Dropping these words forces more
matches, reducing the distance. At the same time, we want to retain
any common words, especially context-dependent ones, since they are
likely to cause a match in semantically similar phrases.
There is no rule presented here for replacing rare
words. The words chosen for replacement were chosen to give an
apparent density function for
.
The data were created from the corpora as follows. All punctuation was removed
from the novels' text. Plural forms of nouns were treated as distinct
words, as were different tenses of the same verb. The possessive
modification 's was treated as a distinct word. (For a
treatment of the question of what is or is not a word, see
[4]). All phrases
containing ``fruit'' and
``door'' were extracted, and concordances were formed with either
``fruit'' or ``door'' as the central token, surrounded by the leading
and trailing eleven tokens, i.e.
in (3.2). The
window length
was chosen to be
. All phrases were
tokens long.
The following concordances illustrate how the definition of distance in (3.2) is used for these novels. The first excerpts show two phrases, the first from A Double-Barrel Detective Story by Mark Twain and the second from Notre-Dame de Paris by Victor Hugo.
...with a gripsack handy, with a change in it and my door ajar. For I suspected that the bird would take wing now...
... the wild boar in his lair, pressed tumultuously round the great door, disfigured now and injured by the great battering ram. But...
After replacing the rare words among all the novels, the two phrases appear this way:
...with a -1 -1 with a change in it and my door -1. For I -1 that the -1 would take -1 now....
...the -1 -1 in his -1 -1 -1 round the great door, -1 now and -1 by the great -1 -1 But...
The distance between these two phrases is about 5.08, close to the sample mean for the ``door'' phrases.
The distance between the following phrases was less than 1.8, closer to the minimum for the ``door'' phrases. The phrases were taken from Dr. Jekyll and Mr. Hyde by Robert Louis Stevenson and Mr. Sponges' Sporting Tour by Surtees.
...inseparable friends. On the 12th, and again on the 14th, the door was shut against the lawyer. 'The doctor was confined to the...
...hanging out of the windows, flirting and chatting and ogling, the door was shut, the blinds were down, the shutters closed, and...
Most of these words are replaced with -1, which causes more matches and a corresponding smaller distance. Also notice the common phrase ``the door was shut'' in both excerpts. This explains why the ``fruit'' concordances have a smaller mean: There are more rare words and fewer repeated phrases surrounding ``fruit'' than surrounding ``door,'' causing more matches after the rare words have been dropped.
There were 655 phrases with ``fruit'' as the central token and 12647
phrases with ``door'' as the central token. Since computing all
possible pairwise distances among the ``door'' concordances would
result in 159 million values, the data were sampled to give 200326
distances computed for the phrases centered on ``door.'' All
distances centered on ``fruit'' were
computed. The coefficient
was 0.8. The ``rare'' words were
defined to be those least-used words which accounted for a fraction of
0.25 of all tokens from the 159 novels. These 0.25 of the tokens were
accounted for by about 0.97 of the 16160 distinct words represented in
the corpus.
In addition, to check the distribution of the distances for phrases centered on different words, the central tokens were replaced with ``fruitdoor'' for a randomly selected 16460 phrases from both ``fruit'' and ``door'' concordances. Distances between these phrases were also computed. The histograms for the three types of distances are shown in Figure 4.1.
![]() |
The replacement of the rare words may have the following interpretation. If no
words are replaced,
will be singular, placing all its mass at high
values, since few tokens will match. If most words are replaced, the
distribution will again be singular, this time with mass close to 0,
since most tokens will match. There is a proportion of words which, if
replaced, will give a nonsingular
. There are some words authors
must use frequently (e.g., ``a,''''an,'' ``of,'' etc.). Other, rare
words are more topic dependent (``taste,'' ``peel,'' etc.). Some words
may depend weakly on the topic and appear frequently (e.g.,
``through'' as in through the door). Semantic information for
humans is contained both in the rare words, most of which relate to
the phrase's topic by virtue of their presence in the phrase, and the
``glue'' among those words: The common words in the phrase tell us
about the relation of the central word to other
concepts. Prepositions tells us about placement with respect to other
objects (``through the door'' or ``piece of fruit''),
specifiers tell us whether the central word is a specific instance of
an object (the door) or an unspecified member of a class of
objects (a door). Removal of the rare words is therefore
removal of important content words showing the topic at hand, leaving
words that, when matching tokens from other phrases, indicate
similarity in the relationships around the central token.
The histogram for the ``fruit'' concordances, surprisingly, looks
normally distributed. It does share several features with the normal
distribution, including approximate symmetry about the quartiles and
rate of decay in the tails.
Kolmogorov's
was 0.0085 for a test with a null hypothesis of
normally distributed
data. This value, though small, is large enough to reject the
hypothesis for such a large sample size. The deviation
from normality is due to a
slight right-skew in the data. Nonetheless, the closeness to normality raises the
question: Are there values of
,
, and the number of
words replaced that will give a normally distributed
? The answer
is not known, but likely to be negative: In [3] and
[9], it was shown that when the
's in the sum are
independent, the density of
is a polynomial spline. In
[3], this spline density resembles a
normal density for some values of
. We have no rigorous
result showing that our dependent
's give a spline density for
, so the statement that the data are never
normally distributed is a conjecture.
The histograms for ``fruit'' and ``door'' have obvious differences in
center and shape. The histogram for ``door'' has a larger center and
notable right skew, whereas the histogram for ``fruit'' is more
symmetric with a smaller center. This is seen when checking any
relevant statistics: the sample mean for the ``fruit'' concordances is
4.26 while that for ``door'' is 5.08 (the large sample size
precludes the need or relevance of mentioning that these differences
are significant). The third central moment for ``fruit'' is
while that for ``door'' is -0.11.
The value of
has an important effect on
. If chosen too large, i.e. close to 1, then all words in a
phrase will be weighted approximately equally, and our sequence will
not depend much more on the central tokens than the outlying ones. On
the other hand, if
is
chosen too close to 0, then
will be singular,
in which case we will not see a range of values with different
probabilities.
In Section 5, the following result relating
to the the match
probabilities of the tokens will be shown:
For a certain class of stationary
's which take a finite number
of values with probabilities
,
has a singular distribution function if the
's
have entropy less than
. This theorem was proved by
Garsia [2] for independent
's, and can be generalized
with a slight modification.
Let
be the distribution function of
for a
specified
. As the value of
decreases,
will move from a nonsingular distribution placing
positive probability at high values of
to a singular
distribution. This fact is partly explained by Garsia's theorem, since
for a large
,
is smaller than the entropy of
the
. On the other hand, if
is small, the entropy of
the
will fall below the bound given by Garsia's theorem, and
will be
singular. We can see
overcome the entropy
of
to give a singular distribution in the histograms shown
in Figure 4.2. So interplay between
and the
gives us two competing features of the data: To satisfy our
notion that the tokens close to the center of the phrase are more
important,
should be small, but to give
a nonsingular
distribution,
should be large. The value of
for the
histograms in Figure 4.1 was chosen as a compromise between
these two features.
![]() |
Garsia's theorem also tells us the histogram of
should become
concentrated around a few values when the
's in the sum are
concentrated on only a few of their possible values. This happens
whenever the chance of a match between phrases is either too high or
too small. So replacing rare words is necessary to increase the
entropy of the
's, thereby allowing
to have a nonsingular
distribution function. There is no known converse to Garsia's
theorem for dependent
, i.e., we cannot say with certainty that
higher entropy among
the
's will give a nonsingular distribution function for
. We
are speculating on the basis of empirical evidence.
To compare the entropy of the
's to
, the
one-step transition probability matrix for the
's was computed
for the ``fruit'' concordances and is shown in Table
4.1. There is criticism in the linguistic community
of the appropriateness of Markov chains as models for human
language [4], but they are regarded as a useful model in
some cases. Viewing our sequence as a
Markov chain, Table 4.2 shows the
estimated stationary distributions
for the
's in the ``fruit'' and ``door'' concordances.
The estimated entropies
of the
's were 2.020 and 1.93 for ``fruit'' and ``door''
respectively, both well above the lower bound of
given by Garsia's theorem.
Stationary distribution
| ||||||||||||||||||||||||||||||||||||||||||||||||||
There is a concrete relationship between he higher entropy of the
's in for the ``fruit'' concordances, the greater scatter
in its histogram, and our scheme of replacing rare words. Table
4.3 shows the breakdown of replacements by
matching. The higher entropy for the terms of the ``fruit''
concordances results from there being more tokens replaced in these
phrases than in the ``door'' phrases. This causes more matches among
the
for the ``fruit'' concordances, which raises their entropy
and in turn causes a wider spread in the distribution of the
. Also notice that more matches in the ``fruit'' phrases gives
them a lower mean distance than that of the ``door'' phrases.
The fact that there are more replaced values in the ``fruit'' concordances than in the ``door'' concordances tells us the word ``fruit'' is often surrounded by rare words more often than is the word ``door.'' So the difference in the distribution functions is caused at least partly by a richer variety of context words for ``fruit'' than for ``door.''
Proportion of matches by replacement
| ||||||||||||||||||||
Do the resulting probability distributions relate to semantics in the phrases? Or have the semantic features of the phrases been erased during replacement of rare words, leaving a set of random variables that have little to do with the language? The answer to this question tells us whether the method allows us to see semantic similarity of words, or only to study an interesting, but semantically irrelevant, aspect of the randomness of language.
Though these are philosophical questions, a view of the match types suggests both are partly true. Certainly if too many words are replaced, almost all matches will occur because of the replaced words, erasing the effect of semantic similarity. But in the example, 0.111/(.111+.319) = .258 of the matches for the ``fruit'' concordances were made with non-replaced words, so much of the similarity between phrases is accounted for by matching among the original tokens. Though replacing rare words removes some or most original meaning, it does not remove all of it.
The third histogram in Figure 4.1 shows another interesting
feature of the
data. This histogram was created by replacing the words ``fruit'' and
``door'' with the artificial word ``fruitdoor,'' and measuring the
distances between the two types of phrases. Since the middle words of
any pair now match because of the bogus word ``fruitdoor,'' we can see
how the distance function behaves when comparing the contexts of two
different words. This method is used to test word-sense disambiguation
methods [4]. One might hope that, for distances measured
between phrases with different central tokens, the
distribution of the
would differ from a
distribution of
between phrases with identical central tokens. If they were
different, we could detect this difference by examining the
distribution function of the distances between the two types of
phrases. One manifestation of this difference in distributions we
might hope for is a higher mean of the distribution containing phrases
of mixed type. The third histogram shows a distribution that is
different from the other two, but its mean is not higher than
both. Let
be the sample mean for the ``fruit'' distances
and
be the sample mean for the ``door'' distances. The
sample mean for the ``fruitdoor'' distances was 4.91, slightly larger
than
. Nevertheless, this distribution does
differ from both the ``fruit'' and ``door'' distributions, which could
be caused by a semantic difference between the two words via their
different contextual words. There is no doubt that much of
the difference in distribution is caused by more replacements in the
``fruit'' phrases. The question of whether this indicates different
meanings between the two words depends what we mean by ``mean,'' and
that question is still debated.
This section presents a theorem stating sufficient conditions under
which the distribution of
is
singular. The theorem was proved for independent random variables by
Garsia [2]. Let
, and let
be the
distribution function of
. Lemma 5.0.1, coupled with the
lack of assumed independence of the
's in Lemma 5.0.3,
allows us to generalize Garsia's main theorem to include a class of
's which are stationary and ergodic. Unlike the definition of
in Section 3, in this section assume
is a
one-sided sequence with
initial value
chosen from a stationary distribution.
Garsia's theorem gives a sufficient condition for the singularity of
. Unfortunately, there is no known
necessary condition for stationary
. Research toward this result lies
in the field of Bernoulli convolutions [6]. There are some known
circumstances in which
has a density when the
's are
independent [9],[3],[7], but our
's are dependent. Despite the lack
of a necessary condition,
knowing when the data cannot have a density function
is instructive when choosing the number of words to replace to produce
the
.
For completeness, the proofs of all of Garsia's original theorems which rely on Lemma 5.0.1 are reproduced here.
PROOF: First notice
Combining this, 5.1 and 5.3, we have
The following theorems are modified versions of those proved by
Garsia. Their proofs have been modified to account for our stationary
's.
Condition S. There exists a
such that for any integer
and
, there is a set of integers
such that for
some
,
On the other hand, if
slowly enough that
Suppose now condition S is satisfied and that (5.6) is
true. Then we may assume there is an integer
such
that
. If (5.7) is true,
by Lemma 5.0.1, for any
,
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
which implies
The next lemma does not rest on the assumption of independence, hence requires no modification to apply to our situation. See [1] for a proof.
generates a partition which is finer than the partition generated by the relation
For a given integer
, partition the indices
into two
sets
and
as follows:
is the set of all
such that
and
the complement. Let
Then
Lemma 5.0.3 then implies
forms a singular sequence, so Corollary 5.0.1 implies the singularity of
so assumption (5.13) of Theorem 5.0.1 is met. The stationarity and ergodicity of the
A higher match rate among the unaltered tokens would raise the
entropy of the
, thereby reducing the proportion of replaced
words necessary to give a nonsingular
.
A method which allows partial matching could be employed to this
end. The match rate for verbs could be increased by
allowing partial matching between two tokens if those tokens have the
same infinitive and match either tense or conjugation, i.e. ``has''
and ``had'' have the same infinitive (``to have''), but different
tenses. Also, pronouns could be divided among first, second and third
person subjective and objective cases, giving partial matches among
words such as ``they'' and ``them'' or ``her'' and ``me.''
| A Christmas Carol | A Double-Barrel Detective Story |
| A Portrait of the Artist as a Young Man | A Sentimental Journey through France and Italy |
| A Study in Scarlet | Alice's Adventure in Wonderland |
| American Notes | An Old Fashioned Girl |
| Anna Karenina | Around the World in 80 Days |
| Barchester Towers | Barnaby Rudge |
| Billy Budd | Black Beauty |
| Bleak House | Brave New World |
| Bruno's Revenge and other Stories | Confessions of an English Opium-Eater |
| Crime and Punishment | David Copperfield |
| Dead Souls | Dombey and Son |
| Dr. Jekyll and Mr. Hyde | Dracula |
| Dubliners | Eight Cousins |
| Emma | Erewhon |
| Far from the Madding Crowd | Frankenstein |
| Good Wives | Great Expectations |
| Guide to Fiction | Gulliver's Travels |
| Hard Times | His Last Bow |
| Huckleberry Finn | Ivanhoe |
| Jane Eyre | Joseph Andrews |
| Jude the Obscure | Kidnapped |
| Kim | King Solomon's Mines |
| Lady Chatterleys Lover | Lady Susan |
| Lavengro | Little Dorrit |
| Little Women | Lord Jim |
| Lorna Doone | Madame Bovary |
| Mansfield Park | Martin Chuzzlewit |
| Martin Eden | Middle March |
| Mill on the Floss | Moby Dick |
| Moll Flanders | Moonfleet |
| Moonstone | Mr. Midshipman Easy |
| Mr Sponges Sporting Tour | Nicholas Nickleby |
| Northanger Abbey | Nostromo |
| Notre-Dame de Paris | Of Human Bondage |
| Oliver Twist | Omoo |
| Our Mutual Friend | Persuasion |
| Peter Pan | Peter Pan in Kensington Gardens |
| Phantom of the Opera | Pollyanna |
| Pride and Prejudice | Prince Otto |
| Rob Roy | Robinson Crusoe |
| Sense and Sensibility | She |
| Shirley | Silas Marner |
| Sons and Lovers | Stalky and Company |
| Stories from the Bible | Sylvie and Bruno |
| Sylvie and Bruno Concluded | Tale of Two Cities |
| Tales from Shakespeare | Tales of Mystery and Imagination |
| Tess of the d'Urbervilles | The Adventures of Sherlock Holmes |
| The Adventures of Tom Sawyer | The Age of Innocence |
| The Aspern Papers |
| The Brothers Karamazov | The Call of the Wild |
| The Castle of Otranto | The Dynamiter |
| The Expedition of Humphry Clinker | The Heart of Darkness |
| The History of Rasselas Prince of Abyssinia | The Hound of the Baskervilles |
| The Jungle Book | The Last of the Mohicans |
| The Life and Opinions of Tristram Shandy Gent | The Man Upstairs |
| The Mayor of Casterbridge | The Memoirs of Sherlock Holmes |
| The Old Curiosity Shop | The Pickwick Papers |
| The Picture of Dorian Gray | The Pilgrims Progress |
| The Portrait of a Lady | The Prairie |
| The Prisoner of Zenda | The Rainbow |
| The Red Badge of Courage | The Scarlet Letter |
| The Scarlet Pimpernel | The Sea-Wolf |
| The Secret Agent | The Sign of Four |
| The Tenant of Wildfell Hall | The Three Musketeers |
| The Turn of the Screw | The Valley of Fear |
| The Vicar of Wakefield | The Virginian |
| The Warden | The Water Babies |
| The Way of All Flesh | The Werewolf |
| The Woman in White | Three Men in a Boat |
| Through the Looking Glass | Tom Browns School Days |
| Tom Jones | Tommy and Co. |
| Treasure Island | Typee |
| Ulysses | Uncle Toms Cabin |
| Under Western Eyes | Valperga |
| Vanity Fair | Vathek an Arabian Tale |
| Villette | War and Peace |
| Washington Square | Westward Ho! |
| What Katy Did Next | White Fang |
| Wives and Daughters | Women in Love |
| Wuthering Heights |
The entire source code may be downloaded as http://lisp-p.org/conc/conc.cpio.bz2 or as a tar-ball from http://lisp-p.org/conc/conc.tar.gz.
The individual source code files may be browsed at http://lisp-p.org/conc/src/. The files are:
| permissions | links | size (octets) | modification time | filename |
| -rw-r-r- | 1 | 1778900 | Oct 3 15:24 | concordances.lisp |
| -rw-r-r- | 1 | 157517 | Oct 3 15:24 | concordances.txt |
| -rw-r-r- | 1 | 2287 | Oct 3 15:24 | dependence.pl |
| -rw-r-r- | 1 | 1020 | Oct 3 15:24 | door_by_corpus.pl |
| -rw-r-r- | 1 | 975 | Oct 3 15:24 | door_r_macro.pl |
| -rw-r-r- | 1 | 1413 | Oct 3 15:24 | fixlisp.pl |
| -rw-r-r- | 1 | 947 | Oct 3 15:24 | fruitdoor_r_macro.pl |
| -rw-r-r- | 1 | 978 | Oct 3 15:24 | fruit_r_macro.pl |
| -rw-r-r- | 1 | 3819 | Oct 3 15:24 | get-concordance.pl |
| -rw-r-r- | 1 | 992 | Oct 3 15:24 | get-concordances.sh |
| -rw-r-r- | 1 | 2947 | Oct 3 15:24 | get-sentences.pl |
| -rw-r-r- | 1 | 7371 | Oct 3 15:24 | process-data.pl |
| -rw-r-r- | 1 | 1450 | Oct 3 15:24 | p-val-hash.pl |
| -rw-r-r- | 1 | 1121 | Oct 3 15:24 | r_macro.pl |
| -rw-r-r- | 1 | 1046 | Oct 3 15:24 | runit.sh |
| -rw-r-r- | 1 | 1340 | Oct 3 15:24 | sample.pl |
| -rw-r-r- | 1 | 1190578 | Oct 3 15:24 | wordhash_pvalues.lisp |
| -rw-r-r- | 1 | 19665 | Oct 3 15:24 | wordsense.lisp |
Gene Michael Stover 2008-04-20