Friday, July 16, 2010

Link dump: interesting natural language modules from CPAN

Algorithm::WordLevelStatistics - finds keywords in generic text. This should be a useful analysis tool for terminology research.

Lingua::Stem - finds stems for a smallish set of languages.

Lingua::StarDict::Gen - generates StarDict dictionaries. (Might be useful.)

Lingua::StarDict itself (2004) and the StarDict project (2007) at SourceForge, hmm. (console version, dates to 2006) - This might be dead, but it's intriguing.

Lingua::YaTeA - extracts noun phrase candidates from a corpus. Definitely to be studied. Seems to have a lot of innards.

Lingua::WordNet - pure Perl WordNet. Apparently. Needs study.

Lingua::Translate - interface to a Web-accessible machine translator, e.g. Babelfish.

Lingua::Sentence - Hello, segmentation! Thanks, CPAN!

Text::Ngrams, Text::Ngramize, Algorithm::NGram - n-gram analysis of text. Oh, and Text::WordGrams, too. And maybe Text::Positional::Ngram.

No comments:

Post a Comment