Thursday, December 26, 2013

Terminology resources

A bit of a linkdump here, first:

  • WordNet is now at 3.0 on Unix, still 2.1 on Windows. The database from Linux is probably more useful. Interestingly, it's also available in Prolog. The licensing is pretty open these days. I don't think it used to be. That's welcome news.
  • Here's something called CoreLex.
  • A good overview of the OLIF format.
I think I could do worse for termbase storage in Perl than simply a database schema that mirrors OLIF (at least partly). That could be part of a general OLIF-handling set of modules. OLIF is attractive because it's model-agnostic in terms of how terms are conceptualized, so an OLIF-based module should be able to do something reasonable with essentially any terminological source.


Saturday, May 4, 2013


The Perl module Lingua::TreeTagger provides an interface to the TreeTagger program (Win installation here).  The only problem with TreeTagger is that it's got a commercial-license requirement. That, and the approach isn't good for Hungarian - but you can't have everything.  This would probably be the best possible intermediate structure for frame-based translation, which I still think should be a valid approach.

Saturday, April 27, 2013

Word lists

I really need some kind of principled way to keep track of word lists and terminology.  Ideally this would be a full-blown terminology management system with an online component and everything, but it would also be a word list source.

Here are some good places to start with word lists: