Turns out the linguistic structure of chemical names is non-trivial. Unfortunately, as it's also quite profitable, it all seems to be behind paywalls, but I'm visiting Bloomington this summer and will have the opportunity to spend some time in the library, so this is one of the things I hope to make some headway on.
In the meantime, here's a paywalled article from the promisingly named Journal of Chemical Information and Modeling, which describes an early version of Name>Struct, a closed-source interpreter for chemical names that strives to understand them in a way similar to a human chemist - that is, they attempt to model actual usage, not just reflect the official definitions of usage. Descriptive, not prescriptive, chemical linguistics.
Anyway, the folks at CambridgeSoft who make Name>Struct have also highlighted some of the pitfalls of chemical linguistics here.
Ah - silly me. A search on "Name>Struct open source" quickly returns OPSIN, an open-source algorithm that I could probably adapt pretty easily. It's here at BitBucket, and written in (shudder) Java. Nifty Web interface here.
No comments:
Post a Comment