Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Status and future of Xerox and other FST tools

Presently the Giella infrastructure supports three fst technologies in parallel:

Each of them have their strengths and weaknesses, summarised as follows:

Xerox

The standard in all FST work.

Strengths:

Weaknesses:

Source code access

Even though the source code is not released, it is possible to get a license to the source code of the c-fsm library (documented here) by requesting a license for the XLE page. Information and relevant links can be found at the bottom of the project page.

Hfst

A source-code compatible clone of the Xerox tools, based on OpenFST, but with multiple backends (Foma, SFST, OpenFST).

Strengths:

Weaknesses:

Foma

A source-code and command interface compatible clone of the Xerox xfst tool, developed and maintained by Måns Huldén. Is open source.

Strengths:

Weaknesses:

How to cope with this…

… ie the lack of future for Xerox, the lack of twolc in Foma and the lack of speed in Hfst.

Today’s dual strategy

The future

… is dependent upon

In the short run, we can continue as we do now.