Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Adding a new language to the Github infrastructure

Languages reside within the GiellaLT organisation, and new languages should be added there.

Prerequisites

You need gut to be able to add a new language the way it is intended.

How to add a new language

Follow these instructions.

When done, also add entries in the registry and check the README file.

To get spellers via the Divvun Manager (nightly channel only in the beginning), there are two more steps:

Result

The above steps will create a new directory for the specified language, and populate it with the required makefiles, autoconf files and template source files.

To start doing real work, you must do one set of preparations still:

cd lang-LANGCODE
./autogen.sh
./configure

Now you can start editing the source files, and whenever you want to make sure everything compiles, run make. Run make check to ensure that all defined tests are passed. Remember to update the test suits as you enhance the linguistic model!

Setting up the documentation page for the new language

The new language must also be added to the language documentation page. Here we document how to set up language documentation for new languages.

Adding a new language to the $GTBIG/prooftesting dir

The procedure is the same as above, but by adding a template to the command:

  1. cd $GTHOME/langs
  2. $GTCORE/scripts/new-language.sh LANGCODE [[TEMPLATECOLL]

where

This directory contains infrastructure for testing proofing tools for a number of languages. At present, only spellers are tested, but more tool types will be added in the future. The prerequisite for being able to test a speller is:

The command to set up the basic testing infrastructure for a new language is exactly as above, with only one path adjustment:

  1. cd $GTBIG/prooftesting
  2. $GTCORE/scripts/new-language.sh LANGCODE

Result

A new language will be added to the testing infrastructure, ready to be populated with speller lexicons. Add new lexicons in the appropriate places (see the other languages for examples), and you should be ready to prepare the testing for your new language:

cd LANGCODE
./autogen.sh
./configure

If everything is ok, at least one of the speller tests listed at the end of the ./configure step should return yes, after which you can just run make. If there is test data available in $GTFREE, the test(s) will run, and after a while the test results are written to an xml file.