Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Test ideas

We want to test our transducers betters

Existing tests

  1. Paradgim testing against predefined answers: yaml tests
  2. Tests written in the lexc and twolc code
  3. Testing whether we generate the lemma or not
  4. Tests using the lemma list as gold standard (do we generate the lemma)

Ideas for new tests

Elaborating the test ideas

Test for Multichar Symbols on the lower side

Now and then Multichar Symbols slip through twolc and give “words” like
Suome^Vn pro correct Suomeen.

How to test for this:

This test one should be able to set up language-independently. In case we get

Test for phonotactically illegal strings

Example, from fkv (this must be adjusted to a script):