Language Technology at UiT The Arctic University of Norway

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/

Page Content

August 2006 gathering


Børre, Maaren, Sjur, Thomas, Tomi, Lene, Trond, Saara


Commented topics

Worst-case scenario: The name project failed, took too long time, etc., and we will have to build separate name lexica in traditional lexc format for each language. In the best case we will get it up and running in a reasonable amount of time.

Time table

Time Tuesday Wednesday Thursday
8:30-10:00 A Presentation (9:00->) T lexc2Xspell Machine update + planning
    L Consequences of eval Aligner (Trond, Saara)
10:00-10:30 A Reports, plans Coffee A Coffee
10:30-12:00 a Polderland T lexc2Xspell -
  a Polderland L Consequences of eval -
12:00-13:00 A Lunch A Lunch A Lunch
13:00-14:00 A G3/Howto/m4/Wiki A Name lexicon 1 (exit Saara, Maaren)
  13:45: Coffee (what shall we store) -
14:00-14:30 T Video with PL (1h) A Coffee A Coffee
  L Evaluate   -
14:30-16:00 A *G3/Howto/m4/Wiki T Name lexicon 2 (how) -
- L (3part) -
... | preprocess --abbr=bin/abbr.txt | | ..
... | preprocess --abbr=bin/abbr.txt --corr=src/typos.txt | lookup ...
A = all
a = all - Lene
T = Saara, Sjur, Trond, Tomi, Børre
L = Lene, Maaren, Thomas

Tuesday afternoon 1

A Presentation with Polderland
T Video with PL (Saara, Sjur, Trond, Tomi, Børre, Maaren, Thomas)
L Evaluate our linguistic analysers (Lene, Maaren, Thomas)
  How good are the tools? (M, T explaining L what input she can expect from M, T)
  What does it take to make them better?
  Do we need tools for measurement
  Or: Office as usual, working

Tuesday afternoon 2

Thursday morning

Machine updates:

Thursday evening

rule for id field in common.xml
default: you are your own id
overriding: your id is different from yourself, and points to another lemma

 Kautokeino .. nob, nno, eng ... id=Guovdageaidnu
 Guovdageaidnu ... sme, sma, smj ... id=Guovdageaidnu

sme.xml (id info is inherited, perhaps)
sme.lexc (pure lexc file, generated)
 Kautokeino contlex-i ;
 Guovdageaidnu contlex-j ;

M4 flags:




non-m4-variation (variation in the lexicon):