Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Speller meeting

What is needed for a new release:

Installer

DONE:

TODO:

POSTPONED:

PLX conversion

The latest PLX file is substantially smaller than earlier, only 6,5 Gb vs 25+ Gb earlier. The change has happened during the last day or two.

One ommission - that can’t explain the reduced size - is that particles like almma, bat, ges, han have been lost in the latest spellers.

Remaining PLX bugs:

First PRI

Words are accepted and suggested in their compounding-form

| cmp-form-suggestions | váigas, suolo | 581,909,803,912

Discovered one case of compounds without hyphen:

| non-hyph noun+acro cmps | muoreNRK | 805

Doesn’t recognize word

| hotealla not rec | hotealla | 1358

First and a half PRI

compounds

| prefix-compounds | not recognized:julevbargodepartemeanta | 408 | FIXED | noun+clit not rec. | vástidanproseantage | 451 | Left cmp-tags don’t work| biilarievttijođiheaddjái | 819

Second Last PRI

Section for Tags-not-working

| doesn’t follow cmp-tags | niibbispiidni, beavddeguorra | 489,709,539 | FIXED | doesn’t follow cmp-tags | sámedikkepresideanta | 489 | +CmpN/None in comp-sugg | guovttenuppelotsahán, ovdamearkadihti | 619,908 | FIXED | +CmpN/None in comp-sugg | 1883-as, Juovla-CD-as | 508,717 | +Use/SpellNoSugg | alhpabet gets suggested: a,đ etc | 461

Bugs below this line can be left out of the next release if we are short on time.

Compounds

| imposs” cmps along w num.| 0-geažideapmigárvu (geažideapmigárvu is impossible) | 536,1145 | num cmp:s on 0- | 051-nummarat | 631 | name/noun+adv cmps | Kuorak-ain, NRK-an | 642,913 | hyphened suggestions | deahtta-samus +A+Attr +Noun | 940 | non-ex. word accepted | saame | 658

Last PRI

| doesn’t understand caps | 1700-LOHKU | 647 | SgNomadj-cmps not work | láikiriehpu (suggestions:0-láikiriehpu…) | 421 | FIXED

DONE:

TODO:

Release plan