Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Speller meeting

What is needed for a new release:

Installer

Variablar for installeraren:

Verkty for debugging:

DONE:

TODO:

POSTPONED:

PLX conversion

One ommission - that can’t explain the reduced size - is that particles like almma, bat, ges, han have been lost in the latest spellers. — FIXED in the latest svn, but no speller yet, and thus no speller test results.

Remaining PLX bugs:

First PRI

Words are accepted and suggested in their compounding-form

| cmp-form-suggestions | váigas, suolo | 581,909,803,912

Discovered one case of compounds without hyphen:

| non-hyph noun+acro cmps | muoreNRK | 805 | FIXED

REGRESSION - HYPH-SUGGESTIONS

| hyphened suggestion | guolli-busse,rátnu-biellu, stohpu-spiidni- | 397,489,721,802 etc

REGRESSION - DOESN’T RECOGNIZE SOME ACRO/PROP+NOUN, seems random

| PROP+NOUN | Oslo-biila, Helsset-vuođđuduvvan, Unicode-doarjja, SF-muorra | 397,426,611,930,931

REGRESSION

| NO STEM-VOWEL SHORTENING | rátnu-biellu, | 489

REGRESSION

| DOUBLE HYPH-SUGG | SF–muora, | 611

REGRESSION

| NO HYPH PREF+NOUN | julevua-gielas, juleva-gielas | 629

This is analysed as:

a-gielas	a-giella+N+Sg+Loc
julevu-	julevu+N+RCmpnd

a-gielas is an entry in the lexicon HyphNouns, a set of nouns requiring hyphens in front of them. Solution: tag all words in LEXICON HyphNouns as +CmpN/First.

REGRESSION

| ALPH+NOUN | u-joavku | 785,818

First and a half PRI

compounds

| DEVERBAL+noun+clit not rec. | vástidanproseantage, hárjehaddanmuorragen | 451

Second Last PRI

Section for Tags-not-working

| doesn’t follow cmp-tags | sámedikkepresideanta | 489 | +CmpN/None in comp-sugg | 1883-as, Juovla-CD-as | 508,717 | +Use/SpellNoSugg | alhpabet gets suggested: a,đ etc | 461

Bugs below this line can be left out of the next release if we are short on time.

Compounds

| imposs” cmps along w num.| 0-geažideapmigárvu (geažideapmigárvu is impossible) | 536,1145 | NEW SUGGESTION: geažideapmi-gárvu- | num cmp:s on 0- | 051-nummarat | 631 | name/noun+adv cmps | Kuorak-ain, NRK-an | 642,913 | hyphened suggestions | deahtta-samus +A+Attr +Noun | 940 | non-ex. word accepted | saame | 658

Last PRI

| doesn’t understand caps | 1700-LOHKU | 647 | Left cmp-tags don’t work | biilarievttijođiheaddjái | 819 | THESE ARE NOW RECOGNIZED, BUT DOESN’T GET SUGGESTED

DONE:

TODO:

Release plan