Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Speller meeting

What is needed for a new release:

Installer

Nothing new since the previous meeting.

There’s a separate document containing the test results for different combinations of MS Office and Windows versions.

DONE:

Z'KpbtLFg=q~u8wq9[\[]6fSpeller>o?4P8a4R~=zb`Eohjn9!

TODO:

POSTPONED:

PLX conversion

Nothing new since last meeting 14.12. Tomi has made a new speller, but the test results were identical to the previous version.

Remaining PLX bugs:

Second PRI

REGRESSIONS    
doesn’t follow cmp-tags OR vowel-shortening searvvepresideanta > searvepresideanta 489
doesn’t follow cmp-tags OR vowel-shortening sámediggepresideanta Sámediggeáirrasin 489,535,604,639
single letter suggestions đ 461
alph+nouns not rec a-muorra 785,818
double hyph-sugg SF–muorra 818 - LexC bug, cf:
SF ACRO ;
SF- ACRO ;
SF-%012:SF-%012 ACRO ;
Fixed regressions      
noun+Pro Num+Noun/Prop wo hyph máliSoussiid, guovttiolbmo 397,461,642,721,804,805 FIXED
noun+Pro Num+Noun/Prop wo hyph uvdnaLot, muorraNRK 595,649,805 FIXED
doesn’t follow cmp-tags ránubiellu > rátnobiellu beavddeguorra > beavdeguorra 489,535,539,604 FIXED
prop+noun not rec Finnmárku-duoddara 611,633 FIXED
prop+noun not rec Koskivuori-plánenreaiddut 633 FIXED
non-ex word accepted loahpet, duvnnii, njealjat 909,962 FIXED
non-ex word accepted adnii 1143 FIXED
compound not recognized maŋŋegeašgálvu, lámpočuvggodeapmi 408,419,451,489,522,535,536,541 FIXED
double hyph-sugg SF–ákkasteapmifierbmi 536 FIXED
Px-forms make comp muorrastávrátgeavaheapmi, muorrastávrádegeavaheapmi 786 FIXED
alphabet as non-first compound part & suggested CV-s 913 FIXED
non-words accepted váigas 581,912 FIXED
prop-noun cmps doesnt work Oslo-biila, Pieski-lávvu 397,426,593,609,611,633,649,802,930 FIXED
prop-prop cmps doesnt work Børde-Rene 575,634 FIXED
prop-acr cmps doesnt work Seskarö-cd 805 FIXED
noun-hyphnoun cmps doesnt work juleva-gielas 629 FIXED
Px-forms make comp muorrastávrátgeavaheapmi, muorrastávrádegeavaheapmi 786 FIXED
alphabet as non-first compound part & suggested CV-s 913 FIXED
imposs” cmps sákkasteapmifierbmi > ásaákkasteapmifierbmi 536 FIXED

TODO - ALT.1:

TODO - ALT.2:

We decided to follow Alternative 1, but regard Alternative 2 as an option after a release based on Alternative 1.

Second Last PRI

Bugs from here on can be left out of the next release if we are short on time.

Compounds    
num cmp:s on 0- 051-nummarat 631
non-ex. word accepted saame 658

Last PRI

  | Capitals | — | doesn’t understand caps | 1700-LOHKU | 647


Compound regressions      
imposs” cmps along w num. 0-geažideapmigárvu (geažideapmigárvu is impossible) 536,1145 NO SUGGESTIONS - GOOD - BUT:
imposs” cmps sákkasteapmifierbmi > aseákkasteapmifierbmi etc 536  

DONE:

TODO:

Release plan

The December 1 goal has passed, without meeting the targets. On the plus side is that the number of open PLX bugs have been greatly reduced, and Tomi are squashinhg PLX bugs all the time. It just takes more time than anticipated.

The installer was not easily solved by the WiX alternative - it turned out that the Greenlandic proofing tools installer has the same problems as we have.

Release status

We now have a speller with only a few known bugs, and most of them have fixes. By tomorrow we should have a speller with only one disturbing bug (the sámediggepresideantta bug). This is our release candiate! This is good enough, and contains a number of important linguistic updates for our users.

TODO:

  1. build a new speller with the remaining bug fixes (Tomi)
  2. tag the present state with the string Divvun2.3RC1 (Tomi)
  3. update the adjectives with the noun changes (Tomi)
  4. make new installers with latest SME speller (Sjur)
  5. test newest speller in Word (Thomas)
  6. test against gold standard corpus (Sjur)
  7. release a public release candidate tomorrow morning (Sjur)
  8. if no new serious bugs are found, release as Divvun 2.3 tomorrow afternoon (all)
    1. update list of known bugs
    2. write a short press release emphasising the linguistic updates for North Sámi, and noting that we still have certain problems with Win7/Office2010

Re-scheduling the release plan:

Next meeting

Friday 21.12 at 10.00