Meeting setup
- Date: 10.12.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09:57.
Present: Børre, Ilona, Sjur, Thomas
Absent: Risten, Per-Eric, Trond, Tomi
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- fix bug 550
- finalise InDesign hyphenator
- update usage and installation documentation
- read the present documentation carefully, and see if there are missing parts
- documentation meeting on Wednesday with Sjur, Thomas
- 100 CD covers should be picked up in Tromsø
- new/updated front page (old front page to history page)
- press release
- fix bugs!
Ilona
- lexicalise
smj
missing words.
- Work with missing forms in
sme
typos.txt
- Not done yet, but working on it.
- Translation
Maaren
- lexicalise actio compounds
Per-Eric
- check some unusual words from the Olavi missing list which are still not
lexicalised
- derivations tests
- fix bugs!
Risten
- set up CD-printing printer
- test printer
- finish the cd cover and cd design
- Print 50 CDs, take them to Oslo as backup
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
(source info, used in testing or not, added to lexicon or not)
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- fix bug 550
- document Windows CD installation work-around
- finalise InDesign hyphenator
- done a lot, but no documentation, and the download package isn’t yet finished
- update usage and installation documentation
- done, not complete
- documentation meeting on Wednesday with Thomas, Børre
- analyse hyphenation test results
- done, with feedback from Polderland
- fix remaining bugs - golden master by end of this Monday
- all serious bugs fixed, some smaller ones left - GM declared on Friday
- new/updated front page (old front page to history page)
- press release
- Thomas started on this one
- fix bugs!
Thomas
sme->smj
lexicon conversion to build bilingual lexicon resources
- test hyphenation
- analyse hyphenation test results
- look at test cases still not behaving properly
- update usage and installation documentation
- done
- search Bugzilla for documentation issues in Divvun => add to docu
- done
- documentation meeting on Wednesday with Sjur, Børre
- done
- fix bugs!
Tomi
- Hunspell lexicon conversion
- fix remaining bugs - golden master by end of this Monday
- fix bugs!
Trond
sme->smj
lexicon conversion to build bilingual lexicon resources
- analyse hyphenation test results
- fix bugs!.
3. Documentation
Needs to be fixed by Wednesday.
TODO:
- fix bug 550
(Børre, Sjur)
- investigation and fix scheduled for Tuesday
4. Corpus infrastructure
Nothing.
5. Infrastructure
TODO:
- add Jabber account in iChat (all)
6. Linguistics
North Sámi
Hyphenation test results:
Correct:
--------
konseartaprográmma kon-sear-ta-pro-grám-ma
konseartaeahkedis kon-sear-ta-eah-ke-dis
Márkomeanu Már-ko-mea-nu
lávvardateahkeda láv-var-dat-eah-ke-da
konseartaeahkedis kon-sear-ta-eah-ke-dis
lávvardateahkeda láv-var-dat-eah-ke-da
servodatberošteaddji ser-vo-dat-be-roš-tead-dji
sámegillii sá-me-gil-lii
sátnegovat sát-ne-go-vat
morašluohti mo-raš-luoh-ti
Justislávdegoddi Jus-tis-láv-de-god-di
Sámedikkiin Sá-me-dik-kiin
olggosaddán olg-gos-ad-dán
luitet lui-tet
kristtalaš krist-ta-laš
orrun or-run
Lotnolasealáhusas Lot-no-las-ea-lá-hu-sas
juoiganjuristan juoi-gan-ju-ris-tan
issoras is-so-ras
suinna suin-na
dáinna dáin-na
Háliidivččen Há-lii-divč-čen
bisánivčče bi-sá-nivč-če
duostan duos-tan
Gárdegobba Gár-de-gob-ba
Čeakčačahca Čeak-ča-čah-ca
Gobba Gob-ba
Sáivagobba Sái-va-gob-ba
Missing hyph points (/):
------------------------
olgobáikkis ol-go/báik-kis
máilmmi má/ilm-mi
gaskaijabeaivváš gas-ka/i-ja-beaiv-váš
dehálaš de-há#laš
Incorrect (#):
--------------
lotnolasealáhussan lot-no-la#s/e#a/lá/hus-san
bearrašiin be#ar-ra/ši#in
PLX XFST code:
--------------
ol^go^báik^kis NIE
gas^kai^ja^beaiv^váš- NBOX
gas^kai^ja^beaiv^váš NIOE
gas^kai^ja^beaiv^váš- NIE
gas^kai^ja^beaiv^váš- GaIE
gas^kai^ja^beaiv^váš NBO
gas^kai^ja^beaiv^váš GaBO
bear^ra^šiin NIE
lot^no^la^sea^lá^hus^san NIE
TODO:
- test latest hyphenator (Sjur)
- analyse test results (Thomas, Sjur, Trond)
Lule Sámi
Hyphenation test results:
Correct:
--------
viessomguhkes vies-som-guh-kes
viessomvuohkáj vies-som-vuoh-káj
árvvovuodo árv-vo-vuo-do
väráltárbbe vä-rált-árb-be
åhpadusorganisásjåvnån åh-pa-dus-or-ga-ni-sá-sjåv-nån
häjmmadáfo häjm-ma-dá-fo
árbbedáhpe árb-be-dáh-pe
barggovuogijt barg-go-vuo-gijt
rijkadajva rij-ka-daj-va
ássjedåbdde ás-sje-dåbd-de
láhkaásadimesa láh-ka-á-sa-di-me-sa
giellalágajn giel-la-lá-gajn
sámegiellaj sá-me-giel-laj
buorrelágásj buor-re-lá-gásj
árbbedábálattjat árb-be-dá-bá-lat-tjat
láhkatæksta láh-ka-tæks-ta
guosski guoss-ki
rijkalattjat rij-ka-lat-tjat
organisásjåvnån or-ga-ni-sá-sjåv-nån
hábbmidime hább-mi-di-me
rábmakonvensjåvnå ráb-ma-kon-ven-sjåv-nå
árvvalasstet árv-va-lass-tet
ássjedåbddejuogos ás-sje-dåbd-de-juo-gos
unneplågogielajt un-nep-lå-go-gie-lajt
láhkaásadimesa láh-ka-á-sa-di-me-sa
biejvveavijsav biejv-ve-a-vij-sav
sámegiellaj sá-me-giel-laj
unneplågogielajn un-nep-lå-go-gie-lajn
ministarjuohkusis mi-nis-tar-juoh-ku-sis
guosski guoss-ki
Dussnagiehtje Duss-na-gieh-tje
Divtasvuodna Div-tas-vuod-na
Gåhpejávrre Gåh-pe-jávr-re
Gásluokta Gás-luok-ta
Helmukvuodna Hel-muk-vuod-na
Ibboluokta Ib-bo-luok-ta
Julevædno Ju-lev-æd-no
Jåhkmåhkke Jåhk-måhk-ke
Jåkmåhkke Jåk-måhk-ke
Jåhkåmåhkke Jåh-kå-måhk-ke
Missing hyph points (/):
------------------------
Jienastim#njuolgadusá Jie/nas/tim#njuol/ga/du/sá (# not removed from input)
sierralágásj sier-ra/lá/gásj
orgánajs or-gá/najs
Orgánajs Or-gá/najs
Incorrect (#):
--------------
javllamáno ja#vl-la/má/no
biejvveávijsav bie#jv/ve/á-vij/s#av
suomagiella su#o-ma-giel-la
sierraláhkáj sier-ra#láh-káj
PLX XFST code:
--------------
javl^la#má^no NIE
javl^la#má^no GaIOE
or^gá^najs NIE
suo^ma#giel^la- NIE
suo^ma#giel^la- NBOX
suo^ma#giel^la NIOE
suo^ma#giel^la NBO
TODO:
sme->smj
lexicon conversion to build bilingual lexicon resources, and
increase smj
coverage (Trond, Thomas, Svenne). Add the words.
- test hyphenation (Sjur, Thomas)
7. Name lexicon infrastructure
Delayed till Divvun2 (or after release of Divvun1).
Decisions made in Tromsø can be found in [this meeting
memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
well) (the morphological section should be kept intact, in e.g.
propernoun-sme-morph.txt) (Sjur, Saara)
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
(Sjur, Tomi, Saara)
- implement data synchronisation between risten.no and
the cvs repo, and possibly other servers (ie the G5 as an alternative server
to the public risten.no - it might be faster and better suited than the
official one; also local installations could be treated the same way)
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
(e.g. @type=secondary) (Thomas, Maaren, linguists)
- merge placenames which are errouneously in different entries: e.g. Helsinki,
Helsingfors, Helsset (linguists)
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
(linguists)
Hunspell
Continuously improving.
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
Testing
Spelling Error Markup
This will wait till after the release.
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
added to lexicon or not) (Saara)
- move Steinar’s error markup in the xml files to (a copy of) the original
(Børre, Kimme)
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
TODO:
- improve hyphenation testing (Sjur)
MS Office
Windows was done = ok, Ilona has tested PowerPoint and Word on Mac, and Trond
tested Excel (you have to turn on correction, otherwise it will give you English
spell checking, it is the same for all languages). Entourage is still untested.
TODO:
- test the proofing tools with all MS Office applications (Børre, Thomas)
- all but Entourage done (Kimme, Børre)
- add Excel behaviour to the documentation (Trond, Sjur)
Open issues based on test results:
sámi-dáru - not accepted => Gen+hyph compound, is not allowed with hyphen. We can allow such compounds without too much overgeneration by adding the hyphen to the last part, ie -dáru in the PLX entry. => Bugzilla as feature request.
smj
- 419 - flagged, 1. sugg same as input => PLD bug
- 482 - fixed
- 484 - fixed
- 518 - regression - Fuoskok = pl+clitic as well as derivation = won’t fix
- stuorFuosskok, StuorFuosskok suggested, should not be (neither accepted)
- 575 - name+name = double hyphens in sugg
- 584 - input and sugg is the same
- 589 - not fixed, sent to Polderland
- Svierigadárogielan - fixed!
sme
- 397 - fixed
- 425 - roman number - will not be fixed in 1.0 release
- 431 - fixed
- 461 - ovda fixed
- 489 - fixed
- 518 - regression - plural same as derivation, won’t fix
- stuorGuovdageainnut suggested, should not be
- Stuoraguovdageainnut flagged and suggested => PLD
- 588 - regression - r. accepted as final part
- 592 - cases of caritives on -heapmi
- 397 - Guovdageaidnu-láđđi - fixed
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
- check that the
smj
R lexicon is identical to sme
(Thomas)
Almost finished, found a bug (reported to Polderland).
TODO:
- improve hyphenation testing (Sjur)
- upload (Sjur)
- documentation (Børre, Sjur)
Hyphenators
Testing! - Done, see above.
Final release
Schedule and tasks for the remaining weeks:
This week:
- Monday - fix remaining speller bugs
- Tuesday - get Polderland updates/fixes, fix bug 550, documentation fixed
- Wednesday - finish documentation, start translation
- Thursday - finish translation, spell check and QA, travel plans ready
- Friday - ready to burn/release; press release
Next week:
- Monday - regular meeting, sum up & check
- Tuesday - we go to Oslo in the morning, meeting and preparations in the
evening, our own celebration / dinner
- Wednesday:
- 10 AM: ceremony, official delivery to SD president, potentially also minister
- 11 AM: official lunch, SD invites the minister for lunch
- after lunch - finished, we go home
Hotel is booked, you have to arrange the travelling to Oslo yourself (but make
sure the travel agency sends the bill to SD).
TODO:
- set up CD-printing printer (Risten, Leif Åge)
- fix Windows CD installation bug (Sjur, Børre)
- work-around should be documented (Sjur)
- 100 covers will be picked up in Tromsø (Børre)
- Print 50 CDs, take them to Oslo as backup (Risten, Julie)
- fix remaining bugs - golden master by end of this Monday (Tomi, Sjur)
- finalise InDesign hyphenator (Sjur, Børre)
- documentation
- not yet finished - after 1.0 release
- update usage and installation documentation (Børre, Thomas, Sjur)
- search Bugzilla for documentation issues in Divvun => add to docu
(Thomas)
- read the present documentation carefully, and see if there are missing parts
=> fill them in (Børre)
- documentation meeting on Wednesday (Sjur, Børre, Thomas)
- new/updated front page (old front page to history page) (Sjur, Børre)
- press release (Sjur, Børre)
- translate all new documentation (all)
- QA all documentation (all)
- do as much hunspell as possible (Børre, Tomi)
9. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
(Sjur)
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
bug report, such that for each bug, we know exactly when it should have been
fixed, in what file(s) and what version.
83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs,
38 are other bugs), and 23 risten.no bugs
10. Next meeting, closing
The next meeting is 10.12.2007, 09:30 Norwegian time.
The meeting was closed at 10:09.
Appendix - task lists for the next week
Boerre
- fix bug 550
- finalise InDesign hyphenator
- new/updated front page (old front page to history page)
- fix bugs!
Ilona
- lexicalise
smj
missing words.
- Help Trond with the
smj
dictionary.
- Translation
Maaren
- lexicalise actio compounds
Per-Eric
- check some unusual words from the Olavi missing list which are still not
lexicalised
- derivations tests
- fix bugs!
Risten
- Print 50 CDs, take them to Oslo as backup
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
(source info, used in testing or not, added to lexicon or not)
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- document Windows CD installation work-around
- finalise InDesign hyphenator
- update usage and installation documentation
- new/updated front page (old front page to history page)
- press release
- fix bugs!
Thomas
sme->smj
lexicon conversion to build bilingual lexicon resources
- test hyphenation
- analyse hyphenation test results
- look at test cases still not behaving properly
- fix bugs!
Tomi
Trond
sme->smj
lexicon conversion to build bilingual lexicon resources
- analyse hyphenation test results
- fix bugs!.