Meeting setup
- Date: 19.11.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10:15.
Present: Børre, Ilona, Per-Eric, Sjur, Thomas, Tomi
Absent: Risten, Trond
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar’s error markup in the xml files to (a copy of) the original
- fix Unicode bug in Hunspell conversion java code
- don’t know the reason for this one
- fix bug 550
- move Bugzilla to the G5.
- fix Windows CD installation bug
- discuss more parallel texts
- fix bugs!
Ilona
- lexicalise
smj
missing words
- Done.. at least most of it.
- other
smj
tasks, ask Thomas
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- Worked and still working. It will be ready this week, some strange words left. We have to make a new missng list to see which words are left.
- fix bugs!
Risten
- finish the design/text for the CD and the cover
- set up CD-printing printer
- try to burn a CD at SD
- get price and schedule for printed CD cover
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
(source info, used in testing or not, added to lexicon or not)
- add nested error markup to xml conversion
- almost finished, just some testing left
- discuss more parallel texts
Sjur
- work on the XML name editor/risten.no integration
- nothing - this will have to wait till Divvun2
- set up risten.no on the G5 again
- really tried last week, to help Trond with some work, but failed (eXist
couldn’t reliably restore files from a local backup, leaving the dictionary
and term collections incomplete and useless; and Forrest rejected to change
port despite explicit requests for it, and port-crashed with the existing
divvun.no site running off the same computer, this made the whole portal
dysfunctional)
- test new and nested error markup
- Saara tried getting it in place, but is not finished with her work, thus
nothing to test yest
- get command line hyphenator for automated testing of the hyph-lexicons
- add hyphenation testing
- improve paradigm testing
- fix bug 550
- follow-up support for the sami languages in OpenOffice.org
- done, they will be included in the next OOo release - 2.4
- fix Windows CD installation bug
- fix circularity issue in nonrec transducers
- Tomi did this on his own - great!
- fix bugs!
Thomas
sme->smj
lexicon conversion to build bilingual lexicon resources
- check for bad hyphenation
- look at test cases still not behaving properly
- paradigm testing
- fix bugs!
Tomi
- Hunspell lexicon conversion
- fix circularity issue in nonrec transducers
- fix bugs!
- other
Trond
sme->smj
lexicon conversion to build bilingual lexicon resources
- fix hyphenation of derivations, inflections
- telephone meeting with Sjur and the faroese group re faroese speller
- fix circularity issue in nonrec transducers
- discuss more parallel texts
- fix bugs!.
3. Documentation
Nothing new.
TODO:
4. Corpus infrastructure
Nothing.
5. Infrastructure
Bugzilla is up and running again.
TODO:
- add Jabber account in iChat (all)
6. Linguistics
North Sámi
TODO:
- fix hyphenation of derivations (Thomas, Tomi, Sjur, Trond)
- now tested, and much improved, but still needs improvements and further
investigation
- fix circularity issue (Sjur, Tomi, Trond)
Lule Sámi
TODO:
- lexicalise words from the Olavi missing list, but check against the pdf
original where in doubt (Per-Eric)
- almost finished - only parts of the letter r still missing
sme->smj
lexicon conversion to build bilingual lexicon resources, and
increase smj
coverage (Trond, Thomas, Svenne)
- look at missing baseforms (Thomas)
7. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur’s task.
Decisions made in Tromsø can be found in [this meeting
memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]
TODO:
- set up Tomcat and risten.no on the G5 again (Sjur, Børre)
- install risten.no
- really tried, but got problems
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
well) (the morphological section should be kept intact, in e.g.
propernoun-sme-morph.txt) (Sjur, Saara)
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
(Sjur, Tomi, Saara)
- implement data synchronisation between risten.no and
the cvs repo, and possibly other servers (ie the G5 as an alternative server
to the public risten.no - it might be faster and better suited than the
official one; also local installations could be treated the same way)
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
(e.g. @type=secondary) (Thomas, Maaren, linguists)
- merge placenames which are errouneously in different entries: e.g. Helsinki,
Helsingfors, Helsset (linguists)
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
(linguists)
Hunspell
TODO:
- Follow-up support for the sami languages in OpenOffice.org (Børre, Sjur)
- done, scheduled for version 2.4
- Hunspell lexicon conversion (Tomi, Børre)
- improved, nouns compile much improved, adjectives also converts, but with
errors, verbs are quite rough
Testing
Spelling Error Markup
This will wait till after the release.
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
added to lexicon or not) (Saara)
- move Steinar’s error markup in the xml files to (a copy of) the original
(Børre, Kimme)
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
TODO:
- add hyphenation testing (Sjur)
- improve paradigm testing report (Sjur)
MS Office
An important aspect of this testing is to document in the user guide anything
that could be a problem for users.
TODO:
- test the proofing tools with all MS Office applications (Børre, Thomas)
PLX conversion update: we will soon get an updated speller engine with flags for
marking word-initial, word-internal and word-final positions. That will make it
possible for us to resolve some outstanding issues in the PLX conversion.
Open issues based on test results:
smj
482 - still problematic (prefix), 484 - double hyphens suggested; 552 - now fixed!, Svierigadárogielan - still rejected (prefix)
sme
397 - double hyphens (name+name), 419, 425 - roman
number, 431 - does not accept the correct string, but DO suggest the same; also hyphen final forms are accepted, but not the same form when part of a compound, 452 - miel is a prefix, 461 - ovda accepted, almost 50 %
(17) gets correct suggestion, 489, 522, 524, Guovdageainnu-láđđi not accepted.
Guovdageaidnu-láđđi nom-
Guovdageainnu-láđđi gen-
It should be Guovdageaidnu-láđđi OR Guovdageainnu láđđi. The first one is
suggested + Guovdageainnutláđđi
Harstad-biila (nom) is ok, whereas gen. Harstada-biila is not, ie the
same pattern.
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
TODO:
- add hyphenation testing (Sjur)
Hyphenators
We should look into the possibility of generating pattern-based hyphenation for
OOo. It shouldn’t be too hard, or require too much work, but needs
investigation. => Divvun2.
TODO:
- get command line hyphenator (Sjur)
Release version
Schedule and tasks for the remaining weeks:
TODO:
- try to burn a CD at SD (Risten, Leif-Åge)
- done, it is working exactly as one burned on the Mac
- finish text to go on the CD cover (Risten)
- set up CD-printing printer (Risten)
- fix Windows CD installation bug (Sjur, Børre)
- get price and schedule for printed CD cover (Risten)
- fix remaining bugs - golden master by next Monday (all)
- finalise InDesign hyphenator (Sjur, Børre, Thomas)
- testing
- documentation
- installation
- update usage and installation documentation (Børre, Thomas, Sjur)
- translate all new documentation (all)
- QA all documentation (all)
- do as much hunsopell as possible (Børre, Tomi)
Actual release
December 11-13, one of these days.
Hotel rooms received for all except Ilona, will be received for her as well.
There will be a release party in the afternoon.
9. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
(Sjur)
Faroese
Speller for fao
using our infrastructure and the knowledge we have.
TODO:
- set up a telephone meeting with them and Sjur (Trond)
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
bug report, such that for each bug, we know exactly when it should have been
fixed, in what file(s) and what version.
69 open Divvun/Disamb bugs (35 of these 56 are speller-related bugs,
34 are other bugs), and 23 risten.no bugs
Dictionaries
TODO:
- eXist on G5 - done
- risten.no on G5 - homepage done, but needs port tweaking/changing
- risten.no XQuery framework - done
- XInclude conversion script - done
- homepage on the documetation pages - draft done
Parallel corpora
TODO:
- discuss more parallel texts (Børre, Saara, Trond)
SD yearly personell seminar
6.-7. December. Sjur has discussed it with Julia, and we won’t go there.
Software updates
- Leopard, 10.5
- Ilona - Doesn’t have a proper DVD yet.
- Trond
- Per-Eric
10. Next meeting, closing
The next meeting is 26.11.2007, 09:30 Norwegian time.
The meeting was closed at 11:34.
Appendix - task lists for the next week
Boerre
- move Steinar’s error markup in the xml files to (a copy of) the original
- fix bug 550
- fix Windows CD installation bug
- discuss more parallel texts
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Ilona
- lexicalise
smj
missing words
- other
smj
tasks, ask Thomas
- Buy the DVD for Leopard
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- derivations tests
- fix bugs!
Risten
- set up CD-printing printer
- get price and schedule for printed CD cover
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
(source info, used in testing or not, added to lexicon or not)
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- work on the XML name editor/risten.no integration
- set up risten.no on the G5 again
- test new and nested error markup
- improve hyphenation testing
- fix bug 550
- fix Windows CD installation bug
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Thomas
sme->smj
lexicon conversion to build bilingual lexicon resources
- check for bad hyphenation
- look at test cases still not behaving properly
- paradigm testing
- test the proofing tools with all MS Office applications
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Tomi
Trond
sme->smj
lexicon conversion to build bilingual lexicon resources
- telephone meeting with Sjur and the faroese group re faroese speller
- discuss more parallel texts
- fix bugs!.