Meeting setup

Date: 26.11.2007
Time: 09.30 Norw. time
Place: Internet
Tools: SubEthaEdit, iChat/Skype

Agenda

Opening, agenda review
Reviewing the task list from last week
Documentation - divvun.no
Corpus gathering
Corpus infrastructure
Infrastructure
Linguistics
name lexicon infrastructure
Spellers
Other issues
Summary, task lists
Closing

1. Opening, agenda review, participants

Opened at 10:15.

Present: Børre, Ilona, Per-Eric, Sjur, Thomas, Tomi

Absent: Risten, Trond

Agenda accepted as is.

2. Updated task status since last meeting

Børre

move Steinar’s error markup in the xml files to (a copy of) the original
- not done
fix bug 550
- not done
fix Windows CD installation bug
- not done
discuss more parallel texts
- not done
finalise InDesign hyphenator
- not done
update usage and installation documentation
- not done
fix bugs!
- not done
Other:
- Continued work on hunspell
- Updated OS X on xserve, added RAM.
- Made new logos

Ilona

other smj tasks, ask Thomas
Buy the DVD for Leopard
- Bought, and downloaded Leopard disk image to the computer. Have to burn it to the DVD and then install it on the computer. Will try to do it today.
other tasks:
- Done some testing in sme speller and reported it to Thomas

Maaren

lexicalise actio compounds

Per-Eric

lexicalise words from the Olavi missing list
- Done
derivations tests
- Done some
fix bugs!
- Not done anything this week

Risten

set up CD-printing printer
- on its way - ordered
get price and schedule for printed CD cover
- done

Saara

add new XSL/XML headers for proofing test docs
Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not)
add nested error markup to xml conversion
discuss more parallel texts

Sjur

work on the XML name editor/risten.no integration
- nothing
set up risten.no on the G5 again
- made it, although Trond reports problems
test new and nested error markup
- later
improve hyphenation testing
- nothing improved
fix bug 550
- not done
fix Windows CD installation bug
- tried, but it’s not working. Work-around should be documented
finalise InDesign hyphenator
- nothing done yet
update usage and installation documentation
- not yet
fix bugs!
other:
- several improvements to the CD creation process
- worked on the CD cover (proofreading many times)
- faroese speller telephone meeting

Thomas

sme->smj lexicon conversion to build bilingual lexicon resources
- not anything this week
check for bad hyphenation
- worked a little
look at test cases still not behaving properly
- worked a little
paradigm testing
- done some
test the proofing tools with all MS Office applications
- done
finalise InDesign hyphenator
- not done
update usage and installation documentation
- not done
fix bugs!
- worked

Tomi

Hunspell lexicon conversion
- not done
fix bugs!
- fixed bugs

Trond

sme->smj lexicon conversion to build bilingual lexicon resources
- Worked on this, refined, work is on schedule
telephone meeting with Sjur and the faroese group re faroese speller
- Done
discuss more parallel texts
- Worked on this.
fix bugs!.

3. Documentation

Nothing new.

TODO:

fix bug 550 (Børre, Sjur)

4. Corpus infrastructure

Nothing.

5. Infrastructure

TODO:

add Jabber account in iChat (all)

6. Linguistics

North Sámi

Hyphenation is better, but still contains a lot of errors. Sjur will run the latest hyphenator on our test material, and discuss the test results with the rest.

TODO:

test latest hyphenator (Sjur)
analyse test results (Thomas, Sjur, Trond)

Lule Sámi

Trond and his team have found words to be added to the smj lexicon.

cat smesmj.txt | grep -v 'prop$' | cut -f2 | lookup -flags mbTT -utf8 ~/gt/smj/bin/smj.fst | grep '\?' | l

6581 words in the smesmj.txt lexicon. Disregarding the proper nouns, 1824 are not recognised by smj-norm.fst or by smj.fst. Many of these are loan words or they are derivations. Some examples:

čála    tjála   n
giehtačála      giehtatjála     n
vuolláičála     vuollájtjála    n <= :-)
vinjučála       vinjotjála      n
johtučála       jåhtotjála      n
čuokkisčála     tjuokkestjála   n
bajildusčála    bajeldustjála   n
mála    mála    n
tjála   čála    n
giehtatjála     giehtačála      n <=== :-((
vuollájtjála    vuolláičála     n
tjuokkestjála   čuokkisčála     n
vinjotjála      vinjučála       n
jåhtotjála      johtučála       n
leapma  liebma  n

ja/dahje        ja/dahje        +?
gobba   gobba   +?
gaiba   gaiba   +?
struhcca        struhcca        +?
fáhcca  fáhcca  +?
suorbmafáhcca   suorbmafáhcca   +?
vahca   vahca   +?
ohca    ohca    +?
juhca   juhca   +?

It seems to be a mixup of smj and sme in the material. That has to be cleaned up.

We have to test hyphenation for lulesami as well.

TODO:

lexicalise words from the Olavi missing list, but check against the pdf original where in doubt (Per-Eric)
- done
sme->smj lexicon conversion to build bilingual lexicon resources, and increase smj coverage (Trond, Thomas, Svenne). Add the words.
test hyphenation (Sjur, Thomas)

7. Name lexicon infrastructure

Sjur got risten.no up and running on the G5. Worked only for him, though.

Decisions made in Tromsø can be found in [this meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

TODO:

set up Tomcat and risten.no on the G5 again (Sjur, Børre)
1. install risten.no
  1. did it
fix bugs in lexc2xml; add comments to the log element (Saara)
finish first version of the editing (Sjur)
test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
convert propernoun-($lang)-lex.txt to a derived file from common xml files (Sjur, Tomi, Saara)
implement data synchronisation between risten.no and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way)
start to use the xml file as source file
clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
publish the name lexicon on risten.no (Sjur)
add missing parallel names for placenames (linguists)
add informative links between first names like Niillas and Nils (linguists)

8. Proofing tools

Hunspell

Continuously improving.

TODO:

Hunspell lexicon conversion (Tomi, Børre)

Testing

Spelling Error Markup

This will wait till after the release.

TODO:

Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (Saara)
move Steinar’s error markup in the xml files to (a copy of) the original (Børre, Kimme)
add nested error markup to xml conversion (Saara)
test new and nested error markup (Sjur)

Automated testing

TODO:

improve hyphenation testing (Sjur)
- not done yet

MS Office

An important aspect of this testing is to document in the user guide anything that could be a problem for users.

TODO:

test the proofing tools with all MS Office applications (Børre, Thomas)
- Thomas has tested all Windows apps - they all work fine with our tools

Lexicon conversion to the PLX format

Open issues based on test results:

smj

482 - still problematic (prefix), 484 - double hyphens suggested, 575 - name+name = double hyphens in sugg, Svierigadárogielan - still rejected (prefix)

sme

397 - double hyphens (name+name), 419 - fixed, 425 - roman number, 431 - does not accept the correct string, but DO suggest the same; also hyphen final forms are accepted, but not the same form when part of a compound, 452 - fixed, 461 - ovda accepted, almost 50 % (17) gets correct suggestion, 489, 522 - fixed, 524 - fixed, Guovdageainnu-láđđi not accepted.

TODO:

look at test cases still not behaving properly (Thomas, Tomi)
check that the smj R lexicon is identical to sme (Thomas)

InDesign tools

TODO:

improve hyphenation testing (Sjur)

Hyphenators

Testing!!!

Release version

Schedule and tasks for the remaining weeks:

TODO:

set up CD-printing printer (Risten, Leif Åge)
fix Windows CD installation bug (Sjur, Børre)
- put on hold - work-around should be documented
get price and schedule for printed CD cover (Risten)
- done: 3980,- + 900,- + VAT for 1000 covers.
- 8 days production time
- 100 covers will be picked up in Tromsø (Børre)
- Print 50 CDs, take them to Oslo (Risten, Julie)
- Burn the CDs in Oslo (Sjur)
fix remaining bugs - golden master by end of this Monday (all)
finalise InDesign hyphenator (Sjur, Børre, Thomas)
- testing = hyphenation testing
- documentation
- installation
update usage and installation documentation (Børre, Thomas, Sjur)
translate all new documentation (all)
QA all documentation (all)
do as much hunsopell as possible (Børre, Tomi)

Actual release

December 12 is the most likely date, before 12:00. Still to be confirmed.

There will be a release party in the afternoon.

9. Other

Corpus contracts

Delayed till after final release.

TODO:

publish corpus contracts and project infra as open-source on NoDaLi-sta (Sjur)

Bug fixing

When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version.

83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs, 38 are other bugs), and 23 risten.no bugs

Software updates

Leopard, 10.5
- Ilona - Will have it tomorrow or at latest on Wednesday. Probably needs help with installing.
- Per-Eric - ready for updating tomorrow - Børre will help

10. Next meeting, closing

The next meeting is 03.12.2007, 09:30 Norwegian time.

The meeting was closed at 11:42.

Appendix - task lists for the next week

Boerre

move Steinar’s error markup in the xml files to (a copy of) the original
fix bug 550
finalise InDesign hyphenator
update usage and installation documentation
fix bugs!

Ilona

lexicalise smj missing words.
Help Trond with the smj dictionary.
Install Leopard

Maaren

lexicalise actio compounds

Per-Eric

check some unusual words from the Olavi missing list which are still not lexicalised
derivations tests
Install Leopard
fix bugs!

Risten

set up CD-printing printer
test printer
finish the cd cover and cd design

Saara

add new XSL/XML headers for proofing test docs
Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not)
add nested error markup to xml conversion
discuss more parallel texts

Sjur

fix bug 550
document Windows CD installation work-around
finalise InDesign hyphenator
update usage and installation documentation
test latest hyphenator
analyse hyphenation test results
fix bugs!

Thomas

sme->smj lexicon conversion to build bilingual lexicon resources
test hyphenation
analyse hyphenation test results
look at test cases still not behaving properly
paradigm testing
finalise InDesign hyphenator
update usage and installation documentation
check that the smj R lexicon is identical to sme
fix bugs!

Tomi

Hunspell lexicon conversion
fix bugs!

Trond

sme->smj lexicon conversion to build bilingual lexicon resources
test hyphenation
analyse hyphenation test results
fix bugs!.

Meeting setup

Agenda

1. Opening, agenda review, participants

2. Updated task status since last meeting

Børre

Ilona

Maaren

Per-Eric

Risten

Saara

Sjur

Thomas

Tomi

Trond

3. Documentation

4. Corpus infrastructure

5. Infrastructure

6. Linguistics

North Sámi

Lule Sámi

7. Name lexicon infrastructure

8. Proofing tools

Hunspell

Testing

Spelling Error Markup

Automated testing

MS Office

Lexicon conversion to the PLX format

smj

sme

InDesign tools

Hyphenators

Release version

Actual release

9. Other

Corpus contracts

Bug fixing

Software updates

10. Next meeting, closing

Appendix - task lists for the next week

Boerre

Ilona

Maaren

Per-Eric

Risten

Saara

Sjur

Thomas

Tomi

Trond

Sitemap