Language Technology at UiT The Arctic University of Norway

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from last week
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 09:46.

Present: Saara, Sjur, Thomas, Tomi, Trond

Absent: Børre, Maaren

Agenda accepted with additions to Other.

2. Updated task status since last meeting

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

3. Documentation

TODO:

4. Corpus gathering

Sjur talked to Pia in Alta about the sma bible texts, and she will send us the texts she has. The inclusion in the corpus must be accepted by Bibelselskapet, which has already done so for smj and sme (with a bound license).

TODO:

5. Corpus infrastructure

Aligner

Problems with the aligner, Saara has talked to Børre, who will look into it. Saara has also got the uplug with hunalign aligner, a command line aligner. Earlier it did not support an anchor list, but the latest version does. It has arrived a bit late for the Disamb project, though.

TODO:

6. Infrastructure

Xerox tools wrapped as servers

Vislcg can’t be included in the server wrapping code, it does not support the needed interactive operation mode. Another solution needs to be found.

TODO:

7. Linguistics

Names and multilinguality

TODO:

  1. finish first version of the editing (Sjur)
  2. test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
  3. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  4. convert propernoun-($lang)-lex.txt to a derived file from common xml files (Sjur, Tomi, Saara)
  5. start to use the xml file as source file
  6. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  7. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  8. publish the name lexicon on risten.no (Sjur)
  9. add missing parallel names for placenames (linguists)
  10. add informative links between first names like Niillas and Nils (linguists)

North Sámi

Nothing this week.

Lule Sámi

TODO:

8. Name lexicon infrastructure

Decided in Tromsø:

Details can be found in [the meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

TODO:

Postponed:

9. Spellers

Polderland data generation

TODO:

Aspell

TODO when the major part of the PLX conversion is done:

Testing

TODO:

10. Other

Corpus contracts

TODO:

Bug fixing

56 open Divvun/Disamb bugs, and 23 risten.no bugs

Guess: 1/3 of the bugs are fixed already (?)

Task lists as iCal entries

TODO:

New Perl modules

The rewritten preprocessor depends on a few new Perl modules. Saara has sent installation instructions to all, and will write some documentation on it. The documentation should be further augmented by Børre.

TODO:

11. Next meeting, closing

The next meeting is 18.12.2006, 09:30 Norwegian time.

The meeting was closed at 11:06.

Appendix - task lists for the next week

Boerre

Maaren

Saara

Sjur

Thomas

Tomi

Trond