Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from last week
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10:10.

Present: Børre, Sjur, Steinar, Thomas, Tomi, Trond

Absent: Maaren, Saara

Agenda accepted as is.

2. Updated task status since last meeting

Børre

Maaren

Saara

Sjur

Steinar

Thomas

Tomi

Trond

3. Documentation

Nothing done last week.

TODO:

4. Corpus gathering

Nothing new. We need to work systematically on filling our corpus holes, although not this and the next month.

TODO:

5. Corpus infrastructure

Alignment

TODO:

Conversion issues

TODO:

6. Infrastructure

Nothing happened last week.

TODO:

7. Linguistics

North Sámi

Maaren is now working on lexicalising the actio compounds.

TODO:

Numbers:

TODO:

Hyphenation problem

TODO:

Lule Sámi

TODO:

8. Name lexicon infrastructure

Decisions made in Tromsø can be found in [the meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

Postponed:

TODO:

  1. restructure interface code for easier maintenance, coding and use
    1. well under way, still some work
  2. finish first version of the editing (Sjur)
  3. test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
  4. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  5. convert propernoun-($lang)-lex.txt to a derived file from common xml files (Sjur, Tomi, Saara)
  6. start to use the xml file as source file
  7. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  8. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  9. publish the name lexicon on risten.no (Sjur)
  10. add missing parallel names for placenames (linguists)
  11. add informative links between first names like Niillas and Nils (linguists)

9. Spellers

Polderland data generation

TODO:

  1. add smj to PLX conversion (Børre, Tomi)
  2. Include numerals in the speller (Børre, Tomi)
    1. first version done, but needs more work
  3. add prefixes to the PLX (Børre, Tomi)
    1. not yet
  4. add derivations to the PLX generation (Børre, Tomi)

Aspell

TODO when the major part of the PLX conversion is done:

Testing

TODO:

Localisation

TODO:

10. Other

Corpus contracts

TODO:

Bug fixing

57 open Divvun/Disamb bugs, and 23 risten.no bugs

KUNSTI final meeting

Conference invitation can be found here.

http://tinyurl.com/326lfy

8.-9. February (Thursday & Friday), Oslo. Thomas could present the morphological work, if he wants to.

11. Next meeting, closing

The next meeting is 5.2.2007, 09:30 Norwegian time.

The meeting was closed at 11:01.

Appendix - task lists for the next week

Boerre

Maaren

Saara

Sjur

Steinar

Thomas

Tomi

Trond