Language Technology at UiT The Arctic University of Norway

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from last week
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10:12. Part two of the meeting opened at 14:45.

Present: Børre, Maaren, Sjur, Steinar, Thomas

Absent: Saara, Tomi, Trond

Agenda accepted as is.

2. Updated task status since last meeting

Børre

Maaren

Saara

Sjur

Steinar

Thomas

Tomi

Trond

3. Documentation

Børre updated the front page, and Steinar started to test the quality of our documentation.

TODO:

4. Corpus gathering

TODO:

5. Corpus infrastructure

Alignment

TODO:

Conversion issues

TODO:

6. Infrastructure

Steinar started the QA, cf above.

TODO:

7. Linguistics

North Sámi

Maaren is now working on lexicalising the actio compounds.

TODO:

Numbers:

TODO:

Lule Sámi

Number inflection and analysis is now fixed and working, thanks to Thomas.

TODO:

8. Name lexicon infrastructure

Decisions made in Tromsø can be found in [the meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

Postponed:

TODO:

  1. restructure interface code for easier maintenance, coding and use
    1. done, except a glitch in Opera (won’t fix for the time being)
  2. finish first version of the editing (Sjur)
  3. test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
  4. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  5. convert propernoun-($lang)-lex.txt to a derived file from common xml files (Sjur, Tomi, Saara)
  6. start to use the xml file as source file
  7. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  8. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  9. publish the name lexicon on risten.no (Sjur)
  10. add missing parallel names for placenames (linguists)
  11. add informative links between first names like Niillas and Nils (linguists)

9. Spellers

Polderland data generation

Polderland has now given instructions for how to handle genitive compounding and proper nound prefixing. We’ll need to decide how to specify nouns that should take genitive first parts when compounding.

TODO:

  1. add smj to PLX conversion (Børre, Tomi)
    1. done
  2. send smj PLX data to Polderland (Børre, Tomi)
  3. decide how to specify nouns requiring genitive first parts (Sjur, Thomas, Trond)
  4. Include numerals in the speller (Børre, Tomi)
    1. first version done, but needs more work
  5. add prefixes to the PLX (Børre, Tomi)
    1. still not
  6. add derivations to the PLX generation (Børre, Tomi)
    1. next after numbers are fixed

OOo speller(s)

TODO when the major part of the PLX conversion is done:

Testing

TODO:

Localisation

TODO:

10. Other

Corpus contracts

TODO:

Bug fixing

59 open Divvun/Disamb bugs, and 23 risten.no bugs

Moving G5

The G5 could now be moved out of Børre’s office, to the basement where the present web server is located. That will reduce the noise level in the office quite substantially:-)

TODO:

11. Next meeting, closing

The next meeting is 12.2.2007, 09:30 Norwegian time.

The meeting was closed at 10:43.

Appendix - task lists for the next week

Boerre

Maaren

Saara

Sjur

Steinar

Thomas

Tomi

Trond