Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from two weeks ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10:03.

Present: Børre, Saara, Sjur, Thomas, Tomi, Trond

Absent: Maaren

Agenda accepted as is.

2. Updated task status since last meeting

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

3. Documentation

TODO:

4. Corpus gathering

Contacted Bård Eriksen regarding smj texts. He would like to help us collect texts, but needs to charge for the overhead involved.

Børre has gathered phone numbers, will phone writers and then send letters and contracts.

Some authors run their own publishing house, they should be contacted (Lene has an overview).

NSI: Børre to contact the IT consultant with an offer for help on relevant issues.

Authors: parallell work: Contact authors while waiting for Publishers, not after.

TODO:

5. Corpus infrastructure

General

TODO:

User accounts and access

For details, see a previous meeting memo, as well as the memo from a [dedicated meeting|/infra/corpus_policy.html].

Shell access

TODO:

Some script (location) discussion: we have two different locations in CVS for “tool scripts”:

gt/src/see-tools/  - source files for tools to be installed
gt/script/emacs/   - ready-to-use scripts (but might still need installation)

Conclusion: This is fine for the time being.

Web browser access

TODO:

More texts to the graphical corpus interface:

TODO:

Aligner

TODO:

Language recognition

TODO:

6. Infrastructure

Xerox tools wrapped as servers

TODO:

Hyphenator

TODO:

Automatic Bugzilla reminder for untouched bugs

TODO:

M4

Setup and infra finished. Now we are ready to start using M4.

TODO:

7. Linguistics

Derivation and spellers like Aspell

Semantic double-tagging of names

TODO:

North Sámi

Issue: Lene has found quite a few non-words as base forms in the verb lexicon. This could be due to contaminated input when Trond added the verbs to the analyser back in 2001. We will have to check whether they are marked XXX, and go through all the XXX cases in our source files (408 sme XXX, 128 smj XXX).

This issue is linked to the question of whether our lexicon files should be proofread and marked for !SUB, outright error removal, etc.. This could be a tough start-job for linguists joining the project.

Nothing this week, but see above re: derivations.

TODO:

Lule Sámi

TODO:

8. Name lexicon infrastructure

Decided in Tromsø:

Details can be found in [the meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

TODO:

9. Tromsø meeting follow-up

TODO:

10. Other

Bug fixing

43 open Divvun/Disamb bugs (two down!), and 25 risten.no bugs

Guess: 1/3 of the bugs are fixed already (?)

Gobby

TODO:

Compilation on victorio

Hypothesis: closed-sme-lex.txt is broken Problem: The file compiles on the other computers, hence it is not easy to see what to eventually look for in the closed file.

TODO:

Meetings and Marratech

TODO:

Task lists as iCal entries

TODO:

11. Next meeting, closing

Next meeting 18.9.2006 at 9:30.

Closed at 11:52.

Appendix - task lists for the next week

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond