Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from two weeks ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10:11.

Present: Børre, Thomas, Tomi, Trond, Saara

Absent: Maaren (sick leave), Sjur (paternal leave) Main secretary: Trond

Agenda accepted with additions under “Other”.

2. Reviewing the task list from the last meeting

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

3. Documentation

TODO:

4. Corpus gathering

Trip to Sámi municipalities and ,,,,

Børre is going to a trip to Karasjok, Tana, Nesseby, Utsjoki and Inari, to visit the municipalities, parliaments and Sámi Council. In Karasjok he will try to visit Davvi Girji as well.

Collecting

See a previous meeting memo for what’s to be done.

TODO: Send out the rest of the letters (Børre)

Odin

Sæth replied by e-mail, hasn’t had time to follow-up, but will try to include us in their plans.

Olavi Korhonen’s Lule Sámi dictionary.

TODO: Børre to contact Olavi Korhonen and Kuhmunen

KIO Grafisk and the Iđut books

TODO:

Bible texts

We will get text from Finland, but still haven’t received any. We have got the Swedish text from Sweden. As for the last html versions from Norway, Trond has not contacted them last week.

Swedish html has arrived, no paratext. Norsk bibelselskap has not sent corrected New Testament versions for sme, and not paratext for nno/nob.

TODO:

Min Áigi

Everything ok here.

TODO:

Kåfjord

Promised to send us texts, but nothing has arrived yet.

TODOBørre to contact them.

Sámi Instituhtta

Audhild Schanche has signed the contract. We will have to contact them about transferring the texts.

TODOBørre to contact them.

5. Corpus infrastructure

https://giellalt.uit.no/lang/corp/corpus-summary.html

TODO:

Changes and updates because of the Divvun public tender

User account admin and infra: see [previous memo|/admin/weekly/2006/Meeting_2006-03-06.html].

TODO: see above under Documentation.

Automatic build of the content of our corpus repo: also see [previous memo|/admin/weekly/2006/Meeting_2006-03-06.html].

TODO:

Free and non-free texts

More info in a [previous meeting memo.|/admin/weekly/2006/Meeting_2006-03-13.html]

TODO:

Linking parallel files

DECISION: We’ll keep the original filename, and store linking info in the header (has to be added manually).

TODO:

More texts to the graphical corpus interface:

TODO:

Top-two priorities:

  1. Trond and Saara to discuss with Lars.
  2. Lars to add text to the server.
  3. Tomi to prepare for the parallel corpus.

Language recognition

TODO:

some flag to write into the xsl file:

TODO: Saara to implement this and to write a short documentation on how to write in the appropriate commands in the file-specifix xsl documents.

6. Infrastructure

Aligner

Today, we have two anchor files in addition to the original one.

TODO:

Hyphenator

Trond and Thomas have been updating the propernoun file with ^ tags. We need the tag in front of compound parts beginning in a vowel or in two or more consonants. Compound parts beginning with one consonant are handled correctly.

TODO:

7. Linguistics

General - hyphenation

See discussion, open questions and decission in the [previous meeting memo.|/admin/weekly/2006/Meeting_2006-04-03.html]

TODO:

TODO:

North Sámi

Semantic feature system

We postpone this issue until Linda has met with Eckhard in early May.

Lule Sámi

TODO:

8. Name lexicon infrastructure

TODO:

  1. refactor and prepare risten.no for multiple collections:
    1. refactor the code into more and more specific components according to our folder hierarchy (Tomi, Sjur)
      1. things are moving forward
  2. develop the needed XQueries and interface (Sjur, Tomi)
    1. developing
  3. data synchronisation between risten.no and the cvs repo (Tomi)
    1. nothing this week
  4. test and review when ready

Discussion postponed until Sjur is back.

9. Spellers

Nothing until the new proper noun lexicon is in place. We don’t have enough people to do both.

Discussion postponed until Sjur is back.

10. Other

Bug fixing

50 open Divvun/Disamb bugs, and 25 risten.no bugs

Please help Saara with bug 279. Not much help…

Saara will contact Roy on this issue.

After the corpus issues have been somewhat settled, we should do a bug barnraising. … and then a new one after the name lexicon is fixed.

11. Summary, task list

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

12. Next meeting, closing

02.05.2006 09:30

Sjur is on paternal leave.

Closed at 12:04