Language Technology at UiT The Arctic University of Norway

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/

Page Content

Meeting setup


  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation -
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 10.30. Agenda accepted as is.

Present: Sjur, Thomas, Tomi, Trond, Børre

Away: Maaren

Main secretary: Sjur

2. Reviewing the task list from the last meeting

3. Documentation -

4. Corpus gathering

Børre has been in contact with Áššu, and they will provide us with both correct and uncorrect versions of their texts each Friday.

Børre and Sjur discussed the legal infrastructure on the way back home on Friday. Among other things, they discussed the cons and pros of the Helsinki and the Oslo model. The Helsinki model is outlined below (the different licenses are numbered 1-3, 4 is an attachment to 3):

Samediggi  ---------- Data central
(collector)      2       (UiT)
|           \             |
1 |            4\          |3
|                 \       |
Author/               User / researcher
Publisher                (end user)
(text owners)

The Oslo model: Compared to the above, the collector and the data central is one and the same.

Other differences:

We will go for a variant of the Helsinki model.


5. Corpus infrastructure

We did some testing in Guovdageaidnu - using the corpus infra is quite easy, some points of improvements were discussed.

Feature requests can be sent to newsgroup. Possible changes to be evaluated:

Seen from a linguistic part of view, we need as much text as possible. For some tasks, we may analyse them automatically, here we need large quanta. For other tasks we will have to analyse manually, here we will (often) have much text to choose between, and we would like to analyse only the ordinary text paragraphs.

6. Linguistics

Normative questions to be clarified:

7. Term db

Bug fixing, cosmetic tuning.

8. Other issues

8.1 Cgi-bin scripts

Pekka tried to use our form-based tools, which didn’t work, and the feedback address was to a non-working address. It should go to The part “-utf8” was removed from our scripts as part of last weeks update. We must keep an eye at that. The digraph thing is still not working. Will be added to Bugzilla.

8.2 Trond about the faculty:

9. Summary, task list








10. Next meeting, closing

06.06.2005 10.00

Closed at 12.52