Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. risten.no
  8. Other issues
    1. Vacation planning
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 09.30. Agenda accepted as is.

Present: Maaren, Sjur, Thomas, Trond, Tomi

Absent: Børre

Main secretary: Sjur

2. Reviewing the task list from the last meeting

All:

Børre

Maaren:

Sjur:

Thomas:

Tomi:

Trond

3. Documentation - divvun.no

Opening - short evaluation

Not mentioned much in Tysfjord, but the address was given: “go there and have a look”. We delivered what we planned, with the exception of Forrest i18n (because of Forrest).

What’s next

4. Corpus gathering

Anders Kintel is now ready with the dictionary manuscripts, and handed it over to the Sámi Lanuage board and GIO by Harrieth Aira and Julie Eira (OAO) at the language board meeting in Tysfjord. The manuscripts will be checked by dthe September meeting.

Kåre Tjihkkom and Kaia Kalstad will now go over the Norwegian - Lule Sámi part, and the dictionaries will be up to control for the September meeting of the lg council. Afterwards, we may get a copy. Factually, we could get it already now, since the Sámi-Norwegian part is already checked. The dictionary is written in Filemaker Pro, and can be exported to other structured formats. Børre will be back at some point next week, and may then ask for a copy.

Thomas has talked to Hans Ragnar Mahtisen about parallel placenames in his Sámi Atlas. These are names of foreign countries. Hans Ragnar positive.

Hans Ragnar already gave us the Sámi placenames, and they are included already, but not as parallel names. Thus, if we want the Tyskland > Duiskka correction function, we need the names.

Thomas will continue to discuss, and emphasize that we also would like to have the parallel names (Tyskland as well as Duiskka).

Trond will get a new version of the NT.

5. Corpus infrastructure

Storing the corpus.dtd and similar files. Now in gt/dtd/

gt$ls
CVS  cwb  doc  dtd  script  sma  sme  smi  smj  smn  sms  tmp  www

The structure of gt and the way to handle dtd must be discussed and decided upon. The discussion is moved to the newsgroup.

6. Linguistics

sme

Nothing done last week. The missing list contains a lot of compounds - compounding isn’t working anymore (it worked earlier?). A lot of noise words are Lule and South Sámi words - we need proper language identification as part of the corpus infrastructure, to make use Tomi’s language extraction feature of catxml to only get the relevant language.

54 jïh
 47 åarjelsaemien
 20 Sámedigge
 20 Saemiedigkie
 20 julevsáme
 17 Sámelága
 15 finihtta
 14 mij
 14 dajvesne
 13 julevsámegiel
 12 aaj
 12 •
 11 sæjhta
 11 dejtie
 10 vuolitásahus
 10 gïelem
 10 ájggu
  9 vihkeles

Bugzilla: The -heddjiid issue should be looked into after the giellalávdegoddi has had their meeting. The -eabbo vs. -eabbu issue could be looked into now, because -eabbo has to be implemented anyway.

smj

7. risten.no

Opening - short evaluation

Opening went well, with the exception of the “sticky” Lule Sámi text.

What’s next

8. Other issues

Language(s) of RSS Newsfeed

It is working (needs a separate input module), so the question is only how it should be used Open for discussion (when Børre is back).

Vacation planning

9. Summary, task list

This week’s task list is a summary of all undone tasks from before the divvun.no/risten.no openings.

All:

Børre

Maaren:

Sjur:

Thomas:

Tomi:

Trond

10. Next meeting, closing

04.07.2005 10:00

Closed at 11.00