Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from two weeks ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10:56.

Present: Børre, Saara, Sjur, Thomas, Tomi, Trond

Absent: Maaren

Main secretary: Trond

Agenda accepted with additions under “Other”.

2. Reviewing the task list from the last meeting

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

3. Documentation

Changes and updates because of the Divvun public tender

TODO:

TODO:

4. Corpus gathering

Collecting

See a previous meeting memo for what’s to be done.

TODO: Send out the rest of the letters (Børre)

Børre has sent a letter to the publishers, has talked to Brita Kåven (she was positive), to Bård Eriksen (Báhko), and has mailed to Dát. Has talked to Min Áigi, they will have a meeting on Monday (today). Áššu will collect text. Has contacted Audhild Schanche at NSI, she would look at the contracts and said they would work out a solution. She also mentioned Dieđut.

Odin

Waiting for Sæth to discuss with colleagues about how to implement the cooperation, and return to us.

TODO:

Olavi Korhonen’s Lule Sámi dictionary.

Korhonen and Oahpadusguovdásj have a shared copyright to the dictionary. They are both very positive.

KIO Grafisk and the Iđut books

TODO:

Bible texts

We will get text from Finland. We are awaiting an answer from Sweden. As for the last html versions from Norway, the people have been very busy the last weeks. Saara has made the paratext2xml converter.

TODO:

5. Corpus infrastructure

TODO in transferring the old gt/sme/corp files to the new corpus repo:

Further discussion about corpus analysis and computer use:

TODO dtd usage and documentation:

Correction tags?

TODO:

OPEN ISSUES:

Changes and updates because of the Divvun public tender

User account admin and infra: see [previous memo|/admin/weekly/2006/Meeting_2006-03-06.html].

TODO: see above under Documentation.

Automatic build of the content of our corpus repo: also see [previous memo|/admin/weekly/2006/Meeting_2006-03-06.html].

TODO:

Free and non-free texts

More info in a [previous meeting memo.|/admin/weekly/2006/Meeting_2006-03-13.html]

Newsgroup discussion - whether to rename gt/ to gtbound/ or not:

Saara: I was thinking of the scripts the users have of their own (if they have) and trying to avoid changes in already existing system. But if you prefer gtbound, I’ll do the changes right away.

Sjur: Let’s evaluate this in the next meeting. I would prefer gtbound/ over just gt/ for clarity’s sake and future newcomers, so if the workload isn’t too high, I would still like you to change it. We’ll discuss and decide on Monday.

Solution: Use a symbolic link to handle backwards compatibility.

TODO:

More texts to the graphical corpus interface:

TODO:

Text upload

The upload is working, but Børre doesn’t receive an automatic message whenever a new text is being uploaded. Saara has made a procedure, but hasn’t turned it on, since she didn’t know whether the email address was working.

TODO:

6. Infrastructure

Aligner

We are working on it, there are problems, and the test files are not good enough.

TODO:

Perhaps best to have all lgs in one list, and extract pairs via cut -d"/" -f1,4.

7. Linguistics

North Sámi

Semantic feature system

TODO:

Further discussion and details in the [previous meeting memo.|/admin/weekly/2006/Meeting_2006-03-20.html]

TODO (Trond):

  1. Discussion testing
  2. infrastructure
  3. semiautomatic retagging

Lule Sámi

TODO:

8. Name lexicon infrastructure

Complex names

TODO:

Editing

TODO on eXist as editor:

  1. refactor and prepare risten.no for multiple collections:
    1. develop the Cocoon sitemap to delegate requests to the proper folder level, such that the most specific code is always used (Sjur)
      1. done for XQueries and XSLT; only CSS left (needs to be handled differently)
    2. refactor the code into more and more specific components according to our folder hierarchy (Tomi, Sjur)
  2. develop the needed XQueries and interface (Sjur, Tomi)
  3. data synchronisation between risten.no and the cvs repo (Tomi)
    1. nothing last week
  4. test and review when ready

Data synchronisation task list/specification:

Details in the [previous meeting memo.|/admin/weekly/2006/Meeting_2006-03-20.html]

9. Spellers

Nothing until the new proper noun lexicon is in place.

10. Other

Divvun admin

The project manager would like all Divvun project members not working in the SD buildings to write down all hours worked, and a very brief description of the tasks done. The list should be sent Sjur every week on Friday afternoon as you leave work, or the following Monday morning.

Making such lists is necessary to be able to document to the SD administration that we are working the hours we should, in case of inquiries or newspaper stories. I trust (and know) you do, but that is not enough if somebody external doesn’t. Those working in the SD buildings are using a time clock for the same purposes, which at the same time enforces a stricter working-hour regime than what is possible with our self-reporting system.

I have been doing the same thing for myself for a long time, and the benefit is that it is easy to keep track of extra hours and “avspasering”. There are many applications out there that will help you, or you can just make a simple list on your own.

TODO:

Divvun project management while Sjur is on paternal leave

Sjur will soon go on paternal leave (expected April 6), and most likely be away for two weeks. While he is away, somebody else would need to be heading the Divvun project. The basic tasks are pretty simple for the most part:

TODO:

Børre is temp. Project Manager:-)

Easter vacation/absenses

Who? When?
Børre from the 10th to the 12th of April
Saara at work normally
Sjur no vacation, possibly paternal leave
Thomas from the 10th to the 12th of April, 3 days
Tomi from the 10th to the 12th of April, might be at work offline
Trond don’t know yet

Gobby

TODO:

SubEthaEdit update

SEE 2.3 is released. It is now commercial only, but 2.2 is still available for free, non-commercial use. Since we already have licenses, this is a non-issue, and all should upgrade. The new version contains bug fixes, and a few new features (don’t remember them all, mostly improvements in the UI).

Sjur: I have made a simple, but useful jspwiki mode, for syntax coloring of our meeting memos:-) It isn’t completely reliable yet, improvements welcome (I won’t do anything more on it).

Sjur: I have also made a first attempt at an XQuery mode, but that one isn’t working very well. I’ll give it to interested people, though:-)

TODO:

Bug fixing

35 open Divvun/Disamb bugs, and 25 risten.no bugs

11. Summary, task list

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

12. Next meeting, closing

03.04.2006 09:30

Closed at 12:19