Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from two weeks ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10:04.

Present: Maaren, Sjur, Thomas, Tomi

Absent: Børre, Saara, Trond

Main secretary: all

Agenda accepted as is.

2. Reviewing the task list from the last meeting

Børre

On Winter holiday.

Maaren

Saara

Sjur

Thomas

Tomi

Trond

On Winter holiday.

3. Documentation

Changes and updates because of the Divvun public tender

4. Corpus gathering

Collecting

See a previous meeting memo for what’s to be done.

TODO: Send out the rest of the letters (Børre)

Odin

Waiting for Sæth to discuss with colleagues about how to implement the cooperation, and return to us.

TODO:

Olavi Korhonen’s Lule Sámi dictionary.

Korhonen and Oahpadusguovdásj have a shared copyright to the dictionary.

KIO Grafisk and the Iđut books

We need a test file in order to find out whether file conversion from Quark to InDesign works.

TODO:

Bible texts

TODO:

5. Corpus infrastructure

We need more “version control” in the corpus work - we don’t know which version of the XSL script was used (but roughly whether it was used or not). We need to document within the XSL file which version of all tools were used, including the XSL file (common section/template) itself.

Transferring the old gt/sme/corp files to the new corpus repo:

TODO:

Further discussion about corpus analysis and computer use:

The new G5 is tremendeously faster than cochise, thus we want to use it. But cochise will continue to be our main corpus repo.

New tasks:

Changes and updates because of the Divvun public tender

6. Infrastructure

We need to set up anonymous, read-only access to our cvs repo as outlined by our friend in Skolelinux. This is necessary both for the Divvun public tender, as well as for other cooperation with external bodies (e.g. Skolelinux). This setup needs to be accompanied by documentation for access, as well as for building, prerequisites, etc.

We might even consider seting up patching routines, such that people with read-only access can send in patches for bugs and enhancements they would like to contribute.

Our open-source policy now demands that we really take the step, and not only talk open-source:-)

Should we set up view-cvs (web interface to cvs)?

Problems:

Howto/who:

7. Linguistics

North Sámi

We want to get the decisions from the SGL meeting in December.

SGL do not want us to send “unnecessary” things like rievan - rieban, politihkka - bolitihkka, pievlat - bievlat. And I agree with Laila. We are wasting their time. We have to write rieban-politihkka-bievlat. These are decided already year 1982-1988 1992-94. We have to read what Ole Henrik Magga has written long time a go about the decissions they (Finland, Norway and Sweden) made 1982-1988, and 1992 (?).

We have to make a list and send the list to SGL. They do have a meeting probably in May.

TODO:

TODO:

Lule Sámi

Goal for March 1st: Reach the 90 percent lexical recall limit.

TODO:

8. Name lexicon infrastructure

Complex names

TODO:

XML format

TODO on eXist as editor:

  1. refactor and prepare risten.no for multiple collections:
    1. develop the Cocoon sitemap to delegate requests to the proper folder level, such that the most specific code is always used
    2. refactor the code into more and more specific components according to our folder hierarchy
  2. develop the needed XQueries and interface (Sjur, Tomi)
  3. data synchronisation between risten.no and the cvs repo (Tomi)
  4. test whether eXist as editor is actually working well (linguists)

Data synchronisation task list/specification:

9. Spellers

Nothing until the new proper noun lexicon is in place.

10. Other

SGL Seminar

SGL don’t want a seminar - at least not yet.

Bug fixing

35 open Divvun/Disamb bugs, and 24 risten.no bugs

11. Summary, task list

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

12. Next meeting, closing

13.03.2006 09:30

Closed at 12:07