Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from two weeks ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. name lexicon infrastructure
  8. Spellers
  9. Other issues
  10. Summary, task lists
  11. Closing

1. Opening, agenda review, participants

Opened at 10:36.

Present: Børre, Saara, Sjur, Thomas, Trond

Absent: Maaren, Tomi

Main secretary: Trond

Agenda accepted as is.

2. Reviewing the task list from the last meeting

Børre

Maaren

Saara

Sjur

Thomas

Tomi

On sick leave.

Trond

3. Documentation

TODO:

4. Corpus gathering

Collecting

See a previous meeting memo for what’s to be done.

TODO: Send out the rest of the letters (Børre)

Odin

Waiting for Sæth to discuss with colleagues about how to implement the cooperation, and return to us.

TODO:

Olavi Korhonen’s Lule Sámi dictionary.

Korhonen and Oahpadusguovdásj have a shared copyright to the dictionary.

KIO Grafisk and the Iđut books

We need a test file in order to find out whether file conversion from Quark to InDesign works.

TODO:

Bible texts

TODO:

5. Corpus infrastructure

We need more “version control” in the corpus work - we don’t know which version of the XSL script was used (but roughly whether it was used or not). We need to document within the XSL file which version of all tools were used, including the XSL file (common section/template) itself.

Transferring the old gt/sme/corp files to the new corpus repo:

TODO:

Further discussion about corpus analysis and computer use:

The new G5 is tremendeously faster than cochise, thus we want to use it. But cochise will continue to be our main corpus repo.

Secure copying:

To get cvs ssh working without password prompting:
ssh-keygen -t rsa
<just type enter to all questions>
chmod 0644 .ssh/id_rsa.pub
scp .ssh/id_rsa.pub <user>@cochise.uit.no:.ssh/authorized_keys2

New tasks:

6. Linguistics

North Sámi

We want to get the decisions from the SGL meeting in December.

TODO:

Lule Sámi

NFR wrote:

[Programstyret] bed (...) om at det umiddelbart vert laga ein alternativ plan
for resten av prosjektperioden. Denne planen må gå ut frå den endra føresetnaden,
at ein ikkje har tilgang til den lulesamiske ordboka. Programstyret vil at ein
går over til den alternative planen frå 1. mars, dersom tilgangen ikkje er reelt
i orden på det tidspunktet.

The criterion for continuing with Lule Sámi that we set up in our plan was the following, somewhat stronger:

kriteriet for om det blir satsing på lulesamisk er om vi har ein betaversjon
av den lulesamiske transdusaren, med integrert leksikon, i gang 1. mars.

Moments to our March 1st report:

Goal for March 1st: Reach the 90 percent lexical recall limit.

Conclusion: we have already met the basic requirements for continuing with Lule Sámi.

TODO:

7. Name lexicon infrastructure

Complex names

TODO:

XML format

TODO:

  1. testing of conversion
  2. eXist as editor:
    1. develop the needed XQueries and interface (Sjur, Tomi)
    2. data synchronisation between risten.no and (Sjur, Tomi)
    3. test whether eXist as editor is actually working well (Thomas, others)

8. Spellers

Nothing while Tomi is on sick leave, and until the new proper name lexicon is in place.

9. Other

Upcoming Strategy money application deadline at Samisk senter

Possible grant proposal themes include:

SGL Seminar

Bug fixing

33 open bugs (and 24 risten.no bugs)

10. Summary, task list

Børre

Maaren

Saara

Sjur

Thomas

Tomi

Trond

11. Next meeting, closing

06.03.2006 09:30

Børre, Saara, Trond will be away.

Closed at 12:00