Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

Cf. one of the following, depending on context:

Opening, agenda review, participants

Opened at 09:58.

Present: Børre, Lene, Maaren, Per-Eric, Sjur, Thomas, Tomi, Trond

Absent: none

Agenda accepted as is.

Updated task status since last meeting

Børre

Lene

Maaren

Per-Eric

Saara

Sjur

Thomas

Tomi

Trond

Pedagogical software online

We now have user documentation almost online (the technical one is part of our TechDoc hierarchy). What is missing is a working URL.

Børre has been working on setting up a Forrest based site for the user front end, with adjusted CSS styling, another layout, etc. This did not quite succeed.

The next step is to establish an url, say http://oahpa.no/ or http://oahpa.uit.no/, directing to that site.

It should be online on Feb. xx, the slick URL and professional layout should be ready by YY.

TODO:

Workshop in Tromsø, end of February

Conference in Tromsø in week 9, february 28-29, on Sámi documentation and revitalisation. The two teams should present our work, and our view on the future. There is a start for it at plan/art/samdoc08/samdoc08.tex and plan/art/samdoc08/samdoc08-sem.tex.

One of the goals for the conference is to make proposals for grant support.

First draft ready in Snåsa.

TODO:

Documentation

TODO:

Corpus gathering

TODO:

Infrastructure

TODO:

Linguistics

North Sámi

Hyphenation bugs still there, now properly documented by the improved test bench.

Lule Sámi

Hyphenation: same as for sme.

TODO:

South Sámi

TODO:

Name lexicon infrastructure

The upcoming dictionaries are:

The kven work should be visible. The smj should be reported this week, the smenob is part of the ped work and is interesting for the general audience.

Status quo: The dictionaries are shown online (http://www.divvun.no:8889/index.html?locale=no), but do not give translations. It requires the locale request parameter to work properly for most users.

Short-term goal: Have them work in risten GUI. Long-term goal: Give them a better GUI and integrate in ped platform and other places.

Decisions made in Tromsø can be found in [this meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

TODO:

  1. fix i18n bug in risten.no/G5 (so they will work without the proper locale request) (Sjur)
  2. fix display in column 3 (Sjur)
  3. fix bugs in lexc2xml; add comments to the log element (Saara)
  4. finish first version of the editing (Sjur)
  5. test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
  6. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  7. convert propernoun-($lang)-lex.txt to a derived file from common xml files (Sjur, Tomi, Saara)
  8. implement data synchronisation between risten.no and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way)
  9. start to use the xml file as source file
  10. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  11. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  12. publish the name lexicon on risten.no (Sjur)
  13. add missing parallel names for placenames (linguists)
  14. add informative links between first names like Niillas and Nils (linguists)

Proofing tools

Hunspell

The %> marker does not survive into Hunspell to work as a boundary marker, despite being defined as %> for the Hunspell version.

Priority list:

  1. debug the missing > marker
  2. add smj to the soup, make sure it works roughly as good as sme
  3. fix the remaining conversion bugs for sme
  4. return to smj, and fix whatever is left to fix
  5. integrate the derivations as separate “continuation lexicons”

TODO:

Testing

Spelling Error Markup

TODO:

Automated testing

TODO:

Lexicon conversion to the PLX format

Open issues based on test results :

sme

Version: Davvisámi, version 1.0.1, 2008-01-28

smj

Version: Julevsáme, version 1.0.1, 2008-01-28

TODO:

InDesign tools

The speller works in InDesign and InCopy. Lacks user defined lexicons, but Polderland is trying to fix this bug.

The new Sámi newspaper, Ávvir, is publishing its first edition on February 6th. We should release final InDesign tools, including final spellers, one week ahead of that, Wednesday Jan. 30th.

TODO:

Hyphenators

We need more test data, to test hyphenation of different types of words. Thomas should make a file gt/sme|smj/testing/hyphenation.txt of the format:

compoundword	com^pound#word

It should contain all possible word formation patterns and their correct hyphenation. That is, at least:

TODO:

Windows installer

TODO:

Releases

TODO:

Other

South Sámi project startup meeting

We extend the meeting on our part, to have this project’s first gathering.

TODO:

Corpus contracts + open source

TODO:

Bug fixing

When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version.

83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs, 38 are other bugs), and 23 risten.no bugs

Next meeting, closing

The next meeting is 4.2.2008, 09:30 Norwegian time.

The meeting was closed at 11:28.

Appendix - task lists for the next five days

Boerre

Lene

Maaren

Per-Eric

Saara

Sjur

Thomas

Tomi

Trond