Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

Cf. one of the following, depending on context:

1. Opening, agenda review, participants

Opened at 09:57.

Present: Børre, Maaren, Per-Eric, Sjur, Thomas, Tomi, Trond

Absent: Risten

Agenda accepted as is.

2. Updated task status since last meeting

Børre

Maaren

Per-Eric

Saara

Sjur

Thomas

Tomi

Trond

3. Documentation

Our documentation needs a thorough clean-up and reorganisation to make it clearer and easier to use.

TODO:

4. Corpus gathering

Nothing has happened last week.

We need to start gathering sma texts right away. Some sources of sma texts:

TODO:

5. Infrastructure

Forrest needs better handling of i18n, to help us get a more stable site. We should probably also look at a visual make-over of our site as well at the same time.

We now also have the time to start to explore the new collaboration features of the Leopard server: shared project calendars, possibly a wiki, permanent chat rooms for dedicated topics.

TODO:

6. Linguistics

North Sámi

Hyphenation bugs still there, needs improved test bench.

Lule Sámi

Hyphenation: same as for sme.

TODO:

South Sámi

Trond and Sjur needs to have a thourough look at the sma sources, and evaluate the present approach to its morphophonology.

TODO:

7. Name lexicon infrastructure

Decisions made in Tromsø can be found in [this meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]

TODO:

  1. fix bugs in lexc2xml; add comments to the log element (Saara)
  2. finish first version of the editing (Sjur)
  3. test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
  4. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  5. convert propernoun-($lang)-lex.txt to a derived file from common xml files (Sjur, Tomi, Saara)
  6. implement data synchronisation between risten.no and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way)
  7. start to use the xml file as source file
  8. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  9. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  10. publish the name lexicon on risten.no (Sjur)
  11. add missing parallel names for placenames (linguists)
  12. add informative links between first names like Niillas and Nils (linguists)

8. Proofing tools

Hunspell

Proper nouns not yet working, and they do not contain anything to clearly identify the end of the stem. This makes it harder to generate proper HUNSPELL entries.

It should be quite easy to introduce suffix/affix border markers in proper nouns

TODO:

Testing

Spelling Error Markup

This will wait till after the release.

TODO:

Automated testing

Paradigm testing is now fixed, and is working.

BUT: paradigms are not generated for smj verbs, in gt/smj/testing. Nouns, adjectives, propernouns, numbers all work fine.

TODO:

Lexicon conversion to the PLX format

Open issues based on test results:

sme

smj

20080114 version is missing proper nouns completely due to ssh error while transfering - Áilegas ?? weren’t those cases from sme -> smj - ok Ok, but the 575 bug was present also in the previous version Yes, but I want to discuss that data, it is disturbing to have test cases that constantly fails (or should fail). They test whether we get double hyphens in the suggestions. We should replace the test cases with the double hyphen strings we used to get.

TODO:

InDesign tools

The speller was delivered last week, but without documentation. Sjur could not get it to work. Documentation should be coming today.

We also need to communicate better (or at all, that is) with Davvi Girji and Min Áigi.

TODO:

Hyphenators

Still issues to investigate.

Windows installer

New Windows installer NSIS

Benefits of using this:

Drawbacks:

TODO:

Releases

We need to bring the Changes document up-to-date, and prepare for a 1.0.2 release later this month.

TODO:

9. Other

South Sámi project startup meeting

We extend the meeting on our part, to have this project’s first gathering.

TODO:

Corpus contracts + open source

TODO:

Bug fixing

When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version.

83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs, 38 are other bugs), and 23 risten.no bugs

10. Next meeting, closing

The next meeting is 21.1.2008, 09:30 Norwegian time.

The meeting was closed at 10:53.

Appendix - task lists for the next week

Boerre

Maaren

Per-Eric

Saara

Sjur

Thomas

Tomi

Trond