Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review, participants
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
    1. Mouse problems
    2. Job positions
    3. Upgrade to 10.4
    4. Physical meeting?
    5. Vacation
  9. Summary, task lists
  10. Closing

Task list since last meeting:

1. Opening, agenda review, participants

Opened at 11.15 (more than 1 hour late due to a miss by Sjur). Agenda accepted as is.

Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre

Main secretary: Trond

2. Reviewing the task list from the last meeting

3. Documentation - divvun.no

We discussed to set up forrest as a standalone on the web server, or to make forrest convert the xml. As long as there is not much load on divvun.no, it is perhaps easiest to just put of forrest. Another possibility is to type “forrest war”, which gives .. that can be saved as … If we install a war-file, we can get away with only updating testfiles. If you, on the other hand, install forrest, you must redo the whole installation on the server when you update. Using war is an easier solution, since we can isolate different parts of the whole site. We should run the default Tomcat, and a forrest file manualy. Textual updates should be done automatically, as a cronjob. The issue should be discussed more in detail.

The task is to set up the server and to integrate it with the cvs repository.

Deadline for when we want to have the site up and running is the end of April.

4. Corpus gathering

Contract prototype! We have two models: Textlaboratoriet in Oslo, and the Helsinki model, presented by Kimmo Koskenniemi, and run by CSC in Finland, as a 3-part solution: The author, the licenser (publisher) and the maintainer (in that case: CSC). The Oslo model is simpler in the outset, but perhaps harder to maintain. The Helsinki model is more complicated in the outset, but perhaps(?) easier to maintain, since it is clear in a more principled way. We should have the UIT and Sámediggi lawyers look at this as well.

Børre: Contact Kimmo K &% Ruth V F

Deadline: The middle of May(?).

We should contact the Writers’ organisations, and suggest for them to recommend to their members that they in turn use the contract we have suggested to them for use of their texts.

5. Corpus infrastructure

Location of charset conv perl script

Tomi has made a perl script to convert problematic charsets to utf8. Where do we store it? Presently, we have cvs files in gt/, and we plan corp files outside the cvs. The question is then where to have cvs-included corpus files, in gt/ with the other cvs files, or in the corpus catalogue with the other corpus files. The discussion will go on in the newsgroups.

6. Linguistics

Thomas: Nothing more than already reported. Work continues on verb transitivity.

Maaren: working missing list. There are very many misspellings in the missing list. Spelling errors are very valuable for us, and in general, they are hard to get at. The spelling errors in the missing list should thus be gathered, e.g. in this format:

/misspelledform/correctspelledform/frequency_of_misspelledform?/

There is a thread for this on the newsgroup, “Misspellings in the corpus material

One frequent error type is the a - á errors. We could perhaps have a special prechecking a>á before the usual “editing distance 1”.

Compilation error

see BUG #69. The parser did not compile on the 17th of April. Many of the errors have general interest, and the cvs comment will be posted to the news group.

7. Term db

Deadline is end of April for the internal beta. Work is in progress, but not done.

8. Other issues

Job positions.

Mouse problems

Mainly Sjur has problems, Tomi sometimes

Upgrade to 10.4

Leif Åge has ordered Tiger, and he will distribute the installation packages from Guovdageaidnu. Tiger will be released on the 29th of April.

Things that we have modified may get broken as we move to 10.4. We should be aware of those issues. The information from Apple is incomplete.

Physical meeting?

Week 21 (23.5.->), place to be decided opn nest meeting

Vacation

To be discussed and decided in the next meeting.

9. Summary, task lists

TODO:

10. Next meeting, closing

25.04.2005 09.30

Closed at 13.00