Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Speller infrastructure
  8. Other issues
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 10:20.

Present: Tomi, Sjur, Børre, Maaren, Thomas, Trond

Main secretary: Sjur

2. Reviewing the task list from the last meeting

Børre

Sjur

Tomi

On vacation last week

Trond

Maaren

Thomas

3. Documentation - divvun.no

We are waiting for the Linux server (will solve our UTF-8 and automatic update issues).

The giellatekno site is converted to the Forrest format, and is in principle ready for publication. Trond has to read through it and do some QA/editing before being released.

The standing invitation remains: Document what you do, when you do it.

4. Corpus gathering

The Oslo contract is added. We now need to work out our own suggestion for a contract. Børre, Trond and Sjur will have a meeting Tuesday 16.8. at 10.00 Norwegian Time to work on that draft. Sjur sends out an e-mail with details before the meeting.

5. Corpus infrastructure

Do documentation.

Naming conventions and directory structure

Principle: incoming directory tree

  1. according to collector (having an incoming basket)
  2. according to..

permanent directory tree according to author institution, Perhaps according to text genre, and thereafter author institution:

The discussion is moved to the discussion list, eventually to a separate meeting. But we’ll start (or rather continue) in the newsgroup.

6. Linguistics

North Sámi

Lule Sámi

When do we get the lexicon from Anders Kintel?

7. Speller infrastructure

aSpell

Write documentation here as well.

Is working, see above. Needs further optimisation before it is useful, cf the size discussion. Size target: At least below 10 Mb, wish: 6 Mb or less. It started at 580Mb, and now it is around 380 MB. Declining, that is.

Tomi is working on the affix file, and by that reducing the size of the speller file.

The affix file will so-to-speak recreate the inflection of our LEXC lexicon in the Aspell format (similar to MySpell?), in a format that could be called one-level (there is no concept of two levels with regular mappings between them in Aspell, or any of the *spell engines, for that matter).

Useful links:

MS Office spellers

To be specified in the outsourcing tech-spec. What about direct integration with MS Office? See this page from MS.

Discussion left for the future, and when we have the offers to compare with.

OpenOffice.org

Questions:

1) Is it possible to use Aspell with OOo?

No. From the site:

Background
Lingucomponent was started by Kevin Hendricks to integrate various open source spell
checkers into the OpenOffice.org build. With a little prodding from Kevin Atkinson,
the author of Pspell, a new spell checker called MySpell was written in C++ that supported
affix compression, based on Ispell. This code has been given to Pspell and will eventually
make it into a future Pspell release after it has been integrated.

Later note: Pspell is dead and Aspell is its replacement. I think Aspell is found at
aspell.sourceforge.net. There is no OOo interface to Aspell (no way to use Aspell with OOo).
kh.

MySpell, which builds on almost any system, is used to support spell checking in
OpenOffice.org.

Links to possibly useful stuff:

2) Are the Aspell source files compatible with MySpell?

To be investigated. Goal: to simplify maintenance as much as possible, ie to have as few source and target formats as possible.

The Debian Finnish aspell page seems to point to source files that are common to both iSpell, Aspell and Myspell. Follow the link near the bottom of the page named developer information for aspell-fi.

[http://hunspell.sourceforge.net/] is supposed to become the new OpenOffice.org spellchecker

8. Other

Helsinki gathering

Program draft available at [/admin/physical_meetings/helsinki-2005-08.html]

To bring from Kautokeino:

Are the hotel rooms reserved? Probably not. Maaren will reserve rooms ([Arthur|http://www.hotelarthur.fi/english/] or Helka).

Flights should be booked individually, with the bill sent to Sámediggi.

Børre in Kautokeino

First week there about a week after the Helsinki gathering. Then every third week by default.

Technical issues

9. Summary, task list

Børre

Maaren

Sjur

Thomas

Tomi

Trond

10. Next meeting, closing

22.08.2005 10:00

Closed at 12:55