Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Meeting setup

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Speller infrastructure
  7. Other issues
  8. Summary, task lists
  9. Closing

1. Opening, agenda review, participants

Opened at 10:08.

Present: Tomi, Sjur, Børre

Absent: Maaren, Thomas, Trond

Main secretary: Børre

2. Reviewing the task list from the last meeting

Tomi

On vacation last week

Trond

Maaren

Thomas

Sjur

Børre

3. Documentation - divvun.no

Forrest war crashes Tomcat

We have the same problem with risten.no. This blocks the addition of new functionality (like RSS feeds and jspwiki support), but won’t stop us from updating the content.

Steps to fix this bug (add):

  1. Take a backup of the present war file (it can be found within the Tomcat webapps/ dir)
  2. Make sure TWICE that you have that backup in a safe place
  3. Test both 5.0.29 and 5.5.9 versions of Tomcat, to see if there is any difference (or the combination of Tomcat and Forrest)
  4. try to identify when the bug was introduced (somewhere around the middle of June)
  5. Report the findings to the Forrest user list

Automated update script

Set up the cron job on cochise. Permissions are set correct now, divvun.no can be manually updated.

4. Corpus gathering

The correct directory for corpus legal things is adm/legal/. The Oslo example should be added as well. Børre will start with that today.

Then we’ll have to write our own contract, adapted to our own needs, and get it rephrased and checked by a lawyer.

THEN we can start to gather texts:-)

5. Corpus infrastructure

Do documentation.

6. Speller infrastructure

aSpell

Is working, see above. Needs further optimisation before it is useful, cf the size discussion. Size target: At least below 10 Mb, wish: 6 Mb or less.

Write documentation here as well.

MS Office spellers

To be specified in the outsourcing tech-spec. What about direct integration with MS Office? See this page from MS.

OpenOffice.org

Questions:

1) Is it possible to use Aspell with OOo?

No. From the site:

Background
Lingucomponent was started by Kevin Hendricks to integrate various open source spell
checkers into the OpenOffice.org build. With a little prodding from Kevin Atkinson,
the author of Pspell, a new spell checker called MySpell was written in C++ that supported
affix compression, based on Ispell. This code has been given to Pspell and will eventually
make it into a future Pspell release after it has been integrated.

Later note: Pspell is dead and Aspell is its replacement. I think Aspell is found at
aspell.sourceforge.net. There is no OOo interface to Aspell (no way to use Aspell with OOo).
kh.

MySpell, which builds on almost any system, is used to support spell checking in
OpenOffice.org.

Links to possibly useful stuff:

2) Is the Aspell source files compatible with MySpell?

To be investigated. Goal: to simplify maintenance as much as possible, ie to have as few source and target formats as possible.

7. Other

Headphones

The project will buy headphones, to help with online project meetings. Small and light!

8. Summary, task list

Børre

Sjur

Tomi

On vacation

Tasks are transferred to the following weeks until they return from vacation, to keep the tasks alive and updated. New tasks may be added:-)

Maaren

Thomas

Trond

8. Next meeting, closing

15.08.2005 10:00

Closed at 12:08