Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Helsinki Gathering 29-30. Aug. 2005

Location

The seminar room at the Dept. of Linguistics, University of Helsinki, Siltavuorenpenger/Brobergsterrassen 20 A, 2nd floor (3. etg./kerros). If you are lost, call Sjur +358-505634319, Trond +358-504676915.

Internet

We will get (wireless) Internet access to all throughout the week.

Technical meeting 29. Aug

Program

  1. 09.00 - 09.30: Presentations
    1. 20 mins: Who we are, what we have done so far
    2. 10 mins: Our visions for the close future
  2. 09:30 - 10:15: Corpus infrastructure
    1. Directory structure:
      1. 5 mins: The problem and the goal (Tomi)
      2. 15-20 mins: Brainstorming to find a solution, 2 groups
      3. 15-20 mins: merge all suggestions, discuss, decide, write down the decision
  3. 10.15 - 10.30: Coffee break
  4. 10.30 - 11.00: Speller infra
    1. What do we need? Testing of different kinds, testing documentation for ourselves and our testers, making and packaging, user evaluation and feedback infrastr., installer apps, end user documentation, download site, etc.
      1. 5 mins: Brainstorming, individually
      2. 20 mins: Discussion, merging suggestions and ideas
    2. 5 mins: Who is responsible for what?
  5. 11.00 - 12.00: Speller internals (Aspell, MySpell, MS Spell, etc)
    1. hunspell introduction/overview
    2. sme Aspell improvements (size/affix lexicon discussion)
  6. 12.00 - 12.45: Saami Speech Synthesis - cooperation with Hel. Univ. Notes
  7. 13.00 - 14.00: Lunch (we’ll stay at lunch till the department meeting is over)
  8. 14.00 - 15.30: New Lexicon format
    1. 5 mins: Problems with the existing, goals for the new (Sjur)
    2. Lule Saami as Guinea Pig for XML lexicon format?
    3. 20 mins: What info do we need there? Consider features of all speller engines, as well as future needs
      1. Brainstorming, summary session
    4. 20 mins: work out dtd (or similar)
    5. 20 mins: go over our infrastruct and identify all points affected by a new lexicon format (2 groups)
  9. 15.30 - 15.45: Coffee
  10. 15.45 - 17.00 Double session:
  11. A) Proofing tools public tender - technical specification
    1. Work out all chapters and all the info that should go in there
  12. B) Machine finetuning
    1. Make sure all machines work the way they are intended to (=the same way). List of known problems:
      1. The perl locale bug in 10.4. (bug #xxx)
      2. BCKSP in UTF-8 mode.

Deferred tasks?

Common meeting 30. Aug

Program

  1. Evaluation
    1. Our achievements
    2. Our weak points
  2. Short intro to daily/weekly routines

  3. Lexicon format
    1. Lule Saami as Guinea Pig for XML lexicon format?
  4. Corpus gathering:
    1. finalize our own internal discussion
    2. Contract discussion with HU/Kimmo. Discussion early in the morning or late in the afternoon.
  5. Linguistic issues - Northern Sámi
    1. currency sign - what is it supposed to do?
    2. linguistic brainstorming on blocking problems (3-part compounds, Oslolaš)
    3. vowel shortening in compounds
    4. numerals (get the facts right - and make a generator)
    5. indefinite pronouns
    6. facultative vowel simplification in compounds
  6. Linguistic issues - Lule Sámi
    1. numerals (get the facts right - and make a generator)
    2. Derivation
    3. Compounds
  7. Computer checks:
    1. Maaren’s computer (several tools aren’t working anymore) - but this can’t be done till she is present

Groups?

Tuesday: Lule Saami linguistics slot

Eventually: