Helsinki Gathering 29-30. Aug. 2005
Location
The seminar room at the Dept. of Linguistics, University of Helsinki,
Siltavuorenpenger/Brobergsterrassen 20 A, 2nd floor (3. etg./kerros).
If you are lost, call Sjur +358-505634319, Trond +358-504676915.
Internet
We will get (wireless) Internet access to all throughout the week.
Technical meeting 29. Aug
- Participants: Børre, Saara, Sjur, Tomi, Trond
- Time: 9-17, lunch break 12.30-14 (ca)
Program
- 09.00 - 09.30: Presentations
- 20 mins: Who we are, what we have done so far
- 10 mins: Our visions for the close future
- 09:30 - 10:15: Corpus infrastructure
- Directory structure:
- 5 mins: The problem and the goal (Tomi)
- 15-20 mins: Brainstorming to find a solution, 2 groups
- 15-20 mins: merge all suggestions, discuss, decide, write down the decision
- 10.15 - 10.30: Coffee break
- 10.30 - 11.00: Speller infra
- What do we need? Testing of different kinds, testing documentation for ourselves
and our testers, making and packaging, user evaluation and feedback infrastr.,
installer apps, end user documentation, download site, etc.
- 5 mins: Brainstorming, individually
- 20 mins: Discussion, merging suggestions and ideas
- 5 mins: Who is responsible for what?
- 11.00 - 12.00: Speller internals (Aspell, MySpell, MS Spell, etc)
- hunspell introduction/overview
- sme Aspell improvements (size/affix lexicon discussion)
- 12.00 - 12.45: Saami Speech Synthesis - cooperation with Hel. Univ.
Notes
- 13.00 - 14.00: Lunch (we’ll stay at lunch till the department meeting is over)
- 14.00 - 15.30: New Lexicon format
- 5 mins: Problems with the existing, goals for the new (Sjur)
- Lule Saami as Guinea Pig for XML lexicon format?
- 20 mins: What info do we need there? Consider features of all speller
engines, as well as future needs
- Brainstorming, summary session
- 20 mins: work out dtd (or similar)
- 20 mins: go over our infrastruct and identify all points affected by
a new lexicon format (2 groups)
- 15.30 - 15.45: Coffee
- 15.45 - 17.00 Double session:
- A) Proofing tools public tender - technical specification
- Work out all chapters and all the info that should go in there
- B) Machine finetuning
- Make sure all machines work the way they are intended to (=the same way).
List of known problems:
- The perl locale bug in 10.4. (bug #xxx)
- BCKSP in UTF-8 mode.
Deferred tasks?
- Corpus format
- Interface for linguistic corpora
- Monolingual corpora
- Parallel corpora
- Localisation
- Bugzilla UTF-8 bug (when creating a new bug report)
Common meeting 30. Aug
- Participants: Børre, Maaren, Saara, Sjur, Thomas, Tomi, Trond
- Time: 9-17, lunch break at regular time
Program
- Evaluation
- Our achievements
- Our weak points
-
Short intro to daily/weekly routines
- Lexicon format
- Lule Saami as Guinea Pig for XML lexicon format?
- Corpus gathering:
- finalize our own internal discussion
- Contract discussion with HU/Kimmo. Discussion early in the morning
or late in the afternoon.
- Linguistic issues - Northern Sámi
- currency sign - what is it supposed to do?
- linguistic brainstorming on blocking problems (3-part compounds, Oslolaš)
- vowel shortening in compounds
- numerals (get the facts right - and make a generator)
- indefinite pronouns
- facultative vowel simplification in compounds
- Linguistic issues - Lule Sámi
- numerals (get the facts right - and make a generator)
- Derivation
- Compounds
- Computer checks:
- Maaren’s computer (several tools aren’t working anymore) - but this can’t
be done till she is present
Groups?
Tuesday: Lule Saami linguistics slot
- Lule: Thomas, Trond, Sjur
- Support: Børre, Maaren looking at practical work
- Tech: Saara, Tomi discussing programming issues
Eventually:
- Linggroup:Maaren, Thomas, Trond,
- Proggroup: Børre, Saara, Sjur, Tomi