Meeting setup
- Date: 21.02.2006
- Time: 09.30 Norw. time
- Place: Wherever we are :-)
- Tools: iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from two weeks ago
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09:44.
Present: Børre, Maaren, Sjur, Thomas, Tomi
Absent: Maaren, Saara, Trond
Main secretary: Børre
Agenda shortened substantially due to time constraints.
2. Reviewing the task list from the last meeting
Børre
- send out contracts with accompanying letter
- Gather public texts, preferrably also parallel ones
- Gathered some from samediggi, will add them soonish
- Continue converting text from input format to our xml
- convert nob and nno bible texts to be used as part of a parallel corpus
- review the paratext2xml converter
- convert smj NT to paratext
- not done, we don’t have the nny and nob NT paratexts
- Call Ove Sæth and Olavi Korhonen
- Correct Forrest integration for new projects and project ideas
- Move complex name lexicon issue to bugzilla
- fix bugs!
Maaren
- work with risten.no
- Not done. Worked with top ten missinglist
- discuss with relevant people regarding seminar on proofing tools, normativity
and SGL in February/March, including place.
- waiting for an answer from SGL when and where is best to hold normativity seminar.
Saara
- continue discussion on the new lexicon format
- Move the issue “Refine language detection for Finnish” to Bugzilla
- Move the issue “Finnish the review of the hyphenation detection” to Bugzilla
- Add version information of the tools to part of the corpus infra.
- Fix the preprocess script and optimize it.
- finalize an improved working version of the CGI and command line scripts for
corpus additions
- update conversion from lexc to xml (proper names) with the latest refinements
- Try to add numeral treatment as part of the analyzator.
- Look at crontab ga/ directory issue with Trond.
- not done. I will move the gt2ga.sh -script to darwin this week,
because it is way too slow in cochise. The more extensive test
routines are postponed for now.
- fix bugs!
- Changed the status of some corpus infra bugs to fixed.
Sjur
- Follow up the lawyer treatment of the contracts
- Lule Sámi twol problems, with Thomas and Trond
- project planning with Trond, continued
- Follow up on place names from Norge Digitalt
- Evaluate SFST as speller (and analyzer) lexicon
- write a background document on the corpus contracts
- public tender:
- review draft tender document from Finnut
- almost ready, to be published next Monday or Tuesday
- smj G3 issue with Thomas and Trond
- sme G3 issue with Thomas and Trond
- call EDD/Christian Emil Ore about national place name lexicon
- risten.no/proper noun lexicon development: fix bugs, continue development
- fix bugs!
Thomas
- work on North Sámi compounding and derivation
**nothing due to sickleave
- review corpus usage documentation
**nothing due to sickleave
- smj G3 issue with Sjur and Trond
**nothing due to sickleave
- sme G3 issue with Sjur and Trond
**nothing due to sickleave
Tomi
- move aspell UTF-8 suffix bug to Bugzilla
- corpus infrastructure:
- dtd location (both public and internal)
- Document aspell infrastructure: finish doc/proof/spelling/X-spell/aspell.xml
(it’s almost done, but there are a couple of loose ends)
- new proper name lexicon
- discuss the new lexicon format and other issues in the newsgroup
- Look into data synchronisation of proper nouns between risten.no and CVS
- new version of xml2lexc (based on ccat), should handle complex names correct:
construct entries like we have now from the different parts of a complex name
entry
- fix bugs!
- Other
Trond
- Work on corpus texts with Børre.
- Contact the Finnish and Swedish Bible societies to get Bible texts.
- Look at ga/ directory issue with Saara.
- News group discussion followup.
- Do a bug report (if not done) on commandline (mis)behaviour in the Xerox tools
- Ask IT guys for an e-mail adress for corpus upload script:
corpus@giellatekno.uit.no (forwarded to Børre)
- fix bugs!.
3. Linguistics
North Sámi
Maaren has been adding words from the top-ten missing list.
Lule Sámi
We need to get a working version of the Lule Sámi lexicon by Monday next week.
The lexicon doesn’t need to be “complete” in any way, but should approach 6-7000
lemmas.
Børre, Thomas and Trond will convert the material to lexc, in
cooperation with Sjur.
4. Other
SGL Seminar
SGL has now been elected, with the folowing members:
- Rolf Olsen (Else Turi)
- Tor Magne Berg (Marit Breie Henriksen)
- Elle Marja Vars (-)
- Lena Kappfjell (Albert Jåma)
- Heidi Andersen (-)
SGL/normativity seminar:
- all members = potentially/likely all languages
- not all languages, only North Sámi
- date? As early as possible, end of February/beginning of March
- place? Maaren will investigate
- I am waiting for the answer from Laila. Place? Do you have ideas?
Karasjok, Helsinki, Kautokeino? Date: end of March?
Leaves and vacations
- Maaren will be away from the end of March and three weeks.
- Sjur is on winter holidays this week, as well as Maaren from Wednesday
- Børre will be in Karesuando in two weeks, but will be working (some, at
least:-)
5. Summary, task list
Børre
- send out contracts with accompanying letter
- Gather public texts, preferrably also parallel ones
- Continue converting text from input format to our xml
- convert nob and nno bible texts to be used as part of a parallel corpus
- review the paratext2xml converter
- convert smj NT to paratext
- Call Ove Sæth and Olavi Korhonen
- Correct Forrest integration for new projects and project ideas
- Move complex name lexicon issue to bugzilla
- fix bugs!
Maaren
- work with top-ten missing list
- discuss with relevant people regarding seminar on proofing tools, normativity
and SGL in March/April, including place.
Saara
- continue discussion on the new lexicon format
- Move the issue “Refine language detection for Finnish” to Bugzilla
- Move the issue “Finnish the review of the hyphenation detection” to Bugzilla
- Add version information of the tools to part of the corpus infra.
- Fix the preprocess script and optimize it.
- finalize an improved working version of the CGI and command line scripts for
corpus additions
- update conversion from lexc to xml (proper names) with the latest refinements
- Try to add numeral treatment as part of the analyzator.
- Look at crontab ga/ directory issue with Trond.
- Create a parallel corpora of the new testaments.
- Routine for adding new languages to the propernoun xml-structure.
- fix bugs!
Sjur
- Follow up the lawyer treatment of the contracts
- Lule Sámi twol problems, with Thomas and Trond
- project planning with Trond, continued
- Follow up on place names from Norge Digitalt
- Evaluate SFST as speller (and analyzer) lexicon
- write a background document on the corpus contracts
- public tender:
- review draft tender document from Finnut
- smj G3 issue with Thomas and Trond
- sme G3 issue with Thomas and Trond
- call EDD/Christian Emil Ore about national place name lexicon
- risten.no/proper noun lexicon development: fix bugs, continue development
- fix bugs!
Thomas
- convert the Lule sámi lexicon
- write convertion list for Lule sámi propernoun cont. lexicas and define the lexicas
- work on North Sámi compounding and derivation
- review corpus usage documentation
- smj G3 issue with Sjur and Trond
- sme G3 issue with Sjur and Trond
Tomi
- move aspell UTF-8 suffix bug to Bugzilla
- corpus infrastructure:
- dtd location (both public and internal)
- Document aspell infrastructure: finish doc/proof/spelling/X-spell/aspell.xml
(it’s almost done, but there are a couple of loose ends)
- new proper name lexicon
- discuss the new lexicon format and other issues in the newsgroup
- Look into data synchronisation of proper nouns between risten.no and CVS
- new version of xml2lexc (based on ccat), should handle complex names correct:
construct entries like we have now from the different parts of a complex name
entry
- fix bugs!
Trond
- Work on corpus texts with Børre.
- Contact the Finnish and Swedish Bible societies to get Bible texts.
- Look at ga/ directory issue with Saara.
- News group discussion followup.
- Do a bug report (if not done) on commandline (mis)behaviour in the Xerox tools
- Ask IT guys for an e-mail adress for corpus upload script:
corpus@giellatekno.uit.no (forwarded to Børre)
- fix bugs!.
6. Next meeting, closing
27.02.2006 09:30
Closed at 10:13