Meeting setup
- Date: 23.05.2005
- Time: 12.00 Norw. time
- Place: Wherever we are :-)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- Sjur: brief report from Joensuu
- The gathering
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 12.20. Agenda accepted as is.
Present: Sjur, Thomas, Tomi, Trond, Børre
Main secretary: Børre
2. Reviewing the task list from the last meeting
- Børre:
- Contact Univ. of Oslo on the contract issue.
- Finish the divvun e-mail alias
- Leif Åge said he would fix it, haven’t got any feedback
- Contact people about Lule Sámí texts
- Contact skolelinux and Knut Hofland about webcrawler.
- Here is Knut’s thoughts on the issue in 1999:
[http://gandalf.aksis.uib.no/knut/mons/kh-avis.htm]
- Finish the setup of divvun.no
- Haven’t had contact with Thor Øivind Johansen, Børre will try to contact him in
person today
- Look at how forrest handles i18n, try to get work done on that.
- Set up programme and other details about the gathering in Guovdageaidnu
25th-27th of May
- Contact Roy Dragseth about cvs-commit mailing
- Sent an e-mail, phoned him a few days later. He is away untill the 26th of May.
- Finish the setup of [http://giellatekno.uit.no/]
- Tomi:
- Template for manual xsl conversion script
- Script for processing the corpus for preprocessor
- Install “Tiger” (10.4), document all issues, together with steps to resolve
them
- Look at the twolc part of how to handle shortening of the middle part in
three part compounds. Discuss the data / linguistic side with the linguists.
- Look at decapitalisation of proper nouns when compounded or derived to a
general noun
- Sjur:
- Meet with Kimmo:
- Discuss text licensing, and get a copy of the Helsinki licensing model/text
Got a copy
- Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now,
each byte counts as one, instead of each character counting as one (what is
needed is a counting mechanism that counts only bytes starting in 01 or 110,
and ignoring bytes starting in 10.)
Forgot that, kwic-snt issue still open.
- Still some work on the Termdb
- Write presentation for the Joensuu meeting
- Decide upon the public opening of divvun.no with Anne Britt
- Discuss with Thor Øivind Johansen about Forrest/divvun/Tomcat.
Tel: +47 7764 6741
- Ask Leif Åge to check the opening/forwarding of ports number needed for
iChat voice and video conferencing
- Maaren:
- continue with proper nouns, work on the Bugzilla bug list
- reserve rooms at Villmarkssenteret
- Translate divvun.no front page to Finnish
- Thomas:
- continue with verb valency
about 3000 verbs left now
- Reply to Teemu Leskinen, thanking for the material but asking that we get a
new version, now with the Finnish names coupled with the Sámi names, if at
all possible.
Teemu will send us a new version.
- Trond:
- Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc.
for the meeting.
Has worked on linguistic issues, but not on Lule Sámi. Have worked on bugs.
- All:
- Have a look at the bug list in Bugzilla (mainly Trond’s bugs, but give
me a hand, will you?) [http://giellatekno.uit.no/bugzilla/]
We are now down at 14 open bugs.
- Prepare the Guovdageaidnu meeting
- Order tickets to the Guovdageaidnu meeting
3. Documentation - divvun.no
Nothing has happened - no contact with Thor Øyvind yet.
4. Corpus gathering
Licenses received.
Received two books from the Old Testament, was promised a new version of the New
Testament.
5. Corpus infrastructure
Tomi has made templates for manual conversion scripts and …
Sjur disc with LarsN and Trond about tagged corpus format in Joensuu
6. Linguistics
See above.
7. Term db
It was opened as an internal version for beta testing at the
Sámediggi on the 19th of May.
8. Other issues
8.1 Report from Jooensuu:
- Welcome to give a system demo on the Finite State workshop in Helsinki
- Tools for automatical creation of bilingual dictionarie4s on the basis
of aligned parallel corpora.
8.2 This week + gathering
8.2.1 Gathering - Programme:
- Wednesday before lunch
- Common issues:
- Evaluate first half year
- Discuss issues/improvements
- Bugzilla methodology
- Wednesday after lunch
- Programmers’ session
- Upgrade to 10.4, fix things
- Bugzilla bugs?
- Linguist session
- Lule Sámi: Discussion on how to proceed, UTF-8 vs. Latin 1, make a task
overview for the programmers
- Linguistic issues in Northern Sámi
- Work procedures for linguistics
The missing list, the names, the loan words
- Kárás1johlas1, case agreement within numerals, Vowel
shortening of compounds with short 1st syllable (bivdo- pro bivdu-).
The linguistic facts must be clearified.
- Bugzilla bugs
- Thursday before lunch
- Common session
- xml format for the lexicon
- Lule Sámi common things
- Thursday after lunch
- Linguists’ session
- Linguistic issues cont. from Wednesday.
- Programmers’ session
- Forrest i18n
- Outsourcing: specification of the outsourced work
- Make sure video and voice conferencing is working through the
Sámediggi firewall
- Friday before lunch
- Computational linguist session (Tomi, Sjur, Trond)
- Discussing things that has come up during the week, relevant for twolc, lexc, xfst
- Session for the rest
- Ask Børre about everything you were afraid to ask about (TM)
- Friday after lunch
- Programmers
- xml format for the lexicon, the more technical details
8.2.2 Practical things:
- We will stay at Villmarkssenteret
- Sjur: 23.-27.5.
- Trond 23.-27.5.
- Tomi 23.-27.5.
- Børre: Monday - Friday (Private accomodation)
- Schedule. Wednesday, thursday, friday, full days (Trond to leave
after lunch on Friday) and Maren about 11.00.
- The meeting room has been reserved.
9. Summary, task list
TODO:
10. Next meeting, closing
30.05.2005 10.00
Closed at 13.35