Meeting setup
- Date: 18.04.2005
- Time: 11.00 Norw. time
- Place: Wherever we are :-)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review, participants
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- Mouse problems
- Job positions
- Upgrade to 10.4
- Physical meeting?
- Vacation
- Summary, task lists
- Closing
Task list since last meeting:
- Tomi: Move the corpus discussion from newsgroup to forrest
- has been doing XSLT for docbook to our xml format (docbook2sme.xsl)
- awaiting final XML format
- Thomas: continue work with verb transitivity. Contact Olavi about
his dictionary.
- Maaren: Will work with this project, starting tomorrow.
- I`ll take contact with Trond and Thomas (telephone meeting Tuesday 12.4.)
- All: discuss our own corpus format in the newsgroup - to be continued.
- Tomi and Børre: identify the input encodings antiword can handle
- Tomi makes a perl-script, Børre helps out on that one.
- Børre: divvun.no:
- to set up the documentation and integration with CVS
This must be integrated with the systems at the UIT, and Børre will contact the
IT persons at UIT
- Coordinate with Sjur as well
- Still missing a project e-mail address. Børre contacts Leif Åge.
- Sjur, Børre: terminology database
- Børre: should make the How-To tab appear as intended:
- Børre: wiki - will follow up on the UTF-8 problem
- temporary workaround: use 8-bit encoding for the memos (=MacRoman works correctly)
- convert all memos to MacRoman
- Trond, Børre, all: fix the links
- Børre will check if there are still more link problems
- Lot’s of links to .txt files are broken. Trond and Børre takes a look at that
issue.
- Børre: contact Min Áigi, discussing possible practical arrangements for cooperation.
1. Opening, agenda review, participants
Opened at 11.15 (more than 1 hour late due to a miss by Sjur). Agenda accepted as is.
Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre
Main secretary: Trond
2. Reviewing the task list from the last meeting
- Tomi: Move the corpus discussion from newsgroup to forrest
- awaiting final XML format
- cont. disc in newsgroup: Tomi and Sjur has discussed in the thread
“Corpus format”, but the discussion is still inconclusive, awaiting further
feedback.
- Thomas: continue work with verb transitivity. Contact Olavi about
his dictionary.
- has worked on the verb transitivity all week, and contacted Olavi about Lule Sámi.
Thomas explained that the Lule dictionary does not need to have complete translations in
order to be useful for us. Olavi informed that the Lule-Swedish web dictionary will
be done in the autumn.
- Maaren: Will work with this project, starting tomorrow.
Managed to work every day(!), on the missing list.
- I`ll take contact with Trond and Thomas (telephone meeting Tuesday 12.4.)
The meeting was held.
- All: discuss our own corpus format in the newsgroup - to be continued.
- deadlline: 25.4.
So far only Tomi and Sjur have discussed, this must get high priority this week.
Marit should also have her say.
- Tomi and Børre: identify the input encodings antiword can handle
- Tomi makes a perl-script, Børre helps out on that one.
- Done
- Børre: divvun.no:
- to set up the documentation and integration with CVS
This must be integrated with the systems at the UIT, and Børre will contact the
IT persons at UIT
- Contacted Thor Øyvind Johansen. Told about forrest, cochise. Discussed the
possibilities (cvs, forrest, ssh). He will answer later this week.
- Coordinate with Sjur as well
- Still missing a project e-mail address. Børre contacts Leif Åge.
Fixed. Sent to us all. Add Trond to the list.
- Sjur, Børre: terminology database
- Still working with it…
The base has been further developed, problems are better identified than
before, so there is progress.
- Børre: should make the How-To tab appear as intended:
- still to be done.
- It seems to be a fundamental problem for forrest, we have to return to this.
- Børre: wiki - will follow up on the UTF-8 problem
- temporary workaround: use 8-bit encoding for the memos (=MacRoman works correctly)
- convert all memos to MacRoman.
- Sjur did this.
- Trond, Børre, all: fix the links
- Børre will check if there are still more link problems
- Lot’s of links to .txt files are broken. Trond and Børre takes a look at that
issue.
- Nothing has been done on this issue.
- Børre: contact Min Áigi, discussing possible practical arrangements for cooperation.
- We have received a very positive letter from Min Áigi. But, to receive texts
we need a contract. I contacted Anne-Britt today, who will look for a “template”
we can use.
- Metsähallitus (Finland) is waiting for a contract as well.
- Thomas has forwarded a contract from Lantmäteriverket (Maanmittauslaitos)
in Finland.
- Trond has sent Thomas the contract between Statens Kartverk and UIT:
- Thomas has forwarded the contract to Maanmittauslaitos and they have written a licence
for us. When we have signed it we will receive the sami names in Suomi. Trond have to
sign it.
3. Documentation - divvun.no
We discussed to set up forrest as a standalone on the web server, or to make forrest
convert the xml. As long as there is not much load on divvun.no, it is perhaps
easiest to just put of forrest. Another possibility is to type “forrest war”,
which gives .. that can be saved as … If we install a war-file, we can get away
with only updating testfiles. If you, on the other hand, install forrest, you must
redo the whole installation on the server when you update. Using war is an easier
solution, since we can isolate different parts of the whole site. We should run the
default Tomcat, and a forrest file manualy. Textual updates should be done
automatically, as a cronjob. The issue should be discussed more in detail.
The task is to set up the server and to integrate it with the cvs repository.
Deadline for when we want to have the site up and running is the end of April.
4. Corpus gathering
Contract prototype! We have two models: Textlaboratoriet in Oslo, and the Helsinki model,
presented by Kimmo Koskenniemi, and run by CSC in Finland, as a 3-part solution:
The author, the licenser (publisher) and the maintainer (in that case: CSC). The
Oslo model is simpler in the outset, but perhaps harder to maintain. The Helsinki
model is more complicated in the outset, but perhaps(?) easier to maintain, since
it is clear in a more principled way. We should have the UIT and Sámediggi lawyers
look at this as well.
Børre: Contact Kimmo K &% Ruth V F
Deadline: The middle of May(?).
We should contact the Writers’ organisations, and suggest for them to recommend to
their members that they in turn use the contract we have suggested to them for
use of their texts.
5. Corpus infrastructure
Location of charset conv perl script
Tomi has made a perl script to convert problematic charsets to utf8. Where do we store it? Presently, we have cvs files in gt/, and we plan corp files outside the cvs. The question is then where to have cvs-included corpus files, in gt/ with the other cvs files, or in the corpus catalogue with the other corpus files. The discussion will go on in the newsgroups.
6. Linguistics
Thomas: Nothing more than already reported. Work continues on verb transitivity.
Maaren: working missing list. There are very many misspellings in the missing list.
Spelling errors are very valuable for us, and in general, they are hard to get at.
The spelling errors in the missing list should thus be gathered, e.g. in this format:
/misspelledform/correctspelledform/frequency_of_misspelledform?/
There is a thread for this on the newsgroup, “Misspellings in the corpus material
One frequent error type is the a - á errors. We could perhaps have a special
prechecking a>á before the usual “editing distance 1”.
Compilation error
see BUG #69. The parser did not compile on the 17th of April. Many of the errors
have general interest, and the cvs comment will be posted to the news group.
7. Term db
Deadline is end of April for the internal beta. Work is in progress, but not done.
8. Other issues
Job positions.
- Sámi language technology position vacant in Tromsø (Nordlys today)
- Marit’s position soon available for others to apply for
- There will in practice be money for one more position for large parts of the
disamb project lifespan. Please suggest candidates if you know someone.
Mouse problems
Mainly Sjur has problems, Tomi sometimes
Upgrade to 10.4
Leif Åge has ordered Tiger, and he will distribute the installation packages from
Guovdageaidnu. Tiger will be released on the 29th of April.
Things that we have modified may get broken as we move to 10.4. We should be aware
of those issues. The information from Apple is incomplete.
Physical meeting?
Week 21 (23.5.->), place to be decided opn nest meeting
Vacation
To be discussed and decided in the next meeting.
9. Summary, task lists
TODO:
- All:
- Follow up the corpus format discussion.
- Decide upon vacation time
- Views on physical meeting place (May meeting)
- Tomi:
- Waiting for the corpus xml-format verification, to complete xml-conversion
script(s) after that
- Børre:
- Contact Kimmo Koskenniemi and Ruth Vatvedt Fjeld about contract
- Follow up on the divvun.no issue, suggest tomcat and .war files
- Work on the termdb
- Find a solution for wiki and utf-8
- Link problems in forrest
- Contact Min Áigi, discuss technical details
- Maaren: tries to work with the missing list. Else?
- Thomas: work with verb transitivity
- Sjur:
- terminology db
- text license contract
- divvun.no
10. Next meeting, closing
25.04.2005 09.30
Closed at 13.00