Meeting setup
- Date: 11.04.2005
- Time: 14.00 Norw. time
- Place: Wherever we are :-)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review, participants
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- Summary, task lists
- Closing
Task list since last meeting:
- Tomi: to check orig. doc. integrity in the OO file format
- discussion postponed till the XML format is settled/defined
- Tomi: Move the corpus discussion from newsgroup to forrest
- Thomas: continue work with verbs / modals
- Børre: correct image linking in termdb and divvun projects
- will look into it this week
- All: discuss our own corpus format in the newsgroup - to be continued.
- Tomi and Børre: identify the input encodings antiword can handle
- to continue to look into this: it is still possible that char conv.
can be done within antiword.
- Sjur, Børre: terminology database
- Børre: divvun.no:
- to set up the documentation and integration with CVS
- Børre: should make the How-To tab appear as intended:
- Børre: add automatic version-stamp: add $Author$ to some docs, replace
e-mail with project e-mail address (Børre will ask Leif Åge).
- Børre: wiki - will follow up on the UTF-8 problem
- temporary workaround: use Latin 1-encoding for the memos
- Børre will continue to look into the chaperon tools (chaperon is used for
converting Wiki to XML)
- Trond: forrest run in cochise:
- Still awaiting latest report from Trond
- Trond, Børre, all: fix the links
- Børre will check if there are still more link problems
1. Opening, agenda review, participants
Opened at 14.08. Agenda accepted as is.
Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre
2. Reviewing the task list from the last meeting
- Tomi: to check orig. doc. integrity in the OO file format
- discussion postponed till the XML format is settled/defined
- Tomi: Move the corpus discussion from newsgroup to forrest
- has been doing XSLT for docbook to our xml format (docbook2sme.xsl)
- Thomas: continue work with verbs / modals
- Finished.
- Maaren: I have now time (at last) to work with this project, starting tomorrow.
I`ll take contact with Trond
- Børre: correct image linking in termdb and divvun projects
- All: discuss our own corpus format in the newsgroup - to be continued.
- Tomi and Børre: identify the input encodings antiword can handle
- Børre looked at it before easter. antiword assumes everything is cp1252 or Unicode.
It would be easiest to do the conversion of non-Unicode input with a perl-script
operating on the (erroneous) output of antiword. The reason for that is that
most of the encodings are “private” ones.
- Sjur, Børre: terminology database
- Børre: divvun.no:
- to set up the documentation and integration with CVS
This must be integrated with the systems at the UIT, and Børre will contact the
IT persons at UIT
- Børre: should make the How-To tab appear as intended:
- Børre: add automatic version-stamp: add $Author$ to some docs, replace
e-mail with project e-mail address (Børre will ask Leif Åge).
- The issue concerning Author tag is solved.
- Still missing a project e-mail.
- Børre: wiki - will follow up on the UTF-8 problem
- temporary workaround: use Latin 1-encoding for the memos
- Børre will continue to look into the chaperon tools (chaperon is used for
converting Wiki to XML)
- Trond: forrest run in cochise:
- Hasn’t been testing more since easter
- When forrest runs on cochise, you can start up a browser at your own local
machine with the address http://cochise.uit.no:/, and look
at the documentation that way.
- Despite the warnings, most of the documentation is generated when using forrest
to generate static HTML sites. The generated documentation is found in ./build/site/
- We can now start moving the old documentation to the attic.
- Trond, Børre, all: fix the links
- Lot’s of links to .txt files are broken. Trond and Børre takes a look at that
issue.
3. Documentation - divvun.no
Børre is going to contact the main sysadm and discuss the technical details.
Deadline: we return to it on the next meeting.
4. Corpus gathering
- Min Áigi: Svein Nordsletta at Min Áigi has been in contact with Per Edvard
Klemetsen. From the e-mail:
«Jeg hadde akkurat en telefonsamtale med Svein Nordsletta (tlf 78469709),
rådgiver i Min Aigi, som var interessert i status for korrekturprogramarbeidet.
Du bør snarest ta kontakt med han for orientering og bistand til tekster,
feilkorpus m.m. Han var meget interessert!»
Would be interesting to get the articles before they are proof-read.
And after that, the finished articles in InDesign(?) format, with hyphenation.
We need a contract. Trond contacts the people at the University in Oslo, so we have
a template. We probably have to modify it for our use cases.
We have got some texts from Marit Einejord at Romssa sáirál (hospital) and
Várdobáiki in Skánit.
Olavi Korhonen has offered us mainly Lule Sámi text, the NT, and he will also look
for other texts. His Northern Sámi texts are more scattered, but he will look at
that issue also.
Routines for corpus gathering: We need a dropbox where raw incoming text can be
stored before it is processed. We’ll return to this issue in the newsgroup.
Sámi placenames in Finland will become available as well.
5. Corpus infrastructure
- deadline for corpus format
- perl scrip for unicode conv.
6. Linguistics
Maaren will start working 2 d/w, Thomas has been working with the transitivity of
the verbs, continuing Lena’s work.
Lule Sámi: Anders has promised us the dictionary. The exact status of it is not
known. Olavi Korhonen also works on a Lule Sámi dictionary, to be finished in the
autumn.
More serious linguistic work on Lule Sámi to be started at the beginning of the summer.
7. Term db
The beta version will be done within a week :-)
8. Other issues
None.
9. Summary, task lists
TODO:
- Tomi: Move the corpus discussion from newsgroup to forrest
- has been doing XSLT for docbook to our xml format (docbook2sme.xsl)
- awaiting final XML format
- Thomas: continue work with verb transitivity. Contact Olavi about
his dictionary.
- Maaren: Will work with this project, starting tomorrow.
- I`ll take contact with Trond and Thomas (telephone meeting Tuesday 12.4.)
- All: discuss our own corpus format in the newsgroup - to be continued.
- Tomi and Børre: identify the input encodings antiword can handle
- Tomi makes a perl-script, Børre helps out on that one.
- Børre: divvun.no:
- to set up the documentation and integration with CVS
This must be integrated with the systems at the UIT, and Børre will contact the
IT persons at UIT
- Coordinate with Sjur as well
- Still missing a project e-mail address. Børre contacts Leif Åge.
- Sjur, Børre: terminology database
- Børre: should make the How-To tab appear as intended:
- Børre: wiki - will follow up on the UTF-8 problem
- temporary workaround: use 8-bit encoding for the memos (=MacRoman works correctly)
- convert all memos to MacRoman
- Trond, Børre, all: fix the links
- Børre will check if there are still more link problems
- Lot’s of links to .txt files are broken. Trond and Børre takes a look at that
issue.
- Børre: contact Min Áigi, discussing possible practical arrangements for cooperation.
10. Next meeting, closing
Next meeting: 18.04.2005 10.00
Closed at 15.34