Meeting setup
- Date: 08.02.2005
- Time: 10.00 Norw. time
- Place: Wherever we are:-)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Infrastructure:
documentation system status - still issues open?
still UTF-8 issues?
physical infrastructure - network for Thomas & Maaren?
(still big problems in the Samediggi network, although
things have improved)
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db:
grammar(s)
db editing
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review
Started 10.05
2. Infrastructure:
- documentation system status - still issues open? Børre:
- has written a how-to about installing Forrest, and started to
write the how-to about writing and using Forrest as a documentation
tool (menues and tabs)
- disamb project will get its own Forrest structure today
- divvun.no needs to be set up
- still UTF-8 issues?
- Use only .profile! (no .bashrc)
- sorting in cochise: Finnish-based now, put into .profile:
export LC_ALL=se_NO.UTF-8
(The locale will have to build it first with
this command only once, by the superuser):
localedef -i /usr/share/i18n/locales/se_NO -c -f UTF-8 se_NO.UTF-8)
To test if the se_NO.UTF-8 locale works:
cat text | LC_ALL=se_NO.UTF-8 sort | less
~/.emacs:
(setq locale-coding-system 'utf-8)
subtopic: Sámi sorting in emacs
- kwic-snt does not align correctly with UTF-8. Does this work?:
LC_ALL=se_NO.UTF-8 kwic-snt
- physical infrastructure - network for Thomas & Maaren?
(still big problems in the Samediggi network, although
things have improved)
- news reading still not working
- CVS: many generated files not ignored - to be followed
3. Corpus gathering
More addresses gathered. Asked Anne Britt to send the letter
now to the ones we have the address of. No response so far.
Finnish addresses still missing.
Thomas: called Anders Kintel, positive to share his work, but has
legal questions. Thomas has sent Børre’s letter, will call him
again.
4. Corpus infrastructure
Tomi has been reading Saara’s notes, and the CSC site, getting into
the DTD now.
Will construct a tree of the document structure.
Tiger-XML:
[http://www.ims.uni-stuttgart.de/projekte/TIGER/]
TEI:
[http://www.tei-c.org/]
Schema types:
- RelaxNG (simple format)
- XML Schemas (w3)
5. Linguistics
Working with Trond’s list and adjectives.
Normativity is still an open question in many cases. Orthographic
status can (and should) be tagged in the disambiguator output. To
be discussed.
6. Term db
- grammar(s):
- North Sámi is roughly finished
- Lule and South Sámi in the works
- db editing:
- making progress, soon to be able to edit:-)
7. Other issues
- Tromsø trip for Maaren and Thomas: they would like to go. It’s good
for the project, and they will continue the planning with Trond.
- Two students: Ilona working 15-20%, Lena (native speaker) will work
on the verb lexicons.
- Meeting memos into the documentation, will thus be public. To be
converted to wiki format
8. Summary, task lists
TODO:
- Sjur + Børre: set up divvun.no
- All: use only .profile (no .bashrc!) -> into the documentation
- Børre, Trond, all: continue to update the documentation of setting
up machines and working environment
- Tomi, Trond: to clean the documentation from references to the old 7/8-bit
system, and convert them to proper documentation for the present UTF-8
- Tomi, Børre, Trond: discuss the use of .profile and .bashrc in cochise
- Tomi, Børre, Trond: sorting in Emacs, SubEthaEdit and command line
- Trond to call Erling Paulsen (news group problem)
- Børre to call Anne Britt about corpus letter
- Tomi to lead the discussion on corpus infrastructure, DTDs vs Schemas, etc
- Sjur, Trond, Tomi, Marit: Orthographic status (misspelled) tags
in the disamb output?
- Sjur, Børre: set up a suitable format for the meeting memos for easy
publishing in Forrest.
9. Closing
Next meeting: Monday 14.2., 10 AM.
Closed at 11.30.