Divvun/Disamb Seminar in Tromsø
- project meeting
- date: 23. (after lunch) - 26. (Friday is travelling day) January.
- place: Tromsø
- Participants
- Monday, Tuesday, Wednesday: Børre, Ilona, Linda, Saara, Sjur, Tomi, Trond
- Thursday: Børre, Ilona, Linda, Sjur, Tomi, Trond
- Practical arrangements:
- Room(s): The meeting is at room 4475, with room 4402 as alternating room.
Both rooms are situated in the same floor as our offices.
- We have one projector, and perhaps a possibility to hire another one.
- 4475 has internet, but no whiteboard, and 4402 has whitebord, but not internet.
- TODO: Get hold of a flipover for 4475.
- TODO: Lunch only for 7 persons
- TODO: Machine for Ilona at 10.00.
- Lunch will be in Sjampanjekantina, 11.30, Monday to Thursday.
- Coffee breaks are not arranged, we may order them from the canteen, or
we may buy it ourselves.
- hotel = Thon Polar Hotel, Grønnegata 45 (Saara, Ilona, Linda, Sjur)
Suggested content that’s still not scheduled:
- Programmers
- Work distribution (in bug fixing, documentation, maintenance etc.)
- Parallel corpora and the corpus infrastructure
- Discussions and clarifications, planned a trip to Oslo, no concrete work
- xml2lexc implementation plan
- Not done (much done earlier, though)
- file-specific xsl scripts for corpus conversions, the what, how and who issues
(t: Saara/Tomi; p: Trond, Børre, Sjur)
- Saara has demoed her work, with eager support of Tomi and Børre, we have thus
got some insight in what to do and how. Our xsl scripts cannot do much heavy
work, though, they remain on the metalevel (modifying the header rather than
the body).
- Linguists
- Two linguists were missing, so not too much linguistics was discussed, but some.
- Get together and discuss common issues
- Approaches to working with the lexicon
- We are wiser now. (or are we?)
- Tags
- All too short discussion with the Saami dept, but at least it clarified our own
thoughts. They promised to look at the tags and their usefulness. The indefinite pronoun
question remains open.
- Substandard forms
- The numeral projects
- Progress made:
- For the arabic part, num.fst is now integrated into sme.fst. It needs refinement,
though.
- For the linguistic part, we did some fieldwork with an innocent informant (Tomi).
- Twol issues
- Unix etc. courses.
- Kept a 45 min unix course, useful. Still more hints to pick up, though, but ok that
the hints are presented in palatable portions. Conclusion: Have a unix session at the
next meeting as well, or perhaps unix course sessions online, next week etc.
- All
- how to correctly write Forrest documentation
- site.xml (what is it, how do we add to it?)
- site-frag.xml (what’s the difference?)
- testing - how-to, improvements
- code style guidelines
Schedule
Aims:
- a mixture of training/discussions and practical work, to avoid monotonicity
- most training in the beginning, as preparations for later work
- The schedule never recovered from the changing participant number (-2, +1, -1, etc.),
but nevertheless we had a good workmeeting.
Monday 11.30-16
The meeting starts with lunch in Sjampanjekantina. 11.30.
Common 12-13.30 (1,5 h)
- Presentation, kick-off 15 min
- Project updates 15 min (2*7 min)
- Project milestones 15 min (2*7 min)
- Cooperation evaluation (practical/daily cooperation) 30 min
- project management
- programmer work
- linguistic work
- anything else?
- break 15 min
13.30-14.45 (split in two groups)
- Trond, Ilona, Linda : Unix
- Sjur, Tomi, Børre, Saara: XQuery intro
15-16 (all)
Software updates/maintenance:
- XXE + config updates
- OS X (all on 10.4.4)
- bash / localisation status quo
- Many different setups did that we never reached a conclusion. Sjur has
a working setup …
- Xerox tools: We must negotiate a new version with Xerox.
- We have full version for creating the full wordlists, but only on cochise
- the new Mac is faster and has more memory
- forrest? Probably not
- SEE, Unison, CVL, Marratech
- CodeTek VirtualDesktop setup/config (?)
Tuesday 8.30-11.30
- How to use the xml corpora (Tomi, Børre, Saara => all - 1h)
- Done, Saara and Børre had a good introduction.
(Computational) linguistics 9.40-11.30
- keepinmind
- 3-part compound treatment
- Numeral session
We split in 3 different groups
- Techni group: Børre, Sjur (updates)
- Arabic group: Saara, Linda
- Alphab group: Tomi, Trond, Ilona
- Arabic
- regx: regex for numeric expressions, dates, quantifiers, … 12. 3. 2003
- Alphabetic
- ling: what are the linguistic facts?
- comp: tags, lexc treatment
- regx: regex for numeric expressions, dates, quantifiers, … 12. 3. 2003
- Much done, but implementation is still half (arabic) and full (linguistic) open. Saara said she
would have a look at dates with words (months).
Tuesday 12-16
- Programmers: Sjur, Børre, Tomi, Saara, (Irene)
- XML specification
- Irene <===
Check whether and how we cover all of the kvensk needs, done
Discuss the multimedia things, done
- Linguists: Ilona, Linda, Trond.
- Note: We should consider including someone from the Saami department.
- Done. (Kjell Kemi and Steinar Nilsen)
- Tagsets and POS in Saami (Evaluating and fixing (?) the tagset both for
the parser and for the disambiguator)
- We only presented our problems and our analyser to the Saami dept staff.
- SGL decisions:
- overview, any consequences for our work?
- No input from SGL.
Wednesday 8.30-11.30
- Programmers: Sjur, Børre, Tomi, Saara
- XML specification
- Irene <===
Check whether and how we cover all of the kvensk needs
Discuss the multimedia things
XQuery/eXist/proper names continued
Wednesday 12-16
- Disamb
- Reviewing issues from our first meeting:
- Schedules, internal dependencies
Thursday
Sjur and Børre in meeting - all day?
Thursday 8.30-11.30
Thursday 12-16
To be allocated
Groups:
ling: * Ilona, Linda
tech: * Saara, Tomi, Sjur, Børre, Trond
twol: * Sjur, Trond
Techtopics:
Routines for addition to lexicon, cooperation wrt. new corpus texts coming in
- Maaren, Ilona, Børre, Trond, (Linda, Thomas)
- Ilona and Linda have discussed cooperation here.
eXist/XQuery/XSL/webapp/risten.no/proper names tasks (17,5h/2d2,5h):
- XQuery people: primary: Saara, Tomi, Sjur (=STS); optional: Børre, Trond (=BT)
- XQuery/XSLT intro (1,5h) (STS + BT)
- intro to the risten.no architecture (1,5h) (STS + BT)
- server setup
- webapp organisation:
- how the search works presently
- how the editing works presently
- single down the simultaneous update bug (3h)
- DONE (puh) and reported to the mailing list (read and enjoy)
- enhance security (1,5h) (https? Yes, but how to? password check) (STSB)
- Some links
- Howto setup ssl
- [RedHat specific, but
useful anyway|http://www.tldp.org/HOWTO/SSL-RedHat-HOWTO-4.html]
- [Online password strength
meter|http://www.securitystats.com/tools/password.php]. When making passwords we should follow the rules recommended on
this page.
- Above points: Not done.
- refactor the code to split collection-specific code from general
(both edit and search) (3h+)
- make a first version of proper noun infra (6h)
- synchronisation with CVS (1,5h):
- download files
- sorting (identical and fixed sort order)
- setup access to the windows server (0,5h)
- refactor to stored procedures (everything contained within eXist)?
(discuss, 0,5h)
- will make it easier to upgrade the webapp (no need to log on to the server
machine, just use the eXist Java client)
- will make the webapp mostly selfcontained
- might introduce a performance hit - needs to be checked
- XML DTD check:
- review the converted lexicons
- anything missing, or badly organized?
Summary
We have seen important progress in some but very limited fields, but there are still
open issues.
Evaluated against our expectations the week was too short.
One lesson to learn is that we should have met better prepared, we should have gone
through the issues in beforehand, so that we could concentrate upon the really hard
problems.
It was nice to meet face to face, both for the first time, and for the
long-time-no-see factor.