Meeting setup
- Date: 27.06.2005
- Time: 09.00 Norw. time
- Place: Wherever we are :-)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- risten.no
- Other issues
- Vacation planning
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09.30. Agenda accepted as is.
Present: Maaren, Sjur, Thomas, Trond, Tomi
Absent: Børre
Main secretary: Sjur
2. Reviewing the task list from the last meeting
All:
- The divvun.no release - until Wednesday:-)
Børre
- Coordinate the release and the tasks ahead of the release.
- Vacation thereafter.
Maaren:
- divvun.no
- in Tysfjord till Thursday, will work on the project Thursday and Friday
- the press represented by 2, Sámeradio + Nordsalten Avis
- the presentation and opening went well
- Karen Monica Paulsen presented risten.no
- … as well as Rolf Olsen
- Berit Karen Paulsen demonstrated the web site
- Lule Sámi people very happy to finally see Lule Sámi words included in the term db
- other items covered later in the meeting
Sjur:
- Author press release for divvun.no for the opening
- risten.no bugs and fixes
- Other things done:
- updated cvs repo for risten.no to the released version
- tagged cvs repos for risten.no and divvun.no with the release tag
DIVVUN-NO_1_0_RELEASE
- Wrote monthly report (see your local forrest after cvs up)
Thomas:
- translate the rest of sme version of sma document (partly translated)
- translate press release into Northern and Lule sami
- bug Kåre Tjihkkom with the last QA on smj translations
- Kåre hasn´t yet sent timetable
- Work with Lule Sami
- call Pål Hivand about when he’s going to send out the press release(s)
Tomi:
- Common Makefile for all the languages
- catxml default language issue
- three-part compounding
- haven’t succeeded to get it working
- decapitalisation of proper nouns when (if?) compounded, and when derived
(the oslolaš issue)
- corrected word2xml script
Trond
- divvun.no release
- Mostly worked on this release. Translated common documents and updated some of
the documentation.
- sma-background completion
- Translate the Hki legal text into Norwegian
- Fix the reported Lule Sámi bugs before Tysfjord.
- Only partly done. What still does not work is a certain rule that is meant
to take care of optional vowel lowering in the Px paradigm, the issue
at hand is the bahaviour of the twolc => operator.
- Apart from that I have looked at the corpus interface, and at corpus translation
routines.
- Then I had Wednesday off to sit in an exam commision.
3. Documentation - divvun.no
Opening - short evaluation
Not mentioned much in Tysfjord, but the address was given: “go there and have a look”.
We delivered what we planned, with the exception of Forrest i18n (because of Forrest).
What’s next
- cvs up in place, for nightly updates of the online documentation.
- using a special, read-only account, for security reasons (the update script
will probably have to have the password written in clear)
- Tomi will have a look, using the cgi-update scripts for the Giellatekno site
as a template, and talk to Thor Øivind on this issue.
- i18n:
4. Corpus gathering
Anders Kintel is now ready with the dictionary manuscripts, and handed it over to the
Sámi Lanuage board and GIO by Harrieth Aira and Julie Eira (OAO) at the language board
meeting in Tysfjord. The manuscripts will be checked by dthe September meeting.
Kåre Tjihkkom and Kaia Kalstad will now go over the Norwegian - Lule Sámi part, and
the dictionaries will be up to control for the September meeting of the lg council. Afterwards,
we may get a copy. Factually, we could get it already now, since the Sámi-Norwegian part is
already checked. The dictionary is written in Filemaker Pro, and can be exported to other
structured formats. Børre will be back at some point next week, and may then ask for a copy.
Thomas has talked to Hans Ragnar Mahtisen about parallel placenames in his Sámi Atlas.
These are names of foreign countries. Hans Ragnar positive.
Hans Ragnar already gave us the Sámi placenames, and they are included already,
but not as parallel names. Thus, if we want the Tyskland > Duiskka correction
function, we need the names.
Thomas will continue to discuss, and emphasize that we also would like to have the parallel
names (Tyskland as well as Duiskka).
Trond will get a new version of the NT.
5. Corpus infrastructure
Storing the corpus.dtd and similar files. Now in gt/dtd/
gt$ls
CVS cwb doc dtd script sma sme smi smj smn sms tmp www
The structure of gt and the way to handle dtd must be discussed and decided upon. The
discussion is moved to the newsgroup.
6. Linguistics
sme
Nothing done last week. The missing list contains a lot of compounds - compounding isn’t
working anymore (it worked earlier?). A lot of noise words are Lule and South Sámi words -
we need proper language identification as part of the corpus infrastructure, to make use
Tomi’s language extraction feature of catxml to only get the relevant language.
54 jïh
47 åarjelsaemien
20 Sámedigge
20 Saemiedigkie
20 julevsáme
17 Sámelága
15 finihtta
14 mij
14 dajvesne
13 julevsámegiel
12 aaj
12 •
11 sæjhta
11 dejtie
10 vuolitásahus
10 gïelem
10 ájggu
9 vihkeles
Bugzilla:
The -heddjiid issue should be looked into after the giellalávdegoddi has had their meeting.
The -eabbo vs. -eabbu issue could be looked into now, because -eabbo has to be implemented anyway.
smj
- Thomas working with adjectives.
- Trond working with bugs.
7. risten.no
Opening - short evaluation
Opening went well, with the exception of the “sticky” Lule Sámi text.
What’s next
- more bug fixing
- a couple of critical features still needs to be added
8. Other issues
It is working (needs a separate input module), so the question is only how it should be used
Open for discussion (when Børre is back).
Vacation planning
- Børre: 20.06-04.07
- Maaren: 11.07-14.08
- Thomas: 11.07-14.08
- Tomi: next week on vacation, 4.7-10.7
- Trond: 4.7-10.7 vacation. 1.8.-14.8 vacation. In between:
More or less vacation (In Finland, working whenever possible).
- Sjur: probably 25.07-08.08.
9. Summary, task list
This week’s task list is a summary of all undone tasks from before the
divvun.no/risten.no openings.
All:
- The divvun.no release - until Wednesday:-)
Børre
- Vacation
- Continue http://giellatekno.uit.no conversion to forrest …
- Contact Svenska bibelsällskapet
- Add issue to forrest issue tracker about utf-8 ihtml documents.
- Check for existing bug reports, it might already be in there
- Continue forrest i18n research
- Sometimes abides the =locale=se/smj/fi in the url (the document gets
correctly selected). The menus and tabs are translated to the language
that the browser asks for.
- add RSS newsfeed support (check with Sjur/risten.no)
- discuss with Anders Kintel about whether we could get the Sámi-Norwegian dictionary
ahead of acceptance
Maaren:
- The missing list, both the overall missing list from our xml corpus, and a file-for-file
review, in order to get different terminology
- Go through the missing list from risten.no
Sjur:
- Ask Anne-Britt to update the contract with the University (we are getting 3, not 2,
persons there from the fall)
- risten.no bugs and fixes
- Give Maaren (or: add to the gt/sme/corp corpus) a list of the risten.no words
- write the LIST option to our test tools, as requested in the discussion group
- complete the action summary after our half-year evaluation
- contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
- voice group-chat not working to Sámediggi
- Maaren has problems with SubEthaEdit (can’t connect)
- add i18n bug to Forrest issue tracker
- add i18n menu as feature request to Forrest issue tracker
Thomas:
- wait for proofreading of the last QA on smj translations
- write to Hans Ragnar Mahtisen about the parallel placenames
- work with Lule Sami adjectives
Tomi:
- Common Makefile for all the languages
- catxml default language issue
- three-part compounding
- decapitalisation of proper nouns when (if?) compounded, and when derived
(the oslolaš issue)
- cvs update script for divvun documentation
- corpus infrastructure: dtd location (both public and internal)
Trond
- Work on the bug list (Lule Sámi).
- Work on compounds (both regular and three-part, with Tomi)
- Work on the corpus interface (with Lars)
- Corpus infrastructure: dtd location
- Get the new version of the New Testament
- Get back to Xerox on the commercial license.
- Check Hans-Ragnars names.
- Give Thomas a telephone, and a crash-course in setting up phone meetings
10. Next meeting, closing
04.07.2005 10:00
Closed at 11.00