Meeting setup
- Date: 06.06.2005
- Time: 11.00 Norw. time
- Place: Wherever we are :-)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 11.00. Agenda accepted as is.
Present: Børre, Sjur, Thomas, Trond
Away: Tomi (Finland), Maaren (Iceland)
Main secretary: Trond
2. Reviewing the task list from the last meeting
Børre:
- Disamb documentation within giellatekno.uit.no (with Trond)
- The alpha version of the forrest-based giellatekno external site is up and running,
but there are still open issues:
- word-form generation not working
- no link back to Divvun project page
- Add corpus texts that are unproblematic from a juridical point of view
- Contact people about Lule Sámí texts
- Contacted Kåre Tjihkkom, he said he didn’t have any texts himself.
He asked me to contact Árran. Talked to Bård Eriksen at Árran, who was
interested in contributing with text and also would like to be a tester.
- Contact skolelinux and Knut Hofland about webcrawler.
- Had a look at [http://borel.slu.edu/crubadan/], which is what skolelinux
has been talking about. According to
[http://borel.slu.edu/crubadan/stadas.html]
- Finish the setup of divvun.no
- Contact Thor Øivind:
- Contacted him, he was going to update Tomcat (had an old version).
- Contact Roy Dragseth about cvs-commit mailing
- Sent an e-mail some time ago. He has not answered, will phone him.
- continue Forrest i18n
- Nothing done.
- Please also add an issue in Forrest’s own issue tracker (bug db) (Sjur)
- continue [http://giellatekno.uit.no], encoding problems
- Solved, using decimal codes :-( (add to Forrest issue tracker)
Maaren:
- continue missing list
- find and prepare issues for the language board meeting
- translate Helsinki contract into rough Norwegian; send it to Sjur,
then lawyer
Sjur:
- Scan and OCR the Helsinki contract; mail it to Maaren
- Done (not OCR)
- Resend to Trond (Maaren is away)
- Ask Anne-Britt to update the contract with the University (we are getting 3, not
2, persons there from the fall)
- write the LIST option to our test tools, as requested in the discussion group.
- complete the action summary after our half-year evaluation
- fix risten.no bugs
- fixed some
- added risten.no to bugzilla @ giellatekno.uit.no
- updated the version on the server
- discussed bugs in eXist, and feature requests
- contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
- voice group-chat not working to Sámediggi - should be fixed
- tried, but no info from LÅH yet
- Also done:
- Tiger-updates (a.o. Aspell), getting Sámediggi intranet access to work
Thomas:
- Find issues to giellastivra čoahkkin
- help Maaren prepare those same issues
- done, but should go to giellalávdegoddi instead of giellastivra
- Continue with the verbs
- A few hundred left, should finish tomorrow
Tomi:
- Start looking at the smj infrastructure, turn it into utf-8, look at whether updates
and improvements done for sme should be done for smj as well, in order to get a uniform
infrastructure.
- update catxml to remove sections in unwanted language(s)
- Translate the texts in the sme/corp/correct/ directory to utf-8.
- Work on the nonrec transducer: Get the nonrec files out of src/ and into int/, get the
“make nonrec” process up and running.
- Done (but headed into limits of the available xfst tools).
- Look into shortening of three-part compounds, middle part
- decapitalisation of proper nouns when (if?) compounded, and when derived
(the oslolaš issue)
Trond:
- Disamb documentation with Børre.
- Alpha version up and running.
- Systematize the linguistic issues ahead of the Language Board meeting
- Add cgi-bin errors to Bugzilla.
- continue the office disc. with the faculty
- Linguistic work.
- Work in continuos progress, worked on vocabulary, worked on disamb input from Pekka.
- computational linguistics, with Tomi.
- Not done (see Tomi’s task list above).
All:
- The www.divvun.no release.
- The Bugzilla bugs, and a quip or two?
- continue the discussion on the language policy of divvun.no in the newsgroup
- Trond has replied
- Børre agrees to the policy
- What about news? In all languages? Lule or North Sámi as default language.
- Thomas: likes it.
- Open questions about language policy:
- All = English? Yes (the amount is small). Decided.
- How much in 5 languages?
- Size of general public info: Trond: only a few paragraphs - about the project, and the
language policy of the site. More specifically, it should include:
- what the divvun project is
- why it is made, by whom, on whose decision
- what we make, (why we do not make sma)
- how we make it
- how, when, and for whom it eventually will be used, and on what terms and price
- an explanation of our lg policy, and a good set of links.
- The user documentation will be written in the target language, in Finnish and
Norwegian, and in English.
3. Documentation - divvun.no
Thor Øivind has to get Tomcat working. See our task list for other issues.
4. Corpus gathering
- Lule Sámi:
- Ådå Testamennta (Svenska bibelsällskapet). They must be contacted.
- North Sámi:
- Børre will get in touch with the Crúbadan project about the se material there,
and access to it. Cf. [http://borel.slu.edu/crubadan/index.html]
- The language consultant in Guovdageaidnu is actively working, gathering public documents from
the municipality archives.
- Otherwise Børre awaits progress on the contract work:
- Sjur will check with Maaren about the translation from Finnish to Norwegian, and forward the
translation to Trond if not done by somebody else.
5. Corpus infrastructure
Lars Nygård has made a searchable web interface as a beta version:
Samisk online korpus for
documentation.
6. Linguistics
- NRK:as
- -eaddjiid vs. –eddjiid
- -eabbo/-abbo vs. –eabbu/-abbu
- -at/-ut vs. –et/-it
- -dettiin vs. –diin/-din
- -lit vs. –šit (kondišunal)
- Western -l- forms :
soappulin vs. soppolin (soppošin vs. soappušin)
- kond alternativ form livččon vs. livččen
- kárášjoh nieida vs. kárášjohnieida
- hálede:
+Du1:e K ; =hálede
+Du1:edne K ; =hálededne
+Du1:etne K ; =háledetne
- háliidit+V+TV+Ind+Prs+Sg3
háliida
háliid
- makkárat imperativhámit galget leahkit? Ovdamearkkat bárrastávval vearbbain: ```
+Pl1:Q3t K ; ! bohtot ! West
+Pl1:Y3t K ; ! boahttot ! East
+Pl2:Q2t K ; ! bohtet ! West
+Pl2:Y3t K ; ! boahttet ! East
Bárahis stávvalat:
háliidit+V+TV+Imprt+Prs+Du1
háliideadnu
háliideahkku
háliidit+V+TV+Imprt+Prs+Pl1
háliideadnot
háliideahkkot
háliidehkot
háliidetnot
háliidit+V+TV+Imprt+Prs+Pl2
háliideahkket
háliidehket
Leahkit:
+Du1:eadnu K ;
+Du1:eahkku K ;
+Pl1:ehkot K ;
+Pl:eatnot K ;
beakkán+A+Sg+Gen
beakkána
beakkán
beakkán+A+Sg+Acc
beakkána
beakkán
```
- Jeagohin vs. Jeagoheapmi
- rieban vs. rievan
- Goallossánit: kulturlávdegoddi / kultuvrralávdegoddi (sosial-, privat-, …)
jf. *kultur, kultuvra (privatriekta)
- Loatnasánit
To be further discussed within the project, and brought forward to the normative bodies.
7. Term db
Still more bugfixing, adjustments based on user feedback.
8. Other issues
Nothing.
9. Summary, task list
All:
- The www.divvun.no release.
- The Bugzilla bugs, and a quip or two?
- continue the discussion on the language policy of divvun.no in the newsgroup, esp.
re the open questions:
- How much in 5 languages?
- Size of general public info: Trond: only a few paragraphs - about the project, and the
language policy of the site. More specifically, it should include:
- what the divvun project is
- why it is made, by whom, on whose decision
- what we make, (why we do not make sma)
- how we make it
- how, when, and for whom it eventually will be used, and on what terms and price
- an explanation of our lg policy, and a good set of links.
- should news be in all languages? What about RSS feeds from Forrest?
Børre
- Contact Kevin Scannell of an Crúbadán fame, ask if we can use his north sami material.
- Contact Roy Dragseth about cvs-mailing
- Work on divvun.no
- Continue http://giellatekno.uit.no conversion to forrest …
- Contact Svenska bibelsällskapet
- Add issue to forrest issue tracker about utf-8 ihtml documents.
- Check for existing bug reports, it might already be in there
- Continue forrest i18n research
Maaren:
- continue missing list
- find and prepare issues for the language board meeting
- translate Helsinki contract into rough Norwegian; send it to Sjur,
then lawyer
Sjur:
- Check the translation of the Helsinki contract with Maaren; forward the task
to Trond if not yet assigned
- Ask Anne-Britt to update the contract with the University (we are getting 3, not
2, persons there from the fall)
- write the LIST option to our test tools, as requested in the discussion group.
- complete the action summary after our half-year evaluation
- fix risten.no bugs
- contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
- voice group-chat not working to Sámediggi
- add i18n bug to Forrest issue tracker
- add i18n menu as feature request to Forrest issue tracker
Thomas:
- Begin to prepare issues to giellalávdegotti čoahkkin
- Present issues in dicussion forum
- Finish the work on verb valency
- Start to look at Lule Sámi
Tomi:
- Start looking at the smj infrastructure, turn it into utf-8, look at whether updates
and improvements done for sme should be done for smj as well, in order to get a uniform
infrastructure.
- update catxml to remove sections in unwanted language(s)
- Look into shortening of three-part compounds, middle part
- decapitalisation of proper nouns when (if?) compounded, and when derived
(the oslolaš issue)
Trond
- translate Helsinki contract into rough Norwegian; send it to Sjur,
then to lawyer
10. Next meeting, closing
13.06.2005 10.00
Closed at 12.45