Meeting setup
- Date: 28.1.2008
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 09:58.
Present: Børre, Lene, Maaren, Per-Eric, Sjur, Thomas, Tomi, Trond
Absent: none
Agenda accepted as is.
Updated task status since last meeting
Børre
- start to reorganise the documentation
- gather
sma
texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- Hunspell lexicon conversion
- InDesign documentation
- investigate the NSIS installer
- release InDesign tools Jan. 30.
- fix bugs!
- other:
- worked on layouts for giellatekno.uit.no, and the coming oahpa.uit.no sites.
This work will also be incorporated into the divvun.no site.
Lene
- Pedagogical project, running till 31.12.08: making grammatical games (VISL)
and interactive dialogues on internet - for pupils, students and others:
Lene, Trond, Saara
- VISL-games and quizes are almost ready (ready for trying, some adjustments to
do) http://beta.visl.sdu.dk
- dialogues: Made a simple technical model, beginning to write the dialogues
- have had course for teachers at one school to get feedback
- made users´ documentation: OAHPA!-portal
Maaren
- Put the list of possible
sma
corpus sources into a document
- update the Changes document
Per-Eric
- check some unusual and missing words from the last Olavi missing list
- keep the contact with Kurt Tores family about his texts, send a new contract
- Sent a new contract, Kurt Tores wife has a contact person who is A Kintel
- try to visit S T Sandstrøm personally as soon as possible, maybe this week
- Sent a new contract, now she is really positive to give all her text to us
without visiting her personaly
- try to find other authors who have
smj
texts digitally
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
(source info, used in testing or not, added to lexicon or not)
- discuss more parallel texts
Sjur
- document Windows CD installation work-around
- unless we get feedback saying otherwise, the present documenation should be
ok
- start to reorganise the documentation
- gather
sma
texts
- improve forrest stability with i18n, site look
- not done, but found an i18n issue on the G5/”internal risten.no”
- set up the Leopard Server features for collaborative support
- check the present
sma
sources
- name db/risten.no
- identified the issue with non-working browsers - locale mismatch
- improve hyphenation testing
- done, and several issues identified
- investigate the NSIS installer
- get hotel rooms in Snåsa
- not done yet, will do today
- make a first
sma
project plan
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- add verb paradigm generation bug to Bugzilla
- test that hyphenation is identical in InDesign and the command line tool
- done - they seem to be identical, which means we can trust the test results
- release InDesing tools Jan. 30.
- fix bugs!
- other:
- tested the automatic language identification in Word 2007, after user
feedback that it works just fine. And it does also for me. We probably need a
FAQ to relegate such issues to, where we can say something like “IF this, TRY
that”
- hyphenation and speller testing
Thomas
- look at test cases still not behaving properly
- create hyphenation test data
- release InDesing tools Jan. 30.
- fix bugs!
Tomi
- Hunspell lexicon conversion
- document how compounding is controlled in the PLX conversion
- release InDesing tools Jan. 30.
- fix bugs!
Trond
sme->smj
lexicon conversion to build bilingual lexicon resources
- Reorganise documentation (with Børre and Sjur)
- Reorganised ped doc, otherwise not
- Gather sma texts (with Børre and Sjur)
- Look at the sma source files (with Sjur)
- Name lexicon project: Test editing xml files (when they are ready for it)
- Make a first
sma
project plan
- Looked at it myself, not in plenary
- fix bugs!.
Pedagogical software online
We now have user documentation almost online (the technical one is part of
our TechDoc hierarchy). What is missing is a working URL.
Børre has been working on setting up a Forrest based site for the user front
end, with adjusted CSS styling, another layout, etc. This did not quite succeed.
The next step is to establish an url, say http://oahpa.no/
or
http://oahpa.uit.no/
, directing to that site.
It should be online on Feb. xx, the slick URL and professional layout should be
ready by YY.
TODO:
- Setting up the user documentation with an external address, and
cross-reference via tabs to giellatekno and divvun. (?)
- get an easy-to-remember URL (UiT/IT)
- More thorough skin, layout, … (External person within the Ped team,
Internal forrest expert) This we will postpone until later
Workshop in Tromsø, end of February
Conference in Tromsø in week 9, february 28-29, on
Sámi documentation and revitalisation. The two
teams should present our work, and our view on the future. There is a start for
it at plan/art/samdoc08/samdoc08.tex
and
plan/art/samdoc08/samdoc08-sem.tex
.
One of the goals for the conference is to make proposals for grant support.
First draft ready in Snåsa.
TODO:
- Presentation of our work
- Basic tools (Sjur, Trond, Thomas)
- Applications (Lene, Sjur)
- Corpus infrastructure (Børre, Saara, Sjur)
- Overall infrastructure (“Makefile”) (Sjur, Tomi)
- Plans for future work (Sjur, Trond)
- Relevance for other projects
- Standard written language texts (Trond)
- Existing written dialect texts (Lene, Trond)
- Existing dialect recordings (Lene)
- Turn the text into slides (samdoc08.tex into samdoc08-sem.tex (Trond)
Documentation
TODO:
- start to reorganise the documentation (Børre, Sjur, Trond)
Corpus gathering
TODO:
- follow-up on the
smj
texts from Kurt Tore (Per-Eric)
- the text discussions will go via Anders Kintel
- go visit Sigga Tuolja Sandstrøm (Per-Eric)
- no need to go there, Per-Eric called her and sent her a contract
- need to talk to a person who has scanned the texts, will get the texts from
him, he will send them to Børre
- gather
sma
texts (Børre, Sjur, Trond)
- Put the list of possible corpus sources into a document
gt/doc/lang/sma/sma-corpus-plan.jspwiki
(Maaren)
Infrastructure
TODO:
- add Jabber account in iChat (all)
- improve forrest stability with i18n, site look (Børre, Sjur, Tomi)
- set up the Leopard Server features for collaborative support - permanent chat
rooms, project calendar(s), wiki? (Børre, Sjur)
Linguistics
North Sámi
Hyphenation bugs still there, now properly documented by the improved test
bench.
Lule Sámi
Hyphenation: same as for sme
.
TODO:
sme->smj
lexicon conversion to build bilingual lexicon resources, and
increase smj
coverage (Trond, Svenne).
- Add the words when all words are ready.
South Sámi
TODO:
- check the present sources (Sjur, Trond)
Name lexicon infrastructure
The upcoming dictionaries are:
- kven: fkvnob, nobfkv
- smesmj
- (smjsme)
- smenob
The kven work should be visible. The smj
should be reported this week, the smenob
is part of the ped work and is interesting for the general audience.
Status quo: The dictionaries are shown online
(http://www.divvun.no:8889/index.html?locale=no), but do not give
translations. It requires the locale request parameter to work properly for most
users.
Short-term goal: Have them work in risten GUI.
Long-term goal: Give them a better GUI and integrate in ped platform and other
places.
Decisions made in Tromsø can be found in [this meeting
memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html]
TODO:
- fix i18n bug in risten.no/G5 (so they will work without the proper locale
request) (Sjur)
- fix display in column 3 (Sjur)
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: (Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
well) (the morphological section should be kept intact, in e.g.
propernoun-sme-morph.txt) (Sjur, Saara)
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
(Sjur, Tomi, Saara)
- implement data synchronisation between risten.no and
the cvs repo, and possibly other servers (ie the G5 as an alternative server
to the public risten.no - it might be faster and better suited than the
official one; also local installations could be treated the same way)
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
(e.g. @type=secondary) (Thomas, Maaren, linguists)
- merge placenames which are errouneously in different entries: e.g. Helsinki,
Helsingfors, Helsset (linguists)
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
(linguists)
Hunspell
The %> marker does not survive into Hunspell to work as a boundary marker,
despite being defined as %> for the Hunspell version.
Priority list:
- debug the missing > marker
- add
smj
to the soup, make sure it works roughly as good as sme
- fix the remaining conversion bugs for
sme
- return to
smj
, and fix whatever is left to fix
- integrate the derivations as separate “continuation lexicons”
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
- debug %> problem (Tomi)
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
added to lexicon or not) (Saara)
- test new and nested error markup (Sjur)
Automated testing
TODO:
- improve hyphenation testing (Sjur)
- add verb paradigm generation bug to Bugzilla (Sjur)
Open issues based on test results :
sme
Version: Davvisámi, version 1.0.1, 2008-01-28
- 425 - roman number - will not be fixed in 1.0 release
- 426 - comp words from Divvun.no - guoktedássásaš accepted - still open
- 536 - speller accepts “impossible” compound-forms, geažideapmigárvu and
giddesteapmisággi accepted - FIXED
- 593 - missing words in beta2, still missing Nuppelohkái - not lexicalized
- 595 - prefix+name wihtout hyphen (ovdaLot instead of ovda-Lot)
- 597 - does not recognize nubbelohki - not lexicalized
- 603 - suomabealdi, norggabealdi accepted
- 606 - speller accepts VUOHTA compound
- 611 - double hyphen sugg still accepted
- 613 - short gen. as second compound part
- 625 - word+footnote - possibly Polderland or MS, or a consequence of allowing
spell checking of words including digits
- 627 - prefix + hyhpen does not get accepted
- 629 - a taking part in compounding without hyphen
- 631 - numbers starting with 0
- 633 - double hyphens accepted in Word, not by cmdline speller
- 634 - PropGen+hyph+PropGen
- 637 - nai(go) becomes -naj(go)
smj
Version: Julevsáme, version 1.0.1, 2008-01-28
- 482 - Nuorttalijguovlojn accepted again
- testcase changed, test PASSED
- 607 - acro + hyphen, NRKGA accepted - still OPEN
- 615 - actio and actor compounds - FIXED
- 616 - Bispadime-me-ráden - still OPEN
- 618 - dipht. simpl. - FIXED
- 629 - a taking part in compound - still OPEN
- 631 - number compounds starting with 0
- 634 - rop gen + hyphen + Prop gen
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
- document how compounding is controlled in the PLX conversion (Tomi)
The speller works in InDesign and InCopy. Lacks user defined lexicons, but
Polderland is trying to fix this bug.
The new Sámi newspaper, Ávvir, is publishing its first edition on February
6th. We should release final InDesign tools, including final spellers, one week
ahead of that, Wednesday Jan. 30th.
TODO:
- improve hyphenation testing (Sjur)
- test that the hyphenation is identical in InDesign and the command line
hyphenator (Sjur)
- test twolc # bug solution (Tomi, Trond, Sjur)
- fix double hyphen bugs (Tomi)
- new lexicons by Tuesday (Tomi)
- updated Polderland tools by Wednesday (Sjur)
- final changes and bug fixes by Thursday afternoon (Thomas, Sjur, Tomi)
- final lexicons by Friday morning (Tomi)
Hyphenators
We need more test data, to test hyphenation of different types of words.
Thomas should make a file gt/sme|smj/testing/hyphenation.txt
of the
format:
compoundword com^pound#word
It should contain all possible word formation patterns and their correct
hyphenation. That is, at least:
- compounds
- derivations
- names
- misspellings
- compounds with acros, numbers, names, etc.
TODO:
- create hyphenation test data (Thomas)
- done
- also done: used it in testing, bugs discovered
Windows installer
TODO:
- investigate the NSIS installer (Børre, Sjur)
Releases
TODO:
- update the Changes document (Maaren)
- release InDesing tools Jan. 30. (Børre, Sjur, Thomas, Tomi)
- compile new lexicons (Tomi)
- test (all)
- document (Sjur)
- package and release (Sjur)
Other
South Sámi project startup meeting
- in Snåsa
- 11th - 15th of Feb, kick-off meeting Wednesday 13.
- Participants: SD (incl. Divvun), Nord-Trøndelag fylkeskommune, Snåsa kommune,
UiT, “resource persons”, south Sámi part of SGL (at least one representative)
We extend the meeting on our part, to have this project’s first gathering.
TODO:
- get hotel rooms (Sjur)
- make a first
sma
project plan (Sjur, Trond)
Corpus contracts + open source
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
(Sjur)
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
bug report, such that for each bug, we know exactly when it should have been
fixed, in what file(s) and what version.
83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs,
38 are other bugs), and 23 risten.no bugs
Next meeting, closing
The next meeting is 4.2.2008, 09:30 Norwegian time.
The meeting was closed at 11:28.
Appendix - task lists for the next five days
Boerre
- start to reorganise the documentation
- gather
sma
texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- Hunspell lexicon conversion
- InDesign documentation
- investigate the NSIS installer
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- fix bugs!
Lene
- Ped project
- work on Tromsø Sami workshop paper
Maaren
- Put the list of possible
sma
corpus sources into a document
- update the Changes document
Per-Eric
- check some unusual and missing words from the last Olavi missing list
- keep the contact with Kurt Tores family about his texts.
- try to find other authors who have smj texts digitaly
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
(source info, used in testing or not, added to lexicon or not)
- discuss more parallel texts
Sjur
- start to reorganise the documentation
- gather
sma
texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- check the present
sma
sources
- name db/risten.no
- investigate the NSIS installer
- get hotel rooms in Snåsa
- make a first
sma
project plan
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- updated Polderland tools by Wednesday
- final changes and bug fixes by Thursday afternoon
- fix bugs!
Thomas
- look at test cases still not behaving properly
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- final changes and bug fixes by Thursday afternoon
- fix bugs!
Tomi
- Hunspell lexicon conversion
- document how compounding is controlled in the PLX conversion
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- debug %> problem in Hunspell conversion
- fix double hyphen bugs
- new lexicons by Tuesday
- final changes and bug fixes by Thursday afternoon
- final lexicons by Friday morning
- fix bugs!
Trond
- Report the smesmj project
- Start working on the samdoc talk
sme->smj
lexicon conversion to build bilingual lexicon resources
- Reorganise documentation (with Børre and Sjur)
- Gather sma texts (with Børre and Sjur)
- Look at the sma source files (with Sjur)
- Name lexicon project: Test editing xml files (when they are ready for it)
- Make a first
sma
project plan
- work on Tromsø Sami workshop paper
- fix bugs!.