Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Links

Initially, this is just a collection of links to useful internal and external documents:

Development process

The main highlights from the Dixite document is the following process and related tools:

  1. Planning - Apache Maven
  2. Requirement Analysis and Specification - ArgoUML, CVS and DocBook (XML) as well as LaTeX for (version control of) documentation
  3. Design and Specification - Dixite uses ArgoUML, and Xfig (runs under X11) for higher level graphing, I suggest we use OmniGraffle for graphing purposes.
  4. Implementation and Unit Testing - a plethora of software, depending on the tasks at hand.
  5. System Integration and Testing - JUnit and Bugzilla
  6. Delivery and Maintenance - Bugzilla

We have so far not had any real development process, only doing bits of 4 and 5 without support from a real plan, no design, no specifications, and testing only of the inflection of the linguistic models (we have very recently added the facilities to test two-level rules, but that isn’t put to much use yet).

Software we have

We are already using several of the tools listed by Dixite (CVS, Bugzilla, XML-based documentation). If we set up a development process similar to the above, with software used specified, we arrive at the following:

  1. Planning – [nothing really, that is, MS Word and Excel] see the list of candidates for more info
  2. Requirement Analysis and Specification – Forrest
  3. Design and Specification – Forrest
  4. Implementation and Testing –
  5. System Integration and Testing –
  6. Delivery and Maintenance – Forrest, CVS, Bugzilla

Suggested additions to our tool chest

For the missing parts I suggest we look more into the following packages:

Updated software list

After having investigated several tools and options, the following set of tools is what we will use for the main phases of our project (editors of different types, like SubEthaEdit, emacs and XXE, are not listed, just assumed):

  1. Planning – Merlin, OmniOutline, OmniGraffle, Forrest
  2. Requirement Analysis and Specification – Forrest, Merlin, ArgoUML, OmniOG
  3. Design and Specifiation – Forrest, Merlin, ArgoUML, OmniOG
  4. Implementation and Testing – Xerox, home-made testing routines, Bugzilla
  5. System Integration and Testing – Bugzilla, software from integrators, home-made testing routines
  6. Delivery and Maintenance – Forrest, CVS, Bugzilla

What is still open, is how to test Aspell and similar engines, as well as the hyphenation. For Aspell and other Unix spellers, a lot of testing can be automated, and the routines needs to be worked out. All testing needs to be formalised, and the results should be collected systematically, preferably automatically, and stored and processed to monitor the development of the tools.

UML References

UML is news to most of us, so here’s a list of references.

Sitemap