Dictionary tech meeting

Agenda:

homonymy entries
- how to generate the correct paradigm
How to parametrise (what? make?) on the lg/dic level
make a transducer containing all the dictionary entries
improvement timetable

Homonymy entries

It has been solved in sme. How?

Now: adding special tags (xml node attribute) after the lemma. This attribute is added to the lexc entry, which ties the transducer to the specific lemma in the dictionary.

Systematic homonomy

Example:

Textword lohkki is ambiguous:

	Nom	Gen	Norwegian
1.	lohkki	lohki	lokk
2.	lohkki	lohkki	lesar

We want to generate both paradigms, and bind each to the correct lexical entry.

xml:

1.
<e>
  <lemma pos="n">

2.
<e>
  <lemma pos="actor">

All actors follow the same pattern. To generate the corresponding paradigms:

lohkki+n+actor+sg+n
lohkki+n+sg+n

The corresponding xml entries looks like:

1.
   <e usage="ped" src="nj">
      <lg>
         <l pos="n">lohkki</l>
         <lc>lohkit</lc>
      </lg>
      <mg>
         <tg>
            <t pos="n">lokk</t>

2.
   <e usage="ped" src="nj">
      <lg>
         <l pos="n" subclass="actor">lohkki</l>
      </lg>
      <mg>
         <tg>
            <t pos="m">leser</t>

We introduce the @subclass to denote inflectional subclasses, like proper nouns, actor inflection like the above

Random homonymy

rett, rettar %[[domstol%] (germ. a-stem )
rett, retter %[[matrett%] (germ. i-stem)

   <e usage="ped" src="nj">
      <lg>
         <l pos="n" subclass="m1">rett</l>
         <lc>lohkit</lc>
      </lg>
      <mg>
         <tg>
            <t pos="n">domstol</t>

2.
   <e usage="ped" src="nj">
      <lg>
         <l pos="n" subclass="m2">rett</l>
      </lg>
      <mg>
         <tg>
            <t pos="m">matrett</t>

Here we use the inflection class as subclacc. That should uniquely identify the correct paradigm.

Improvement timetable

Friday 20.3. Then subversion.

How to parametrise (what? make?) on the lg/dic level

Make a transducer containing all the dictionary entries

Language Technology at UiT

Page Content