Universal notation for bilingual dictionaries

3810 visits 
Universal notation for bilingual dictionaries
word1 [tabulator] word2 [tabulator] note1 [tabulator] note2 [tabulator] translator
Example:
# This is a comment line. English-German dictionary. Encoding UTF-8.
day	Tag	n:	most common word for day in German	John Smith
First word is a word in the first language. Second word is the word in the second language the translation
of the first word in second language. First and second column are storing the actual translations. Other
columns make it possible to describe the translation and give the information about the translator who
made the translation.

The first note column contains information about word class. Standard are:
m - masculine noun
f - feminine noun
n - neuter noun
pl - plural noun
n: - noun
v: - verb
adj: - adjective
adv: - adverb
prep: - preposition
conj: - conjunction
interj: - interjection

Or information about the sphere where it is commony used if it is not a common word. Standard are:
[abbr.] - abbreviation		[fin.] - finance		 [myt.] - mythology
[agr.] - agricultural		[geo.] - geographical		 [phra.] - phrase
[astr.] - astronomy		[geol.] - geology		 [phy.] - physics
[aut.] - automobile industry	[hist.] - history		 [polit.] - politics
[bio.] - biology		[it.] - information technologies [rel.] - religion
[bot.] - botany			[law.] - law term		 [sex.] - sexual term
[chem.] - chemistry		[mat.] - mathematics		 [slang.] - slang term
[chil.] - children speech	[med.] - medicine		 [sport.] - sport term
[col.] - colloquial		[mil.] - military		 [tech.] - technology
[el.] - electrotechnics		[mus.] - musical term		 [vulg.] - vulgar term

Or special notes which are specified in each file they are used in. Special notes are in () braces.
Example:
# Comment. Special note (dv) used for derived verb
work	some word	(dv)	note for this translation	John Smith

The second note is a place for all the information that cannot be described by first note.
The history of our universal notation for bilingual dictionaries
After months of work with hundreds of dictionaries and millions of translations we have discovered that good notation is essential for good dictionary. This notation is very simple an allows everyone to use the dictionary. Plain text with UTF-8 encoding is the ultimate combination that allows everyone to access the translations and working with the actual dictionary is very simple. You can make the format fit your dictionary but always specify the changes in the comments with actual dictionary.

We had to change previous XDF format onto universal dictionary notation because Nasa already uses XDF format (extension of XML).

DOs
  • Always use UTF-8 encoding. If you do not use UTF-8 encoding specify the encoding in comments before the translations.
  • Pack the dictionary with maximum compression in zip or rar.
  • Name the dictionary for example like English-Spanish.txt (or English-Traditional_Chinese.txt). It is simple for the users to work with the dictionaries named like this. You see that Document1.txt is not optimal.
  • Always specify the special notes in comments.
  • If you discover some useful change of universal dictionary notation let us know so everyone can benefit from your discovery.
DON'Ts
  • Do not try to save space and write several translations per line. It makes work with the dictionary very complicated.
  • Do not mix braces of the notes. For example (med. ] or (med:).
  • Try to avoid /, \, ", ', . and braces in the translations. Users like to find dog instead of "dog"/('s) in dictionaries.
  • Do not fill the dictionary you make with long phrases or sentences. It is much more useful to have translations for go, visit, grandmother than to have one translation for "Mary went to visit her grandmother.". Sentences are one of the things that should not be in the dictionary. There should be only phrasal verbs and idioms.
  • Do not mix the notes into the translations. It is one of the worst things you can do. There are special columns for the notes.
  • Do not worry about the duplicate words (10 translations of word statue have 10 times statue as English word). Compression will take care of that. Saving more space by changing the format leads to much more complicated format. Thanks to tabulators as separators the format is very space saving. The xdf files can be also compressed very well.