Terminology Management with DITA

DITA-OT-Day 2016 - Stefan Eike

Who am I?





Stefan Eike

Work

  • Technical Writer and Information Architect
  • Open Source DITA Developer

Education

  • Technischer Redakteur B.A. (Technical Writer)
  • Medieninformatiker M.Sc. (Media Informatics) coming soon

Open Source DITA Team

How to Contribute?

  • Join us: Code, test, write, translate and design.
  • Share ideas and create visions.
  • Give feedback and write bug reports.

Semiotic Triangle



Figure of the Semiotic Triangle

Terminology

Why Terminology?

  • Enhance the understandability of your documents by avoiding synonyms.
  • Speak with one voice, a harmonized terminology increases the quality of your text.
  • Lower your translation costs. Harmonized (multilingual) terms lead to more matches in your Translation Memory System (TMS).
  • Prevent internal communication conflicts in your own team and with others.
  • Explain to translators the meaning of your product subject/vocabulary.

More languages + more synonyms = More words to handle

Where should terminology be applied?

Wherever there is text, there should be terminology

  • Software
  • Marketing material
  • Technical documentation
  • Web Sites
  • Catalogues
  • Brochures
  • Internal documents
  • Presentations
  • and so forth

org.doctales.terminology

What's that?

  • Plugin for the DITA-OT
  • New DITA topic type for terms
  • New DITA map type for terminology maps
  • <oXygen/> XML framework for authoring
  • Transformation scenarios
    • Termchecker for DITA
    • Termchecker for XLIFF
    • TBX-Basic
    • TBX-Min
    • Termbrowser (based on <oXygen/> webhelp)
    • Termbrowser Responsive (based on <oXygen/> webhelp)

New <termentry> Topic

  • New <termentry> DITA topic type, similar to <glossentry> topic.
  • A term is the central unit of information.
  • The new topic type opens the flexibility to change the behavior of <oXygen/> to better support terminology management.

<glossentry> Topic

Built for glossaries, not for terminology databases.

  • Not meant to be multilingual.
  • No elements for terminology metadata.
  • Attribute standards are missing, e.g. <glossStatus> @value has no values defined
<glossentry id="usbfd">
  <glossterm>USB flash drive</glossterm>
  <glossdef>A small portable drive.</glossdef>
  <glossBody>
    <glossPartOfSpeech value="noun"/>
    <glossUsage>
      Do not provide in upper case (as in "USB Flash Drive") 
      because that suggests a trademark.
    </glossUsage>
    <glossAlt>
      <glossAcronym>UFD</glossAcronym>
      <glossUsage>Explain the acronym on first occurrence.</glossUsage>
    </glossAlt>
    <glossAlt id="memoryStick">
      <glossSynonym>memory stick</glossSynonym>
      <glossUsage>This is a colloquial term.</glossUsage>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>stick</glossAbbreviation>
      <glossStatus value="prohibited"/>
      <glossUsage>This is too colloquial.</glossUsage>
      <glossAlternateFor href="#usbfd/memoryStick"/>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>flash</glossAbbreviation>
      <glossStatus value="prohibited"/>
      <glossUsage>This short form is ambiguous.</glossUsage>
    </glossAlt>
  </glossBody>
</glossentry>

<termentry> Topic

  • Multiple languages in one topic
  • Compatible to DITA-OT processing.
  • Total control of element names and structures
  • Easier to create XSL transformations, Schematron Termchecker and Termbrowser and <oXygen/> XML author frameworks that do not interfere with the standard DITA frameworks
  • Strict element and attribute structures simplify UX focussed development
  • Content can still be shared between a <glossentry> based glossary (e.g. for the end user) and the <termentry> based terminology base, e.g. by <ph> (e.g. for internal terms)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE termentry PUBLIC "-//DOCTALES//DTD DITA DOCTALES Termentry//EN" "termentry.dtd">
<termentry id="truck">
  <title>Truck</title>
  <definition>
    <definitionText>A truck is a motor vehicle designed to transport cargo.</definitionText>
    <definitionSource>
      <sourceReference href="https://en.wikipedia.org/wiki/Truck" 
        format="html" scope="external">
        Wikipedia
      </sourceReference>
    </definitionSource>
  </definition>
  <termBody>
    <agreedWith>
      <termCommitteeMember>Heinrich Müller</termCommitteeMember>
      <termCommitteeMember>Günther Schmidt</termCommitteeMember>
    </agreedWith>
    <fullForm language="de-DE" usage="preferred">
      <termVariant>Lastkraftwagen</termVariant>
    </fullForm>
    <acronym language="de-DE" usage="preferred">
      <termVariant>LKW</termVariant>
    </acronym>
    <fullForm language="en-US" usage="preferred">
      <termVariant>truck</termVariant>
    </fullForm>
    <fullForm language="en-GB" usage="preferred">
      <termVariant>lorry</termVariant>
    </fullForm>
    <fullForm language="en-GB" usage="notRecommended">
      <termVariant>truck</termVariant>
    </fullForm>
  </termBody>
  <relations>
    <relatedTerms>
      <relatedTerm keyref="car"/>
    </relatedTerms>
  </relations>
</termentry>

<oXygen/> Author Framework

Why a framework?

  • Tagless XML editing: Technical Writers (mostly) prefer WYSIWYG-like editing and don't want to see XML tags.
  • UX-focussed
  • Fun

Screenshot

Termchecker

Schematron based

  • Creates Schematron Quick Fixes.
  • Prerequisite: @xml:lang of DITA Topic must match @language of <termNotation>.
<sch:pattern id="truck-d35e134">
  <sch:rule context="text()">
    <sch:report test="contains(/*/@xml:lang, 'en-GB') and contains(., 'truck')"
                role="warning"
                sqf:fix="sqfGroupTruckd35e134">The term 'truck' is not allowed.</sch:report>
    <sqf:group id="sqfGroupTruckd35e134">
      <sqf:fix id="termTruckd35e1284">
        <sqf:description>
          <sqf:title>Replace with an allowed term: 'lorry'</sqf:title>
          <sqf:p>A truck is a motor vehicle designed to transport cargo.</sqf:p>
        </sqf:description>
        <sqf:stringReplace regex="truck">lorry</sqf:stringReplace>
      </sqf:fix>
    </sqf:group>
  </sch:rule>
</sch:pattern>

Termchecker Screencast

Demo

Installation

This chapter explains how to install the plugin org.doctales.terminology to the DITA-OT and how to integrate its framework into <oXygen/>.

Make sure you have installed:

  • DITA-OT 2.3 or higher
  • <oXygen/> 18 or higher
  1. In the DITA-OT directory call:
    ./bin/dita -install https://github.com/doctales/org.doctales.terminology/archive/master.zip
    The plugin is installed into your DITA-OT.
  2. To integrate the <oXygen/> XML framework, open Options > Preferences in <oXygen/>.
  3. Go to Document Type Association > Locations.
  4. Add the directory of the plugin as an additional framework directory, e.g:
    ~/DITA-OT/plugins/org.doctales.terminology
    Note: You can find detailed instructions for installation and usage in the README.md.

Goals for the future

Visualization of semantic net

Any ideas?

Contact

Thank you