Dataset Translations
Translation files for datasets and related metadata in the SOS Data Catalog reside on the SOS machine under the /shared/sos/locale directory. There are three types of files used to translated datasets. Dataset name and description translations are defined in SOS playlists and in tab separated value (tsv) files, while major categories, subcategories, and keywords are defined in comma separated value (csv) files.
Each translation file follows the naming convention “xx_YY.zzz”, where “xx” is the [ISO 639 language code][iso-369], “YY” is the ISO 3166 country code, and “zzz” is the file format extension (sos, tsv, or csv). The locale information is extracted from the playlist filename, so it is critically important to use the correct language and country codes for the language being translated. All the translations for each language/country combination are defined together in the same sos, tsv, and csv files.
It is not required to supply translations for the entire SOS Data Catalog at once for a locale. Any dataset information without translations will remain in English and you can add more translations incrementally at any time.
English Language Overrides
Permalink to English Language OverridesThe default locale is “en_US”, which is English in the United States. However, you are allowed to use en_US.sos, .tsv, and .csv files to make “translations.” While these aren’t technically translations, the definitions specified will override the default English information in the SOS Data Catalog, giving you the opportunity to customize the text there, if desired.
Dataset Translation Playlists
Permalink to Dataset Translation PlaylistsThere are two options for translating the names and descriptions of datasets and their variations. The first of these is using standard SOS playlists (see the TSV Files section to use the other option). Either option works well, but playlists must be used if you have longer descriptions that require multiple paragraphs.
For translations, each dataset or variation is specified by three playlist
properties: an include
property with the dataset playlist path, a rename
property with a translated name, and a description
property with a translated
description on one or more lines enclosed between {{ }}
characters.
Translating Dataset Playlists for a New Language
Permalink to Translating Dataset Playlists for a New Language- Generate playlist files from the SOS dataset catalog using
translations2db --generate_playlist
(see the translations2db Command Line Utility section for details) - Copy /shared/sos/locale/generated/en_US.sos from to
/shared/sos/locale/xx_YY.sos, following the
xx_YY.sos
locale naming convention for the language and country for which you want to create a translation - Replace the English values (to the right of the equals sign) for the
rename and description keywords with translated values in a Linux text
editor, such as
vi
orgedit
. Be sure the description text is enclosed between{{
and}}
characters - Reload your dataset translations into the SOS Data Catalog using either
the SOS Stream GUI or
translations2db --load_playlists
(see the translations2db Command Line Utility section for details)
Editing Dataset Playlist Translations for an Existing Language
Permalink to Editing Dataset Playlist Translations for an Existing LanguageEditing a translation (or English override text) is done by simply modifying
the values to the right of the equals sign of either rename or description
keywords for any datasets in a translation playlist with your favorite text
editor. Be sure the description text is enclosed between {{
and
}}
characters. Should you wish to remove a particular translation entirely,
just delete the lines containing include, rename, and description for any
datasets you no longer want.
TSV Files
Permalink to TSV FilesThe second option for translating the names and descriptions of datasets and their variations is with tab separated value (tsv) files. Either option works well, but tsv files have the advantage of being easily loaded into a standard spreadsheet for convenient editing. For translations, each dataset or variation is specified by four columns: A is the dataset ID number in the Data Catalog, B is the playlist file path, C is the dataset name, and D is the dataset description.
One highly efficient way to use this option is to upload the dataset tsv file you want to translate into a Google Sheet using Google Drive and then use the Google Translate formula to automatically translate all the text.
To complete the translation for this example, copy column C over column B (using Paste special > Paste values only), delete column C, then repeat these same steps for the the dataset descriptions. Once the translations are all pasted as values they may be edited by hand to fix errors and improve the quality of the translations.
The final steps are to download this as a tsv file, copying it into /shared/sos/locale/zh_TW.tsv (replacing the original file there).
Reload your dataset translations into the SOS Data Catalog using
translations2db --load_dataset_tsv
(see the translations2db Command Line
Utility section for more details).
Highlights (Spotlight) Translations
Permalink to Highlights (Spotlight) TranslationsHighlight (spotlight) datasets were added in SOS 5.3 and are a small set of SOS datasets periodically selected by the NOAA SOS Team in Boulder. They provide an easy way to gradually explore the SOS Data Catalog over time and enable quick access to datasets that highlight timely events. Translations for these datasets are specified by a combination of normal dataset translations made in .sos or .tsv files and additional definitions in highlights dataset (.hds) files, which have a similar structure to a playlist. Note that highlights datasets are also associated with Highlights Categories used to classify a particular kind of highlights dataset. These are translated separately in the same csv files used for other dataset categories and keyword (see the following section).
For translations, each highlights dataset or variation is specified by several
properties: an include property with the dataset playlist path, a
TemporaryDatasetName property with a translated name (only used for temporary
datasets), and a WhyDescription property with a translated description of why
the dataset is being highlighted on one or more lines enclosed between {{ }}
characters. Note that the dataset name is not translated in the
hds file unless it is temporary (i.e., not in the SOS data catalog).
Regular dataset name translations are already made using .sos or
.tsv files (see the previous section).
Translating Highlights (Spotlight) Datasets for a New Language
Permalink to Translating Highlights (Spotlight) Datasets for a New Language- Generate hds files from the SOS dataset catalog using
translations2db --generate_highlights
(see the translations2db Command Line Utility section for details) - Copy /shared/sos/locale/generated/en_US.hds from to
/shared/sos/locale/xx_YY.hds, following the
xx_YY.sos
locale naming convention for the language and country for which you want to create a translation. - Replace the English values (to the right of the equals sign) for the
TemporaryDatasetName and WhyDescription keywords with translated values in
a Linux text editor, such as
vi
orgedit
. Be sure theWhyDescription
text is enclosed between{{
and}}
characters - Reload your dataset translations into the SOS Data Catalog using either
the SOS Stream GUI or
translations2db --load_highlights
(see the translations2db Command Line Utility section for details)
Editing Highlights (Spotlight) Dataset Translations for an Existing Language
Permalink to Editing Highlights (Spotlight) Dataset Translations for an Existing LanguageEditing a translation (or English override text) is done by simply modifying
the values to the right of the equals sign of either TemporaryDatasetName
or
WhyDescription
keywords for any datasets in an hds file with your favorite
text editor. Be sure the WhyDescription
text is enclosed between
{{
and }}
characters. Should you wish to remove a particular
translation entirely, just delete the lines containing include
,
TemporaryDatasetName
, and WhyDescription
for any highlights datasets you no
longer want.
Dataset Category and Keyword Translations
Permalink to Dataset Category and Keyword TranslationsThe SOS Data Catalog uses categories and keywords to organize and provide
searching capabilities for the hundreds of datasets it holds. Each dataset has
at least one “Major Category” and “Subcategory” to classify it and usually has
one or more “Keywords” pertaining to its content. Each highlights (spotlight)
dataset also has a “Highlights Category”. These metadata entities are localized
using comma separated value (csv) files. Csv is a common import/export format
for spreadsheets, such as Excel or Google Sheets. The csv files follow the
naming convention xx_YY.csv
, where “xx” is the [ISO 639][iso-369] language
code and YY is the ISO 3166 country code. The default locale is
en_US, which is English in the United States. However, if an en_US.csv file is
present, it will not be imported into the SOS Data Catalog since American
English values are already defined by default.
Each row in the csv file includes the type of metadata, the English text, and the translated text.
Translating Categories and Keywords for a New Language
Permalink to Translating Categories and Keywords for a New Language- Create an en_US.csv file (see the translations2db Command Line Utility section for details)
- Rename it to the correct locale name following the
xx_YY.csv
naming convention - Load it into a spreadsheet program (or a text editor if preferred)
- Replace the last column of English text with translated values. Do not modify the text in the first two columns
- Export back to csv (if using a spreadsheet). Be sure the
xx_YY.csv
file is placed in the /shared/sos/locale/ directory - Load the
xx_YY.csv
file into the SOS Data Catalog using either the SOS Stream GUI or translations2db command line utility
Editing Category and Keyword Translations for an Existing Language
Permalink to Editing Category and Keyword Translations for an Existing Language- Load an existing
xx_YY.csv
file into a spreadsheet program (or a text editor if preferred) - Update the last column of translated text with your edits. Do not modify the text in the first two columns
- Export back to csv (if using a spreadsheet). Be sure the
xx_YY.csv
file is placed in the /shared/sos/locale/ directory - Reload the
xx_YY.csv
file into the SOS Data Catalog using either the SOS Stream GUI or translations2db command line utility