As mentioned in previous tips, three tools that help a company maintain translation consistency at different levels include glossaries, style guides, and translation memory. Glossaries help maintain consistency at the term level, translation memories help maintain consistency at the sentence level, and style guides help fill in the gaps by maintaining consistency in style, tone, phrasing, and more. This tip will quickly define translation memories and then provide links to download large free translation memories.
What is Translation Memory?
A translation memory is a database of parallel corpora (text). The text is broken up into translation units (usually sentences), which are matched with corresponding translation units in other languages. This memory allows a translator or localization project manager to instantly “recall” matching translations that have been previously completed so that no sentence needs to be translated yet again. Helpful matches can be exact or partial (aka “fuzzy matches”). By providing easy access to these partial and exact sentence matches, translation memory saves both time and money. This is particularly true for revisions of previously translated files, which will naturally show that each new version draws from many matches in previous versions. Translation memory helps thousands of human translators around the world, and dozens of software applications include translation memory capabilities.
Several free large translation memories are available for download. These downloads contain actual parallel corpora, not the translation memory tools to create or manage such corpora.
Government Translation Memory
- European Commission (millions of TUs in 22 EU languages)
- EU Constitution (thousands of TUs in 21 EU languages)
- European Parliament (millions of TUs in 11 EU languages)
- Stockholm Parallel Corpora (thousands of TUs in English, Greek, and Chinese)
Localization and Technical Translation Memory
- OpenOffice.org (tens of thousands of TUs in German, English, Spanish, French, Japanese, and Swedish)
- KDE (hundreds of thousands of TUs in 92 languages)
- PHP Manuals (thousands of TUs in 22 languages)
- European Medicines Agency (millions of TUs in 22 EU languages)
Media Translation Memory
- OpenSubtitles.org (millions of TUs in 30 languages)
- SETimes.com (millions of TUs in 9 Southeastern European languages)
Will these large translation memories include many matches for companies translating unrelated files? No. These is likely to be of greatest use when translating documents for the same organizations that created these memories. They are also of use to companies testing statistical machine translation engines, which require large parallel corpora for training.
Most of the translation memories above are available at the OPUS website. Some organizations like TAUS and tool providers like Google and Wordfast also offer access to shared translation memories. Additional free and paid parallel corpora are available here, here, at the bottom of this Wikipedia article, and at TM Marketplace. Some might also reason that various localization glossaries are the equivalent of translation memories – the ambiguity of their classification comes because most translation units in software GUIs are very short.

