Multilingual Magazine’s List of Localization Terms, Definitions, Acronyms and Abbreviations


AAMT Asia-Pacific Association for Machine Translation CCJK Simplified Chinese, Traditional Chinese, Japanese and Korean
ACE automatic content enrichment CCS coded character set
ACR abstract character repertoire CDATA character data
AD audio description CE Common Era
ADR automated dialog replacement CEE Central and Eastern Europe
AM authoring memoty CEF character encoding form
AMT automated machine translation CES character encoding scheme
ANSI American National Standards Institute CEO chief executive officer
APDU application protocol data unit CFO chief financial officer
API application programming interface CGI common gateway interface
ASCII American Standard Code for Information Interchange CGO chief globalization officer
ASL American Sign Language CHT Chinese for Taiwan
ASP application service provider Cl community interpreting
ATA American Translators Association CIC corporate intelligence center
ATAG Authoring Tool Accessibility Guidelines CID character identifier
ATSUI Apple Type Services for Unicode Imaging CIO chief information officer
CJK Chinese, Japanese and Korean
B2B business-to-business CJKV Chinese, Japanese, Korean and Vietnamese
B2C business-to-consumer CL controlled language
BCE Before the Common Era CLA cross-lingual application
bidi bidirectional text CLAT controlled language authoring technology
BLEU Bilingual Evaluation Understudy CLC controlled language checker
BMP basic multilingual plane CLDR Common Locale Data Repository
BOM byte order mark CM content management; character map
BPO business process outsourcing CMM capability maturity model
BRIC Brazil, Russia, India and China CMS content management system
CNS Chinese National Standard
CAD computer-aided design CNT contents files
CAGR compound annual growth rate COLT connection optimized link technology
CAI computer-assisted interpretation COM component object model
CAP cultural adaptation process CP code page
CAT computer-aided/assisted translation CRM customer relationship management
CBMT context-based machine translation CRPG computer role-playing game
CBT computer-based training CS compound strings
ML markup languages SCM supply chain management
MLS multiple listing service SDK software development kit
MLV multilanguage vendor SDML signed document markup language
MMOG massively multiplayer online game SEL self-extensible language
MMORPG massively multiplayer online role-playing game SEO search engine optimization
MT machine translation SGML standard generalized markup language
MUD multiuser domain SL source language
MUI multilingual user interface SLA service level agreement
MWS multilingual workflow system SLV single language vendor
NLP natural language processing SMB small and medium-sized businesses
NLS national language support SME small and medium-sized enterprises; subject matter expert
OASIS Organization for the Advancement of Structured Information Standards SMG screen management guidelines
OAXAL OASIS Open Architecture for XML Authoring and Localization SMI structure of management information
OBJ object files SMT statistical machine translation
OCR optical character recognition SMTP simple mail transfer protocol
ODBC open database connectivity SMTS statistical machine translation software
OEM original equipment manufacturer SOA service-oriented architecture
OLG online gaming SOAP Simple Object Access Protocol
OPEX operating expenses SOP standard operating procedure
OPl over-the-phone interpretation SOV subject-object-verb
OS operating system SRX Segmentation Rules exchange
OSS open-source software ST source text
OTA over-the-air STE Simplified Technical English
P8tL profit and loss SIT speech-to-text
PC personal computer; politically correct SVO subject-verb-object
PCDATA parsed character data TDD transmission and distribution
PDA personal digital assistant TBX TermBase exchange
PDF portable document format TC Traditional Chinese; technical committee
PDI power distance index TEnT translation environment tool
PEST political, economic, sociocultural, technological TES transfer encoding syntax
PIL patient information leaflet TIF Terminology Interchange Format
PIM personal information manager TL target language
PM project manager; project management TM translation memory
PO purchase order TMF terminology markup framework
PoA plan of action TMS translation memory system
POS part of speech TMX Translation Memory exchange
POSIX portable operating system interface TOC table of contents
PPC pay per click TR technical report
PRC People’s Republic of China TRP translation request package
QftA questions and answers TSP translation service provider
QA quality assurance TTK Translation Toolkit format
QC quality control TTS text-to-speech
ROD research and development TU translation unit
RBMT rule-based machine translation 24/7 something that happens around the clock, seven days a week
TXML Tracker extensible Markup Language
RC resource code files UAE United Arab Emirates
RDF Resource Description Framework UCD Unicode Character Database
RES resource files UCS universal character set
RFC request for comments UI user interfaces
RFP request for proposal ULF universal learning format
RFQ request for quote UN United Nations
RLV regional language vendor UPT universal personal telecommunications
ROA return on assets URI uniform/universal resource identifier
ROI return on investment URL uniform resource locator
ROK Republic of Korea UTC coordinated universal time; Unicode Technical Committee
RONA return on net assets UTX Universal Terminology Exchange
RPG role-playing game VAR value-added reseller
RQM resource quality management VBA Visual Basic for Applications
RTF rich text format VC venture capital
RTL right-to-left VFY viscose filament yarn
RTT real-time translation VID visual interface design
SaaS software as a service VISCII Vietnamese Standard Code for Information Interchange
SBMT statistical-based machine translation VOIP voiceover internet protocol
SC Simplified Chinese VPN virtual private network
SCL system control language VR virtual reality; voice recognition
W3C World Wide Web Consortium XAML Extensible Application Markup Language
WAN wide area networks XCCS Xerox Character Code Standard
WAP wireless application protocols XDR External Data Representation
WBS work breakdown structure XHTML Extensible HyperText Markup Language
WBT web-based training XLEFF XML Localization Interchange File Format
WCM web content management XML Extensible Markup Language
WIP work in progress xmhtm XML-based Text Memory
WORM write-once, read-many XSL Extensible Stylesheet Language
WSDL Web Service Description Language XSLT Extensible Stylesheet Language Transformation
WYSIWYG what you see is what you get ZWNBS zero width no break space



A/B testing. In the context of marketing and business intelligence, a randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing with two variants.

Abductive reasoning. In artificial intelligence and philosophy, rea­soning based on possible or hypothesized causes or explanations. It involves inferring the best or most plausible explanation from a given set of facts or data.

Abilene Paradox. A paradox in which a group of people collectively decides on a course of action that is counter to the preferences of any of the individuals in the group. It involves a common breakdown of group communication in which each member mistakenly believes that his or her own preference is counter to the group’s and, thus, the person does not raise objections.

advanced leveraging. Within computer-aided translation tools, advanced leveraging combines statistical analysis and linguistic intelligence to create a new category of fuzzy matches that can lead to an increase in translation productivity. It features full-text index­ing capabilities that allow users to search and retrieve text strings of any length, such as full and fuzzy segments, paragraphs, terms and even subsegments.

agglutination. In linguistics, combining short words or word ele­ments into a single word in order to express compound idea..

agile. In this context, agile methods break tasks into small itera­tions with minimal planning. Each iteration involves a team work­ing through a full software development cycle, for example, which speeds up release of the product.

American National Standards Institute (ANSI). An organization of American industry groups that work with other nations to develop standards in facilitating telecommunications, character encoding and international trade.

American Sign Language (ASL). The dominant sign language of the deaf community in the United States, in the English-speaking parts of Canada and in parts of Mexico. Although the United Kingdom and the United States share English as a spoken and written language,>British Sign Language is quite different from ASL and not mutually intelligible.

American Standard Code for Information Interchange (ASCII).

The worldwide standard for the code numbers used by computers to represent all the uppercase and lowercase Latin letters, numbers, punctuation and other symbols.

anglophone. Someone who speaks the English language natively or by adoption. The term specifically refers to people whose cul­tural background is primarily associated with the English language, regardless of ethnic and geographical differences.

application programming interface (API).A software interface that enables applications to communicate with each other. An API is the set of programming language constructs or statements that can be coded in an application program to obtain the specific functions and services provided by an underlying operating system or service program.

application service provider (ASP). A service, usually a business, that provides remote access to an application program across a net­work protocol, typically HTTP. A common example is a website that other websites use for accepting payment by credit card as part of its online ordering systems.

Association of Southeast Asian Nations. A geopolitical and eco­nomic organization often countries located in Southeast Asia, which was formed in 1967 by Indonesia, Malaysia, the Philippines, Singa­pore and Thailand. Since then, membership has expanded to include Brunei, Burma (Myanmar), Cambodia, Laos and Vietnam.

audio description (AD). A term used to describe the descriptive nar­ration of key visual elements in a video or multimedia product. AD makes the visual images of media accessible for people who are blind and visually impaired. The visual is made verbal. In AD, narrators typically describe actions, gestures, scene changes and other visual information. They also describe titles, speaker names and other text that may appear on the screen.

Authoring Tool Accessibility Guidelines (ATAG). Authoring tools are software and services that web developers and other “authors” use to produce web content. ATAG documents explain how to make the authoring tools themselves accessible, so that people with disabilities can create web content, and help authors create more accessible web content – specifically: enable, support and promote the production of content that conforms to Web Content Accessibility Guidelines.

automated machine translation (AMT). AMT and Caterpillar Techni­cal English are development project collaborations between Cater­pillar, Inc., and Carnegie Mellon University to further improve the creation and translation of technical documentation into three core languages: Spanish, French and German.

automatic content enrichment (ACE). A bridge between single lan­guage websites and localization, ACE technology associates English words and phrases on web pages with pop-ups containing informa­tion in a user’s native language.


back translation.The process of translating a document that has already been translated into another language back to the original language – preferably by an independent translator.

Balkans.A geopolitical and cultural region of southeastern Europe. The region takes its name from the Balkan Mountains, which run through the center of Bulgaria into eastern Serbia.

Baltic states. The Baltic states are three countries in northern Europe, all members of the European Union: Estonia, Latvia and Lithuania. After centuries of foreign domination, the Baltic countries were rees­tablished as independent nations in the aftermath of World War I in 1918-1920.

bidirectional (writing system). A writing system in which text is gen­erally flush right, and most characters are written from right to left, but some text is written left to right as well. Arabic and Hebrew are the only bidirectional writing systems in current use..

bidirectional text (bidi).A mixture of characters within a text where some are read from left to right and others from right to left. Bidirec­tional or bidi refers to an application that allows for this variance. Big5. The name of the Chinese character set and encoding used exten­sively in Taiwan.

Big5 is not a national standard, but is equivalent to the first two planes of CNS 11643-1992.

Bilingual Evaluation Understudy (BLEU). An algorithm for evaluat­ing the quality of text that has been machine translated from one natural language to another. Quality is considered to be the correspon­dence between a machine’s output and that of a human. The closer that a machine translation is to a human translation, the better it is. BLEU was one of the first metrics to achieve a high correlation with human judgments of quality and remains one of the most popular. Scores are calculated for individual translated segments – generally sentences – by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Intelligibility or grammatical correctness is not taken into account, bitext. A merged document comprised of both source language and target language versions of a given text. Bitexts are generated by a piece of software called an alignment tool, which automatically aligns the original and translated versions of the same text,

bloggerati (sing, bloggerato). Adapted from literati, the term refers to “A-list bloggers” – popular and/or celebrity bloggers in the blogging community.

bodyshopping. The practice of using offshore resources and personnel to do small disaggregated tasks within a business environment without any broader intention to offshore an entire business function, branding. A name, logo, slogan and/or design scheme associated with a product or service..

Brand recognition and other reactions are cre­ated by the use of the product or service and through the influence of advertising, design and media commentary. A brand is a symbolic embodiment of all the information connected to the product and serves to create associations and expectations around it. A brand often includes a logo, fonts, color schemes, symbols and sound that may be developed to represent implicit values, ideas and even personality,.

break-even point. The amount of sales or revenues that a company must generate in order to equal its expenses. In other words, it is the point at which the company neither makes a profit nor suffers a loss; there is no net loss or gain. Break-even analysis provides insight into whether or not revenue from a product or service has the ability to cover the costs of production of that product or service. Company executives can use this information in making a wide range of busi­ness decisions, including setting prices, preparing competitive bids and applying for loans.

BRIC.An acronym that refers to the fast growing and developing economies of Brazil, Russia, India and China.

business ethics. Examines ethical principles and moral or ethical problems that arise in a business environment. It applies to all aspects of business conduct and is relevant to the conduct of individuals and business organizations as a whole.

byte-order mark (BOM). A Unicode character that indicates the byte order of the Unicode text that follows.


captive center.A company-owned offshore operation. The activities are performed offshore, but they are not outsourced to another company, cascading style sheet (CSS). An external format that determines the layout of tagged file formats such as HTML.

casual games. A category of electronic or computer games targeted at a mass audience, casual games usually have a few simple rules and an engaging game design, thereby making it easy for a new player to begin playing the game in just minutes. Casual games require no long-term time commitment or special skills to play, and there are comparatively low production and distribution costs for the producer. Catalan.A Romance language, the national and official language of Andorra, and a co-official language in the Spanish autonomous com­munities of the Balearic Islands, Catalonia and Valencia – where it is known as Valencian – and in the city of Alghero on the Italian island of Sardinia. Although with no official recognition, it is also spoken in the autonomous communities of Aragon and Murcia in Spain, and in the historic Roussillon region of southern France.

Caterpillar Technical English (CTE).Consists of a controlled vocabulaiy – approximately 80,000 technical terms – and all of the English gram­matical structures required when writing technical documentation. CTE ensures that automated machine translation is able to translate what authors write in English.

Catch-22.A term coined by Joseph Heller in his 1961 novel Catch- 22, describing a false dilemma where no real choice exists. A familiar example of this circumstance occurs in the context of job searching. In moving from school to a career, a graduate may encounter a Catch-22 where one cannot get a job without work experience, but one cannot gain experience without a job.

CE marking. The letters CE are the abbreviation of the French phrase conformiteEuropeene that literally means European conformity. CE marking on a product is a manufacturer’s declaration that the product complies with the essential requirements of the relevant European health, safety and environmental protection legislations.

Central America.The central geographic region of the Americas. It is the southernmost, isthmian portion of the North American continent, which connects with South America on the southeast. Central America has traditionally consisted of Belize, Costa Rica, El Salvador, Guate­mala, Honduras, Nicaragua and Panama.

Central and Eastern Europe (CEE).Predominantly used to describe former Communist countries in Europe after the collapse of the Iron Curtain in 1990. Later, it became an abbreviation mostly – still being not precisely defined – referring to the European countries east of Germany and south to the Balkan states. In most cases it includes Poland, Czech Republic, Slovakia, Hungaiy, Romania, Bulgaria, and the Baltic states of Estonia, Latvia and Lithuania. It sometimes also includes Belarus, Ukraine, Moldova and Russia.

CESU-8. Similar to UTF-8, CESU-8 is a way of representing Unicode text. CESU-8 uses six bytes for supplementary characters and is not appropriate for data interchange.

character.The smallest component of written language that has semantic value.A printed or written letter or symbol. In computing, the binary code used to represent a letter or symbol.

character identifier (CED). The key used to access outline (glyph) data in CID-keyed fonts.

character set or charset. A defined set of characters used by a specific computer system where no coded representation is assumed.The map­ping of characters from a writing system into a set of binary codes such as ANSI or Unicode.

CJKV.The abbreviation for the languages Chinese, Japanese, Korean and Vietnamese.

cloud computing. A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the inter­net. Users need not have knowledge of, expertise in or control over the technology infrastructure in the “cloud” that supports them. The term cloud is used as a metaphor for the internet based on how the internet is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals.

CNS. The Chinese National Standard (CNS) 11643-1992 defines a total of 48,027 characters and applies the EUC-TW (extended UNIX code- Taiwan) to one-, two- and four-byte encoding,.

code page. A table that defines the numeric index (computer code point value) associated with each character in a specific set of charac­ters. Each character in a code page has a numerical index,.

code sweep.A special tool that scans program code to identify areas where character encoding will cause problems. Newer, international­ized code anticipates these problems.

Commonwealth of Independent States.A regional organization whose participating countries are former Soviet Republics formed during the breakup of the Soviet Union. The CIS is a loose associa­tion of states and is in no way comparable to a federation. Its official members are Armenia, Azerbaijan, Belarus, Kazakhstan, Kyrgyzstan, Moldova, Russia, Tajikistan and Uzbekistan. Turkmenistan and Ukraine are unofficial member states.

community interpreting.A type of interpreting service that is par­ticularly vital in communities with large numbers of ethnic minorities, enabling those minorities to access services where, due to the language barrier, they would otherwise find it difficult.Situations where such interpreters are necessaiy typically include medical, educational, hous­ing and legal areas. Community interpreters need not only to be fluent in the language that they are interpreting, but also need to be familiar with the public services involved.

computational linguistics.The engineering of systems that process or analyze written or spoken natural language. It is concerned with the computational aspects of the human language. Its goal is to provide computers with the ability to produce and interpret human language,.

computer-aided translation (CAT). Computer technology applications that assist in the act of translating text from one language to another,.

computer-based training (CBT). A form of education in which the student learns by executing special training programs on a computer,.

conditional text.Content within a document that is meant to appear in some renditions of the document, but not other renditions. The text is conditional in the sense that its inclusion or variation depends on which version of the document is being produced,

conference interpreting.The interpretation of a multilingual confer­ence or meeting, either simultaneously or consecutively. International institutions such as the European Union and the United Nations hold multilingual meetings that often need to be interpreted into severalforeign languages, usually done via headset by behind-the-scenes conference interpreters.

consecutive interpreting. The interpreter begins his or her interpreta­tion of a complete message after the speaker has stopped producing the source utterance. At the time that the interpretation is rendered, the interpreter is the only person in the communication environment who is producing a message. Normally, in consecutive interpreting, the interpreter is alongside the speaker, listening and taking notes as the speech progresses. When the speaker has finished or comes to a pause, the interpreter reproduces the message in the target language, in its entirety and as though he or she were making the original speech,.

content management system (CMS). A system used to store and subsequently find and retrieve large amounts of data. CMSs were not originally designed to synchronize translation and localization of content, so most have been partnered with globalization management systems.

content marketing. Any marketing that involves the creation and sharing of media and publishing content in order to acquire and retain customers from a clearly defined target audience. This information can be presented in a variety of formats, includingnews, video, white papers, ebooks, infographics, case studies and how-to guides. Content marketing creates interest in a product through educational, entertain­ing or informative material. Successful content marketing relies on providing consistent, high-quality content designed to solve people’s problems.

controlled authoring.Writing for reuse and translation. Controlled authoring is a process that integrates writing with localization so that the text can be written for reuse and at the same time written for efficient translation.

controlled languages.Subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Also, stylistic rules – such as not using certain verb tenses or the passive voice – can be created, depending upon the group or organization and its language usage goals,.

controlled vocabulary.The standardization of words that may be used to search an index, abstract or information database. There is usu­ally a published listing or thesaurus of preferred terms identifying the system’s vocabulary.

corpus (pi. corpora). A large body of natural language text used for accumulating statistics on natural language text. Corpora often include extra information such as a tag for each word indicating its part-of-speech and perhaps the parse tree for each sentence,.

creole language. A stable language that originates from a mixture of various languages. The majority of creole languages are based on English, Portuguese, French, Spanish and other languages – their superstate language – with local or immigrant languages as substrate languages. The lexicon of a creole usually consists of words clearly borrowed from a superstate language, except for phonetic and seman­tic shifts; on the other hand, the grammar often has original features and may differ substantially from those of the superstrate language,.

cross-reference.As a noun, an instance within a document that refers to related or synonymous information elsewhere, usually within the same work. As a verb, the action of making this connection, crowdsourcing.The act of taking a task traditionally performed by an employee or contractor and outsourcing it to an undefined, generally large group of people, in the form of an open call. For example, the public may be invited to develop a new technology, carry out a design task, refine an algorithm or help capture, systematize or analyze large amounts of data.

Cyrillic alphabet. Actually a family of alphabets, subsets of which are used by certain East and South Slavic languages – Belarusian, Bulgarian, Macedonian, Russian, Rusyn, Serbian and Ukrainian – as well as many other languages of the former Soviet Union, Asia and Eastern Europe. With the accession of Bulgaria to the European Union (EU) on January 1, 2007, Cyrillic became the third official alphabet of the EU.


DAU/MAU.Daily active users divided by monthly active users. Mea­sures the percentage of players that show up every day to social games. If a game’s DAU/MAU is .3, then around a third of the game’s total players are checking in at least once each day. DAU/MAU is commonly thought to show how addictive a game is.

Darwin Information Typing Architecture (DITA).An XML-based architecture for authoring, producing and delivering technical infor­mation. This architecture consists of a set of design principles for creating “information-typed” modules at a topic level and for using that content in delivery modes such as online help and product support portals on the web.

data mining.Analysis of data in a database using tools that look for trends or anomalies without knowledge of the meaning of the data. Data mining uses computational techniques from statistics and pattern recognition, desktop publishing (DTP).Using computers to lay out text and graphics for printing in magazines, newsletters, brochures and so on. A good DTP system provides precise control over templates, styles, fonts, sizes, color, paragraph formatting, images and fitting text into irregular shapes.

diacritic. A mark or sign placed under, over or through a Latin script character that indicates a modification in the phonetic value of the character with which it is associated.

dialect. A variety of a language used by people from a particular geographic area. The number of speakers and the area itself can be of arbitrary size. A dialect is a complete system of verbal communication – oral or signed but not necessarily written – with its own vocabulary and/or grammar.

diaspora.A dispersion of a people from their original homeland or the dispersion of an originally homogeneous entity, such as a language or culture.

diphthong.A complex speech sound or glide that begins with one vowel sound and gradually changes to another within the same syl­lable, such as coin, loud and side.

disambiguation.The process of rewriting or reconstructing a sentence so that one of its possible meanings is singled out.

document type definition (DTD). States what tags and attributes are used to describe content in SGML documents, where each tag is allowed, and which tags can appear within other tags.

domain.A knowledge domain that a user is interested in or is commu­nicating about. A group of computers or devices that share a common directory database and are administered as a unit,.

dongle.A security or copy-protection device for commercial computer programs. Programs can use a dongle query at the start of a program to determine if the registration is valid and to terminate if the correct code is not present.

double-byte character set (DBCS). This term has two basic mean­ings. In CJK (Chinese-Japanese-Korean) computing, the term tradi­tionally means a character set in which every graphic character not representable by an accompanying SBCS (single-byte character set) is encoded in two bytes. Han characters would generally comprise most of these two-byte characters. The term can also mean a characterset in which all characters – including all control characters – are encoded in two bytes.

double-byte languages. Languages such as Chinese, Japanese and Korean (CJK) that use twice as much memory because their characters are more complex and graphical than Roman alphabet letters. CJK languages are character-based with each character referring to an idea as opposed to a specific shape.

dubbing. In filmmaking, the process of recording or replacing voices for a motion picture. The term is most commonly used in reference to voices recorded that do not belong to the original actors and speak in a different language than the actor is speaking.

Dynamic Quality Framework (DQF).Designed by TAUS and over 50 cocreators, DQF provides a commonly agreed-upon approach to select the most appropriate translation quality evaluation models and metrics depending on specific quality requirements. The underlying process, tech­nology and resources affect the choice of quality evaluation model. This environment is designed to ensure that members apply best practices for their MT evaluations, whether selecting a translation engine, measuring productivity or evaluating the final quality of translations.


EEA-EFTA states. The European Economic Area (EEA) comprises the member states of the European Union (EU), except Croatia, plus Iceland and Norway. It was established on 1 January 1994 following an agreement between the member states of the European Free Trade Association (EFTA) and the European Community, which later became the EU. It allows the EFTA-EEA states to participate in the EU’s internal market without being members of the EU.

e-govemance. The public sector’s use of information and communica­tion technologies with the aim of improving information and service delivery, encouraging citizen participation in the decision-making process and making government more accountable, transparent and effective.

e-govemment. Refers to a government’s use of information technol­ogy to exchange information and services with citizens, businesses and other arms of government. E-govemment may be applied by the legislature, judiciaiy or administration in order to improve internal efficiency, the deliveiy of public services or the processes of demo­cratic governance.

e-learning. The use of internet technology for learning outside of a physical classroom.

80/20 Rule. Also known as Pareto’s Principle, the law of the vital few and the principle of factor sparsity. The rule states that for many phenomena, 80% of the consequences stem from 20% of the causes. Management thinker Joseph M. Juran suggested the principle, and it was named after the Italian economist Vilfredo Pareto, who observed that 80% of income in Italy was received by 20% of the Italian popula­tion. The assumption is that most of the results in any situation are determined by a small number of causes. This idea is often applied to data such as sales figures: “20% of clients are responsible for 80% of sales volume.” Such a statement is testable, is likely to be correct and may be helpful in decision making.

embedded media. Media that can be included in an HTML page, such as RealAudio files or GIF animations. Web browsers use multipurpose internet mail extensions (MIME types), a specification for formatting these non-ASCII messages so that they can be sent over the internet.

When a browser finds a file in an HTML document with a MIME exten­sion such as .gif, the browser knows to display that file as an image.

Many email clients also support MIME.

embedded system. Hardware and software that make up a component of a larger system, often for real-time response, that is expected to function without human intervention.

encoding scheme. Rules for assigning numeric value (code points) to characters. Encoding is a method by which a character set is turned into computerized form for transmission and preservation, endangered language. A language that is at risk of falling out of use, generally because it has few surviving speakers. If it loses all of its native speakers, it becomes an extinct language,.

enterprise application interface (EAI).Created to facilitate the flow of information and to connect transactions among distributed and complex applications and business processes within enterprises, enterprise resource planning (ERP).An amalgamation of a com­pany’s information systems so that data from various functions such as human resources, inventories and financials are bound together and linked to customers and vendors.

escort interpreting. The interpreter accompanies a person or a delega­tion on a tour, on a visit or to a meeting or interview. These specialists interpret on a variety of subjects, both on an informal basis and on a professional level, and most of the interpretation is consecutive.

ETSI. European Telecommunications Standards Institute, one of the world’s most influential producers of telecommunications standards.

ETSI ISG LIS.An industry specification group that was formed in the spring of 2011 within ETSI to take over the Localization Industry Stan­dards Association (LISA) standards portfolio, including related LISA intellectual property, after LISA was declared insolvent on February 28, 2011. ETSI ISG LIS now owns such standards as TBX and TMX..

European.Refers to languages such as English, French, Russian and Greek that use single-byte encoding schemes for their alphabets..

European Union (EU).An intergovernmental and supranational union of 27 democratic member states. The EU was established under that name in 1992 by the Treaty on European Union (the Maastricht Treaty),.

expended UNIX code (EUC). A multibyte encoding design used to encode Japanese, Chinese, Korean and Taiwanese on UNIX systems..

Extensible Markup Language (XML).A programming language/ specification pared down from SGML, an international standard for the publication and delivery of electronic information, designed especially for web documents.

Extensible Stylesheet Language (XSL).A language for expressing style sheets, controlling formatting and other output behavior.


FIGS. An acronym for the languages French, Italian, German and Spanish,.

file transfer protocol (FTP). A common way to move files between host computers and sometimes personal computers,.

francophone.Used to describe a French-speaking person.Geopoliti- cally, it refers to a person who speaks French as a first language or who self-identifies with this language group. As an adjective, it means French-speaking, whether referring to individuals, groups or places.

free text.Data that is entered into a field without any formal or predefined structure other than the normal use of grammar and punctuation.

freelance translator. Also known as a freelancer, an independent translator who sells his or her services to a client on a job-to-job basis or without a long-term commitment to any one employer,

full match.A source text segment that corresponds exactly (100%) with a previously stored sentence in a translation memory tool.

fuzzy match. Refers to the situation when a phrase or sentence in a translation memoiy (TM) is similar (but not a 100% match) to the sentence or phrase the translator is currently working on. The TM tool calculates the degree of similarity or “fuzziness” as a percentage figure.


gamification.The use of game design, game thinking and game mechanics to enhance nongame contexts.

GB 18030.A non-Unicode code page extending the traditional Chi­nese standard and containing room for 1.6 million characters. GB 18030 can include one-, two- or four-byte characters and includes support for Mongolian, Tibetan, Yi and Uyghur, as well as all previ­ously supported Chinese scripts.

Geert Hofstede. An influential Dutch writer on the interactions between national cultures and organizational cultures, and the author of several books, including Culture’s Consequences: Compar­ing Values, Behaviors, Institutions and Organizations Across Nations and Cultures and Organizations: Software of the Mind, coauthored with his son Gert Jan Hofstede. Hofstedc’s study demonstrates that national and regional cultural groupings affect the behavior of soci­eties and organizations and that they are persistent across time,.

gist translation. A less-than-perfect translation performed by machine or automatic translation.

Global information management Metrics eXchange – Volume (GMX-V). A word and character count standard for electronic documents. GMX-V was developed and maintained by OSCAR (Open Standards for Container/Content Allowing Re-use), a special interest group of LISA (Localization Industry Standards Association). GMX-V, one of the tripartite series of standards from LISA, deals with electronic document metrics. GMX is made up of the following standards: GMX-V – Vol­ume; GMX-C – Complexity; and GMX-Q – Quality,.

global positioning system (GPS).The only fully functional global navi­gation satellite system. Utilizing a constellation of at least 24 medium earth orbit satellites that transmit precise microwave signals, the system enables a GPS receiver to determine its location, speed, direction and time. GPS is funded by and controlled by the US Department of Defense. While there are many thousands of civil users of GPS worldwide, the system was designed for and is operated by the US militaiy..

globalization (g11n). Refers to the process that addresses business issues associated with launching a product globally, such as integrating localization throughout a company after proper internationalization and product design. In glln, the common abbreviation for globaliza­tion, the 11 refers to the 11 letters between the g and the n..

globalization management system (GMS). Focuses on managing the translation and localization cycles and synchronizing those with source content management. Provides the capability of centralizing linguistic assets in the form of translation databases, leveraging glos­saries and branding standards across global content,.

glocal.Derived from the combination of the words global and local. The word refers to the creation or distribution of products or services intended for a global or transregional market, but customized to suit local language, laws and culture.

glocalization. A blending of the words globalization and localization, the term refers to the individual, group, division, unit, organization or community that is willing and able to think globally and act locally. Glo­calization emphasizes that the globalization of a product is more likely to succeed when the product or service is adapted specifically to each locality or culture in which it is marketed.

glossarization.Refers to the process of locating and translating product-specific terminology. All available materials undergo a linguistic review, then are compiled and translated to ensure consistency and fluency among different versions,.

glossaiy. In the context of localization, a glos­sary is a list of source language terms paired with a list of corresponding terms inthe target language.

glyph. The shape representation or picto- graph of a character.

GNU.Short for GNU is Not UNIX. GNU is a UNIX-compatible software system that is nonproprietary.

google. As a verb, refers to using the Google search engine to obtain information on the web..

gross domestic product (GDP). One of the measures of national income and output for a given country’s economy. The most com­mon approach to measuring and quantifying GDP is the expenditure method: GDP = con­sumption + gross investment + government spending + (exports – imports),.

gross margin. The amount of contribution to the business enterprise, after paying for direct-fixed and direct-variable unit costs, required to cover overheads (fixed commit­ments) and to provide a buffer for unknown items. It expresses the relationship between gross profit and sales revenue,.

guanxi. A central concept in Chinese soci­ety and describing the basic dynamic in personalized networks of influence. Guanxi is, in part, a personal connection between two people in which one is able to prevail upon another to perform a favor or service or be prevailed upon. The two people need not be of equal social status. It could also be a network of contacts, which an individual can call upon when something needs to be done and through which he or she can exert influence on behalf of another.


hangul. Invented in the fifteenth century, the native alphabet of the Korean language, as opposed to the nonalphabetichanja system borrowed from China. Each hangul syllabic block consists of several of the 24 letters (jamo) – 14 consonants and 10 vowels,.

hanja.The Korean name for Chinese char­acters. More specifically, it refers to those Chinese characters borrowed from Chinese and incorporated into the Korean language with Korean pronunciation,.

hanzi.A logogram, literally meaning Han char­acter, used in writing Chinese. These Chinese characters have also been borrowed for use inJapanese (kanji), less frequently Korean (hanja), and formerly Vietnamese (hantif), and other languages.

hard-coding. Refers to the software development practice of embed­ding data directly into the source code or fixed formatting. Hard- coding requires the program’s source code to be changed any time the desired data changes, when it might be more convenient to the end user to change the detail by some means outside the program,.

hashtags.A community-driven convention for adding additional context and metadata to tweets. Hashtags have the hash or pound symbol (#) preceding the tag, for example, #collegefootball, #Beatles or #oilspill. Hashtags can occur anywhere in a tweet,.

hidden Markov model (HMM).A statistical technique with training algorithms that can process a large quantity of training data and can automatically train a system to recognize particular speech patterns.

hiragana.A flowing phonetic subscript of the native Japanese writing system. In hiragana, all of the sounds of the Japanese language are represented by 50 syllables.

Hispanic. A term that historically denoted relation to ancient Hispania (geographically coinciding with the Iberian peninsula – modern-day Spain, Portugal, Andorra and Gibraltar) and/or to its pre-Roman peoples. The term now refers to the culture and people of Spain plus the Spanish-speaking countries of the Americas.

homograph.One of two or more words that have the same spelling but differ in origin, meaning and sometimes pronunciation. An example is wind (weather) and wind (activity).

homophone. A word that has the same pronunciation as another but different meaning, derivation or spelling. Examples are there and their, foe and faux, and time and thyme.

honorific. Linguistic honorifics convey formality, social distance, politeness, humility, deference or respect through the choice of an alternate form such as an affix or change in person and number. In Japanese, for example, the system of honorifics is extensive and man­datory in many social situations.

HyperText Markup Language (HTML). A markup language that uses tags to structure text into headings, paragraphs, lists and links, and tells a web browser how to display text and images on a web page.


“I” form interpretation.Interpretation in the first person, where the interpreter acts as a neutral portal and attempts to capture the feeling and tone of whomever he or she is interpreting for.

ideographic language.A written language in which each character represents an idea, concept or other component of meaning, rather than pronunciation alone. Japanese kanji, Chinese hanzi and Korean hanja are examples of ideographic writing systems,.

information retrieval. The science of searching for information in documents, searching for documents themselves, searching for meta­data that describe documents or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the internet or intranets, for text, sound, images or data,.

input method editor (EME). A way to input via keyboard that makes use of additional windows for character editing or selection in order to facilitate entry of alternate writing systems.

International Organization for Standardization (ISO).A network of national standards institutes from 145 countries working in partnership with international organizations, governments, industry, business and consumer representatives. ISO acts as a bridge between public and private sectors.

internationalization (il8n). Especially in a computing context, the pro­cess of generalizing a product so that it can handle multiple languages and cultural conventions – currency, number separators, dates and so on – without the need for redesign. In il8n, the common abbreviation for internationalization, the 18 refers to the 18 letters between the i and the n.

Internationalization Tag Set (ITS). A set of attributes and elements designed to provide internationalization and localization support in XML. ITS 2.0 is the current version of the standard, internaut.A slang term for a designer, operator or technically capa­ble professional user of the internet, someone who is ultra-familiar with the internet as an entity and with cyberspace in general. The word is a combination of internet and astronaut. Other terms roughly analogous with internaut are cybemaut and netizen, though each has its own connotation. The common thread among them, however, is an implication of experience and knowledge of the internet or cyberspace that goes beyond the casual user.

Inuktitut.The name of the varieties of the Inuit language spoken in Canada, including parts of the provinces of Newfoundland and Lab­rador, Quebec, to some extent in northeastern Manitoba as well as the territories of Nunavut, the Northwest Territories and traditionally on the Arctic Ocean coast of the Yukon Territory. Inuktitut is recognized as an official language in Nunavut and the Northwest Territories..

Irish-medium school.Gaelscoil (plural: Gaelscoileanna), or Irish-medium school, is particularly popular in primary school in Ireland. The term refers especially to Irish-medium schools outside the Irish-speaking regions. Stu­dents in the Gaelscoileanna acquire the Irish language through language immersion, though they study the standard curriculum.


Java. A programming language originally developed by Sun Micro­systems and released in 1995 as a core component of Sun’s Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities. Java applications are typically compiled to byte code that can run on any Java virtual machine regardless of computer architecture.

Java computer-assisted translation (JCAT).A Java-based translation tool that takes advantage of XML features. JCAT primarily benefits linguists..

JavaScript.An open-source scripting language for design of interac­tive websites. JavaScript can interact with HTML source code, enabling web developers to use dynamic content. For example, JavaScript makes it easy to respond to user-initiated events (such as form input) without having to use common gateway interface.

JavaServer Pages (JSP). JSP have dynamic scripting capability that works in tandem with HTML code, separating the page logic from the static elements – the actual design and display of the page – to help make the HTML more functional.

JIS.The acronym for the Japanese Industrial Standard, which is the Japanese equivalent of ANSI.


kana.The two Japanese syllabaries – hiragana and katakana.kanji.The Chinese characters that are used in the modern Japanese logographic writing system along with hiragana, katakana and the Hindu-Arabic numerals. The Japanese term kanji literally means Han characters. Despite the existence of some 13,000 kanji characters, these alone do not suffice to write Japanese. Hiragana characters are also required to express grammatical inflections.

katakana. A Japanese syllabary, one component of the Japanese writ­ing system along with hiragana, kanji and in some cases the Latin alphabet. The word katakana means fragmentary kana, as they are derived from components of more complex kanji. Katakana are char­acterized by short straight strokes and angular comers and are the simplest of the Japanese scripts. Katakana and hiragana both render the same syllables, but katakana is angular and used largely to spell words borrowed from other languages, while hiragana is cursive and is used more frequently to spell native Japanese words,.

kernel. The central module of an operating system, it loads first and remains in memory to control memory management, disk manage­ment, and process and task management.

keyword. Any word on a web page. Keyword searching is the most common form of text search on the web. Most search engines do their text queiy and retrieval using keywords.


locale.An international language and geographic region that also embodies common language and cultural information. Locale differs from language in that the same language may be spoken in more than one country. Locale also refers to the features of a user’s computing environment that are dependent on geographic location, language and cultural information. A locale specifically determines conventions such as sort order rules; date, time and currency formats; keyboard layout; and other cultural conventions.

localization (llOn).The process of adapting a product or software to a specific language or culture so that it seems natural to that particular region. True localization considers language, culture, customs and the characteristics of the target locale. It frequently involves changes to the software’s writing system and may change keyboard use and fonts as well as date, time and monetaiy formats. In llOn, the common abbreviation for localization, the 10 refers to the ten letters between the / and the n.

Latin America. The region of the Americas where Romance languages – those derived from Latin, namely Spanish and Portuguese – are officially or primarily spoken.

Latina, Latino.The demonyms Latina (feminine) and Latino (mascu­line) are defined in several English language dictionaries as persons of Hispanic, especially Latin American, descent, often living in the United States. In the United States, the term is in official use in the ethnonym Hispanic or Latino, defined as “a person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race.” Neither Hispanic nor Latino refers to a race, as a person of Latino or Hispanic ethnicity can be of any race,

learning management system (LMS). Software that automates the administration of training events.

lemmatize. To sort so as to group together inflected or variant forms of the same words.

Levant. The Levant region, also known as the Eastern Mediterranean and Greater Syria, is a geographic and cultural region consisting today of Lebanon, Syria, Jordan, Israel, Palestine, Cyprus, Hatay Province and other parts of southern Turkey.

leverage/leveraging. Refers to the amount of previously translated text from an earlier release that can be reused or recycled,.

lexicography.The act of compiling dictionaries.

LI18NUX2000 Global Specification. Based on specifications drawn up by several working groups within Lil8nux, LI18NUX2000 Global Specification includes globalization functionality features from com­mercial UNIX systems as well as operating system recommendations to ease the development of internationalized application software,.

ligature. Refers to a glyph that is created when two or more characters are combined to form a new, single typographical character,.

lingua franca.A language that is adopted as a common language between speakers whose native languages are different.

linguist.Someone who is accomplished in languages.A student or practitioner of the subject of linguistics (the scientific study of lan­guages and their structures).

Linux.A free open-source UNIX-type operating system that runs on a number of hardware platforms.

LISA. The Localization Industry Standards Association, declared insol­vent on February 28, 2011.

loanword. A word or phrase adopted from another language with little or no modification.

the long tail.The statistical property that a large share of the popula­tion rests within the tail of a probability distribution. In localization, it refers to the large number or languages or cultures that taken uniquely would only represent small percentages of world population. The term has gained popularity in recent times as a retailing concept describing the niche strategy of selling a large number of unique items in rela­tively small quantities. The term was popularized by Chris Anderson in an October 2004 Wired magazine article, in which he mentioned Amazon and Netflix as examples of businesses applying this strategy,.

lossy.Describes a compression algorithm that reduces the amount of information in data, rather than just the number of bits used to represent that information.


machine-aided translation (MAT). Computer technology applications that assist in the translation of text from one spoken language to another, based on the concept of translation memory and the reuse of previously translated terms and sentences.

machine translation (MT).A technology that translates text from one human language to another, using terminology glossaries and advanced grammatical, syntactic and semantic analysis techniques.Maghreb.Usually defined as most of the region of North Africa west of Egypt. It is partially isolated from the rest of the continent by the Atlas Mountains and the Sahara desert. Berber activists have called the region Tamazgha, meaning land of the Berbers, since the second half of the twentieth century.

markup language. A markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. Markup instructs the software displaying the text to carry out appropri­ate actions, but is omitted from the version of the text that is displayed to users. Some markup languages, such as HTML, have predefined pre­sentation semantics, meaning that their specification prescribes how the structured data are to be presented; others, such as XML, do not. massive online collaboration. Massive collaboration is a form of collective action that occurs when large numbers of people work independently on a single project, often modular in its nature. Such projects typically take place on the internet using social software and computer-supported collaboration tools that provide a potentially infinite hypertextual substrate within which the collaboration may be situated. A key aspect that distinguishes mas­sive collaboration from other forms of large-scale collaboration is that the collaborative process is mediated by the content being created – as opposed to being mediated by direct social interaction as in other forms of collaboration.

massively multiplayer online game (MMOG). A type of computer game that enables hundreds or thousands of players to simultaneously interact in a game world to which they are connected via the internet,.

massively multiplayer online role-playing game (MMORPG).A mul­tiplayer computer role-playing game that enables thousands of players to play in an evolving virtual world at the same time over the internet.

MENA.An acronym for Middle East and North Africa. The list of coun­tries and territories has no standard definition, and sometimes spreads as far as Malta, Azerbaijan and Somalia.

mergers and acquisitions (M0A). Refers to the aspect of corporate strategy, corporate finance and management dealing with the buying, selling and combining of different companies that can aid, finance or help a growing company in a given industry expand rapidly without having to create another business entity.

metadata. Structural metadata covers the design and specification of data structures, while descriptive metadata is about individual instances of application data, or the data content. Metadata is often described as data about data, or data about data context,

metrics. Denotes the science of measuring as applied to a specific field of study.

morpheme.The smallest linguistic unit that has semantic meaning, morphology.The branch of grammar that studies the structure or forms of words. The main branches are inflectional morphology, derivational morphology and compounding.

multilingual.Refers to software that supports more than one lan­guage simultaneously, thereby allowing the end user to select multiple languages and formats. This software allows data containing mul­tiple languages to be entered, processed, presented and transmitted multinationally.

multilingual workflow system (MWS).A computer program that cre­ates an environment to support and orchestrate a range of activities that facilitate the development of multilingual products. An MWS should contain a globalization management system for managing multilingual content, along with translation memory and machine translation.

multimedia. In computing, multimedia describes a number of diverse technologies that allow visual and audio media to be combined. Entertainment, education and advertising applications, among oth­ers, use a computer to present and combine text, graphics, video, animation and sound.

multimodal. Multimodal access for a personal computer, telephone, personal digital assistant and other devices allows input via speech, keyboard, mouse, stylus and/or other methods; outputs include speech, audio and graphical displays.


n-gram. A sequence of items, such as letters or words, can be pre­dicted using n-gram models to show probability, where n refers to the number of items in the sequence. Some stemming techniques use the n-gram context of a word to choose the correct stem.

Namespaces. XML Namespaces provide a simple method for qualify­ing element and attribute names used in Extensible Markup Language (XML) documents by associating them with namespaces identified by URI references. XML Namespaces are the solution to the problem of ambiguity and name collisions.

nanosyntax. A term used to describe an approach to syntax in which syntactic trees are built up out of a large number of elements. Each morpheme may correspond to several such elements, which do not have to form a subtree.

national language support (NLS). A function that allows a software application to set the locale for the user, identify the language in which the user works, and retrieve strings – representing times, dates and other information – formatted correctly for the specified language and location. NLS also includes support for keyboard layouts and language-specific fonts.

natural language processing (NLP). A main focus of computational linguistics, the aim of NLP is to devise techniques toautomatically analyze large quantities of spoken (transcribed) or written text in ways that parallel what happens when humans perform this task,.

nearshoring. A form of outsourcing in which an activity – for exam­ple, business processes or software development — is relocated to locations that are, generally, cheaper and yet geographically nearer than offshore locations.

.NET. Microsoft platform for applications that work over the internet,.

netizen.A blend of internet and citizen, a person actively involved in online communities. Netizens use the internet to engage in activities of the extended social groups of the web – for example, giving and receiving viewpoints, furnishing information, fostering the internet as an intellectual and social resource, and making choices for the self-assembled communities. Generally, a netizen can be any user of the worldwide, unstructured forums of the internet,.

notified bodies. Organizations designated by the national govern­ments of the member states of the European Union as being compe­tent to make independent judgments about whether or not a product complies with the protection – essential safety – requirements laid down by each CE marking directive.


OASIS. Organization for Advancement of Structured Information Standards (formerly called SGML Open). An IT standardization con­sortium based in the state of Massachusetts. Its foundational sponsors include IBM and Microsoft. Localization buy-side, toolmakers and service providers are also well represented.

OAXAL.OASIS Open Architecture for XML Authoring and Localiza­tion. A technical committee encouraging the development of an open standards approach to XML authoring and localization, offshore outsourcing (offshoring). The practice of engaging a third- party provider in another country – often on another continent or “shore” – to perform tasks or services often performed in-house, ontology. An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them, open-source software.Any computer software distributed under a license that allows users to change and/or share the software freely. End users have the right to modify and redistribute the software, as well as the right to package and sell the software.

OpenI18N certification.A certification program that uses an indepen­dent authority to verify whether a Linux distribution is adhering to the industiy-developcd internationalization standard.

OpenType fonts. OpenType fonts are cross-platform, self-contained files and contain advanced typographic features such as glyph substi­tution and metrics overrides.

operating system (OS). The software that drives the hardware associ­ated with a computer system.

optical character recognition (OCR).Recognition of printed or writ­ten characters by a computer. Involves computer software designed to translate images of typewritten text – usually captured by a scanner into machine-editable text or to translate pictures of characters into a standard encoding scheme representing them in ASCII or Unicode,.

original equipment manufacturer (OEM). OEMs buy computers in bulk and customize them for a particular application. OEMs then sell the cus­tomized computers under their own names. Therefore, OEMs are really the customizers and not the original manufacturers of the equipment.

OSCAR.USA’s technical committee (special interest group) for actual standardization work. Explanation of the acronym is somewhat strained, meaning Open Standards for Container/Content Allowing Re­use. OSCAR was dissolved along with LISA in February 2011.

outsource. To hire a third-party provider to perform tasks or services often performed in-house.


Panimages. From the Greek prefix pan, meaning whole or all-inclusive, an image search engine that automatically translates a search term into about 300 other languages, suggests a few that might work and then displays images from Google and the online photo database Flickr..

parser.A computer program that takes a set of sentences as input and identifies the structure of the sentences according to a given gram­mar. The term parser is sometimes used generically in cases where the sentences are made up of information units of any kind.

pay per click (PPC). An advertising technique used on websites, advertising networks and search engines. With search engines, PPC advertisements are usually text ads placed near search results. When a site visitor clicks on the advertisement, the advertiser is charged a small amount.

personalization. Sometimes referred to as one-to-one marketing, personalization involves using technology to accommodate the dif­ferences among individuals. Web pages are personalized based on the characteristics – interests, social category, context and so on – of an individual. Personalization is a means of meeting the customer’s needs more effectively and efficiently, making interactions faster and easier, and, consequently, increasing customer satisfaction and the likelihood of repeat visits.

phonology. The part of linguistics that deals with systems of sounds especially in a particular language.

pinyin. More formally Hanyu pinyin, the most commonly used Romanization system for Standard Mandarin. Hanyu is the Han (Chinese) language, and pinyin means phonetics or, more literally, spelling sound or spelled sound.

plain text. In computing, plain text makes up the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text and to binary files. Plain text files can be opened, read and edited with count­less generic text editors. Plain text files are almost universal in programming.

plug-ins. Software modules that add a specific feature or service to a larger system.

porteno. A common reference to the people of Buenos Aires, Argen­tina. In Spanish, it literally describes a person who is from a port city, and is also used as an adjective for anything related to those port cities,.

pretranslation. Involves the preparation of files for translation where the existing files already contain related segments of previously trans­lated data. Only 100% matches are replaced, with the result being a set of files containing both source and target language terminology.

project management (PM). The systematic planning, organizing and controlling of allocated resources to accomplish project cost, time an.

performance objectives. PM is normally reserved for focused, nonre- petitive, time-limited activities with some degree of risk,.

project manager. A professional in the field of project management. He or she has the responsibility of the planning, execution and clos­ing of any project. Key project management responsibilities include creating clear and attainable project objectives, building the project requirements and managing the triple constraint for projects – cost, time and scope.

prosumer. This word is becoming fairly common but can be confusing, and has two meanings. Futurist Alvin Toffler in his 1980 book The Third Wave coined the word as a blend of producer and consumer when he predicted that the role of producers and consumers would begin to blur and merge. Toffler used it to describe a possible future type of consumer who would become involved in the design and manufacture of products so that they could be made to individual specification. The second usage describes a purchaser of technical equipment who wants to obtain goods of a better quality than consumer items, but can’t afford professional items – older terms for goods of this intermediate quality are semipro­fessional and industrial quality. Here, the word is a blend of professional and consumer.

pseudo-localization.Translates the code strings of a product into “pseudo-strings.” The resulting “pseudo-language” is designed to test the impact that different aspects of localization have on the product’s functionality and appearance.

pseudo-translation.Similar to a test run that seeks to copy the translation process rather than actually produce a translation. A text string is taken and put through a translation-like process that alters it and produces a new string. The text string is frequently changed as a result of this process, so pseudo-translation is done to illustrate the potential problems that may occur when the transla­tion is actually done.

quality assurance (QA). The activity of providing evidence needed to establish confidence among all concerned that quality-related activi­ties are being performed effectively. All those planned or systematic actions necessary to provide adequateconfidence that a product or service will satisfy given requirements for quality. QA covers all activities from design, development, production and installation to servicing and documentation.


radical.The root or base form of a word.The building blocks of Chinese characters of which the most common set contains 214 radicals. Radicals themselves are composed of strokes.

Resource Description Framework (RDF). A formal data model from the World Wide Web Consortium (W3C) for machine understandable metadata used to provide standard descriptions of web resources,

return on investment (ROI). In finance, the ratio of money gained or lost on an investment relative to the amount of money invested.The amount of money gained or lost may be referred to as interest, profit/loss, gain/loss or net income/loss.

right-to-left languages. Languages such as Hebrew, Arabic, Urdu and Farsi are written primarily right to left. This text flow presents significant text and graphic layout implications,.

romaji.The application of the Latin alphabet to write the Japa­nese language. Japanese who have attended elementary school since World War II have been taught to read and write.omanizedJapanese. Therefore, almost all Japanese are able to read and write Japanese using romaji.

romanization.In linguistics, the representation of a word or lan­guage with the Roman (Latin] alphabet, or a system for doing so, where the original word or language uses a different writing system, rule-based machine translation (RBMT).The application of sets of linguistic rules that are defined as correspondences between the structure of the source language and that of the target language. The first stage involves analyzing the input text for morphology and syntax – and sometimes semantics – to create an internal representation. The translation is then generated from this repre­sentation using extensive lexicons with morphological, syntactic and semantic information, and large sets of rules.


Sanskrit.A historical Indo-Aryan language and the primary litur­gical language of Hinduism, Jainism and Mahayana Buddhism. Currently, it is an official language of the state of Uttarakhand in northern India.

search engine. A program designed to help find information stored on a computer system such as the world wide web or a personal computer. A search engine allows a user to ask for content meeting specific cri­teria – typically those containing a given word, phrase or name – and retrieves a list of references that match those criteria,.

search engine optimization (SEO). A set of methods aimed at improving the ranking of a website in search engine listings. SEO is primarily concerned with advancing the goals of a website by improving the number and position of its organic search results for a wide variety of relevant keywords.

Segmentation Rules exchange (SRX). An XML-based standard used to describe how to segment text for translation and other language- related processes. It was created to enhance the leverage of the TMX standard. A vendor-neutral standard for describing how translation and other language-processing tools segment text for processing. It allows translation memory and other linguistic tools to describe the language-specific processes by which text is broken into segments (usually sentences or paragraphs) for further processing,.

semantic.Part of the structure of language, along with phonology, morphology, syntax and pragmatics, which involves understanding the meaning of words, sentences and texts.

Semantic Web.An extension of the worldwide web that provides a common framework allowing data to be shared and reused across application, enterprise and community boundaries. It is based on Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URLs for naming,.

serious games.Computer and video games that are intended to not only entertain users, but have additional purposes such as educa­tion and training. They can be similar to educational games and are primarily focused on an audience outside of primary or second­ary education. A serious game is usually a simulation that has the look and feel of a game, but is actually a simulation of real-world events or processes. The main goal of a serious game is usually to train or educate users, though it may have other purposes, such as marketing or advertisement, while giving them an enjoyable experience.

service-oriented architecture (SOA).A software architectural concept that defines the use of services to support the requirements of software users, sight translation. With sight translation, the input is visual (the writ­ten word) rather than oral (the spoken word). Reading comprehension is an important element of sight translation.

Simple Object Access Protocol (SOAP).A standard for exchanging XML-based messages over a computer network, normally using HTTP.

Simplified Chinese.Refers to one of two standard Chinese charac­ter sets of printed contemporary Chinese written language, officially simplified by the government of the People’s Republic of China in an attempt to promote literacy. Simplified Chinese is used in mainland China and Singapore, modified to be written with fewer strokes per character.

simship. A term used to refer to the simultaneous shipment of soft­ware products in different languages or with other distinguishing differences in design.

simultaneous interpreting. The interpreter reformulates the message into the target language as quickly as possible while the source speaker is speaking. Normally, in simultaneous interpreting between spoken languages, the interpreter sits at a microphone in a soundproof booth, usually with a clear view of the speaker, listening through headphones to the incoming message in the source language. The interpreter then relays the message in the target language into the microphone to who­ever is listening.

single-source concept. Documentation according to single-source concept means using a common source to provide documentation in several output formats (printed manual, online help),.

social games. In this context, a social network game, a type of online game distributed primarily through social networks such as Facebook. Social games are usually characterized by community, often built around the existing social network, and the ability to drop in and out of the game without ever winning or losing.

social media. Refers to the web-based and mobile technologies used to turn communication into an interactive dialogue. It builds on the ideo­logical and technological foundations of Web 2.0, and typically allows for the creation and exchange of user-generated content. Social media can take on many different forms, including internet forums, social net­working sites, blogs, microblogging, wikis and interactive visual media,

social network. An online service, platform or site that focuses on building social relations among people, who, for example, share inter­ests or activities. A social network service essentially consists of a representation of each user (often a profile), his or her social links and a variety of additional services. Most social network services are web-based and provide means for users to interact over the internet. Facebook, Linkedln and Foursquare are popular social networks used for different purposes.

source language (SL). A language that is to be translated into another language.

South America. A continent of the Americas, situated entirely in the Western Hemisphere and mostly in the Southern Hemisphere. It is bordered on the west by the Pacific Ocean and on the north and east by the Atlantic Ocean; North America and the Caribbean Sea lie to the northwest.

standard generalized markup language (SGML). An international standard for information exchange that prescribes a standard format for using descriptive markup within a document, defining three docu­ment layers: structure, content and style.

statistical machine translation (SMT). A machine translation paradigm where translations are generated on the basis of statistical models whoseparameters are derived from the analysis of bilingual text corpora. SMT is the translation of text from one human language to another by a computer that learned how to translate from vast amounts of trans­lated text.

style guide. A style guide (or manual of style) is a set of standards for the writing and design of documents, either for general use or for a specific publication, organization, or field. (It is often called a style sheet, though that term has other meanings.)

A style guide establishes and enforces style to improve communication. To do that, it ensures consistency within a document and across multiple documents and enforces best practice in usage and in language composition, visual composition, orthography and typography. For academic and technical documents, a guide may also enforce the best practice in ethics (such as authorship, research ethics, and disclosure), pedagogy (such as exposition and clarity), and compliance (technical and regulatory).

Style guides are common for general and specialized use, for the general reading and writing audience, and for students and scholars of various academic disciplines, medicine, journalism, the law, government, business, and specific industries.

stemming. The process of reducing inflected words to their base or root form. There are several types of stemming algorithms of varying accuracy, but having a stemming algorithm in place can be important in linguistic information retrieval.

streaming. Streaming allows a computer user to see and hear an audio/video file as it is transferred. Player programs for platforms such as Windows Media, RealNetworks and QuickTime (available free) must be downloaded to decompress audio/video files for listening or viewing. Streaming video is usually sent from prerecorded video files, but can be broadcast live.

supply chain management (SCM). An electronic alternative to the traditional paper chain, enabling participating suppliers to access up-to-date company information and enabling companies to better manage and track supply and demand.

sustaining engineering.Engineering and technical support that fol­lows release of requirements and specifications in the path to deliver an end product. Sustaining engineers are responsible for a system’s upkeep, and monitoring the data it creates.

syllabary. A table of syllables or more specifically a set of the syllabic symbols/characters in which each character represents a syllable, used in certain languages such as Japanese.

syntax.The study of the rules whereby words or other elements of sentence structure are combined to form grammatical sentences.


target language (TL).The language that a source text is being translated into.

TBCS-EUC.A triple-byte character set (TBCS) encoded according to the specification of the extended UNIX code (EUC).

TBX.TermBaseeXchange standard. A standard for terminology and term exchange.

technical committee (TC). Standardization bodies usually own, create, maintain and update technical standards through purpose- specific technical committees. In organizational structures such as OASIS, Unicode and ISO, they are called technical committees, while in others such as W3C they are not. They may also be referred to as an Industry Specification Group, Working Group, Special Interest Group and so on.

telephone interpreting. The interpreter, who is usually based in a remote location, provides interpretation via telephone for two indi­viduals who do not speak the same language. Most often, telephone interpreting is performed in the consecutive mode. This means that the interpreter listens to each utterance first and then proceeds to render it into the other language, as opposed to speaking and listening simultaneously.

terminology management. Primarily concerned with manipulat­ing terminological resources for specific purposes – for example, establishing repositories of terminological resources for publishing dictionaries, maintaining terminology databases, ad hoc problem solving in finding multilingual equivalences in translation work or creating new terms in technical writing. Terminology management software provides the translator a means of automatically searching a given terminology database for terms appearing in a document, either by automatically displaying terms in the translation memory software interface window or through the use of hotkeys to view the entry in the terminology database.

terminology manager.A computer technology application tool that assists in the translation of text from one spoken language to another, tidy functions. Tidy is a binding for the Tidy HTML clean and repair utility that allows a user to not only clean and otherwise manipulate HTML documents, but also traverse the document tree,

time-to-market. The length of time it takes from a product being conceived until it is available for sale. Time-to-market is crucial in industries where products are outdated quickly, token (tokenization). The fundamental elements making up the text of a C program. Tokens are identifiers, keywords, constants, strings, operators and other separators. White space – such as spaces, tabs, new lines and comments – is ignored except where it is necessary to separate tokens..

Tracker extensible Markup Language (TXML).An XML-based pivot format. The translation memory environment Wordfast Pro uses TXML..

Traditional Chinese. A Chinese character set that is consistent with the original Chinese ideographic form that is several thousand years old. Today, traditional characters are used in Taiwan, Hong Kong, Macau and by some overseas Chinese communities, especially those originating from the aforementioned regions/countries or who emi­grated before the widespread adoption of simplified characters in the People’s Republic of China.

translation.The process of converting all of the text or words from the source language to the target language. An understanding of the context or meaning of the source language must be established in order to convey the same message in the target language,.

translation management system (TMS). Sometimes also known as a globalization management system, a TMS automates localization workflow to reduce the time and money employed by manpower. It typically includes process management technology to automate the flow of work and linguistic technology to aid the translator,.

translation memory (TM). A special database that stores previously trans­lated sentences which can then be reused, in full or in part, on a sentence- by-sentence basis. The database matches source to target language pairs..

Translation Memory eXchange (TMX). Based on XML, an open stan­dard that has been designed to simplify and automate the process of converting translation memories from one format to another.

translation memory system (TMS).A tool for computer-aided trans­lation. The translation memoiy (TM) stores the original text and its human translation in manageable units. The TM system proposes the translation whenever the same or a similar unit occurs again, translation portal. A website or service that offers a broad array of resources via the internet, thus providing a marketplace for trans­lation agencies, freelance translators and customers to exchange services.

Translation technology. Information and communication technology that executes or helps to execute the translation process aiming at increased efficiency and speed.

translation unit (TU). A segment of a text that the translator treats as a single cognitive unit for the purposes of establishing an equivalence.

The translation unit may be a single word, a phrase, one or more sentences or even a larger unit.

transliteration. To write or print a letter or word using the closest corresponding letters of a different alphabet or language.A system­atic way to convert characters in one alphabet or phonetic sounds into another alphabet.

truncation.Truncating text lines in the display means leaving out any text on a line that does not fit within the right margin of the window displaying it. Also, in database searching, the addition of a symbolat the end of a word or word stem so the computer will look for all variants of the word.

TTK.Stands for Translation Toolkit.The native bilingual format for Alchemy CATALYST, which supports previous versions of Alchemy CATALYST project fdes.

24/7. An abbreviation for 24 hours a day, 7 days a week, including holidays and days that otherwise may alter limitations of work. In commerce and industry, 24/7 identifies a service that will be present regardless of the current time or day, as might be offered by a restaurant, gas station, manned datacenter, supermarket or help information line.

tweet. A post or status update on Twitter, a microblogging service. Tweets are text-based posts of up to 140 characters displayed on the author’s profile page.

Twitter. A social networking and microblogging service, owned and operated by Twitter, Inc., that enables its users to send and read other user messages called tweets.


uncial writing. A majuscule script commonly used from the third to the eighth centuries common era by Latin and Greek scribes.

Unicode. The Unicode Worldwide Character Standard (Unicode) is a character encoding standard used to represent text for computer processing. Originally designed to support 65,000 characters, it now has encoding forms to support more than one million characters.

Unicode Consortium.Home of the Unicode Standard and Common Locale Data Repository (CLDR). Unicode’s goal is to support scripts for all languages in the world.

Unicode Localization Interoperability technical committee (ULI). The third Unicode Consortium technical committee was formed in April 2011. ULI has not chartered creating its own standards; instead, it is looking into localization interoperability related standards behaviors and profiling.

Unicode TR29.The primary Unicode standard defining word and sen­tence boundaries. This standard is also referred to as Unicode Standard Annex #29 or UAX #29.

Unicode transfer format (UTF-8). An encoding form of Unicode that supports ASCII for backward compatibility and covers the characters for most languages in the world.

uniform resource identifier, uniform resource locator (URI, URL).

Short strings that identify resources on the web: documents, images, downloadable files, services, electronic mailboxes and other resources. United Arab Emirates (UAE). A federation of seven emirates, each admin­istered by a hereditary emir, situated in the southeast of the Arabian Pen­insula in Southwest Asia on the Persian Gulf, bordering Oman and Saudi Arabia. The UAE consists of Abu Dhabi, Dubai, Sharjah, Ras A1 Khaimah, Ajman, Umm AQaiwain and Fujairah. An emirate is a political territory that is ruled by a dynastic Muslim monarch-styled emir.

Universal Learning Format (ULF). A modular set of XML-based for­mats for capturing and exchanging various types of e-learning data..

Universal Terminology eXchange (UTX).A format for user-created dictionaries with source language and target language entries. UTX is intended to absorb the differences between various formats for machine translation. UTX can be used for other purposes, especially in the domain of natural language processing.

UNIX.A multiuser, multitasking operating system. It was one of the first operating systems to be written in a higher level programming language, thus making it hardware-independent, usability. The ease that users experience in navigating an interface, locating information and obtaining knowledge over the internet.

User Agent Accessibility Guidelines (UAAG).Provides guidelines for designing user agents that lower barriers to web accessibility for people with disabilities. User agents include browsers, media players and applications that retrieve and render web content.


variable. In computer programming, variables enable programmers to write flexible programs. Rather than entering data directly into a program, a programmer can use variables to represent the data. Then, when the program is executed, the variables are replaced with real data. This makes it possible for the same program to process different sets of data.

vector-based. Refers to software and hardware that use geometrical formulas to represent images (same as object-oriented graphics),.

video game.A game that involves interaction with a user interface to generate visual feedback on a video device. The electronic systems used to play a video game are known as platforms; examples of these are personal computers and video game consoles. These platforms are broad in range, from large computers to small handheld devices,.

voiceover. Refers to a production technique where a disembodied voice is broadcast live or prerecorded in radio, television, film, theater and/or pre­sentation. The voiceover may be spoken by someone who also appears on­screen in other segments or it may be performed by a specialist voice actor.

VoiceXML. The Voice Extensible Markup Language standard enables voice input and audio output for voice response and multimodal applications.


Web Accessibility Initiative (WAI). An effort to improve the acces­sibility of the world wide web for people with disabilities. People with disabilities may encounter difficulties when using computers gener­ally, but also on the web. Since people with disabilities often require nonstandard devices and browsers, making websites more accessible also benefits a wide range of user agents and devices, including mobile devices, which have limited resources. The W3C launched the Web Accessibility Initiative in 1997 with endorsement by The White House..

Web Content Accessibility Guidelines (WCAG).Part of a series of web accessibility guidelines published by the WAI. They consist of a set of guidelines for making content accessible, primarily for people with disabilities, but also for all user agents, includ­ing highly limited devices such as mobile phones. The current version, WCAG 2.0, was published in December 2008 and is also an ISO standard, ISO/IEC 40500:2012.

web hit. The counting term sometimes used to measure website traffic. The count includes every file used on a web page as a “hit” to that page. Viewing one page with six graphics would mean at least seven hits. Page views and unique visitors are more accurate measures of website traffic.

Web Ontology Language (OWL).A family of knowledge repre­sentation languages or ontology languages for authoring ontolo­gies or knowledge bases. The languages are characterized by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL is endorsed by the World Wide Web Consortium (W3C) and has attracted academic, medical and commercial interest.

web service.A collection of protocols and standards used for exchang­ing data between applications or systems.

whispering interpreting. Also called chuchotagc, the interpreter sits or stands next to the intended audience and interprets simultaneously :n a whisper. This mode does not require any equipment. Whispered interpretation is often used in situations when the majority of a group speaks one language, and a limited number of people do not speak the source language.

Win 32/64. Refers primarily to the number of bits that can be pro- :essed or transmitted in parallel, or the number of bits used for a single element in a data format in a Windows operating system.

World Wide Web Consortium (W3C).An international community that develops and owns many standards, including XML and HTML.

Written Chinese. Written Chinese refers to the thousands of symbols or Chinese characters used to represent spoken Chinese, along’ with rules and conventions about how they are arranged and punctuated. Chinese characters do not constitute an alphabet or a compact syllabary. Instead, they are built up from simpler parts representing objects or abstract notions, although most characters do contain some indication of their pronunciation.


XML Localization Interchange File Format (XLIFF). An XML-based format for exchanging localization data. Standardized by OASIS in April 2002 and aimed at the localization industiy, XLIFF specifieselements and attributes to aid in localization. XLIFF could be used to exchange data between companies, such as a software publisher and a localization vendor, or between localization tools, such as translation memory systems and machine translation systems.

XML (extensible Markup Language). A programming language/ specification pared down from SGML, an international standard for the publication and delivery of electronic information, designed especially for web documents. (XML-based Text Memory). A standard for XML to allow ease of translation of XML documents.

extensibleHyperText Markup Language (XHTML). A family of XML markup languages that mirror or extend versions of the widely used Hypertext Markup Language (HTML), the language in which web pages are written.

XSL (extensible Stylesheet Language).A language for expressing style sheets, controlling formatting and other output behavior.


ZWNBS. Zero width no break space (ZWNBS) is also known as the byte order mark (BOM) if used at the beginning of a Unicode file. It was originally used in the middle of Unicode files in rare instances where there was an invisible join between two characters where a line break must not occur. A new code joiner has been implemented – U+2060 WORD JOINER.