Translating Narration Scripts from Japanese to English

by Tom Gally

The following is the written version of a presentation I gave at the Ninth International Japanese/EnglishTranslation Conference (IJET-9) on May 23, 1998, in Yokohama, Japan. Further information on the IJET conferences can be found at the IJET home page.

NARRATION SCRIPTS present special challenges for translators. Because they are intended to be read aloud, their style and format must make them easy to read and easy to understand when heard. And because they often accompany recorded visual images, there may be severe restraints on the length and structure of sentences.

What Are Narration Scripts?

By "narration scripts," I mean primarily scripts for the overdubbed narration of videos, films, CD-ROMs, and other multimedia presentations. Typical examples include scripts for public relations videos for companies, government agencies, and other organizations; guides to factories; descriptions of technologies or products; educational and training videos; in-flight entertainment and advertisements; television commercials; and video and film documentaries. A related category is scripts for live presentations, such as by speakers at trade shows and conferences. (In this paper, I do not discuss the translation of dubbed or subtitled dialogue for dramatic films.)

How Are Narration Scripts Different from Other Types of Translation?

In most translations, the translator can assume that the translated text will be presented to readers in written form. With narration scripts, though, the end user - the viewer of a video, the listener to a speech - does not see a written text. Instead, the written translation is converted into speech through a key intermediary: the narrator. To create good narration scripts, the translator must understand what narrators do and the restrictions under which they work.

How Narrations Are Recorded

Let's look at how a translated script is recorded for a typical Japanese public-relations video. The English translation is usually ordered after the Japanese version is finished. For the English version, only the spoken narration and the on-screen titles will be changed, because re-editing the images and musical soundtrack would be too expensive. At about the same time when the final Japanese script is sent to be translated, a recording studio is reserved and a native English-speaking narrator is hired to do the narration.

After the script has been translated and checked, the narration is recorded. The recording session may be attended not only by the narrator but also by people from the translation agency, the video production company, the advertising agency, and the company that is paying for the video. Most of these people do little during the recording; the focus of attention is on the director and the narrator.

The narrator sits in a small booth separated from the control room by a sheet of soundproof glass. In front of him (and for the sake of pronominal simplicity I will assume that the narrator is male, as perhaps seventy percent of narrations are done by men) is a small table on which he places the script, a microphone into which he speaks, a cue lamp that lights up when he is to start reading, and perhaps a television monitor so that he can view the video as he speaks. He is typically a bit nervous, as he is being paid handsomely for his time and wants to do a good job with as few mistakes as possible. While even the best narrators make mistakes - stumbling over words, dropping syllables, using the wrong intonation - all want to keep their errors to a minimum, because only flawless recordings are accepted, and repeated re-recordings of phrases or sentences are time-consuming and, if the recording session goes over the scheduled time, expensive.

The narrator must concentrate intensely, for not only must he read the words correctly and clearly; he must also say them with the intonation, speed, and vocal quality that will be the most effective and pleasing to the ear. A good narrator can choose from a large repertoire of voices and speaking styles, and any distraction - particularly infelicities in the script - can keep him from doing his best. To do a good job, the narrator needs a good script.

Spell Out Symbols and Abbreviations

Consider the following sentence that might appear in a script:
The test is run at 47。C.
In a normal written translation, this sentence would not raise an eyebrow, but in a script it presents a problem: How is the narrator supposed to read it? Is it
The test is run at 47 degrees Celsius.
The test is run at 47 degrees centigrade.
The test is run at 47 degrees.
Any of these is possible, but the narrator should not have to decide. That decision should be made by the translator. Most narrators do not have technical backgrounds, and some might even read this sentence as
This test is run at 47 degree C ("see").
Even the most mathematically adept narrator is likely to stumble over the next example:
Lake Biwa contains 27.5 x 10 9 m 3 of water.
Few people can read such figures without pausing. The sentence should be written:
Lake Biwa contains 27.5 billion cubic meters of water.
In the above examples, I have written the numbers as numerals because the narrators I have worked with do not seem to have any problem reading them. Some narrators, though, may prefer to minimize the distractions and have numbers written out:
Lake Biwa contains twenty-seven point five billion cubic meters of water.
Numbers that may be read in more than one way should always be spelled out:
This highway will be completed in twenty-ten (not 2010). An average of fifteen hundred (not 1500) people visit this museum every day.
Abbreviations present similar problems. For example:
JAPANCO manufactures APIX, TRI, and RFLEX systems.
While JAPANCO is no doubt read Japanco, what about APIX? Is it A-pix or A-P-I-X? Find out how abbreviations are read and spell them accordingly. Hyphens and periods can help:
Japanco manufactures Apix, T.R.I., and R-Flex systems.

Use Spoken Language

Other expressions normally used only in written texts should be avoided as well. For example:
This factory is Japan's leading producer of gadgets, widgets, etc.
While some English speakers do say "et cetera" in speech, it is rarely heard in professionally written scripts, and it should be used only sparingly to translate the ubiquitous など and ら and や of Japanese. For this example, one could write just
This factory is Japan's leading producer of gadgets and widgets.
or, in the case of a public relations video,
This factory is Japan's leading producer of gadgets, widgets, and other innovative solutions for the 21st century.
Another written expression that does not convert well into speech is the parenthesis. If read aloud as written, the following example sounds strange:
We call this the Electronic Control System (ECS).
Try one of these alternatives:
We call this the Electronic Control System, or ECS. We call this ECS, for Electronic Control System. We call this the Electronic Control System. ECS is...

Mark Pronunciations

Unlike translators, professional narrators generally don't live in rooms surrounded by reference books. If they don't know how to pronounce a word, many are likely to guess. The translator should anticipate which words might cause difficulties and mark the reading or pronunciation accordingly. The following is a sentence from a script I translated recently:
Our researchers have developed a new excimer laser.
When I was reading the script aloud before I turned it in to my client (a practice, by the way, that translators should follow for all spoken texts), I realized that I didn't know how to pronounce "excimer." Fortunately, the McGraw-Hill Dictionary of Scientific and Technical Terms (Fifth Edition) gives a pronunciation, so my revised script read as follows:
Our researchers have developed a new excimer* laser.
*pronounced 'EKS-suh-murr'
Narrators may not understand dictionary pronunciation symbols or the International Phonetic Alphabet, so this sort of folk phonetic script (EKS-suh-murr) is best. In recent years, the New York Times has been indicating pronunciations in this way, and their style would be a good model to follow.

Strive for Mellifluousness

Just as you should not confuse the narrator with obscure words, you should also not try to twist his tongue when he is trying to speak clearly and dramatically. The following sentence, even if an accurate translation, is not advisable for a narration:
Our concerns focus particularly on particulates pollution.
All those p's and r's and l's would throw anybody reading the sentence aloud, and the alliteration might distract the listener. Try something like this instead:
We are especially concerned about particulate emissions.

Present Information Linearly

Though the narrator is the translator's immediate audience, don't forget the listener. Some expressions may be easy to read aloud but difficult to understand. For example:
The gadget, widget, and thingamabob weigh 22, 37, and 86 grams, respectively.
If you had heard that sentence spoken aloud once, would you be able to say how much the widget weighs? Since the listener can't go back and read the sentence again, it's better to recast:
The gadget weighs 22 grams, the widget 37 grams, and the thingamabob 86 grams.
Here's another nonlinear pattern - that is, a pattern in which corresponding elements do not appear near each other - that should be avoided:
The gadget and the widget are among our most popular products. We developed the former in 1993 and the latter in 1995.
Either repeat the nouns:
The gadget and the widget are among our most popular products. We developed the gadget in 1993 and the widget in 1995.
or recast the sentence:
We developed the gadget in 1993 and the widget in 1995. They are among our most popular products.
The next example would seem to present no problem:
And the price is only $1.3 million.
Any professional narrator can read this as one point three million dollars. However, some might stumble over the words while reading because the three written elements $, 1.3, and million are not read in the order in which they are written. The translator should make things easier on the narrator and write the sentence as follows:
And the price is only 1.3 million dollars.

Keep Sentences Short and Simple

A frequent problem is sentence length. Consider the following sentence taken from a Japanese video script:
Translated as a single English sentence, it might come out as:
The internal structure of the single high-strength, galvanized steel wire, 5.23 millimeters in diameter, which supports this huge suspension bridge, exactly resembles that of bamboo fiber, and the wire is comprised of cementite and ferrite.
The narrator would need capacious lungs to get this English sentence out in a single breath, and a breath taken in the middle might be audible or disrupt the sentence's flow. But even more problematic is the sentence's complex structure. In the first clause, nearly twenty words separate the subject structure from the verb resembles, and that separating phrase includes both an appositive (5.23 millimeters in diameter) and a relative clause (which supports this huge suspension bridge). This convoluted structure would make the sentence difficult for listeners to understand. Break it up into shorter, simpler sentences:
The cable supporting this huge suspension bridge is made of individual strands of strong galvanized steel wire. Each strand is 5.23 millimeters in diameter. The wire is made of cementite and ferrite, and its internal structure is very similar to that of bamboo fiber.
Here's another example from a video describing how internal combustion engines work:
A lung-emptying, listener-confusing translation might be:
In contrast to steam engines, which burn coal or other fuel to convert water into steam and move pistons or turbines by the steam's pressure, internal combustion engines inject a mixture of air and vaporized fuel directly into a closed chamber for burning and then convert the energy of the gases' expansion into motive power.
A better rendition:
Steam engines work by burning coal or other fuel to turn water into steam. The pressure of the steam drives pistons or turbines. In the internal combustion engine, a mixture of air and vaporized fuel is injected directly into a closed chamber and burned. The energy from the expanding gases then becomes the motive power.

Relative Clauses

One common cause of awkward English in Japanese-to-English translations is the preservation of Japanese relative clauses as relative clauses in English. For example, a Japanese sentence of the structure
[long relative clause modifying the subject] + [subject] + [predicate]
comes out in English as
[subject], which [long relative clause modifying the subject], [predicate]
The Japanese sentence is easy to understand because the related elements are close to each other - the relative clause immediately precedes the subject it modifies, and the subject immediately precedes its predicate. In the English sentence, though, the long relative clause separates the subject from its predicate and makes the sentence difficult to follow. Often the best way to solve this problem - not only in narration scripts but in most types of translation - is to spin off the relative clause as a separate sentence.

Semantic Order

Another reason translations often read or sound awkward is that the semantic elements of the sentences are not arranged in the most natural order. Consider the following Japanese passage and two possible English translations, especially the sentences in italics.
In Japan, in Asia, throughout the world, society is changing and technology is advancing at an unprecedented pace. People rely more and more on the media to understand these changes. In Japan, in Asia, throughout the world, society is changing and technology is advancing at an unprecedented pace. To understand these changes, people rely more and more on the media.
As a translation, either A or B is acceptable. However, B flows more naturally than A. The first sentence (In Japan, in Asia,...) describes changes that are taking place, so the second sentence sounds more natural if changes is mentioned before the new information (rely more and more on the media). Here's another example:
Japanco sponsors trade shows, seminars, and symposia both in Japan and overseas. People come into contact with the latest information at these events. Japanco sponsors trade shows, seminars, and symposia both in Japan and overseas. These events bring people into contact with the latest information.
Once again, B is more natural than A. The first sentence (Japanco sponsors...) describes several events, so the second sentence flows more smoothly if events - the old information - is mentioned before the new information (People come into contact with the latest information).

In linguistics, the term for the old, previously mentioned information in a sentence is theme, while the new information is called the rheme. When we speak or write our native languages, the natural flow of sentences is to have the theme at the beginning and the rheme the end. When translating, though, we often lose this natural flow as we try to preserve the information in the source text. As a result, translated texts often contain sentences with the theme-rheme order reversed. This is one reason why even accurate translations can read awkwardly and convey information poorly.

The Last Shall Be First

Because the new information normally comes toward the end of sentences, the end of the sentence should also be the location of the most important information. The following example comes from the very end of a PR video:
The 21st century will present many exciting challenges. At Japanco, we are committed to meeting those challenges. The 21st century will present many exciting challenges. At Japanco, those challenges will be met. The 21st century will present many exciting challenges. We are committed to meeting those challenges at Japanco.
Version A has two problems. The first is a theme-rheme problem: the theme (challenges) comes after the rheme (committed to meeting). The other is that the two sentences end with the same word. While there is no grammatical reason to avoid repeating the same word at the end of consecutive sentences, the repeated word creates an awkward, distracting rhythm, especially if the sentences are spoken and short.

Version B is better, but it ends on a weak note (will be met). What remains in a listener's ear after a sentence has been spoken is the last part of the sentence, not the first. Because this video is publicity for the company Japanco, the video's final sentence is most effective if it ends with the company's name. That is while I prefer version C.

Though perhaps counterintuitive, the lesson is simple: The most important word in a sentence is the last.


While the essence of a good narration script is the language used, the physical format of the printout can also make the narration session go more smoothly.

Use large type. Like translation, narration is a profession that can be pursued well beyond middle age, and many older narrators have trouble reading small print. Use at least 14-point type for the printed script. Some narrators prefer 18-point. Serif typefaces such as Times or Palatino are usually easier to read than sans serif faces like Helvetica.

Never allow a sentence to run over from one page to the next. It is difficult to read smoothly if one's eye must jump from the bottom of one page to the top of the next. Narrations are usually recorded in blocks of one or two pages so that the microphone will not pick up the sound of rustling paper. Some word processing programs such as Microsoft Word have a paragraph style feature to keep paragraphs together automatically. If you are using other software, then you will have to insert page breaks manually.

Do not staple the pages of the script given to the narrator. If he does read two pages together, then he will have to spread the pages flat next to each other on the table. For the same reason, print the script on only one side of the paper.

Sometimes Japanese scripts do not have numbered paragraphs (also called "cues," because a visual cue is given to the narrator at the start of each paragraph). During translation and recording, however, the translator, director, and others often have to refer back and forth between the Japanese and English versions. When given an unnumbered script, the translator should number all the Japanese cues and use those numbers in the translation.

Format the text with a hanging indent and extra space between the cues:
Steam engines work by burning coal or other fuel to turn water into steam. The pressure of the steam drives pistons or turbines.

In the internal combustion engine, a mixture of air and vaporized fuel is injected directly into a closed chamber and burned.

The energy from the expanding gases then becomes the motive power.
A coordinator at a translation agency once told me that he thought that scripts were easier to read if the text was not aligned on the right margin, because then the spaces between the words were all the same length. However, one narrator told me that he preferred justified text. I am still not sure whether justification is justified.


The most common and serious problems that arise during the recording of translated narration scripts involve timing, specifically texts that do not match the recorded images and texts that are too long or too short.

The following sentence comes from the video about internal combustion engines. While the following sentence is being read, the screen shows a series of animated computer graphics of first an engine operating, then a crankshaft turning, and finally an automobile's wheels turning. The Japanese sentence reads:
If this were a written translation, the following version would be acceptable:
The wheels of the car are turned by the crankshaft, which is driven by a repeated four-part cycle of intake, compression, expansion, and exhaust.
For the video, though, this translation is unacceptable because the words do not match the images on the screen. The narrator would be reading "wheels" while the engine's operation is shown and "intake, compression, expansion, and exhaust" when the crankshaft and wheels appear. The following translation is better:
Intake, compression, expansion, and exhaust repeat in a four-part cycle to turn the car's crankshaft and wheels.
An even more common problem is texts that are too long. A sentence that takes 10 seconds to read in Japanese may take 15 or even 20 seconds to read when translated faithfully into English. It would usually be too expensive to re-edit the images, so the text must be shortened instead.

Here is a passage that would take about 20 to 22 seconds to read:
Steam engines work by burning coal or other fuel to turn water into steam. The pressure of the steam drives pistons or turbines. In the internal combustion engine, a mixture of air and vaporized fuel is injected directly into a closed chamber and burned. The energy from the expanding gases then becomes the motive power.
If only 18 or 19 seconds were available, it could be shortened:
Steam engines work by burning fuel to turn water into steam. The steam pressure drives pistons or turbines. In the internal combustion engine, a mixture of air and vaporized fuel is injected directly into a chamber and burned. The energy from the expanding gases becomes the motive power.
Here's a 14-second version:
In steam engines, fuel is burned to produce steam that drives pistons or turbines. In the internal combustion engine, an air-fuel mixture is injected into a chamber and burned. The expanding gases provide the motive power.
If only 8 seconds were available, the passage could be shortened to this:
Steam engines burn fuel to produce steam that provides the motive power. Internal combustion engines produce that power directly.
While some information has been dropped, the key point - the difference between steam engines and internal combustion engines - is preserved.

Occasionally a translation will be too short, and the translator or narrator will be asked to add text to stretch out the narration. In public relations videos, this can sometimes be done by adding boiler-plate adjectives (high-performance, superb, top-quality), but in explanatory texts it is often difficult to add information if you are not familiar with the background and purposes of the video. When asked to lengthen a text, I usually try to contact the author or client to find out what sorts of information can or should be added.

Often narrators are asked to rewrite scripts in the studio. While the narrators can usually do a good job, sometimes problems occur. If the narrator has just received the script, for example, then he may not know which elements can be deleted and which must be preserved. Timing adjustments are best done by the translator, who has read the entire script in the original language and understands the message that is supposed to be conveyed.

Of course, translators can match text to images only if they are able to see the images. Some translation agencies and other clients may not understand how important the video is to the translation, so they may not send a copy of the video to the translator. If you are asked to translate a video script, always request a copy of the video.

If the video is still not available (perhaps it hasn't been edited yet), then all you can do is keep your translation short and try to preserve the same sequence of information as in the source text, especially if the script seems to be explaining what is happening on the screen. One rule of thumb for calculating the probably length of a narration is that a Japanese narrator usually reads 5 to 6 characters per second (more with kana, less with kanji), and that an English narrator reads about 13 to 17 letters (including spaces) per second, or an average of 4 seconds per 60-space line.

When you do have the video, watch it once or twice before translating your first draft. When that draft is finished, read it aloud while watching the video to make sure that the words match the images. In addition to timing problems, you're also likely to notice awkward phrasing, terminology problems, and singulars that should be plurals and vice versa. (When reading aloud, be careful not to speak too quickly. Narration tends to be slower than normal conversation.)

Work Flow and Money

In Japan, the usual work flow for translation scripts is longer and more complex than for other translation jobs. For a typical PR video, the work flows like this:
Client (i.e., the company paying for the video) --> Advertising agency --> Video production company --> Translation agency, talent agency, language school, event production company, etc. --> Translator, narrator
Since the translator is at the very end of this long food chain (which may also include "coordinators," that is, fixers who bring the various parties together and collect a percentage), we would naturally like to get the work from a higher level than a translation agency. That is not as easy as it sounds, though. While occasionally script work may be available from the original client or, more often, from the advertising agency, usually it is the video production company that farms out the translation and narration work. There are over 600 of these video production companies in Japan, and most of them produce only a handful of translated videos a year. It is much easier for them to ask a translation agency to provide both the translation and the narration, especially since many English-speaking narrators in Japan do not speak Japanese well enough to communicate with the usually monolingual staff of video production companies.

Despite this long food chain, experienced translators of narration scripts can usually charge rates above the market average for translation. Producing a video costs a lot of money and the script is an important element in the quality of the finished video, so there is less pressure to skimp on translation expenses than with other types of work. It helps to know that narrators are paid well for the work they do. A low-end, relatively inexperienced narrator may receive 30,000 yen for an hour or two in the studio, while a top-notch narrator whose voice and talents are in demand might be paid 100,000 yen or more.


I have been a Japanese-to-English translator since 1986, and during that time I have translated several hundred narration scripts. I still remember clearly the first time I attended the recording session for a script I had translated, because I was stunned at how different the script sounded when read by a professional narrator. The words I had written, which looked so flat and dull on the page, came to full-blooded life through the narrator's rich, nuanced delivery. Whenever I have translated scripts since, I have always pictured my first reader as not my client and not the audience but the narrator.


I would like to thank the narrators I have worked with over the years for the opportunity to observe them at work and for their comments and suggestions on my scripts. I would especially like to thank Takeo Hirose (廣瀬武夫), the president of ByWord, a Tokyo agency that specializes in narration and translation, for his many valuable comments and suggestions.