UPC-A and EAN-13 barcodes

Barcodes started appearing on retail LP sleeves in the late 1970s, but adoption was slow in the industry; it wasn't until circa 1986 that major-label releases consistently had barcodes, and there were still exceptions after that.

There are mainly two standard barcode formats for retail merchandise: UPC-A and EAN-13. The bars of a UPC-A encode a 12-digit number. The bars of EAN-13 encode a 13-digit number.

UPC-A is a forward-compatible subset of EAN-13. Both formats nowadays share a common "EAN/UPC" definition in the GS1 standard.

General format

The actual standards are complex, but the digits represented by EAN/UPC can be thought of in this simplified way:

  • a 1-digit region code / country flag (0 or omitted for North America)
  • a 5-digit company prefix (assigned by GS1 to identify the manufacturer)
  • a 6-digit article number (assigned by the manufacturer to identify the item)
  • a 1-digit check character (a digit calculated from the others)

For the main 12 digits (the ones after the optional region code), each digit is directly represented by two pairs of bars and spaces of variable width. The choice of bar/space patterns used for encoding the first six digits depends on the region code, which is only implied, and which for a valid UPC-A must be 0. Thus, when the region code is 0, the bar/space patterns of UPC-A and EAN-13 are identical.

A pair of extra-long guard bars always appears in the middle and on each end of the barcode.

A human-readable interpretation (HRI) version of the barcode is often printed below the bars, sometimes with the first and/or last digit separately to one side.

The 11 digits used for the combination of company prefix and article number are flexible; some companies may be assigned a longer prefix, leaving fewer digits for unique article numbers.

Distinguishing UPC-A from EAN-13

Visual differences

UPC-A and EAN-13 differ in the length of the bars:

  • In UPC-A, the bars representing the first and last digits of the encoded string are extended to be the same length as the guard bars.

UPC-A and EAN-13 differ in how the HRI text is presented:

  • The EAN-13 HRI normally shows 13 digits, all on the same line, with the region code to the left of the first set of guard bars, then two groups of six digits in between the other guard bars.
  • The UPC-A HRI normally shows 10, 11, or 12 digits, with two groups of five between the guard bars, and with the first and last digits (if present) smaller and respectively positioned high and to the left and right of the outer guard bars.
  • The HRI text of any barcode may have dashes to match the catalog number or simplify manual entry. None of these extra characters are encoded in the bars; they are just for the benefit of humans. Although these dashes are more common in the HRI of UPC-A barcodes, their presence does not reliably indicate the type of barcode.
  • EAN-13 barcodes sometimes end with > as a typesetting cue to ensure a sufficient margin to the right of the bars. Although any type of barcode may have < or > on the left or right, respectively, these characters are usually not found on UPC-A because there are usually already human-readable digits in or under both margins.

Japan has its own version of EAN-13 called JAN (Japanese Article Number). The bars are the same as EAN-13, and thus encode 13 digits like normal. However, on older items, the HRI may be printed in a machine-readable manner compatible with legacy OCR (optical character recognition) devices unique to Japan. When the HRI is machine-readable, the font is special, the bars do not extend down in between any numbers, and the text is preceded by the letter T.

Scanner behaviour

A barcode reader only reads across the pattern of black and white bars. It does not read the HRI, and it does not check the length of the bars. Therefore, the scanner does not know whether it is reading UPC-A or EAN-13, unless it finds that the first six digits in the bars use one of the patterns that implies a non-zero region code. In other words, when the region code is zero, a barcode reader cannot be trusted to correctly identify the type of barcode; it could be UPC-A or EAN-13.

Any EAN scanner can read a UPC-A barcode, but an old UPC-only scanner cannot read an EAN-13 barcode at all unless the region code is zero.

Scanners manufactured after 2004 can read any UPC/EAN, and will internally interpret the bars as a 13-digit string. However, if the first digit is zero, then depending on how the reader is configured, it may report the string as an EAN with all 13 digits, or for compatibility with old UPC systems, it may drop the initial zero and report the string as a 12-digit UPC.

Free barcode-reading apps for smartphones and tablets often just report everything as an EAN.

Examples

Here is the back cover of a UK-market CD showing an EAN-13 barcode:

Here is the back cover of the corresponding US-market CD showing a UPC-A barcode:

The bar patterns, as seen by a scanner, are exactly the same. Regardless of which item is scanned, an EAN scanner will report 13 digits (0077779431021), and a UPC scanner (or a modern scanner programmed for UPC compatibility) will report 12 digits (077779431021).

It is factual to say:

  • The US item has a "UPC" or "UPC-A" type of barcode.
  • The UK item has an "EAN" or "EAN-13" type of barcode.
  • The bars on the US item represent a 12-digit string called a "UPC" or "GTIN-12".
  • The bars on the UK item represent a 13-digit string called an "EAN" or "GTIN-13".

The GS1 standard dictates that a UPC-A barcode may be decoded as a 13-digit number by adding an implied leading zero to the GTIN-12. Because of this, and because of the way scanners and vendor databases operate, it is arguably valid to say:

  • Each item has a "UPC" which is the 12-digit interpretation of the bars.
  • Each item has an "EAN" which is the 13-digit interpretation of the bars.

This overlap and ambiguity in the terminology makes it difficult to precisely describe Barcode fields on Discogs. A description of "EAN", for example, may refer to the type of barcode image, or it may refer to a given scanner's numeric interpretation of the bars. It would be confusing to refer to an "EAN" on a release which has a UPC-A type of barcode, even though the bars can be interpreted as a 13-digit EAN string. Therefore, although it is allowed to enter the 13-digit string, most users do not mention the type of barcode at all unless both types are printed on the same release (a very uncommon situation pictured below). If you do wish to mention the type of barcode, do not just rely on the barcode reader; use your eyes to confirm the actual type of barcode.

Here is an unusual example showing the back cover of a CD marketed in the US and Germany with different barcode types (and entirely different digit strings) for each region:

Add-on symbols

Sometimes, a UPC/EAN barcode is followed by an "add-on symbol": a secondary, supplemental barcode which encodes only two digits (or sometimes five), with the HRI printed above instead of below the bars. For example, the short bars on the right side of this image, with "97" above them, constitute a 2-digit add-on to the UPC-A barcode on the left:

These extra digits are for things like magazine issue or product version numbers. For example, CBS Records in the US & Canada sometimes used add-ons for reissue numbering: the first release on a particular format might have no add-on, a reissue might have an 02, the second reissue 03, and so on.

An add-on symbol is not considered part of the primary barcode. Some scanners ignore the add-on and only report what is in the main barcode.

Mythbusting

  • There is no central registry of barcodes.
  • Errors are sometimes made when printing barcodes on items. The barcode may be unscannable, may scan incorrectly, or may belong to a different item. This may be an accident, or on bootlegs it is usually intentional.
  • Sometimes the HRI does not match what is in the bars. Never transcribe the text and then describe it as "Scanned"; it is only scanned if the bars were read and decoded by a machine.

Notable commentary

UPC/EAN inventor George J. Laurer commented on his blog that from a technical capability standpoint, UPC-A was "always" meant to represent a 13-digit code. He made it capable of 12 digits, circa 1973, and expanded it in a clever way to 13 digits several years later, when he devised the EAN-13 encoding. He says that for political reasons, GS1 and its predecessors never accepted this point of view, instead promoting the idea that UPC-A contains only 10, 11, or 12 digits. To this day, GS1 requires that UPC-A only be used for 12-digit codes, and forbids UPC-A from using the bar patterns for non-zero country codes. Laurer believes this arbitrary restriction will eventually be lifted.

References