LazloCatalogSortTechnique

This page describes a possible technique for sorting catalog numbers consistently and automatically, with three examples.

Overview

This technique uses two catalog numbers per catalog number field:

      • Visible Catalog Number** This is the catalog number entered by the user and displayed on the site. It should be entered exactly as it appears on the release.
      • Shadow Catalog Number** This is a version of the visible catalog number that has been automatically transformed by a series of rules to help it sort more accurately. It is never visible to the end user, but is used to determine sort order on a label page. It is automatically updated every time the visible catalog number is changed or when the transformation ruleset changes.

Transformation Process

The shadow catalog number is generated by running the visible catalog number through a series of regular expression transformations. The examples use the Unix "sed" utility to perform these transformations.

It's possible to have more than one set of transformation rules. A very basic set of rules might be applied on all label pages, with fancier custom rules assigned to labels on an individual basis, to solve sorting problems that are specific to the catalog numbers the label uses.

Examples

Basic Rule Set

Here's a very basic rule set that could be applied to generate shadow catalog numbers sitewide.

#!/usr/bin/sed -f
s/[a-z]/\U&/g
s/[^A-Z0-9]//g
s/\([0-9][0-9]*\)/\t\1\t/g
:a;s/\t\([0-9]\{1,15\}\)\t/\t0\1\t/;ta
s/\t//g

Here's what this goobledygook does:

  • Transforms all lowercase letters to uppercase.
  • Strips out all characters except for letters and numbers.
  • Separates groups of numbers from groups of letters using tab characters.
  • Pads all number groups out to sixteen digits wide ("90241" becomes "0000000000090421", etc.)
  • Strips out the tabs.

Other Examples (not planned to be implemented as of 14th Oct 2008)

Mercury Records, Basic Transformation

LazloCatalogSortExample1 is an example of how the Mercury label page 1 sorts after these rules have been used to generate "shadow" versions of the existing catalog numbers. If you look it over you'll see that the changes are subtle. Spacing and placing of dashes, for example, no longer matter: "574865-2" now comes between "574 847-1" and "574 902-2" instead of after "574 975-1"; the "422" series is now interspersed instead of split into "422"+space vs. "422"+dash, and so forth. This alone is a major improvement over the current sorting system, and would benefit the site even if it were the only change implemented: catalog numbers can be entered exactly as they appear and still sort without needing to manually add extra spaces and other display-oriented tweaks.

Mercury Records, Complex Transformation

But if we can create specific per-label rule sets we can get much fancier. LazloCatalogSortExample2 is an example of how the Mercury page might be sorted if we added a handful of additional rules that are specific to how the label formats its catalog numbers. The goal is to cluster related releases together even when a conventional catalog number sort can't do so. Mercury UK uses different prefixes for different formats of the same single, which will all share a common sequence number -- for example, the 7" of Electribe 101's "Tell Me When The Fever Ended" was MER 310, the CD single was MERCD 310, the 12" was MERX 310 and the remix 12" MERXR 310. By adding two rules, we get all these different formats to sort together:

s/^\(MER[A-Z]*\)\t\(.*\)$/MER\t\2\t\1/
s/^\(MRX[A-X]*\)\t\(.*\)$/MER\t\2\t\1/

(This rule needs some tweaking to handle album releases which use the MERL/MERH prefixes, but you get the idea.) Another couple of rules allows shortened catalog numbers to be listed in sequence with full catalog numbers. For example, Van Morrison's "No Guru, No Method, No Teacher" LP (entered as catalog number 422-830 077-1 M-1) can be listed next to the CD (entered as catalog number 830 077-2, omitting the "422" prefix) by adding one more rule:

s/\t\(8[0-9]\{6\}\)\t/\t422\1\t/

And so forth.

ZTT Records, Complex Transformation

As an example of how complex a label's catalog number can be and still be processed using this system, LazloCatalogSortExampleZTT shows the ZTT page 2 as it sorts using a series of rules that are specific to this label. ZTT's catalog scheme is notoriously complicated and inconsistent, but a set of rules can still be devised that allows releases to be clustered in a sensible manner.