LazloCatalogSortTechnique

This page describes a possible technique for sorting catalog numbers
consistently and automatically, with three examples.

Overview

This technique uses two catalog numbers per catalog number field:

      • Visible Catalog Number** This is the catalog number
        entered by the user and displayed on the site. It should be
        entered exactly as it appears on the release.
      • Shadow Catalog Number** This is a version of the visible
        catalog number that has been automatically transformed by a
        series of rules to help it sort more accurately. It is never
        visible to the end user, but is used to determine sort order
        on a label page. It is automatically updated every time the
        visible catalog number is changed or when the transformation
        ruleset changes.

Transformation Process

The shadow catalog number is generated by running the visible catalog
number through a series of regular expression transformations. The
examples use the Unix "sed" utility to perform these transformations.

It's possible to have more than one set of transformation rules. A very
basic set of rules might be applied on all label pages, with fancier
custom rules assigned to labels on an individual basis, to solve sorting
problems that are specific to the catalog numbers the label uses.

Examples

Basic Rule Set

Here's a very basic rule set that could be applied to generate shadow
catalog numbers sitewide.

#!/usr/bin/sed -f
s/[a-z]/\U&/g
s/[^A-Z0-9]//g
s/\([0-9][0-9]*\)/\t\1\t/g
:a;s/\t\([0-9]\{1,15\}\)\t/\t0\1\t/;ta
s/\t//g

Here's what this goobledygook does:

  • Transforms all lowercase letters to uppercase.
  • Strips out all characters except for letters and numbers.
  • Separates groups of numbers from groups of letters using tab
    characters.
  • Pads all number groups out to sixteen digits wide ("90241" becomes
    "0000000000090421", etc.)
  • Strips out the tabs.

Other Examples (not planned to be implemented as of 14th Oct 2008)

Mercury Records, Basic Transformation

LazloCatalogSortExample1 is an
example of how the Mercury label page
1 sorts after these rules have
been used to generate "shadow" versions of the existing catalog numbers.
If you look it over you'll see that the changes are subtle. Spacing and
placing of dashes, for example, no longer matter: "574865-2" now comes
between "574 847-1" and "574 902-2" instead of after "574 975-1"; the
"422" series is now interspersed instead of split into "422"+space vs.
"422"+dash, and so forth. This alone is a major improvement over the
current sorting system, and would benefit the site even if it were the
only change implemented: catalog numbers can be entered exactly as they
appear and still sort without needing to manually add extra spaces and
other display-oriented tweaks.

Mercury Records, Complex Transformation

But if we can create specific per-label rule sets we can get much
fancier. LazloCatalogSortExample2
is an example of how the Mercury page might be sorted if we added a
handful of additional rules that are specific to how the label formats
its catalog numbers. The goal is to cluster related releases together
even when a conventional catalog number sort can't do so. Mercury UK
uses different prefixes for different formats of the same single, which
will all share a common sequence number -- for example, the 7" of
Electribe 101's "Tell Me When The Fever Ended" was MER 310, the CD
single was MERCD 310, the 12" was MERX 310 and the remix 12" MERXR 310.
By adding two rules, we get all these different formats to sort
together:

s/^\(MER[A-Z]*\)\t\(.*\)$/MER\t\2\t\1/
s/^\(MRX[A-X]*\)\t\(.*\)$/MER\t\2\t\1/

(This rule needs some tweaking to handle album releases which use the
MERL/MERH prefixes, but you get the idea.) Another couple of rules
allows shortened catalog numbers to be listed in sequence with full
catalog numbers. For example, Van Morrison's "No Guru, No Method, No
Teacher" LP (entered as catalog number 422-830 077-1 M-1) can be listed
next to the CD (entered as catalog number 830 077-2, omitting the "422"
prefix) by adding one more rule:

s/\t\(8[0-9]\{6\}\)\t/\t422\1\t/

And so forth.

ZTT Records, Complex Transformation

As an example of how complex a label's catalog number can be and still
be processed using this system,
LazloCatalogSortExampleZTT
shows the ZTT page 2 as it sorts
using a series of rules that are specific to this label. ZTT's catalog
scheme is notoriously complicated and inconsistent, but a set of rules
can still be devised that allows releases to be clustered in a sensible
manner.