Music recommendations
by source, not by history.

Signal is a recommendation engine that asks a different question than Spotify does. Instead of “what would this user listen to next”, it asks “what is this music actually adjacent to.” It reads production credits, label lineage, audio embeddings, tag folksonomies, and editorial writing — and returns results with the reasoning attached. Cold-start by design.

Language Python

Framework FastAPI

Deployment Railway

Data Supabase + pgvector

Auth Supabase JWT

Where the knowledge comes from.

Signal doesn’t own any music. It owns a knowledge graph assembled from four public or licensed sources, pre-computed once and kept warm in Postgres with pgvector.

01 · ~16M releases

Discogs

Monthly XML dumps ingested locally. Production credits, boutique label associations, precise style/genre tags, catalog numbers. This is where the long tail lives.

02 · ~35M recordings

MusicBrainz

Bulk export plus incremental API. Artist relationships (influences, members, collaborators), recording-level ISRCs that anchor the rest of the graph.

03 · Tag folksonomy

Last.fm

Community-sourced genre tags and artist-similarity scores. Imperfect but cultural: captures how listeners categorize music, not how labels market it.

04 · 512-dim embeddings

Deezer → CLAP

30-second audio previews from Deezer, embedded with LAION’s CLAP model. The resulting vector space lets sonic similarity sit alongside textual similarity in the same index.

How a recommendation happens.

Every request to GET /recommend goes through the same five stages. Stateless, no user history, no personalization. The result is a ranked list of ten items with explanations.

Stage 1

Entity resolution

Fuzzy match the query against artists, tracks, and genres in Postgres. “Nin” resolves to Nine Inch Nails; “shoegaze” resolves to a genre tag.

Stage 2

Seed selection

For an artist, pull the primary tracks. For a track, use the track directly. For a genre, sample tracks tagged with it. Seeds have CLAP embeddings and structured tags attached.

Stage 3

Candidate gathering

Three parallel paths: cosine similarity search on the audio embedding centroid; knowledge-graph traversal across artist_similarity, co-mentions, and shared labels; tag intersection on style/genre metadata.

Stage 4

Late-fusion scoring

Each candidate is scored across seven weighted signals. No single score decides the result; the fusion does.

Stage 5

Diversity filter

Determinantal-point-process–inspired caps: no more than three from the same genre, one per artist, and two slots reserved for exploration picks that don’t dominate the centroid.

Calling it.

The public endpoint takes a query and an entity type. The response ships ranked recommendations with explanations attached, so a UI can show why something was picked.

client.py

import httpx

r = httpx.get(
    "https://signal.auricaudio.app/recommend",
    params={"q": "Nine Inch Nails", "type": "artist"},
)
data = r.json()

for artist in data["artists"][:5]:
    print(f"{artist['artist_name']:30s}  "
          f"{artist['score']:.2%}  "
          f"{artist['explanation']}")

# Throbbing Gristle               82.34%  listeners also enjoy; shared label: Mute
# Skinny Puppy                    78.91%  sonically similar; shares tags: industrial, ebm
# Coil                            76.22%  listeners also enjoy; production lineage
# Ministry                        74.05%  shares tags: industrial, metal
# Einstürzende Neubauten          71.87%  sonically similar; shared label: Mute

Late-fusion scoring.

The central algorithm is a weighted sum across seven independent signals. Each signal is a separate subsystem (vector search, graph traversal, tag intersection, text embeddings); the fusion decides the final rank. No one signal dominates, and every contribution can explain itself.

Signal	Weight	What it measures
`audio_similarity`	0.25	Cosine distance to the seed embedding in CLAP space.
`tag_overlap`	0.15	Jaccard overlap on genre and style tags.
`artist_similarity`	0.15	Last.fm + MusicBrainz relationship score between the candidate artist and the seed artist.
`text_similarity`	0.15	Embedded artist bios and style descriptions, compared in a separate text model.
`label_affinity`	0.10	Shared boutique-label signal (Warp, Ghostly, Kranky, etc.). Labels curate a sound; that signal is rarely noisy.
`cultural_similarity`	0.10	Co-mentions in blog crawls and editorial writing, weighted by source.
`co_mention`	0.10	How often the two artists appear in the same playlists and tag clusters.

The scoring function.

Abridged but faithful: each candidate gets scored against the seed, explanations are accumulated as the signals fire, and the fused score determines rank.

score.py

WEIGHTS = {
    "audio_similarity":    0.25,
    "tag_overlap":         0.15,
    "artist_similarity":   0.15,
    "text_similarity":     0.15,
    "label_affinity":      0.10,
    "cultural_similarity": 0.10,
    "co_mention":          0.10,
}

def score_candidate(db, track_id, seed):
    scores = {}
    reasons = []

    # audio: cosine distance in CLAP embedding space
    if seed.embedding is not None:
        emb = fetch_embedding(db, track_id)
        sim = cosine(seed.embedding, emb)
        scores["audio_similarity"] = sim
        if sim > 0.7: reasons.append("sonically similar")

    # tags: jaccard on genre + style folksonomy
    tags = fetch_tags(db, track_id)
    if seed.tags and tags:
        overlap = len(seed.tags & tags) / len(seed.tags | tags)
        scores["tag_overlap"] = overlap
        shared = list(seed.tags & tags)[:3]
        if shared: reasons.append(f"shares tags: {', '.join(shared)}")

    # graph: artist relationship + label affinity
    candidate_artist = fetch_artist(db, track_id)
    if candidate_artist and seed.artist_id:
        scores["artist_similarity"] = artist_sim(db, seed.artist_id, candidate_artist)
        reasons.append("listeners also enjoy")
        label = shared_label(db, seed.artist_id, candidate_artist)
        if label:
            scores["label_affinity"] = 0.8
            reasons.append(f"shared label: {label}")

    total = sum(scores.get(k, 0.0) * w for k, w in WEIGHTS.items())
    return total, "; ".join(reasons)

Cold-start by design.

Most recommendation systems need a listening history, a follow graph, or at minimum a thumbs-up to get started. Signal has never needed any of those. The recommendations for the tenth visitor are as good as the ten-thousandth. Personalization, when and if it exists, is a layer on top \u2014 not a precondition.

Music recommendations by source, not by history.