Signal is a recommendation engine that asks a different question than Spotify does. Instead of “what would this user listen to next”, it asks “what is this music actually adjacent to.” It reads production credits, label lineage, audio embeddings, tag folksonomies, and editorial writing — and returns results with the reasoning attached. Cold-start by design.
Signal doesn’t own any music. It owns a knowledge graph assembled from four public or licensed sources, pre-computed once and kept warm in Postgres with pgvector.
Monthly XML dumps ingested locally. Production credits, boutique label associations, precise style/genre tags, catalog numbers. This is where the long tail lives.
Bulk export plus incremental API. Artist relationships (influences, members, collaborators), recording-level ISRCs that anchor the rest of the graph.
Community-sourced genre tags and artist-similarity scores. Imperfect but cultural: captures how listeners categorize music, not how labels market it.
30-second audio previews from Deezer, embedded with LAION’s CLAP model. The resulting vector space lets sonic similarity sit alongside textual similarity in the same index.
Every request to GET /recommend goes through the same five
stages. Stateless, no user history, no personalization. The result is a
ranked list of ten items with explanations.
Fuzzy match the query against artists, tracks, and genres in Postgres. “Nin” resolves to Nine Inch Nails; “shoegaze” resolves to a genre tag.
For an artist, pull the primary tracks. For a track, use the track directly. For a genre, sample tracks tagged with it. Seeds have CLAP embeddings and structured tags attached.
Three parallel paths: cosine similarity search on the audio embedding centroid; knowledge-graph traversal across artist_similarity, co-mentions, and shared labels; tag intersection on style/genre metadata.
Each candidate is scored across seven weighted signals. No single score decides the result; the fusion does.
Determinantal-point-process–inspired caps: no more than three from the same genre, one per artist, and two slots reserved for exploration picks that don’t dominate the centroid.
The public endpoint takes a query and an entity type. The response ships ranked recommendations with explanations attached, so a UI can show why something was picked.
import httpx
r = httpx.get(
"https://signal.auricaudio.app/recommend",
params={"q": "Nine Inch Nails", "type": "artist"},
)
data = r.json()
for artist in data["artists"][:5]:
print(f"{artist['artist_name']:30s} "
f"{artist['score']:.2%} "
f"{artist['explanation']}")
# Throbbing Gristle 82.34% listeners also enjoy; shared label: Mute
# Skinny Puppy 78.91% sonically similar; shares tags: industrial, ebm
# Coil 76.22% listeners also enjoy; production lineage
# Ministry 74.05% shares tags: industrial, metal
# Einstürzende Neubauten 71.87% sonically similar; shared label: Mute The central algorithm is a weighted sum across seven independent signals. Each signal is a separate subsystem (vector search, graph traversal, tag intersection, text embeddings); the fusion decides the final rank. No one signal dominates, and every contribution can explain itself.
| Signal | Weight | What it measures |
|---|---|---|
audio_similarity | 0.25 | Cosine distance to the seed embedding in CLAP space. |
tag_overlap | 0.15 | Jaccard overlap on genre and style tags. |
artist_similarity | 0.15 | Last.fm + MusicBrainz relationship score between the candidate artist and the seed artist. |
text_similarity | 0.15 | Embedded artist bios and style descriptions, compared in a separate text model. |
label_affinity | 0.10 | Shared boutique-label signal (Warp, Ghostly, Kranky, etc.). Labels curate a sound; that signal is rarely noisy. |
cultural_similarity | 0.10 | Co-mentions in blog crawls and editorial writing, weighted by source. |
co_mention | 0.10 | How often the two artists appear in the same playlists and tag clusters. |
Abridged but faithful: each candidate gets scored against the seed, explanations are accumulated as the signals fire, and the fused score determines rank.
WEIGHTS = {
"audio_similarity": 0.25,
"tag_overlap": 0.15,
"artist_similarity": 0.15,
"text_similarity": 0.15,
"label_affinity": 0.10,
"cultural_similarity": 0.10,
"co_mention": 0.10,
}
def score_candidate(db, track_id, seed):
scores = {}
reasons = []
# audio: cosine distance in CLAP embedding space
if seed.embedding is not None:
emb = fetch_embedding(db, track_id)
sim = cosine(seed.embedding, emb)
scores["audio_similarity"] = sim
if sim > 0.7: reasons.append("sonically similar")
# tags: jaccard on genre + style folksonomy
tags = fetch_tags(db, track_id)
if seed.tags and tags:
overlap = len(seed.tags & tags) / len(seed.tags | tags)
scores["tag_overlap"] = overlap
shared = list(seed.tags & tags)[:3]
if shared: reasons.append(f"shares tags: {', '.join(shared)}")
# graph: artist relationship + label affinity
candidate_artist = fetch_artist(db, track_id)
if candidate_artist and seed.artist_id:
scores["artist_similarity"] = artist_sim(db, seed.artist_id, candidate_artist)
reasons.append("listeners also enjoy")
label = shared_label(db, seed.artist_id, candidate_artist)
if label:
scores["label_affinity"] = 0.8
reasons.append(f"shared label: {label}")
total = sum(scores.get(k, 0.0) * w for k, w in WEIGHTS.items())
return total, "; ".join(reasons) Most recommendation systems need a listening history, a follow graph, or at minimum a thumbs-up to get started. Signal has never needed any of those. The recommendations for the tenth visitor are as good as the ten-thousandth. Personalization, when and if it exists, is a layer on top \u2014 not a precondition.