Hearddit is a Soundcloud/Spotify/reddit bot that builds playlists from links posted on music subreddits. I built the bot since I was getting tired of the twenty or so tracks that the internet radio collaborative filters decided I liked and wanted an easy way to find new music using existing apps.

Here are the playlists:

Hearddit has been running for a little over one week and it’s already discovered some great stuff:

The whole thing is a short python program, available here. It runs every couple of hours so there will be some delay between when a link gets posted on Reddit and when it appears in a playlist.

Currently Hearddit is scraping the following subreddits (updated 2015-02-23):

Remark: Not all music hosted on Soundcloud is available on Spotify and vice versa so the sounds will be different depending on what app you’re using.

Building Soundcloud playlists is pretty straightforward since music subreddits encourage users to “support the artists and submit their content directly. Look for the original source of content and submit that.” which results in lots of links to Soundcloud. Additionally, Soundcloud has an official python API that’s easy to use.

To create Soundcloud playlists Hearddit uses the Soundcloud API to resolve links found on the target subreddit, checks if it’s already created a playlist for that subreddit, and then either appends the resolved links to that playlist or create a new one and append to that. Obligatory code sample below.

def create_soundcloud_playlist_from_urls(urls, playlist_name):

    client = soundclound_login()

    #use soundcloud api to resolve links
    tracks = []
    for url in urls:
        if 'soundcloud' in url:
                tracks.append(client.get('/resolve', url=url))
            except requests.exceptions.HTTPError:
    track_ids = [x.id for x in tracks]
    track_dicts = list(map(lambda id: dict(id=id), track_ids))

    #check if playlist already exists
    my_playlists = client.get('/me/playlists')
    old_list_urls = [p for p in my_playlists if p.fields()['title'] == playlist_name]
    if old_list_urls:
        # add tracks to playlist
        old_list_url = old_list_urls[0]
        client.put(old_list_url.uri, playlist={'tracks': track_dicts})
        # create the playlist
        client.post('/playlists', playlist={
            'title': playlist_name,
            'sharing': 'public',
            'tracks': track_dicts})

    #get the link to the list created
    my_playlists = client.get('/me/playlists')
    new_list_url = [p.fields()['permalink_url'] for p in my_playlists
                    if p.fields()['title'] == playlist_name]

    if new_list_url:
        return new_list_url[0]
        logger.warning('no soundcloud list')

Building Spotify playlists is more complex. For one thing people don’t link to Spotify from Reddit so aligning post titles with Spotify required use of Spotify’s search engine. Hearddit has a few search heuristics based on the /r/electronicmusic sidebar:

def search_spotify_for_a_title(title, sp):
    query = re.split('(\[|\()',title)[0]
    if len(query) > 5:
        results = sp.search(q=query, type='track')
        if len(results['tracks']['items']) > 0:
            logger.info('hit! {0}'.format(query))
            return results
            logger.info('miss! {0}'.format(query))

This gets a lot of matches right but the input queries as well as query understanding at Spotify could still use a bit of work.

Next is the problem that python APIs for Spotify aren’t as good as Soundcloud’s. There are two out there: pyspotify and spotipy.

Pyspotify is more popular on github but it’s also harder to use. Pyspotify uses CFFI to build a wrapper around Spotify’s official C library, libspotify and, for whatever reason, coding in pyspotify requires lots of calls to session.process_events() and *.load() or else things fail in unexpected ways.

Spotipy uses the requests library to access Spotify’s web api. Things that aren’t documented in python are easy enough figure out from the web api docs. Spotipy isn’t python 3 compatible but there’s a fork which (mostly) takes care of that. Here’s the function that Hearddit uses to build Spotify playlists:

def create_spotify_playlist_from_titles(todays_titles, playlist_name):

    sp = spotify_login()

    #try to map the submission titles to spotify tracks
    search_results = [search_spotify_for_a_title(x,sp) for x in todays_titles]
    search_results = [x for x in search_results if x and (len(x['tracks']['items']) > 0)]
    hits = [x['tracks']['items'][0] for x in search_results]

    #get all my playlists, check if playlist_name already there
    new_pl = None
    for my_pl in sp.user_playlists(user=sp.me()['id'])['items']:
        logger.info('my_pl: {}'.format(my_pl['name']))
        if my_pl['name'] == playlist_name:
            logger.warning('appending to {}'.format(my_pl))
            new_pl = sp.user_playlist(user=sp.me()['id'], playlist_id=my_pl['id'])
    if not new_pl:
        logger.warning('new playlist!')
        new_pl = sp.user_playlist_create(user=sp.me()['id'],name=playlist_name,public=True)

    out_url = new_pl['external_urls']['spotify']

    #get all tracks in the playlist
    new_new_pl = sp.user_playlist(sp.me()['id'], new_pl['uri'])

    old_track_uris = set([x['track']['uri'] for x in new_new_pl['tracks']['items']])
    for s in old_track_uris:
        logger.debug('old! {}'.format(s))

    new_track_uris = [hit['uri'] for hit in hits if hit['uri'] not in old_track_uris]
    for new_track_uri in new_track_uris:
        logger.debug('new! {}'.format(new_track_uri))

    #can only insert 100 tracks at a time (Spotify API limit)
    def chunker(seq, size):
        return (seq[pos:pos + size] for pos in range(0, len(seq), size))

    logger.warning('adding {0} new tracks to {1}'.format(len(new_track_uris), new_pl['name']))
    if new_track_uris > 0:
        for sublist in chunker(new_track_uris,99):
            logger.warning('added {0} new tracks to {1}'.format(len(sublist), new_pl['name']))

    return out_url

blog comments powered by Disqus


23 December 2014