Skip to main content
On this page
On this page

Exporting IMDb lists to browser bookmarks

Mar 1, 2023 · Last updated: Apr 1, 2023 ·
Posted in weblog#tech
Categories: imdb, javascript, python

For some time now, I have been maintaining my reading list using a JSON file, having shifted away from Goodreads. Likewise, I no longer use an IMDb account to keep track of my watchlist. I have exported all the data to my browser bookmarks.

Occasionally, I come across an IMDb list that I want to include in my library, but there is no convenient means to export it directly to my bookmarks. An obvious approach is to just drag each item into the toolbar, but you don't get any metadata like the release year or rating with that. Additionally, this is not practical for longer lists. On top of that, IMDb links are riddled with tracking parameters that require manual cleanup.

So, I use a little bookmarklet which parses the DOM to get the movie list in Markdown, and copies the list to the clipboard. It then shows a prompt asking whether or not to open pandoc.org/try. If clicked "OK", it opens up the pandoc online playground in a new tab where I paste the markdown, and get converted HTML which can be imported into my browser.

The playground also supports a permalink with query parameters to pre-populate the input and options, so there's no need to copy anything but there is a limit to the URL size, so it wouldn't work for large lists.

Here's the code:

javascript: (() => {
  let movies = '';
  let count = 0;
  document
    .querySelectorAll('.lister-item > .lister-item-content')
    .forEach((item) => {
      const title = item.querySelector('h3 > a').innerText;
      const link = item
        .querySelector('h3 > a')
        .getAttribute('href')
        .replace('?ref_=ttls_li_tt', '');
      const year = item.querySelector('.lister-item-year').innerText;
      const rating = item.querySelector('.ipl-rating-star__rating').innerText;
      movies += `- [${title} ${year} (Rating: ${rating})](https://www.imdb.com${link})\n`;
      count += 1;
    });

  navigator.clipboard
    .writeText(movies)
    .then(() => {
      if (confirm(`Copied markdown with ${count} movies. Open pandoc?`)) {
        window.open(`https://pandoc.org/try/?params={"text":"","from":"markdown","to":"html5"}`);
      }
    })
    .catch((err) => alert('Error copying text'));
})();

Update

I ported the code to a Python script that directly writes the HTML to a file that can be imported to the browser:

#!/usr/bin/python3

import sys
import re
import urllib.request
from bs4 import BeautifulSoup

listurl = sys.argv[1]

if listurl.startswith("http"):
    listid = re.findall(r"ls\d+$", listurl)[0]
elif listurl.startswith("ls"):
    listid = listurl
    listurl = "https://www.imdb.com/list/" + listid
else:
    print("Invalid argument. Please enter a valid link or list ID.")
    sys.exit(1)

with urllib.request.urlopen(listurl) as res:
    soup = BeautifulSoup(res.read(), "html.parser")

exportfile = f"export-{listid}.html"
movies = "<ul>\n"
count = 0

for item in soup.select(".lister-item > .lister-item-content"):
    title = item.select_one("h3 > a").string
    link = item.select_one("h3 > a")["href"]
    year = item.select_one(".lister-item-year").string
    rating = item.select_one(".ipl-rating-star__rating").string

    # movies += f"- [{title} {year} (Rating: {rating})](https://www.imdb.com{link})\n"
    movies += f'<li><a href="https://www.imdb.com{link}">{title} {year} (Rating: {rating})</a></li>\n'
    count += 1

with open(exportfile, "w") as f:
    f.write(movies + "</ul>")

print(f"Exported {count} movies to {exportfile}")