Small Data Driven Websites

In this house we believe:

Relying on a third-party to host your data is bad because:

This note describes some examples of how I build data-driven webpages with the above in mind. I am perfectly comfortable in hand-editing yaml files in VIM, which I recognise the majority of people are not. This approach probably won't work for sites which involve editors with less technical skills, at least not without training. Tom Critchlow laments the lack of solutions for small databases for use in this space which may be of interest if this is your goal

Places is a map of places aroud the world that I find interesting. It includes places I've visited and enjoyed, places I want to visit, and places people have suggested I visit. It very rarely gets updates, and it's only me that does these updates, so it's fine for the process to be a little manual

The source of truth for this webpage is a map on uMap. This provides a nice UI for editing maps on top of OpenStreetMap layers, and is in fact a project of OpenStreetMap. Most people would probably just share the map directly from uMap and be done with it, but that has a number of problems.

The workflow for Places is:

  1. Make changes within uMap and export a geojson file. This geojson file is also editable in any decent GIS software, uMap is currently my favourite and has data storage to allow it to be the single source of truth for this dataset
  2. Copy that file into the git repository that powers the website
  3. Wait a couple of minutes for Netlify to publish the updated page

There is no web framework or static site generator here because there is no real content other than the data

We could potentially pull in the uMap data directly at render-time (assuming no CORS issues). To do so would be a mistake because it would rely on uMap remaining online and therefore be subject to bitrot.

The only dynamic functionality is ability to submit places, which uses a Netlify contact form. Losing this functionality is not a major problem.

Countries presents a list of Travelers Century Club countries, showing which ones I have visited or transited through. There are also a few notes for some of the countries, but that is not the primary goal of the site. The data gets updated whenever I travel somewhere new, which may range from zero to five times a year.

Most people would probably present this data in the form of Google Sheets or Airtable and be done with it. Instead I

  1. Edit yaml files - typically directly on the main branch via the GitHub webpage
  2. Wait a couple of minutes for Netlify to publish the updated page

Behind the scenes Netlify is running Hugo to build the webpage from the yaml data. I could instead of built the webpage entirely as a front-end application parsing json data, but this requires javascript to be enabled on the user's web browser

This results in a glorious small static webpage presenting the data. My web design skills may be from the 90's, but the resulting page is easy to parse by humans and computers alike

The original source data comes from the Travelers Century Club and is really horrible to parse. For a long time I maintained a Ruby gem where I had pre-parsed the data and made it available in Ruby, but that still required manual editing of the resulting yaml and nobody else was using the gem. My source list of countries is likely now out of sync with the source list of countries. At some point I will do the manual work to re-align my list, but I can get away with only doing this every few years.

This example is slightly different to the others because it's a work in-progress and creates multiple pages.

If relying on data from external systems is bad, then it follows that we should collate that information onto our own site. This is the opposite of the indieweb P.O.S.S.E (Publish on Own Site Syndicate Elsewhere) but acknowledges the fact that other systems are often preferable for data generation.

Currently I collate links added to Pinboard and GitHub stars into my weeknotes here on It's not truely my own site, but it is backed up. It doesn't cover everything I would like to include, but it's probably the most interesting/useful.

I do this through a very rough CLI app I've named weekly. The flow looks like:

  1. Run hugo new to create the template for this week's post
  2. Add notes on what I've been up to
  3. Run weekly pinboard to append interesting links
  4. Run weekly githubstars to append github stars
  5. Git commit and push to make a backup
  6. Run hugo to publish the site

After creating this workflow I came across Katy DeCorah's Build your own metadata library which automates a similar workflow and publishes into a single page, as well as providing a local mini data warehouse providing the data for other interesting use-cases. This is a tempting alternative, but low on my priorities list right now.