Small Data Driven Websites
In this house we believe:
- web 1.0 is good
- self-sufficient websites not depending on anything external are good
- as a result of the above, static-generated sites are good
Relying on a third-party to host your data is bad because:
- That third-party may go down permanently and lose all your data
- You typically get very little ability to customise how the data is presented
- It probably won't give you a way of allowing the entire world to submit data
- It may or may not allow you to use your own domain name (or subdomain)
- Your data may get shared with people you don't want to power marketing and advertising databases targeted against you
This note describes some examples of how I build data-driven webpages with the above in mind. I am perfectly comfortable in hand-editing yaml files in VIM, which I recognise the majority of people are not. This approach probably won't work for sites which involve editors with less technical skills, at least not without training. Tom Critchlow laments the lack of solutions for small databases for use in this space which may be of interest if this is your goal
Places
https://places.wheresalice.info is a map of places aroud the world that I find interesting. It includes places I've visited and enjoyed, places I want to visit, and places people have suggested I visit. It very rarely gets updates, and it's only me that does these updates, so it's fine for the process to be a little manual
The source of truth for this webpage is a map on uMap. This provides a nice UI for editing maps on top of OpenStreetMap layers, and is in fact a project of OpenStreetMap. Most people would probably just share the map directly from uMap and be done with it, but that has a number of problems.
The workflow for Places is:
- Make changes within uMap and export a geojson file. This geojson file is also editable in any decent GIS software, uMap is currently my favourite and has data storage to allow it to be the single source of truth for this dataset
- Copy that file into the git repository that powers the website
- Wait a couple of minutes for Netlify to publish the updated page
There is no web framework or static site generator here because there is no real content other than the data
We could potentially pull in the uMap data directly at render-time (assuming no CORS issues). To do so would be a mistake because it would rely on uMap remaining online and therefore be subject to bitrot.
The only dynamic functionality is ability to submit places, which uses a Netlify contact form. Losing this functionality is not a major problem.
Countries
https://countries.wheresalice.info/ presents a list of Travelers Century Club countries, showing which ones I have visited or transited through. There are also a few notes for some of the countries, but that is not the primary goal of the site. The data gets updated whenever I travel somewhere new, which may range from zero to five times a year.
Most people would probably present this data in the form of Google Sheets or Airtable and be done with it. Instead I
- Edit yaml files - typically directly on the main branch via the GitHub webpage
- Wait a couple of minutes for Netlify to publish the updated page
Behind the scenes Netlify is running Hugo to build the webpage from the yaml data. I could instead of built the webpage entirely as a front-end application parsing json data, but this requires javascript to be enabled on the user's web browser
This results in a glorious small static webpage presenting the data. My web design skills may be from the 90's, but the resulting page is easy to parse by humans and computers alike
The original source data comes from the Travelers Century Club and is really horrible to parse. For a long time I maintained a Ruby gem where I had pre-parsed the data and made it available in Ruby, but that still required manual editing of the resulting yaml and nobody else was using the gem. My source list of countries is likely now out of sync with the source list of countries. At some point I will do the manual work to re-align my list, but I can get away with only doing this every few years.
Links and GitHub Repos
This example is slightly different to the others because it's a work in-progress and creates multiple pages.
If relying on data from external systems is bad, then it follows that we should collate that information onto our own site. This is the opposite of the indieweb P.O.S.S.E (Publish on Own Site Syndicate Elsewhere) but acknowledges the fact that other systems are often preferable for data generation.
Currently I collate links added to Pinboard and GitHub stars into my weeknotes here on envs.net. It's not truely my own site, but it is backed up. It doesn't cover everything I would like to include, but it's probably the most interesting/useful.
I do this through a very rough CLI app I've named weekly. The flow looks like:
- Run
hugo new
to create the template for this week's post - Add notes on what I've been up to
- Run
weekly pinboard
to append interesting links - Run
weekly githubstars
to append github stars - Git commit and push to make a backup
- Run
hugo
to publish the site
After creating this workflow I came across Katy DeCorah's Build your own metadata library which automates a similar workflow and publishes into a single page, as well as providing a local mini data warehouse providing the data for other interesting use-cases. This is a tempting alternative, but low on my priorities list right now.
Appendix
- https://tomcritchlow.com/2023/01/27/small-databases/ Notes on personal libraries, collections and small indexes on the web
- https://katydecorah.com/code/build-your-metadata-library/ Collating your own data
- http://discord.gfsc.studio/ Discussion on the Geeks For Social Change Discord server heavily fed into the first edition of this note. Join us if you like working on small-tech for good