Playtesting: some reflections

Lovecraftesque playtests

I’ve collated the information from the first Lovecraftesque external playtest and I thought it might be useful to discuss it here. I’m not going to talk about our game, instead I’ll be talking about the playtest in more general terms, in the hopes of deriving some more general lessons about playtesting.

Recruitment

We advertised the playtest through our website, Black Armada, and through G Plus, Twitter and Facebook. We put the files in a public drop box but only provided the link on request to people who expressed an interest in playtesting.

We received 31 expressions of interest. 29 of these were from people who appeared to be men, 2 from women. 6 were from people who we know quite well in real life, and another 3 from people we’ve met a few times in the flesh. The rest were from comparative strangers.

We allowed six weeks for playtesting from the day we announced it. We sent a reminder out at the midway point to anyone who we hadn’t interacted with for at least a week, and another one a few days before the deadline.

Of 31 expressions of interest, 19% sent in a report.

We received 6 playtest reports within the playtest period – just under a 20% response rate. All of these were submitted by men. 2 came from friends, 4 from comparative strangers. Between these we got 22 session-hours of playtesting, or 72 person-hours.

It seems to me that we were fairly fortunate to get as many as we did. In previous playlists using a similar method I only had a 10% response rate, from a smaller number of expressions of interest. The improved success comes, I think, from a combination of us being better connected within the indie roleplaying community than I was back then, and having a game pitch that was always likely to be a bit more popular.

Method

None of the playtesters received any guidance from us or clarification. They were given a set of detailed questions covering 10 aspects of the game, which were rather bossily labelled “READ THIS FIRST”, in addition to the rulebook and some supporting materials.

None of the playtests involved us, either as a participant or a witness.

Results

All six playtest reports responded to the questions we asked fairly assiduously. I wouldn’t say they were all completely comprehensive, but none of them ignored the structured questions, and all responded to most of the points we wanted covered. One came with a blow-by-blow actual play report (which was quite valuable beyond what our questions elicited).

I shall now provide a breakdown of the issues identified by the playtest. (Either identified by the playtesters themselves or apparent from their report whether they themselves realised it or not.) I have classified them as follows:

  • A critical issue is one which would make the game unplayable.
  • A serious issue is one which would make the game not fun or prevent the design goals of the game from being realised. If even one group identified a serious issue, I’d count it.
  • A major issue is one which makes the game very clunky or interferes with realising the design goals of the game.
  • A minor issue is one which doesn’t interfere with the design goals or make the game avery clunky, but rather is a matter of polish. Minor rules clarifications also fall into this category.

I’ve obviously had to exercise judgement as to whether an issue identified by a group is attributable to the design, and whether there’s anything that can be done in the design to ameliorate the issue. In one or two cases, because different groups reported radically different observations, I haven’t recorded an issue, but will instead watch for these recurring in the next round of playtesting.

Here’s what our groups found:

  • Critical issues – 0 (phew!)
  • Serious issues – 1
  • Major issues – 2
  • Minor issues – 16

50% of our groups caught all three major or serious issues, but 33% only caught one and 17% didn't catch any.

A note here about consistency: not all our issues were detected by all of our groups. Two groups (one of which played twice) did not pick up the serious issue identified above, and the two major issues were each picked up by only three of the six groups (arguably one of them was detectable in a fourth group, but I think we might have dismissed it based on their evidence alone, as it didn’t look that serious). More importantly, these were clustered: 3 groups caught all the serious and major issues, 3 groups missed at least two of these issues.

I want to be clear, by the way, that I don’t consider the above to be a poor reflection on any of our groups. I suspect the ones that missed issues did so because they were more familiar with the style of game or the genre. Some of our clearest and most helpful feedback came from groups that didn’t catch a lot of the bigger issues, but did notice many smaller ones. All the feedback was immensely useful.

The above suggests to me that you want at least three groups to test a game to be reasonably confident of picking up on major and serious issues. With fewer, you might get them, or you might be unlucky. (Of course in our case, we would need four groups to guarantee catching them all.)

By the way, I haven’t analysed the minor issues, but my impression is that they were sprinkled liberally through all six groups. I doubt if there’s a single group that didn’t pick up some minor issues missed by the rest.

Conclusions

The top line conclusion is that you need to playtest, and not just with one or two groups. The comparison with the playtesting on my previous game is instructive. I only had one response, which added a little to my own efforts at playtesting. But clearly, my analysis above means that there is a high risk of failing to catch even quite serious issues with such a low level of response. There would be innumerable smaller issues that will have slipped the net.

Getting playtesters isn’t at all easy. I think we were fortunate this time around. Our voices carry a bit further as a result of a few years circulating in the online indie gaming community. We got support from a couple of people with a very wide reach, and although it’s hard to say how much impact this had, I would guess a lot. And our game concept was more grabby – though whether we would have been taken as seriously if we’d proposed such a concept three years ago, I can’t say.

One thing I would observe is that it’s a lot easier to make playtests happen if you offer to organise them yourself. That’s pretty obvious, but it is worth saying anyway. You can tackle the tendency for the game to get cancelled by providing a venue, making sure you pick people you can rely on and above all not dropping out yourself. And you can make sure decent notes are taken and guarantee to take them away with you. It’s more effort, and if you want it to have the same value as an external test you’ll have to be disciplined about not facilitating the game itself, but it dramatically increases your sample size, which reduces the chances of missing a given issue.

Josh Fox

Rabalias grew up wanting to be a pirate. But a band of evil bureaucrats kidnapped him and forced him to work for The Man. Even so, Rabalias was patient and cunning. He escaped by gnawing his way through the walls of his prison and concealing the hole behind a picture of cthulhu. He fled to the coast, and stowed away on the Black Armada, where he worked his way up to the rank of Admiral.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.