Saturday, November 27, 2021

Last words: a critique of ROAD and other rule changes

Even though I've retired from X-Wing, I've been clarifying my thoughts around why I dislike the Random player Order After Dials (ROAD) rules change and AMG's other stated rule changes. Such is the reality of leaving a community and game of six years, it doesn't come easily! But hey, critics review games they don't intend to play in the future all the time. I guess I can as well.

I don't know if this is closure or the last gasp of the bargaining stage of grief. A benefit of leaving the game is I can root for the community and be happy for those who enjoy the new rules, rather than fighting to save my place in the community. This post is not consistent with that. Forgive me. You are free to close this window and leave the rest unread.

To help explain my reasoning, I'm about to introduce a "Clearly Bad Rule Change" for a different game, one which everyone would agree is bad. Before I do that, I need to explain what I am NOT doing with this example and the two reasons I'm using it!

I am NOT saying ROAD is the same as Clearly Bad Rule Change. That would be a false equivalence. I do not believe they are the same and there will be clear differences between the two.

The first reason is to filter out weak arguments in defense of ROAD. An argument that applies both to ROAD and to Clearly Bad Rule Change proves too much and can be dismissed. Alternatively, this provides a standard for good argument in defense of ROAD. They must be specific enough to not also be a defense of Clearly Bad Rule Change.

Second, it's helpful to examine why Clearly Bad Rule Change is bad. It serves as a comparison and lets us consider how ROAD in X-Wing is both different and similar. Using these differences and similarities, we can consider the effects of ROAD on X-Wing and explain one reason why players have had different experiences with ROAD.

Clearly Bad Rule Change

Consider a social deduction game like Secret Hitler, Werewolf, or Resistance. Imagine a rule change where the roles are dealt after the game is over.

(If you're less familiar with social deduction games, imagine a game of Poker where all private cards are dealt after bidding ends. In the examples below, substitute the roles with "good hand" and "bad hand".)

We can agree this rule change would destroy the entire premise of those games and is clearly bad.

Again, I am not saying ROAD is exactly the same as Clearly Bad Rule Change. Let's consider the two ways I'm using this example: as a standard to evaluate arguments and as a comparison to examine the similar and different effects of ROAD.

First, let's consider arguments made in defense of ROAD that can also be made in defense of this Clearly Bad Rule Change:

"You're just afraid of change."
"No play-style/strategy/sacred cow should be protected."
"You never knew if you really read your opponent or if you just got lucky, so not much skill expression was lost."
"This change would make the game more skillful." (without explanation)
"This change just forces you to make a plan that is good both if you are the werewolf/fascists/bad guys and if you are the villagers/liberals/good guys."
"You just got used to making decisions with perfect information about your role."
"This change lets you decide whether to play aggressively assuming you have a specific role, or conservatively which would work if you had either role."

These arguments could be valid if they come with specific clarifications or examples that show why they apply to ROAD in X-Wing and not to Clearly Bad Rule Change in social deduction games. They could be some specific tactical or strategic considerations that ROAD adds or a specific scenario that illustrates them. Otherwise, in these general forms, these arguments must be insufficient or fallacious because they defend what is clearly a bad rule.

More interestingly, why is Clearly Bad Rule Change so bad? In social deduction games, all of your goals are the result of your role and the roles of people you're interacting with. Forcing players to make all of their decisions before roles are assigned means players don't have any goals to achieve when they are making their decisions, so those decisions are pointless.

To evaluate ROAD, we should ask what goals in X-Wing are independent of your player order and what goals in X-Wing rely on knowing your player order. Of course, the overall goal of X-Wing (kill enemy ships) does not depend on player order, so we'll be examining sub-goals that lead to winning the game on a smaller basis (such as round by round or ship by ship).

Unlike social deduction games, there are some sub-goals in X-Wing that do not rely on player order. No matter what the player order, you want to point your firing arc in the correct location and avoid your opponent's fire arc if possible. There's also shooting the correct targets. These goals still exist with ROAD.

Outside of these, many sub-goals in X-Wing do rely on player order. Dialing a maneuver to arc-dodge relies on player order. Whether to position your ships to block or avoid a block relies on player order. Strategically, the optimal location to place your ships this turn to set up for future turns also relies on player order. Range control to ensure getting target locks on engagement relies on player order. Ensuring your ships either block enemy K-Turns or can clear their K-Turns next round relies on player order. The optimal spot for trailing enemy ships so you can pursue them without running into them relies on player order. Whether to spread your ships out to avoid blocks or potentially get isolated matchups relies on player order. Whether to deploy across from or far from your opponent's ships depend on player order. All of these sub-goals are lost with ROAD.

ROAD only matters for overlapping initiatives and should be evaluated on these situations. However, even when there are some overlaps, ships of different initiatives will move in a set order in relation to each other. In those cases (likely the majority of games with overlaps), the difference in initiatives can enable these sub-goals.

There is another important difference. In X-Wing with ROAD, there are often some decisions which can still be made after dials are set and player order is determined. When ships have strong repositioning abilities, especially ones before execution of maneuvers, these sub-goals can still exist with ROAD.

It's telling that from the games I've seen with ROAD, it is best when players have ships of different initiatives (e.g., both players have one ship of initiative 4, 5, and 6) and where ships have lots of options after dials are set and player order is determined (e.g., strong repositioning abilities). ROAD seems most problematic when most of the ships on the board are at the same initiative and when those ships do not have good options after dials are set (e.g., ships that just want to take a Focus token).

This is a problem that can't be addressed through list-building. This is an issue that affects the enjoyment of lists, not necessarily the strength of lists. Lists with ships of the same initiative and few repositioning abilities can still be powerful with ROAD, they just might not be fun against overlapping initiatives. It's very possible to design a game where the optimal strategy is not fun (camping in FPS games, Nantex-apocalypse in X-Wing), and that's usually a bad outcome.

One important consideration is where hidden information adds to the game. For example, revealing everyone's role at the start of a social deduction game also destroys the game.

Could something similar happen for ROAD and X-Wing, where the randomness creates new goals for players to play with? Yes, specifically in solved or degenerate matchups like ace vs. ace with overlapping initiatives where one side moves second. Perhaps there are others, but I have not seen such examples (set player order can influence joust vs. joust matchups but to a much smaller extent and allows for skill expression). We'll consider this below, but as we do, let's not forget the costs paid to achieve those benefits and whether there could be a better alternative.

What about other benefits of ROAD?

The strongest reason for ROAD is to reduce matchup variance and the number of "phantom games" that are effectively decided before ships are even deployed. Another possible reason for ROAD is to make the game more casual and broaden its appeal.

Addressing the second benefit first, ROAD only affects games with overlapping initiatives. All of the tactics and strategy would remain in games without overlapping initiatives. That makes ROAD a poor way to make the game appeal to a more casual audience.

The primary goal for ROAD is to reduce matchup variance in ace vs. ace matchups with overlapping initiatives. I agree this was a concern and ROAD accomplishes its goals in that area.

While ROAD may reduce matchup variance in ace vs. ace matchups with overlapping initiatives, it's unclear how much it reduces matchup variance in the game overall rather than shifting the "phantom game" problem into other matchups.

Specifically, it's possible that ships with strong repositioning abilities are now much weaker with ROAD against ships without such abilities of overlapping initiatives. Not only can the repositioning ships not set a dial to aggressively take advantage of their repositioning abilities for arc-dodging, they also cannot do so for blocking. For aces flying against jousters, ROAD is in some ways worse than going first the whole game.

This is best illustrated with an example. Consider Soontir Fel against Wedge Antilles. Soontir Fel costs about the same as Wedge Antilles. Soontir Fel and Wedge Antilles throw the same dice against each other (three attack, two defense). Soontir Fel has half the health of Wedge Antilles. For this to be a fair fight, Soontir Fel must get one free attack for every attack they trade or get similar value from his free token. It's unlikely that can happen consistently with ROAD. ROAD could eliminate "phantom games" in ace vs. ace matchups but create new "phantom games" in ace vs. joust matchups.

One major source of matchup variance in X-Wing is facing a list of slightly higher initiative. A list of initiative 5 aces facing a list of initiative 6 aces is usually as much of a "phantom game" as two lists with overlapping initiative 5 aces. ROAD does nothing to address this.

Any attempts to balance matchups of overlapping initiatives with a points adjustment would increase matchup variance with non-overlapping initiatives. Soontir Fel is bad against Wedge Antilles because he stomps lower-initiative pilots much harder than Wedge Antilles. ROAD has no effect on Soontir Fel stomping Kylo Ren. Any reduction in Soontir Fel's points to make the Wedge Antilles matchup closer would only make the Kylo Ren matchup more lopsided.

Counterplans

Defenders of ROAD correctly say that we should not compare ROAD to a perfect world with no problems. That is true. Overall, I strongly believe the old bidding system with all its flaws was still better than ROAD. But some people found the old bidding system unacceptable, and that's reasonable.

I do not have to defend any existing rule system to argue against ROAD. I can instead propose alternatives could also reduce matchup variance while also preserving the tactical and strategic depth of X-Wing. Here are two to consider.

First, it's possible that most of the flaws of the old bidding system came from being an all-pay auction. The winning bid not only secures player order but also destroys the opponent's bid. The one-shot nature of bidding in X-Wing limits the craziness, but all-pay auctions can get pretty wild. Wild, as in paying hundreds of dollars for an ordinary $20 bill.

The advantage of bidding is that it reduces matchup variance. One solution to Soontir Fel being strong when moving last and weak when moving first is to implement a system where Soontir Fel almost always moves last, and costing Soontir and the bid accordingly (or rather, upgrades that compete with a bid). Players will naturally bid more when their ships strongly require moving second and less when their ships are fine with moving first and thus organically reduce matchup variance. (I had predicted that player order randomly assigned at start for the whole game without bids would increase matchup variance, and it seems to be the case.)

This breaks down with the all-pay nature of the old bidding system. A large bid to move last for Soontir might be fair against a list that bids nothing, but it's a steal if the bid only wins by one point or by a coinflip on a tie. Then they'd have the player order advantage and the other player sacrificed about as much value in raw points.

It could be that changing to a winner-pays auction in X-Wing could solve most of the problems of bidding. Here's one way to implement this:

During list construction, players construct a primary list and a secondary list. The primary list is constructed as usual. The secondary list must be identical to the primary list with the following exception: any unspent points in the primary list may be spent on additional upgrades, or to exchange upgrades to upgrades with higher point costs and the exact same upgrade slot requirements.

At the start of the game, the bids of the two players' primary lists are compared. The player with the higher bid (or winner of the coin flip, if tied) plays their primary list. The other player plays their secondary list.

This makes bidding more risky, especially since it also oapplies even if the lists had no overlapping initiatives! The lower-bid player in a mirror match can offset a disadvantageous player order with additional upgrades, which can be specifically teched against matchups where they are outbid. (I'm not sure whether deficit scoring would still be appropriate with this rule-set, but if so, I would recommend that the bid is only scored if any other points are scored.)

Second, we could have alternating player order. The first player token is assigned randomly before deployment and alternates players after every round. This allows both players to get an equal number of turns moving second. It also preserves strategic and tactical depth since both players know what their player order is in the current turn and in future turns.

From testing reports, the main complaint about this rule is it might encourage passive play and discourage engagements. However, passive play is a broader problem that applies not only to player order with overlapping initiatives but also fortressing or mobile fortressing. If that problem is solved, then the main drawback of alternating player order would also be addressed. Here is an Aggression Tiebreaker that can solve passive play while affecting a minimum of other games:

This rule introduces a Tiebreaker Token (please use any agreed-on object to represent this). The holder of the Tiebreaker Token wins the game if the game ends with neither player having scored any points, instead of triggering a Final Salvo.

At the end of each round, if the Tiebreaker Token has not been assigned and points have not been scored, either player may call for a tiebreaker check:

Each player selects one of their ships and measures from that ship to the nearest board edges.
If only one player's chosen ship is outside Range 2 of all board edges, then that player is assigned the Tiebreaker Token for this game.

Once the Tiebreaker Token has been assigned or points have been scored, no more tiebreaker checks can be made this game.

(Tip: Any ship can move beyond Range 2 of all board edges on the first round with a central deployment and a 3 straight maneuver.)

This rule makes it strictly disadvantageous to play passively from the start of the game. Once your opponent has the Tiebreak Token, they can force you to engage on their terms. That is enough to kill fortressing and mobile fortressing since these strategies will have to end or they will automatically lose the game.

I like this rule over some competing rules because it gives agency to the player who wants to engage. A rule where both players lose if no points are scored allows a fortressing player to hold the other player hostage and either engage into the fortress or suffer consequences. Because every ship can contest the Tiebreaker Token on round 1, it is your own fault if you allow your opponent to have the Tiebreaker Token and then play passively afterwards. This rule also has a small footprint. You can safely ignore this if you plan to engage and score points during the game.

It's impossible to know, but I would guess that one of these rules could achieve the matchup-variance goals of ROAD without its drawbacks of reducing strategic and tactical depth. I would also guess that at least one of these solutions would be much more acceptable to the community overall compared to ROAD.

Changes to Bumping

Could the problems with ROAD be solved with other rule changes under consideration? In their stream, AMG mentioned they are considering rule changes where bumped ships may take focus tokens and ships can shoot at Range 0. Their stated reasoning for these changes was to reduce the consequences of bumping, even outside of overlapping initiatives.

While these rule changes may reduce some of the chaos of ROAD, they do nothing to restore the sub-goals removed by ROAD. Instead, they actually remove the value of blocking across the board, even when initiatives do not overlap.

These rule changes would likely increase matchup variance. One of the hardest matchups in the game is when a jousting list faces another jousting list with a slightly higher initiative. Initiative killing is extremely powerful in these matchups. Still, the disadvantaged lower-initiative player can threaten blocks and retake the advantage with skillful play. The higher-initiative player usually has to deploy in the far corner and may have to spread out to avoid blocks, limiting some of their advantage. One player is heavily favored but a game still has to be played and even the advantaged player will still need to make good decisions to win.

With these rule changes, the higher-initiative jousting list can deploy across from the lower-initiative jousting list, smash into range 0 with no consequences, and initiative-kill everything at close range. These matchups would likely become unwinnable for the lower-initiative jousting list. Worse, it takes all of the skill out of these matchups for both players.

This rule change also means that lower-initiative ships with strong repositioning abilities can no longer use them to gain an advantage by blocking. Again, that hurts the disadvantaged player and further polarizes matchups.

With just ROAD, I was planning to play casually since I can sort of pick my matchups and avoid overlapping initiatives. But these rule changes would affect all games. I know these rule changes are not finalized, but the reasons AMG stated behind these rule changes was what led me to quit the game. Based on the casual way these rule changes were discussed during the stream, I do not have any confidence that AMG understands the potential problems with these rule changes. I strongly suspect AMG has holes and/or biases in their playtesting, especially around jousting lists.

Community

Why did I spend thousands of dollars on X-Wing? I could have bought dozens of games with that money! Is any game worth spending thousands of dollars on?

The value of X-Wing was not just the game, but in the community. It provided a fun activity that got people out to the same place every week. It created interesting discussions both in-person and online. We made friendships and communities. That is worth thousands of dollars and more.

It's unclear exactly how many players favor and how many players dislike these rule changes. Surveys about ROAD show players are about evenly split, with maybe a third in favor, a third neutral, and a third opposed. I expect the numbers to become more favorable over time due to survivorship bias as players opposed to ROAD stop seeing the surveys.

This is our neighborhood dive bar turning into a health food shop. The craft beer and fried food are being replaced by smoothies and quinoa bowls. Maybe one is better than the other. But not every regular at the dive bar is going to like the health food. We're probably going to lose some people in the transition.

Is any rule change, no matter how good for someone, worth giving that many of the community a worse experience or perhaps losing that much of the community?

I do not know what the exact motivations of AMG are in making this rule change. I generally follow Hanlon's Razor. Perhaps this was AMG's best attempts to improve the game, or changing the audience for the game, or perhaps a "planned obsolescence" strategy to turn over the community for new blood. Either way, X-Wing is no longer the enjoyable activity that brings me back to the same place and people. X-Wing is no longer a safe place for me to invest for a community.

I wish the community was not so divided and could have presented a united front to the developers. But that is unrealistic to expect and is now in the past.

It's impossible to tell the future, and maybe X-Wing will one day return to being a game I find enjoyable. In the meantime, it is much healthier for me to leave. I can root for my friends to keep their hobby and their community, rather than fight them for my place in it.

Good luck and have fun!

Thursday, November 4, 2021

Thank you all, and so long!

All good things come to an end, and it doesn't feel right to leave without saying goodbye. I've had a wonderful 6 years playing X-Wing. It's time for me to step off the ride. The community here is truly special, and I'm going to miss it. I wish you all the best.

Big shoutout to Miranda Ketita, the person who got me into X-Wing. Thanks for taking me under your wing and showing me the ropes.

A huge shoutout to Marc de Bruyn, my playtest partner back in London. We spent a long time theorycrafting and practicing, and you gave me tons of feedback on my model.

The London Ontario X-Wing community through the years, including Dave Roy, Ryan Ferguson, Rob, Ryan Slager, Eric Lalande, and Dave Ryersee. Justin Leonard, Andrew "Pineapple", Alex Kanski.

Team Canada for XTC: Andrew Oehler, Remi Dumais, Steve McLean, Stephen Kim, Cam Murray, Mike Massiah, and Devon Monkhouse.

Other X-Wing players of Ontario, including Andrew Durham, Brendon Osmann-Deyman, Tristan Singleton, Solon Wong, Timbo, Ryan Dwornik, Jackie Luong, Jeff Asiri, Alan Fung, Kelvin Lau, Evan Cameron.

The great people I met at Worlds. Jesper Winstrom, Rasta Maice, XY, and everyone else I played in the tournament and side events.

Special props to Jeff Bizzak. We didn't interact much outside our game where you destroyed me, but I did steal my Canadian Nats list from you :).

Community leaders. Dee and Ryan of the Fly Better Podcast, thank you for letting this rando on your podcast. Dion, Marcel, Ryan, and Will of GSP, thank you for all your contributions to the community. Members of the community I've interacted more or less with through the years: Ablazoned, GreenDragoon, Gisli, the Midwest Scrub community.

I apologize for anyone I missed.

I wish you guys all the best, and I hope we can meet again someday. Who knows what the future will bring? In the meantime, good luck and have fun!

Saturday, May 15, 2021

Can we improve the way we talk about X-Wing ships?

(I hope Betteridge's Law doesn't apply.)

This is an article where I say there's a problem and have no idea how to fix it.

The source of all of society's problems

Imagine a discussion about Punishing One Dengar. He was extremely weak at 74 points at release and is now a competitive option at 58 points. If someone asked at release why Punishing One Dengar is weak, the responses would probably have talked about his awful dial, being a clunky big-base ship, not really having a turret, no extra dice mods, dying under focus fire, etc. If someone asked why Punishing One Dengar is strong now, responses would probably have talked about Initiative 6, above-average health, having a double-tap ability, being able to learn how to fly with his clunky dial, etc.

Besides the problems with cherry-picking and confirmation bias, the funny thing is both sides are right and always have been. Punishing One Dengar has always had these features. He had all of these features when he was weak and when he was strong. What isn't usually discussed is the only thing that changed which is his points cost, or how many points he is worth so it can be compared to a changing point cost. This discussion isn't useful for understanding why Dengar is good or bad.

(Edit to add: there is one case where I find a discussion of features is helpful and that's when I didn't know about a certain interaction with a ship. But even then, it's often not discussed exactly how beneficial that is and how often that comes up.)

It's really hard to talk about whether a ship is good or bad. It's hard to talk about the effectiveness of ships. Even comparing two vanilla ships often requires non-simple math and running the dice calculator several times to get an accurate comparison. Conditional abilities are harder because we also need to guesstimate the chance the ability triggers and more dice probabilities will have to be calculated. Talking about features that are less numerical like arc-dodging or dials is even more difficult.

Even when we can convey how effective a ship is, whether a ship is good or bad depends crucially on its points cost. This is a problem that requires dividing by two numbers that are not friendly for division and we have to do this again every 6 months.

I noticed this most recently when talking with Raithos about Darth Vader in the TIE Defender after a test game. It was an unproductive discussion of our feelings, some head-sims that were probably in completely different places, and whether the dice or strategy in our sample size of one favored one side or the other. I don't think either of us changed our minds on the ship after our discussion.

(I remembered this after posting, so I'm editing this in now.) One option is to compare ships to other ships. This makes it easier to discuss the effectiveness of a ship compared to its point cost. The challenge with this method is that the comparison ship still has to be evaluated. For example, there was a recent post that argued the First Order Provocateur was underpowered by comparing it to a Saber Squadron TIE Interceptor. My evaluation is that the First Order Provocateur is one of the best options in the game and the Saber is bordering on being overpowered. However, this isn't too problematic if we're just concerned about finding the best ships in the game and using comparison ships that are known to be strong. Still, that limits the amount of comparison ships and thus the applicability of this method.

What if we just talk about tournament results? Besides not being able to predict what we should play, I stand by my previous article on why this may not work. Imagine tournaments as shuffling a deck of cards, and we want to know whether a certain card is more likely to be on or near the top of the deck after we shuffle. We shuffle the deck and the Jack of Hearts is on top. You might have realized how many shuffles we need to figure out if the Jack of Hearts is actually more likely to be near the top of the deck or if this was complete luck. We don't have that many tournaments/shuffles.

Even with the tournaments we have, we often don't pay enough attention that the Queen of Spades was #2 or the 8 of Clubs was #7. On top of that, think of all the pilots in the game or the even vaster number of possible lists in the game, only a small portion of which are in a single tournament. A tournament doesn't even shuffle the full deck. And if there's a systematic way players of varying strengths picks lists, then even an infinite sample size would give us a biased result of how strong lists are.

Obviously, this is all just a plug for my model, right? Well, sort of. For example, version 1.9 of my model rates the TIE Silencer "Avenger" at 57 points, which is roughly fair. Chris Allen, in the recent Fly Better Podcast, thinks Avenger is much better than that. We can dig into the model and see that I based Avenger's strength on his ability triggering 1 time per game on average and he gets an extra dice mod for his attack when it triggers. Chris would likely point out that I got it wrong: Avenger's usually flown in 5-ship lists so his ability would trigger more often than once a game on average, and being able to reposition makes his ability more valuable than just a dice mod. We can have a productive conversation about how often the ability triggers: I might say that sometimes Avenger will die first, sometimes the ability triggers when Avenger is stressed or can't benefit from the action, and sometimes none of your ships die during the game and you win or lose on time. But in the end, the focused area of disagreement makes it easy to change my mind or otherwise understand why we disagree. If I give Avenger 1.5 uses of his ability and value the benefit halfway between the extra token and a full initiative-7 coordinate, then I'd think Avenger is worth 62 points (+10% over his current cost of 56 points). What an improvement this is over vague talks about feelings or generic listings of features!

The model is great for talking about ship strength -- for the two people in the world who understand how it works. It's a messy and poorly-documented jumble of equations and assumptions. I often forget how some of the more obscure parts of it work. It's exponentially harder to read someone else's code and understand their logic. Otherwise, you're just taking my numbers at face value and relying on my judgement. I've spent more time systematically thinking about and evaluating X-Wing ships than most people (than everyone?), but many different people will have a better evaluation of a specific ship they're very familiar with. Anyone who thinks I got things exactly right in my model should look at how many versions there have been :).

And there are definitely ships which the model gets wrong. For example, version 1.9 of the model values Eta-2 Anakin at 45 points, for a whopping -22% difference from its current point cost of 56 points. From the games I've seen, including Paul Heaver's VASSAL League games, and the one game I've play-tested him, I know Eta-2 Anakin is almost certainly better than that. But is Anakin average, competitive, among the best options in the game, or broken OP? I have no idea, and I probably won't know until there's more data on the ship or I figure out what my model (and by extension, what I) got wrong about how to evaluate the ship. In the meantime, I don't know how to have productive conversations about how strong this ship is.

Do I have any ideas how to solve this? Eh. I've written some articles in the Evaluation and Calculation series to try to explain some of the math behind evaluating ships, but that's still not easy to have a discussion about. I've yet to write about more difficult topics like arc-dodging and the articles are a bit time-consuming to write. I've thought about creating a table (actually 6 tables, one for each initiative) of fair vanilla ship point values for attack and hull combinations. That still requires some assumptions about the meta of attack and defense distributions, but it could help establish a baseline for further conversation. And I can also expose some more calculations in the model such as durability and damage output to show how I'm getting results. All of these solutions are very mathy. I'm not sure any of these would solve this broader problem, especially in more casual conversations. Hopefully some more clever people will come up with a good solution for this :).