A couple weeks ago, we had an awesome session with Eric Ries. He came and asked us a very important question: How do you know that you’re making the product better?
There’s really three answers to that question:
- your gut
- listening to users
- measuring users
And of these three, there’s only really one “right” answer – the only way to figure out if a change is actually significant (or even moving in the right direction):
(Disclaimer: split testing is not a substitute for decision making)
Perform tests to measure changes in user behavior.
Out of our time with Mr. Ries, we committed to following one metric that is key to our business, and make sure all of our efforts to improve that metric are run through a split test to ensure that we understand how to impact that number.
Yes, we really are trying to AB test everything. Not so we can have better conversion rates, or know which colors are best, but so that we can figure out what actually causes people to use our products more. In fact, I built a tool using rails and redis to help us measure retention across multiple cohorts during a split test:
For this particular split test, we launched a new version of our website at preview.astrid.com, and redirected a percentage of our new and returning users there. We then measured retention behavior across our two groups after 1, 3, 7, and 14 days. (Right now it’s midnight at the 7th day boundary, which is why no one’s “opened” the website in our +7 day category yet). The important thing is the delta between variants, where we can see that our “true” variant shows an improvement in delta retention, but the result is not yet statistically significant to 95%.
Assuming the results stabilize at these values, from this data we can answer Mr. Ries’s question clearly – yes, our changes improved our product, and in this case, by around 5% for new users and 2% for existing users.
That’s awesome, and it’s things like this that push us forward towards our goal. A lot of A/B tools out there are built to test cosmetic changes – layout, text, colors, stuff like that. At a startup like ours, we have bigger things to test, and we now have a suitable a/b testing framework and a burning desire to use it.