Arc XP: Benchmarking
THE SITUATION
Owned by Jeff Bezos and the SaaS division of the Washington Post, Arc XP is a state-of-the-art content & digital experience platform engineered to meet the demands of news, media and enterprise companies around the world. An integrated ecosystem of cloud-based tools, Arc helps create and distribute content, drive digital commerce, and deliver multichannel websites to both internal and external audiences. Today, Arc powers more than 1,400 sites across 23 countries, reaching over 1.5 billion unique monthly visitors.
I am currently launching a Benchmarking program for Arc XP that is focused on tracking both attitudinal and behavioral aspects of key products over time. The Benchmarking Program is part of an expansion of the UX Research practice to include quantitative methods alongside our already well-established qualitative methods.
MY ROLE
Lead all aspects of Benchmarking Program, working across products, with cross-functional teams
Teach Product Managers and their teams what benchmarking is, why it is important, and why we are including it in the UX Research program for Arc XP
Teach Product Managers how to effectively leverage our internal analytics tool, Pendo, to gain insights into their product
Conduct workshops with each cross-functional product team in order to:
Identify what to measure, when to measure, and how to measure
Ensure that Benchmarking Program is aligned with broad company goals and product specific KPIs
Compile and analyze collected data
Set up a process that the Product Managers and Designers can repeat on their own at an agreed upon cadence
MY APPROACH
I put together a remote workshop that I used with each product team. The workshop structure helped us keep company-wide and product-specific KPIs centered as we decided what to benchmark and how.
Once we established a broad framework of goals and KPIs, we then moved into an ideation session. As I collected our ideas on metrics, we discussed pros and cons of each. After this we talked about potential methods for collecting data given the tools we have at Arc XP. Finally, we linked these ideas back to KPIs to focus in on what we wanted to benchmark.
EXAMPLE OUTCOME: BENCHMARKING COMPOSER
The first product team I worked with was our Composer team, our most high profile product. This team is in the middle of a major product redesign, and I wanted to be able to get a benchmark on where we stood with our end users before the redesign.
Fortunately, just before we embarked on this redesign I had the opportunity to invite the Senior Designer from this team to pair with me for an on site contextual inquiry with our biggest client. She travelled with me to three different client locations, and spent hours sitting with me and client content creators/editors as they used Composer. She also worked closely with me to analyze and interpret the data from that research. I created journey maps after that research that helped focus attention on common client pain points. That shared, rich research was a solid foundation from which to move into benchmarking her product.
The team and I settled on a two part approach:
Part 1: Behavioral
Use our analytics tool, Pendo, to measure completion time for our most common user task (from starting a new article to publishing) during a given one week period. I then created client segments and we considered the data by client type to look for trends.
Part 2: Attitudinal
I encouraged the team to collect the attitudinal benchmark within the product itself. I selected the most common user task, and used the completion of that task to trigger Pendo to serve up a positive version of the System Usability Scale (SUS) survey hosted in Maze.
FINDINGS
We have collected one response to the SUS, our attitudinal benchmark, and that result will become more interesting when we test again after the redesign. The findings from our initial behavioral analysis, however, were immediately relevant to the team.
Behavioral Findings
Finding One:
During the the study week, our data showed that our users started 117,135 stories and completed 66,770 stories
Our newspaper clients completed about half of the articles they started during that week
Our broadcast clients completed about 75% of the stories they started
The context: This made sense to me when viewed within the richer context of information I have collected through on site contextual inquires with clients. I have seen first hand that our Newspaper and Broadcast clients have very different workflows and it was interesting to see how that difference appears at the quantitative level.
Finding Two
Newspaper clients completed articles within about an hour and half, whereas Broadcast clients averaged about 40 minutes.
The context: Again, this confirmed behaviors and workflow differences I had observed through on-site contextual inquiries. Newspaper clients tended to have longer review processes, while Broadcast clients had a number of “rinse and repeat” publications that ran parallel to the Broadcast schedule and in some cases did not require review.
Finding Three
A significant portion of our Broadcast and Newspaper clients start and publish a new article in under 10 minutes.
The context: Our working theory is that these are examples of “cut and paste work arounds,” that I have seen in contextual inquires and we have heard about in client interviews. Clients want the ability to “just starting writing” without having to complete metadata. Frustration with this friction within Composer leads them to write and edit in other tools, and then simply copy and paste that text into Composer when the work is complete. Reducing this friction is a key component in the Composer redesign, and benchmarking will track whether or not we are successful.
Finding Four
The single most common user action after publishing an article is to “Save.”
The context: This was not a surprise and I had seen this behavior as well in past contextual inquiries with clients. Composer’s error and warning messaging decreases user trust in the system, and leads to “anxiety saves.” We are fixing these messages in our redesign and our goal will be to eliminate these superfluous save actions.
BENCHMARKING GOALS FOR THIS TEAM
Once our redesign is released, we will do another round of benchmarking, and plan to continue this at a six month cadence. My goals for this team are:
After the redesign we will track impact, if any, on completion times and numbers of stories completed for all client segments
After the redesign we want to see a reduction in the number of articles completed in under ten minutes (reduce the number of end users cutting and pasting from other tools) across our clients segments
At each six month check in, we would like to see end users’ SUS ratings for Composer improve over time
WHAT I LEARNED
When I work with teams on benchmarking their product, one of the hardest adjustments for the team is thinking long term about what they should benchmark. I have to keep reminding them that what we choose to measure must remain relevant over many years. The shift from formative types of research (with which the organization is more familiar) to summative research is also a shift in mindset.
Collecting data is less time consuming than evaluating , parsing, and interpreting that data. Segmenting our clients out by type in a way that made all stakeholders comfortable was surprisingly time consuming. That said, moving from data to insights and hypotheses was very engaging, and sparked rich discussions with each team about their product.
I strongly believe that tools like benchmarking and data analytics in general are far more powerful when paired with qualitative research. Because I had conducted extensive contextual inquires on site with multiple clients I could easily interpret the trends we saw in this data and help the team come up with hypotheses about their product and end user behaviors.