App Audits

Company: Shopify
Role: Senior Product Designer
Type: Process Design

Because Shopify doesn’t host third-party apps, approved apps effectively went dark after launch; leaving us blind to behavior changes, infractions, and trust-breaking issues until merchants complained. I designed a prioritization process using indirect data tripwires to surface risky live apps early, establishing a consistent quality bar, discouraging bad actors, clarifying ecosystem-wide non-compliance, and creating a fast, repeatable way to address issues at scale.

Problem

Shopify doesn’t host their 3rd party apps. This meant that after apps were approved, we were blind to what they did and how they changed over time. We relied on developers and merchants notifying us of issues that could possibly impact thousands, and we were unaware of critical infractions that broke the trust of our users.

Solution

A process to surface and address live apps in priority order by using indirect data tripwires that indicate potential infractions and broken requirements.

Impact

Developed a consistent quality standard in the ecosystem
Discouraged bad actors
Introduced a concise process to address live apps
Identification of scope of non-compliance in the ecosystem

Risky business

After apps were reviewed and went live on the App Store, we didn’t have a feasible way to track changes in their code or listing details. In other words, every app you see in the App Store could potentially have been a risk to the trust we had built with merchants.

Indirect truth

There are over five thousand live apps in the App Store, yet there are only a handful of App Store auditors. We had no direct data to work with. Where would we even start? We needed to prioritize.

By using indirect data we created an extensive list of tripwires based off of three main pillars:

Security: Potential breach of confidential information and PII
Trust: Potential misinformation or break in expected behaviours
Risk: Extent of reach of an app when it comes to users and GMV

We designed the tripwires with a scoring system, that in combination with an assigned threshold and additional automated checks, provided us with a prioritized list of what apps we should be looking at.

Historical experience

Building tripwires required extensive user research work. In this case, the users were other Shopify employees that had frontline experience with the issues we were trying to find. This research allowed us to formulate tripwires with indirect data that would work as potential flags for those issues.

Spreadsheet where I categorized and prioritized tripwires

Focused on the right thing

Making a decision on a threshold, formulating the scoring system and adding weight from other pre-existing automated checks allowed to surface the most impactful apps.

Funnel flow on how we chose what apps to audit

Many stakeholders

This project was high stakes. It demanded communicating with app developers and possibly serving punishment for breaking the terms of service. So it was natural for an extensive number of stakeholders to be involved: Developer Relations managers, Security, Governance, App Store, Marketing, App Reviewers, etc.

Most of the initial work was meeting with the stakeholders to understand their own protocols and how they fit in the process; request feedback; and achieve buy-in on the direction that project was going towards.

Zooming in and out of the problem

This picture shows parts of the same flow at different levels of detail. Bringing clarity to all stakeholders was crucial to move the project forward. But not all stakeholders were interested in the same things, so we always had to make sure that we spoke about the relevant part of the picture at the relevant level.

Minimum viable product

This project happened at a busy time, so we had to work with a high number of difficult limitations:

Completely new process, meant no previous learnings; which also meant we needed to prepare and design the iteration and feedback loops.
The auditors are not technically specialized so we had to use our specialized teams as training resources.
Being a new process we couldn’t invest a lot in new tools, thus we needed to repurpose other existing tools.

All these limitations allowed us to be resourceful and practical during the exploration phase of the project, leading to a succesful and timely build phase. With time saved, the project proved to be worthy of further investment in tooling and automation, thus reducing human workload and increasing the number of apps audited.

Repurposing familiar tools

With few resources, we had to reutilize existing frameworks in a new context. We repurposed the tool for the multi-stepped App Submission process, for our multi-stepped App Audit cycles. Similarly, we repurposed the criteria-driven App Review tool to create the Issues-driven App Audit screening.