Because Shopify doesn’t host third-party apps, approved apps effectively went dark after launch; leaving us blind to behavior changes, infractions, and trust-breaking issues until merchants complained. I designed a prioritization process using indirect data tripwires to surface risky live apps early, establishing a consistent quality bar, discouraging bad actors, clarifying ecosystem-wide non-compliance, and creating a fast, repeatable way to address issues at scale.
Problem
Shopify doesn’t host their 3rd party apps. This meant that after apps were approved, we were blind to what they
did and how they changed over time. We relied on developers and merchants notifying us of issues that could
possibly impact thousands, and we were unaware of critical infractions that broke the trust of our users.
Solution
A process to surface and address live apps in priority order by using indirect data tripwires that indicate
potential infractions and broken requirements.
Impact
Developed a consistent quality standard in the ecosystem
Discouraged bad actors
Introduced a concise process to address live apps
Identification of scope of non-compliance in the ecosystem
Risky business
Shopify App Store Circa 2020
After apps were reviewed and went live on the App Store, we didn’t have a feasible way to track changes in their
code or listing details. In other words, every app you see in the App Store could potentially have been a risk to
the trust we had built with merchants.
Indirect truth
There are over five thousand live apps in the App Store, yet there are only a handful of App Store auditors. We
had no direct data to work with. Where would we even start? We needed to prioritize.
By using indirect data we created an extensive list of tripwires based off of three main pillars:
Security: Potential breach of confidential information and PII
Trust: Potential misinformation or break in expected behaviours
Risk: Extent of reach of an app when it comes to users and GMV
We designed the tripwires with a scoring system, that in combination with an assigned threshold and additional
automated checks, provided us with a prioritized list of what apps we should be looking at.
Historical experience
Building tripwires required extensive user research work. In this case, the users were other Shopify employees
that had frontline experience with the issues we were trying to find. This research allowed us to formulate
tripwires with indirect data that would work as potential flags for those issues.
Spreadsheet where I categorized and prioritized tripwires
Focused on the right thing
Making a decision on a threshold, formulating the scoring system and adding weight from other pre-existing
automated checks allowed to surface the most impactful apps.
Funnel flow on how we chose what apps to audit
Many stakeholders
This project was high stakes. It demanded communicating with app developers and possibly serving punishment for
breaking the terms of service. So it was natural for an extensive number of stakeholders to be involved:
Developer Relations managers, Security, Governance, App Store, Marketing, App Reviewers, etc.
Most of the initial work was meeting with the stakeholders to understand their own protocols and how they fit in
the process; request feedback; and achieve buy-in on the direction that project was going towards.
Zooming in and out of the problem
This picture shows parts of the same flow at different levels of detail. Bringing clarity to all stakeholders was
crucial to move the project forward. But not all stakeholders were interested in the same things, so we always had
to make sure that we spoke about the relevant part of the picture at the relevant level.
Minimum viable product
This project happened at a busy time, so we had to work with a high number of difficult limitations:
Completely new process, meant no previous learnings; which also meant we needed to prepare and design the
iteration and feedback loops.
The auditors are not technically specialized so we had to use our specialized teams as training resources.
Being a new process we couldn’t invest a lot in new tools, thus we needed to repurpose other existing tools.
All these limitations allowed us to be resourceful and practical during the exploration phase of the project,
leading to a succesful and timely build phase. With time saved, the project proved to be worthy of further
investment in tooling and automation, thus reducing human workload and increasing the number of apps audited.
Repurposing familiar tools
With few resources, we had to reutilize existing frameworks in a new context. We
repurposed the tool for the multi-stepped App Submission process, for our multi-stepped App Audit cycles.
Similarly, we repurposed the criteria-driven App Review tool to create the Issues-driven App Audit screening.
Dashboard originally utilized for app submissions, now used for auditsApp review form, repurposed to be used for audits