FinTech is one of our core competencies, and QA and testing of payment and billing systems have always been at the heart of our custom-built FinTech application development process. One of our custom projects (under NDA) is a social dating platform integrated with more than 70 online payment systems in 250 countries. The smallest bug or error in payment processing can bring about dramatic consequences for users, cause transaction processing delays and platform downtime, etc., which eventually leads to overheads and financial losses.
Testing billing system integration with 3rd party providers the right way is crucial for any FinTech dev project success, so I want to share some of the tips for effective QA of billing functionality based on the lessons we’ve learned from our numerous FinTech projects.
This article aims to help testers, QA engineers, and PMs, whose projects are already integrated or are about to be integrated with different payment solution providers, choose the most appropriate method to test the billing functionality of their apps.
Specifics of billing functionality testing
One of the goals of any business is to generate profit. The above-mentioned social dating platform we’re building for our client features premium subscriptions and credits that work like internal currency and can be used to increase user profile ranking, send a gift to another user, etc.
A premium subscription is valid for a certain period, and gives you several options: enter invisible mode, see people who have shown interest in you, confirm the authenticity of your account and many others.
To make these paid services work, we use integration with more than 70 payment providers. The choice of provider depends on the platform, country, device, mobile operator, and other factors. Therefore, the question of testing paid services is very acute.
To begin with, let’s consider why it is necessary to approach the testing of paid services with special attention. There are two reasons.
1. Buggy billing can kill your business
The first problem is reputational. The user who has paid for the premium subscription becomes more sensitive (and less tolerant) to in-app bugs. Any negative user feedback in public space, whether it’s a user review of an application or a comment in the App Store or Google Play, leads to reputational and credibility losses and is detrimental to your business continuity.
The second problem is that once you start receiving money from the user, you become legally bound and subject to prosecution for violating particular jurisdictions and/or personal data protection regulations.
Companies typically lose money in three ways.
Let’s say the user wants to transfer credits to their friend but can’t complete a transaction due to a repetitive error. They contact your support team, and it starts investigating the case. As a result of the scrutinized investigation, a severe bug in the payment system has been detected. The user initiates a refund, and the company loses some money. Yet, it’s the least harmful way of taking a beat.
Let’s assume the same situation has occurred, but instead of contacting your support team, the user calls the bank or payment provider to find out what’s wrong with their failed transaction. The bank/provider initiates a refund. In this case, we are dealing with a chargeback. Danger to business here is not only in the form of lost profits. After a certain number of chargebacks, the company gets penalized, and its credibility rating drops significantly.
In 2015, British Gas had to pay a multi-million dollar compensation to users who were charged a higher fee due to a mistake in their payment calculation system. The lawsuit sucked much money off the company and had a drastic impact on its bottom-line.
Arranging proper testing of billing functionality helps avoid the above scenarios and enables companies to generate revenues and increase user loyalty without facing any bumps in the road.
2. To test billing functionality, you need tribal knowledge and strong FinTech expertise
FinTech development teams that are only beginning the integration process with payment providers often lack sufficient internal resources and capabilities to test billing features the right way. Without knowing all possible billing failure cases, they miss out on the important nuances when implementing the system’s response to the notification of payment providers.
This can lead to unpredictable consequences, from lost profits to dissatisfied users.
Billing failure cases
There are three main cases: an error, a successful payment, and a refund to the user. Each case has its details, and each case should be approached and handled differently.
Errors can be critical and non-critical. An example of a non-critical error is a failed notification from a payment provider (e.g., the user needs to refill the balance to make a payment).
An example of a critical error is user payment suspension or entire blocking.
While in the first case you can repeat the transaction later after refilling the balance, in the latter case, you need to call your bank and find out why the error occurred. The user may have been suspected of being involved in fraud, which requires detailed investigation and can be time-consuming.
As you already know, there are two types of user compensation: refunds and chargebacks. Your system should react differently to each. For example, it makes sense to think about blocking some of the features of your application for the user after the chargeback, as it is one of the most prevalent fraud methods.
A successful payment can pertain to both a one-off payment and a subscription.
One-off payments can be consumable or non-consumable. An example of a consumable payment is a credit used as an internal currency. An example of a non-consumable payment can be found in games. Suppose you have a character to play and you want to buy superpower for them, which will be valid for some time. In this case, the purchase is non-expendable.
In addition to the initial purchase of a subscription, you may have other options:
- renew your subscription;
- cancel subscription;
- Initiate trial subscription;
- Grace period (a period immediately after the deadline for an obligation during which a late fee, or other action that would have been taken as a result of failing to meet the deadline, is waived provided that the obligation is satisfied during the grace period);
- partial billing (for example, PayPal allows partial billing if there are insufficient funds in the user’s account).
You also need to take into account two characteristics that are completely dependent on the payment provider: internally and externally managed subscription.
An internally managed subscription is, for example, a credit card or PayPal subscription where you are given a token after the first payment that you use to re-apply to the provider without having the user’s payment details.
An externally managed subscription is when the payment aggregator takes over the management of your subscriptions and sends you notifications about their current statuses.
If you fail to arrange all refunds properly and implement all refunds as chargebacks, your brand credibility ranking will drop dramatically and can disrupt your stats gathering workflows.
Ignorance of the diversity of billing cases can lead to lost profits or company’s vulnerability in front of users who feel cheated.
So, on the one hand, bugs in payment processing should be found before the final release because they can lead to the most negative consequences. On the other hand, the situation is complicated by the fact that integration with payment providers is always like interacting with the “black box”, which adds a lot of variables to the testing process.
Technical issues in the billing functionality testing
Let’s review possible technical issues using our custom-built social dating platform’s integration with the Apple App Store as an example.
All subscriptions within App Store are externally managed, i.e., they are fully managed on the provider’s side, and our system can only request the current state or receive notification about its change.
We specifically chose this integration because it is the most complex one, and it contains all the variety of cases that can be found in the process of integrating the billing service with payment providers.
Let’s start with a one-off expendable purchase.
In step 1, the user makes a request to purchase the service. The application decides that the payment should be made and in step 2, it transfers process management to the payment provider (App Store). Step 3: The user is provided with a payment form. Step 4: The user provides the data for payment. Step 5: The provider executes the transaction and reports the result to the application by returning a receipt containing full information about the purchase (date, service, status, etc.). Step 6: The check, supplemented with user data, is sent to the server for processing. The server processes the data of the check and generates a push notification for the application in step 7. In step 8, the notification is shown to the user.
The problem is that steps 3, 4, and 5 are performed at the payment provider’s end and are not controlled by us so they can have different variations.
Buying subscriptions starts the same way as making a one-off payment, but it is difficult to control further management of the process.
Let me remind you that the Apple subscription we are looking at as an example is externally manageable. This means that the user can manage it asynchronously after purchase: close it, change the validity period, request a refund. We see this in step 9. Since the action takes place outside of our system, it’s dotted in the figure above.
In step 10, the App Store can change the subscription status: extend, close, or enter it into the Grace Period window.
For us to be able to find out what the subscription status is, there is step 11, which is specific to aggregators such as the App Store and Google Wallet. In this step, the system sends a token that uses a receipt received at the beginning of the subscription or after a previous renewal.
Step 12 is the provider’s response. We receive a check with the current subscription status. The result in this step depends on the asynchronous steps 9 and 10.
In the fall of 2018, Apple implemented a server-to-server notification mechanism for all of its customers, which allows them to notify of changes that have occurred to their subscriptions. This notification is shown in step 13. For most payment providers, the server-to-server notification mechanism is the only one, so you can say that the Apple example covers the entire variety of cases. In the case of other ISPs, step 13 eliminates steps 11 and 12.
In step 14, the server generates a response for the application to change the subscription status.
Thus, we have received a full graph of the states that need to be passed to check the paid services. All black boxes that we can’t manage on our own are marked with orange in the image.
Billing functionality testing methods
Uncontrollable black boxes pose severe issues to the billing functionality QA and complicate the process.
We’ve divided all methods into three categories.
A real payment is a test method that gives you a clear picture of the real-time state of integration. An error that occurs during a real payment is absolute proof of the bug.
Otherwise, a real payment method is not so good. First, it is expensive, as you’ll need to spend real money on testing. You are mistaken if you think that the whole amount will eventually return to the company: all providers charge a commission on each transaction and the lower your credibility score, the higher the fee (it can reach 40% or more in some cases). In addition, you may lose money when testing payments in other countries because of the difference between the buying and selling rates of currency (you will buy at the exchange rate of the bank for the sale of a currency, and the return will come at the rate of purchase).
Also, this method can be very time-consuming, because you will have to wait for the end of the subscription renewal period, or the end of the grace period, which can take months.
The sandboxes are awesome. This is essentially the same functionality that a payment provider gives us in the case of a real payment, but without spending real money. It is fully supported by the provider, which means that integration with the sandbox is cheap.
The problem of testing stretch in time is solved, as a rule, by using various tricks. For example, the App Store sandbox uses the following subscription term conversions.
Real subscription time/Apple Sandbox time ratio:
1 week/3 min
1 month/5 min
2 months/10 min
3 months/15 min
6 months/30 min
12 months/1 hour
Real subscription time/Google Wallet time ratio:
1 week/3 min
1 month/5 min
3 months/10 min
6 months/15 min
12 months/30 min
Unlike Apple’s Sandbox, you can also check the trial, grayscale, and other billing scenarios in the Google Wallet Sandbox using the following ratio:
Trial = 3 min
Introductory = the actual time of the current subscription
Grace period (3/7 days) = 5 min
Temporary account hold = 10 min
Pause (1/2/3 months) = 5/10/15 min accordingly
Closing a subscription can also be implemented differently: in the App Store sandbox it is closed after the fifth extension, and in Google Wallet, it is done from the merchant’s console or on the device from the Play Store.
The problem with sandboxes is that providers treat their quality differently. Our experience shows that of the more than 70 payment service providers that are integrated into our client’s platform, only two sandboxes flex full functionality and stable operation. These are Adyen and PayPal sandboxes. The rest of the providers have sandboxes that are either stable, but reduced in terms of functionality (like Google Wallet), or unstable and severely reduced in functionality (like App Store and Fortumo). And there are providers that do not have and are not going to have a sandbox for testing payment integrations at all.
Eliminating external dependencies
Now when you know that testing with real payments is expensive and inefficient, and not all payment providers provide sandboxes as a simulation for testing, let’s explore ways to eliminate external dependencies. There are only three of them: mocks, fakes, and stubs.
Mocks are your system’s responses to queries with predefined parameters without real access to the payment provider (see Fig. 8). For example, a request to an SMS payment provider to the number +xxxx-xxx-xx-xx is intercepted at the stage of sending the request to the provider and forms the system response in the form of a successful payment. The request for the number +xxxx-xxx-xx-xy is also intercepted, but it leads to a response with an error code “Not enough money to complete the transaction”.
Fakes in billing are forged notifications (as if they come from a real provider) (see image below).
Integration with each provider implies a limited set of system responses to a limited set of notifications or receipts. Based on this data, it is possible to form a set of notifications for each separate payment (with signatures and other fake security attributes), which our system will consider as real notifications from the payment provider.
Stubs in billing are redirects to a page with a list of possible system’s reactions instead of sending and processing the request (see Fig. 10), when we provide all the possible reactions of the payment provider for the current state of payment and call this reaction instead of sending a request to a real provider or sandbox.
Although all of these methods help avoid wasting real money and time, they cannot be called cheap, because in order to use them you need to map out all possible billing states for each provider and keep them up to date. Also, to use all methods (except, perhaps, fakes) it is necessary to make significant changes in the code. Besides, as simulations of real payment modeling, mocks, stubs, and fakes provide only a certain degree of approximation to reality but don’t give you a 100% picture of errors.
Let’s get back to the process of making a one-off payment. Steps 3, 4, 5 are key to integration: transfer of control to the payment provider, sending a request to the provider and receiving a response. When using each of the ways to eliminate external dependencies, the focus is on some of these steps: when using the mocks we simulate the transfer of control and request send-out when using the stubs, we only simulate the transfer of control, and when using the fakes – receiving a response. All other steps are omitted.
On the one hand, such elimination of steps leads to risks (as an example, you can skip the bug in steps not covered with tests). On the other hand, modeling each step makes the method more expensive, as it requires real changes in the system. Therefore, in practice, we use combinations of methods. For example, we use mocks and fakes in situations when sending a request to a certain number doesn’t activate the reaction of the system, but sends out a fake notification to the entry point for notifications on our server. We use stubs and fakes when a reaction from a stub is selected, and a fake notification is also sent. Of course, such implementations should be limited to the developer environment and should not be allowed to move to the production server.
Limitations of billing testing methods
Unfortunately, none of the methods described above is a silver bullet. So how do you know when to use which? We suggest evaluating them according to the following criteria:
Reproducibility and coverage – which method will help cover and reproduce as many cases as possible?
Possibility of end-to-end checks – what does the method do better: does it allow you to check the entire process of making a payment or test quickly only some stage of it?
Cost – estimate the full cost, including both real money expenses and the cost of writing and maintaining the code.
We’ve summed up our comparison results in the table below:
To top it all, here’s a summary of methods.
A real payment
Pros: this is the only method that allows you to test the entire billing integration process.
Cons: it offers quite a limited number of cases, the annual subscription needs to be tested for a year, very expensive as we spend real money and can’t always qualify for the refund.
Pros: it’s the cheapest method, and it can cover cases specific for a particular ecosystem such as Apple and Google
Cons: it does not provide the possibility of full-fledged end-to-end testing, even the code in the sandbox itself may be different from the code on the production server.
Fakes, mocks, and stubs
Pros: very flexible methods which allow for covering the whole set of cases.
Cons: due to the specifics of this method, we do not test the whole payment process. The method is not cheap: we need to write the code and keep it up to date.
Selecting a billing test method
To determine which method to use at what stage, let’s turn to the classical testing pyramid.
At the bottom of the pyramid, there is a large number of tests, which should fully cover the whole functionality of our system. These should be very small and quite cheap cases.
At the top of the pyramid, the coverage can be incomplete: it can involve expensive cases. The main goal we want to achieve here is to test the full path of our service from request to delivery to the user.
If we correlate this with the criteria for evaluating testing methods, we will get the following ratio: for tests at the bottom of the pyramid – use fakes, stubs and mocks; for tests at the top of the pyramid – use integration-oriented methods such real payment and sandboxes.
- Paid services and billing functionality should be tested particularly carefully, as even the smallest bugs can lead to unexpected consequences.
- When implementing integration with a payment provider (especially when using iterative development methodologies), it is important to study and map out all possible provider states. Iterativeness can be used to complicate the system’s response to certain states, but the system itself should classify the states correctly from the very beginning.
- A payment provider is always a “black box” for us, as it is very difficult to test how it works. Don’t try to use just one method and test everything with it – it will lead to adverse consequences. It is better to use a combination of methods across QA stages to ensure maximum efficiency of your testing efforts.
- When using fakes, mocks, and stubs, it’s important to remember that these are real payment models, and like any model, they have a degree of proximity to reality and can pose certain risks. These risks need to be assessed and covered by either real payments or additional checks.