Quality Assurance
FAQ: How do you assure the quality of your categorization results?
Last updated
FAQ: How do you assure the quality of your categorization results?
Last updated
The quality of every categorisation is measured by every single category as some categories are more important than others (e.g. salary detection is much more difficult but also much more important for our customers than payments in supermarkets).
For each category we measure the classical precision and recall KPIs and compute the F1 score which is a balanced average of recall and precision. “Recall” - roughly speaking - measures the capability of the machine to detect the desired category (“how many of the expected category occurrences are detected”), while “precision” measures the accuracy of the detected occurrences.
For instance, typical F1 scores in January 2020 were:
Fintecsystems has implemented a quality assurance process comprising the following 9 steps.
Every customer can send direct feedback to the categorization team which takes care of the request. Most requests are of explanatory nature, some are problem reports that the team directly takes care of.
Our categorization team does random sampling checks of categorized turnovers and sets the desired outcome. Several employees have to come to the same conclusion to settle a manual categorization decision (majority decision). This process looks at thousands of account turnovers every month and the results are processed in a 30-days-round-robin process.
The manual majority decisions are used to compute recall, precision and F1 score for every category. This helps us to measure the quality of our engine and to steer our work.
As in our software development department, changes of the categorization logic cannot be done by only one person. A second expert has to check the changes in the categorization engine and accept them as useful and valid.
Every change in the categorization engine has to pass an automated test. If this automated test fails, the change cannot go productive.
Every change in the categorization engine has to be checked by a team member in a before-after-comparison manner, meaning that the “before” engine and “after” engine are run on a big sample of real account turnovers and changes are displayed to be checked by the expert.
Every version of the categorization engine is strictly versioned like any other piece of software that we are producing. If there should be a problem in a new version, we can return to a previous version any time and within seconds.
The categorization team does weekly or bi-weekly phone meetups to discuss the handling of edge and problem cases. Every edge or problem case is documented.
All manual categorization decisions of all team members are monitored in a way that we can detect deviations from the “decision norm”. Comparing a vast number of manual decisions done by several team members shows individual misunderstanding or training needs.
Category
Description
F1 Score
A.5.4
Payments to collection offices
98.7%
E.1.1
Salary
98.9%
K.2.1
Rent
92.0%