
Get Back to Basics with ExposeIT
Have you noticed that most articles and blogs about data and using data in analytics, are written with the assumption that all forensic practitioners must have at least a certain level of understanding about artificial intelligence and machine learning? We all want to benefit from AI (Systems that act smart) and ML (Systems that learn through experience) however, we need to first understand how these disciplines depend on data.
A good place to start is with processing clean data:

Clean data / noun/ error-free data, or data with very few errors.
For example: spelling mistakes, incomplete, irrelevant or redundant data.
Data that is correctly formatted (no duplicates), complete datasets (not corrupted).
Do you know how your organisation compares?
Some statistics on the condition of data quality highlight that making decisions on poor quality data will produce inaccurate business outcomes, – an obvious finding one would assume! A survey conducted with 100 enterprises in the USA, found that nearly 40% of their company data was inaccurate, and some best-in-class companies claimed they are able to access only 35% of newly added data.
5 Characteristics of clean data
Validity – The degree to which your data conforms or contrasts with your business rules.
E.g. Duplicates in a personal address book or in a customer database.Accuracy – True value of data.
E.g. Previously terminated accounts which return, masquerading as good accounts.Consistency – Within the same dataset and/or across multiple data sets.
E.g. Unknown duplicated and related data within and across systems.Relevance and timing – Collecting data should justify the effort required. Data collected too soon or too late could misrepresent a situation.
Completeness – All required data is known. All data necessary for business information demand should be available.
Uniqueness – Level of data collected is important otherwise it leads to inaccuracies.
E.g. A field or a combination of fields must be unique across a database. No two persons can have the same ID number – A South African ID number is a 13-digit number recorded in a specific sequence namely: Date of birth, Gender, Citizen or Resident status, Checksum digit to verify validity.
XTND’s ExposeIT fraud and crime profiling and watchlist solution enables a forensic practitioner to collect and analyse large volumes of data, in order to generate actionable insights in near real time. A powerful combination of visual tools and automated analytics help analysts to quickly uncover patterns, trends and connections that are hidden within large, diverse, and disparate data sets.
How we get insights into who is who, and who is related to whom?
The correct spelling or presentation of a name, whether of a person or company, is critical in finding all the information and associations attached to that name. Let’s take for example, two permutations in a person’s name, address and contact number – without using additional expensive entity resolution software (recognising when multiple observations belong to the same entity, despite having been described differently), linking the two permutations through link analysis will almost certainly result in missing other entities that may be of value in identifying fraud or crime.
1 | 2 |
---|---|
Leon Townsen | Lee. L Towsen |
123 Main street | 123 E Main street |
82.601.4223 | 2782601_4223 |
Don’t miss a blog, subscribe to the
From Data to Intelligence Series here
Written by Leon Towsen, COO, XTND