Synthetic Data – What’s it all about?
Lots of organisations sell Synthetic Data on a subscription basis, but what is it, does my organisation need it, and what can be done with it?
A good way of explaining Synthetic Data is to use a real world analogy that everyone can relate to.
Your organisation has information; it may be about your customers, their finances, what they have bought, where they live or even health data. Your organisation wishes to leverage this data, or maybe test new software, and potentially pass this data to a third party to do the analysis for you.
It’s likely that your customer may not have given permission of this usage of their information. But as if not more importantly, if this data were to be disclosed it could have disasterous consequences for your organisation, and your customer.
Here’s where Synthetic Data comes in. Data can be generated which is statistically similar to the source data, and it is this data which is passed to the third party; maintaining confidentality whilst allowing your organisation to leverage the power of data science and AI.
Synthetic data, though is not always equal.
Tequila AI takes immense care to ensure that the synthetic data we generate for your organisation is appropriate for your needs.
For example, you have a cohort of clients in a location. Our tools can generate names of customers which are similar in origin. Say for example a customers name is Marina Iantorno, and her current location is Germany, but the place of Birth was Argentina. It’s statistically likely that she is female, speaks Argentinian Spanish. The prevalence of Iantorno is highest in Buenos Aires. Her heritage is likely Italian, and she may speak German with a level of proficiency commensurate with how long she has lived there. If she has German Citizenship she will have at least that required to pass the citizenship test. In this scenario, Tequila’s AI tool would generate a name statistically similar to those charactristics.
Tequila AI take the same care with confidentiality. Say for example, a client was examining health care data. Some conditions have such a low incidence rate that it would be possible even when synthesizing data to determine the original data. In this case, Tequila AI’s tools would flag that issue and not produce data in that use case.
If you’d like to know more, Contact Us to discuss your requirements and how we can help you achieve your goals.