The majority of AI training data will be synthetic by next year – Gartner

2023-08-04
关注

  •  

Most data used to train machine learning models will be synthetic and automatically generated, a new report from Gartner predicts. Only 1% of all AI training data was synthetic in 2021 but analysts suggest it could hit 60% by the end of 2024. Governance and vigilance about biases is essential to prevent this data suffering the same challenges as organic data, one expert told Tech Monitor.

Analysts predict more than 60% of data used to train AI models will be synthetic by the end of 2024. Photo: Yurchanka Siarhei/Shutterstock
Analysts predict more than 60% of data used to train AI models will be synthetic by the end of 2024. (Photo by Yurchanka Siarhei/Shutterstock)

Synthetic data is generated by AI to fill in missing gaps in real world information such as medical imaging or information on specific disease patterns. In new research on trends in data science, published this week, Gartner predicts that by 2024 more than 60% of all AI model training data will be synthetic, something it says will lead to better AI systems.

This move from organic to synthetic training data is part of a wider shift towards data-centric AI, such as those used to produce large language and foundation models. “Solutions such as AI-specific data management, synthetic data and data labelling technologies, aim to solve many data challenges, including accessibility, volume, privacy, security, complexity and scope,” Gartner’s report says.

A recent report by GlobalData found that synthetic data start-ups were “redefining the landscape of data generation”. Describing it as the “master key to AI’s future”, Kiran Raj, practice head of disruptive tech at GlobalData said the start-ups were breaking through the shackles of data quality and regulation. “As the demand for reliable, cost-effective, time-efficient, and privacy-preserving data continues to accelerate, startups envision a future powered by synthetic data, ushering a new era of machine learning progress,” Raj said.

It has the potential to have positive impacts across a range of sectors. In healthcare it is already being used to augment real patient data for training doctors, improving drug discovery and optimising systems. In the financial services sector it is helping mitigate risk and detect fraud. And in retail it is improving demand forecasting, personalised marketing and fraud detection.

AI moving to the edge

The other key trends noted by Gartner include a shift towards edge processing for AI. Processing data at the point of creation will help organisations gain real-time insights and detect new patterns, according to the report. It will also make it easier to meet ever more stringent data privacy requirements. The organisation predicts more than 55% of data analysis by neural networks will occur in an edge system by 2025. 

Gartner analysts predict there will be a greater emphasis on responsible AI. This includes ensuring the technology is used as a positive force rather than a threat to society. It includes ensuring businesses make ethical choices when adopting AI that address societal value, risk, trust, accountability and transparency. These are the core requirements making up many of the AI regulations being developed around the world including in the UK.

Organisations should adopt a “risk-proportional approach” to AI investment and deployment, the analysts warned. This includes taking caution when applying solutions and models and seeking assurances from vendors to ensure they are managing their own risk and compliance obligations. This will help protect from financial loss and legal action. 

Content from our partners

AI will equip the F&B industry for a resilient future

AI will equip the F&B industry for a resilient future

Insurance enterprises must harness the powers of data collaboration to achieve their commercial potential

Insurance enterprises must harness the powers of data collaboration to achieve their commercial potential

How tech teams are driving the sustainability agenda across the public sector

How tech teams are driving the sustainability agenda across the public sector

Some foundation model and generative AI organisations are offering degrees of indemnity from these risks. Adobe says it will costs associated with copyright claims from the use of its Firefly generative AI image model. This is because the company is confident the model is trained solely on licenced and authorised data that won’t produce copyright-suspect output.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

Healthcare and disease detection

Peter Krensky, director analyst at Gartner said: “As machine learning adoption continues to grow rapidly across industries, data is evolving from just focusing on predictive models, toward a more democratised, dynamic, and data-centric discipline. This is now also fuelled by the fervour around generative AI. While potential risks are emerging, so too are the many new capabilities and use cases for data scientists and their organisations.”

Caroline Carruthers, data expert and co-founder of global data consultancy Carruthers and Jackson told Tech Monitor synthetic data was an invaluable tool for training AI models, particularly where there large datasets weren’t available. “It’s been used most effectively in the healthcare sector, where data on rare diseases has been supplemented by synthetic data to improve modelling of treatment options,” she says. 

Carruthers said that while there is “clear value to expanding limited datasets with synthetic data, there are a number of risks”, including the possibility that biases which are prominent in smaller datasets might be amplified by synthetic data using it as a foundation. She adds: “The bottom line is that synthetic data faces the same challenges as organic data when it comes to the need for governance and being vigilant about potential biases.”

Read more: Adobe Firefly offers indemnity from generative AI copyright claims

Topics in this article : AI

  •  

  • en
您觉得本篇内容如何
评分

相关产品

EN 650 & EN 650.3 观察窗

EN 650.3 version is for use with fluids containing alcohol.

Acromag 966EN 温度信号调节器

这些模块为多达6个输入通道提供了一个独立的以太网接口。多量程输入接收来自各种传感器和设备的信号。高分辨率,低噪音,A/D转换器提供高精度和可靠性。三路隔离进一步提高了系统性能。,两种以太网协议可用。选择Ethernet Modbus TCP\/IP或Ethernet\/IP。,i2o功能仅在6通道以太网Modbus TCP\/IP模块上可用。,功能

雷克兰 EN15F 其他

品牌;雷克兰 型号; EN15F 功能;防化学 名称;防化手套

Honeywell USA CSLA2EN 电流传感器

CSLA系列感应模拟电流传感器集成了SS490系列线性霍尔效应传感器集成电路。该传感元件组装在印刷电路板安装外壳中。这种住房有四种配置。正常安装是用0.375英寸4-40螺钉和方螺母(没有提供)插入外壳或6-20自攻螺钉。所述传感器、磁通收集器和壳体的组合包括所述支架组件。这些传感器是比例测量的。

TMP Pro Distribution C012EN RF 音频麦克风

C012E射频从上到下由实心黄铜制成,非常适合于要求音质的极端环境,具有非常坚固的外壳。内置的幻像电源模块具有完全的射频保护,以防止在800 Mhz-1.2 Ghz频段工作的GSM设备的干扰。极性模式:心形频率响应:50赫兹-18千赫灵敏度:-47dB+\/-3dB@1千赫

ValueTronics DLRO200-EN 毫欧表

"The DLRO200-EN ducter ohmmeter is a dlro from Megger."

评论

您需要登录才可以回复|注册

提交评论

广告
提取码
复制提取码
点击跳转至百度网盘