We Need Smart Intellectual Property Laws for Artificial Intelligence

2023-08-09
关注

Once a backwater filled with speculation, artificial intelligence is now a burning, “hair on fire” conflagration of both hopes and fears about the revolutionary technological transformation. A profound uncertainty surrounds these intelligent systems—which already surpass human capabilities in some domains—and their regulation. Making the right choices for how to protect or control the technology is the only way that hopes about the benefits of AI—for science, medicine and better lives overall—will win out over persistent apocalyptic fears.

Public introduction of AI chatbots such as OpenAI’s ChatGPT over the past year has led to outsize warnings. They range from one given by Senate Majority Leader Chuck Schumer of New York State, who said AI will “usher in dramatic changes to the workplace, the classroom, our living rooms—to virtually every corner of life,” to another asserted by Russian president Vladimir Putin, who said, “Whoever becomes the leader in this sphere will become the ruler of the world.” Such fears also include warnings of dire consequences of unconstrained AI from industry leaders.

Legislative efforts to address these issues have already begun. On June 14 the European Parliament voted to approve a new Artificial Intelligence Act, after adopting 771 amendments to a 69-page proposal by the European Commission,. The act requires “generative” AI systems like ChatGPT to implement a number of safeguards and disclosures, such as on the use of a system that “deploys subliminal techniques beyond a person’s consciousness” or “exploits and of the vulnerabilities of a specific group of persons due to their age, physical or mental disability,” as well as to avoid “foreseeable risks to health, safety, fundamental rights, the environment and democracy and the rule of law.”

A pressing question worldwide is whether the data used to train AI systems requires consent from authors or performers, who are also seeking attribution and compensation for the use of their works.

Several governments have created special text and data mining exceptions to copyright law to make it easier to collect and use information for training AI. These allow some systems to train on online texts, images and other work that is owned by other people. These exceptions have been met with opposition recently, particularly from copyright owners and critics with more general objections who want to slow down or degrade the services. They add to the controversies raised by an explosion of reporting on AI risks in recent months related to the technology’s potential to pose threats of bias, social manipulation, losses of income and employment, disinformation, fraud and other risks, including catastrophic predictions about “the end of the human race.”

Recent U.S. copyright hearings echoed a common refrain from authors, artists and performers—that AI training data should be subject to the “three C’s” of consent, credit and compensation. Each C has its own practical challenges that run counter to the most favorable text and data mining exceptions embraced by some nations.

The national approaches to the intellectual property associated with training data are diverse and evolving. The U.S. is dealing with multiple lawsuits to determine to what extent the fair use exception to copyright applies. A 2019 European Union (E.U.) Directive on copyright in the digital single market included exceptions for text and data mining, including a mandatory exception for research and cultural heritage organizations, while giving copyright owners the right to prevent the use of their works for commercial services. In 2022 the U.K. proposed a broad exception that would apply to commercial uses, though it was then put on hold earlier this year. In 2021 Singapore created an exception in its copyright law for computational data analysis, which applies to text and data mining, data analytics and machine learning. Singapore’s exception requires lawful access to the data but cannot be overridden by contracts. China has issued statements suggesting it will exclude from training data “content infringing intellectual property rights.” In an April article from Stanford University’s DigiChina project, Helen Toner of Georgetown University’s Center for Security and Emerging Technology described this as “somewhat opaque, given that the copyright status of much of the data in question—typically scraped at massive scale from a wide range of online sources—is murky.” Many countries have no specific exception for text and data mining but have not yet staked out a position. Indian officials have indicated they are not prepared to regulate AI at this time, but like many other countries, India is keen to support a domestic industry.

As laws and regulations emerge, care should be exercised to avoid a one-size-fits-all approach, in which the rules that apply to recorded music or art also carry over to the scientific papers and data used for medical research and development.

Previous legislative efforts on databases illustrate the need for caution. In the 1990s proposals circulated to automatically confer rights to information extracted from databases, including statistics and other noncopyrighted elements. One example was a treaty proposed by the World Intellectual Property Organization (WIPO) in 1996. In the U.S., a diverse coalition of academics, libraries, amateur genealogists and public interest groups opposed the treaty proposal. But probably more consequential was the opposition by U.S. companies such as Bloomberg, Dun & Bradstreet and STATS that came to see the database treaty as both unnecessary and onerous because it would increase the burden of licensing the data that they needed to acquire and provide to customers and, in some cases, would create unwanted monopolies. The WIPO database treaty failed at a 1996 diplomatic conference, as did subsequent efforts to adopt a law in the U.S. but the E.U. proceeded to implement a directive on the legial protection of databases. In the decades since the U.S. has seen a proliferation of investments in databases, and the E.U. has sought to weaken its directive through court decisions. In 2005 its internal evaluations found that this “instrument has had no proven impact on the production of databases.”

Sheer practicality points to another caveat. The scale of data in large language models can be difficult to comprehend. The first release of Stable Diffusion, which generates images from text, required training on 2.3 billion images. GPT-2, an earlier version of the model that powers ChatGPT, was trained on 40 gigabytes of data. The subsequent version GPT-3 was trained on 45 terabytes of data, more than 1,000 times larger. OpenAI, faced with litigation over its use of data, has not publicly disclosed the specific size of the dataset used for training the latest version, GPT-4. Clearing rights to copyrighted work can be difficult even for simple projects, and for very large projects or platforms, the challenges of even knowing who owns the rights is nearly impossible, given the practical requirements of locating metadata and evaluating contracts between authors or performers and publishers. In science, requirements for getting consent to use copyrighted work could give publishers for scientific articles considerable leverage over which companies could use the data, even though most authors are not paid.

Differences between who owns what matter. It’s one thing to have the copyright holder of a popular music recording opt out of a database; it’s another if an important scientific paper is left out over licensing disputes. When AI is used in hospitals and in gene therapy, do you really want to exclude relevant information from the training database?

Beyond consent, the other two c’s, credit and compensation, have their own challenges, as illustrated even now with the high cost of litigation regarding infringements of copyright or patents. But one can also imagine datasets and uses in the arts or biomedical research where a well-managed AI program could be helpful to implement benefit sharing, such as the proposed open-source dividend for seeding successful biomedical products.

In some cases, data used to train AI can be decentralized, with a number of safeguards. They include implementing privacy protection, avoiding unwanted monopoly control and using the “dataspaces” approaches now being built for some scientific data.

All of this raises the obvious challenge to any type of IP rights assigned to training data: the rights are essentially national, while the race to develop AI services is global. AI programs can be run anywhere there is electricity and access to the Internet. You don’t need a large staff or specialized laboratories. Companies operating in countries that impose expensive or impractical obligations on the acquisition and use of data to train AI will compete against entities that operate in freer environments.

If anyone else thinks like Vladimir Putin about the future of AI, this is food for thought.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.

  • en
您觉得本篇内容如何
评分

相关产品

EN 650 & EN 650.3 观察窗

EN 650.3 version is for use with fluids containing alcohol.

Acromag 966EN 温度信号调节器

这些模块为多达6个输入通道提供了一个独立的以太网接口。多量程输入接收来自各种传感器和设备的信号。高分辨率,低噪音,A/D转换器提供高精度和可靠性。三路隔离进一步提高了系统性能。,两种以太网协议可用。选择Ethernet Modbus TCP\/IP或Ethernet\/IP。,i2o功能仅在6通道以太网Modbus TCP\/IP模块上可用。,功能

雷克兰 EN15F 其他

品牌;雷克兰 型号; EN15F 功能;防化学 名称;防化手套

Honeywell USA CSLA2EN 电流传感器

CSLA系列感应模拟电流传感器集成了SS490系列线性霍尔效应传感器集成电路。该传感元件组装在印刷电路板安装外壳中。这种住房有四种配置。正常安装是用0.375英寸4-40螺钉和方螺母(没有提供)插入外壳或6-20自攻螺钉。所述传感器、磁通收集器和壳体的组合包括所述支架组件。这些传感器是比例测量的。

TMP Pro Distribution C012EN RF 音频麦克风

C012E射频从上到下由实心黄铜制成,非常适合于要求音质的极端环境,具有非常坚固的外壳。内置的幻像电源模块具有完全的射频保护,以防止在800 Mhz-1.2 Ghz频段工作的GSM设备的干扰。极性模式:心形频率响应:50赫兹-18千赫灵敏度:-47dB+\/-3dB@1千赫

ValueTronics DLRO200-EN 毫欧表

"The DLRO200-EN ducter ohmmeter is a dlro from Megger."

评论

您需要登录才可以回复|注册

提交评论

广告

scientific

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

How Next Gen Connectivity Solutions Power Global Mining

提取码
复制提取码
点击跳转至百度网盘