According to news from this site on November 10, OpenAI issued a document announcing that it will cooperate with organizations to generate public/private data sets for training AI models. The data partnership aims to "make more... Organizations can help guide the future of AI" and "benefit from more useful models."
This site learned from the blog that OpenAI stated: “To ultimately make AI safer and benefit all mankind, we hope that AI models can deeply understand all topics, industries, cultures and languages, which needs to be as deep as possible A wide range of training data sets."
As part of the Data Partners Program, OpenAI said it will collect "large-scale" data sets that "reflect human society" and are not currently easily accessible online. While the company plans to work across multiple modalities, including images, audio and video, it's specifically seeking data that "expresses human intent" (such as long-form writing or conversation) across different languages, topics and formats.
OpenAI said it will work with organizations, using a combination of optical character recognition and automatic speech recognition tools, todigitize training data and remove sensitive or personal information if necessary.
OpenAI hopes to create two types of datasets:A public open source dataset that can be used by anyone in training AI models, and a set of private datasets for training proprietary AI models .
OpenAI says the private set is for organizations that want to keep their data private but want OpenAI’s models to better understand their domain; so far, OpenAI has worked with the Icelandic government and Miðeind ehf to improve GPT-4 The ability to speak Icelandic and working with the Free Law Project to improve its model’s understanding of legal documents.The above is the detailed content of OpenAI seeks partners to generate datasets for training AI models. For more information, please follow other related articles on the PHP Chinese website!