In a briefing on Monday, research leaders across tech, academia, and the government joined the White House to announce an open dataset full of scientific literature on the novel coronavirus. The COVID-19 Open Research Dataset, known as CORD-19, will also add relevant new research moving forward, compiling it into one centralized hub. The new dataset is machine readable, making it easily parsed for machine learning purposes—a key advantage according to researchers involved in the ambitious project.
In a press conference, U.S. CTO Michael Kratsios called the new dataset the “most extensive collection of machine readable coronavirus literature to date.” Kratsios characterized the project as a “call to action” for the AI community, which can employ machine learning techniques to surface unique insights in the body of data. To come up with guidance for researchers combing through the data, the National Academies of Sciences, Engineering, and Medicine collaborated with the World Health Organization to come up with “high priority” questions about the coronavirus related to genetics, incubation, treatment, symptoms and prevention.
The partnership, announced today by the White House Office of Science and Technology Policy, brings together the Chan Zuckerberg Initiative, Microsoft Research, the Allen Institute for Artificial Intelligence, the National Institutes of Health’s National Library of Medicine, Georgetown University’s Center for Security and Emerging Technology, Cold Spring Harbor Laboratory and the Kaggle AI platform, owned by Google.
The database brings together nearly 30,000 scientific articles about the virus known as SARS-CoV-2 as well as related viruses in the broader coronavirus group. Around half of those articles make the full text available. Critically, the database will include pre-publication research from resources like medRxiv and bioRxiv, open access archives for pre-print health sciences and biology research.
“Sharing vital information across scientific and medical communities is key to accelerating our ability to respond to the coronavirus pandemic,” said Chan Zuckerberg Initiative Head of Science, Cori Bargmann said of the project.
The Chan Zuckerberg Initiative hopes that the global machine learning community will be able to help the science community connect the dots on some of the some enduring mysteries about the novel coronavirus as scientists pursue knowledge around prevention, treatment and a vaccine.
For updates to the CORD-19 dataset, the Chan Zuckerberg Initiative will track new research on a dedicated page on Meta, the research search engine the organization acquired in 2017.
The CORD-19 dataset announcement is certain to roll out more smoothly than the White House’s last attempt at a coronavirus-related partnership with the tech industry. The White House came under criticism last week for President Trump’s announcement that Google would build a dedicated website for COVID-19 screening. In fact, the site was in development by Verily, Alphabet’s life science research group, and intended to serve California residents, beginning with San Mateo and Santa Clara County. (Alphabet is the parent company of Google.)
Read more: https://techcrunch.com/2020/03/16/coronavirus-machine-learning-cord-19-chan-zuckerberg-ostp/