Various search engines for finding scientific publications are being equipped with AI functions. The companies promise to be especially careful, as AI-generated misinformation could do great harm, especially in science.
On August 1, Elsevier, a large Dutch scientific publisher, released Scopus AI to a select group of 15,000 beta users. This is a GPT-3.5-based AI search for the existing Scopus database. The final product launch is planned for early 2024.
The promise of what Scopus AI can do is great: The tool is expected to provide researchers with a concise and reliable overview of research topics, including academic references; reduce reading time and the risk of hallucinations; provide easy navigation to further links; and enable natural language queries.
A graphical representation will soon be introduced, allowing users to dive into the relationships between different papers.
Digital Science, the company behind the popular Dimensions database search, is following suit, announcing on August 1 a closed testing period for an AI assistant that uses its own General Science BERT language model as well as OpenAI’s GPT models.
“When a user interacts with Dimensions AI Assistant, it scans 33 million articles from the Dimensions database, retrieves and then semantically ranks the top results using semantic embeddings,” explains head of innovation Martin Schmidt.
Summaries of the top four results are then processed by an OpenAI API to generate abstract summaries. Dimension’s General Science BERT model then provides the answers from the ten most relevant publications.
Clarivate and its Web of Science product have similar plans. Its assistant will be based on a large language model from AI21 Labs, possibly a derivative of Jurassic-1. The idea is to give scientists quick access to detailed and contextual information and answers, as well as personalized recommendations, Clarivate said in a press release.
Risk of AI-generated misinformation
Until now, chatbots have been considered rather unsuitable for scientific research because they can make up information, including sources. These sound plausible, but cannot be found upon further research. For example, the scientific language model Galactica, published by Meta, was taken offline shortly after publication following massive protests from the scientific community.
The aforementioned providers, Elsevier, Digital Science and Clarivate, are therefore understandably cautious about releasing AI features directly to all users. Instead, they will first be tested in a closed beta. It is not clear from the announcement how the database operators plan to limit the spread of AI.
We recognize the challenges inherent in using AI, especially for something as fundamental and impactful as research. The ethical and social implications of this technology cannot be ignored and hence we feel that an ethical and responsible approach is needed, even in the face of rapid innovation and the hype around AI.
We are committed to releasing tools where we either explicitly understand and can state their downsides or where we have a high degree of confidence and trust in their outputs. We believe this approach is crucial so we can empower users to discern when they can trust the AI and when they should exercise caution.
Daniel Hook, CEO of Digital Science
It is a common thesis that AI in the context of search engines can help with information retrieval in addition to content generation. Google, for example, shows great ambition to use generative AI for information retrieval with the Search Generative Experience, which is more generative than search, or Microsoft with the integration of GPT-4 in Bing. Recently, the coding platform Stack Overflow announced Overflow AI, a set of new tools to help developers find very specific pieces of code.