1 Introduction

The increasing computational power and availability of big data have substantially empowered artificial intelligence (AI) in recent years [10]. Thereby, AI – in its simplest sense, defined as “ability of computers or other machines in performing activities that require human intelligence” [1] – has become a transformative power across a wide range of industrial, social, and intellectual contexts [11]. It offers vast opportunities for innovations [4, 5], engenders new solutions to complex problems [39], and enables to address major societal issues, including sustainable development, and environmental sustainability [33, 41, 42]. Particularly, the data-intensive research and applications benefit from AI due to the possibility to process and analyze massive amounts of structured and unstructured (big) data [10, 23]. Big data can be leveraged to foster innovation and corporate social performance as well [6, 16]. Thus, AI and big data in concert provide an unprecedented innovation potential. Chemical research and development (R&D) have derived their insights from data from the very beginning. In light of the massive amounts of (experimental) data on chemical and physical properties, chemical reactions and structures, and biological activities, the development and deployment of methods from computer science – termed artificial intelligence already then – in chemistry dates back to the 1960s [15]. The long-standing history of computational methods to analyze chemical data and their large scale and scope make AI particularly suitable for and prevalent in chemical R&D. That is reflected by the various areas of application such as toxicity prediction [19, 43], drug discovery and design [20, 36, 37], and life-cycle impact assessment [38], in which AI is utilized. We argue that AI in chemical R&D can substantially contribute to environmental sustainability if AI is harnessed in an ethical way.

Sustainability features a central ethical principle and objective related to the development and deployment of AI [14, 21, 41]. Generally, the increasing pervasiveness and impact AI on the individual, economic, and societal level have sparked the debate of ethical principles guiding AI development and use [8, 13, 14, 17, 21, 29, 31]. The rather fragmented AI ethics landscape [21] consists of recurring principles, such as transparency, beneficence, and nonmaleficence that are of high order, normative, and deontological nature [17, 29] and thus require translation into business practice [29, 31]. Hereinafter, we aim at showing that the ethical principle explicability, that is, how AI works (intelligibility) and who is responsible for the way AI works (accountability), in combination with an open research data management system should be the focal factors accompanying AI in R&D to promote sustainability.

2 Explicability

Leveraging AI in (chemical) R&D for sustainability is at the core of the beneficence principle of AI ethics. Ethical frameworks for AI [14] increasingly propagate that AI should promote individual, social, and environmental good and well being (i.e., beneficence), while preventing any risks and harm (i.e., nonmaleficence). Unifying these objectives to achieve a “dual advantage” for society [14] is central to the AI-for-social-good perspective [8, 13, 14, 39]. The other three ethical principles inherent in the AI-for-good-perspective are autonomy (i.e., self-determination, balanced human and AI agency, power to decide), justice (i.e., fairness, avoidance of biases, discrimination, and inequality), and explicability (transparency, intelligibility, accountability) [14].

We attach particular importance to the explicability principle because it features a proethical condition for enabling or impairing judgments of beneficence, non-maleficence, justice, and autonomy [40]. Understanding the functionalities of AI, that is, intelligibility, informs evaluations of the other principles by comprehending if and how AI benefits (beneficence) or harms (nonmaleficence) individuals and society in a fair and unbiased way (justice) and by deciding to delegate decisions to AI systems or not (autonomy) [14]. Intelligibility relates to the epistemological dimension of explicability [14] and can be defined as human understanding of a model’s function without any need for explaining its internal structure or underlying data processing algorithm [3]. Although it is often used interchangeably with concepts such as comprehensibility, interpretability, explainability, and transparency, it is considered as the most appropriate conceptualization [3].

In an AI-driven R&D context, intelligibility is multidimensional, since it concerns internal stakeholders within and external stakeholders outside the research institution or company. There is no one-fits-all approach [20], but the audience is decisive [32]. Researchers directly involved in the R&D process need an in-depth understanding of AI models, predictions, outcomes, and underlying data. Although full intelligibility of complex AI models might be hard to achieve, researchers should know and understand how AI systems reach predictions (i.e., transparency), why model predictions are acceptable, to what extent they provide new, relevant information, and how reliable they are [20]. When deciding between simpler and more complex, black-box models (i.e., modeling stage), researchers should be able to evaluate the respective AI models’ ability to fit data (i.e., predictive accuracy). To foster this evaluation, the number of model parameters can be reduced (i.e., sparsity), the model prediction-making process can be internally simulated and reasoned (i.e., simulatability) or partitioned (i.e., modularity). At the post hoc analysis stage, fitted/trained models are assessed in terms of what and which relationships they have learned from data (i.e., descriptive accuracy), either on the single (local) prediction level or on the (global) dataset level. Therefore, researchers can look at the importance of certain dataset or variable features for model predictions [32].

However, intelligibility cannot be simply transferred from internal to external stakeholders due to proprietary boundaries and intellectual property right restrictions in case of commercial product development [2, 30]. Narrowly related to intelligibility is accountability, the ethical dimension of explicability [14], since judgments about accountability require a certain understanding of the underlying processes of AI systems and applications (i.e., intelligibility) [7, 26, 28, 31]. Accountability can enable a shared responsibility within the organization and towards external stakeholders, which is particularly relevant in the sustainability context. Accountability has two temporal perspectives. In hindsight, responsibility is ascribed when something goes wrong. Moreover and even more importantly, AI systems should be responsibly designed and deployed in a forward-looking way [7], which again requires intelligibility.

In the following, we present a research data management system that acts on an explicability and open science maxim in order to foster collaborative actions in respect to sustainability objectives.

3 Open research data management

The research data management system illustrated in Fig. 1 consists of the AI-driven (chemical) R&D process which is fed with relevant multifaceted experimental and secondary data. The R&D process ranges from the definition of the required properties, the AI-based molecular design and in silico characterization of the relevant properties to the AI-based ranking of the most promising substances and product candidates. The latter provides the basis for follow-up laboratory research, and the corresponding research findings in turn feed the experimental data bases. Thus, a data life cycle with substantial epistemic and methodological advantages can emerge. That is, a broader and permanently refined data bases allow more comprehensive model training, which eventually increases accuracy and validity of AI model predictions. As AI predictions are as accurate and unbiased as the underlying data, consistent, unbiased and curated data sets are a prerequisite for AI-driven R&D to take full effect [9, 36]. Biases can arise from misleading proxy features [3, 34], sparse (small) data [9, 34], or even researchers themselves due to personal preferences and biases and (chemical) education [3, 36]. A curated data life cycle can counter these biases.

Fig. 1
figure 1

Research data management system

Moreover, sophisticated data bases and self-learning AI models facilitate to simultaneously investigate a greater amount and variety of research questions as compared to resource-intensive laboratory research that often uses rather inefficient one-parameter-at-a-time methods [36]. That acquires particular importance in light of the complex mixture of thousands of chemicals the environment and humans are exposed to from multiple sources, their interactions, interdependencies, and externalities, and eventually their related risks and adverse effects for humans and the environment [12, 22]. AI-driven R&D processes have the potential to define, study, and detect substances, product candidates, and their properties that are best-suited for areas of applications and surrounding circumstances and environmental factors. Thereby, environmental and social good (beneficence) is fostered, while adverse environmental and societal effects are limited (nonmaleficence), that is, sustainability is pursued.

Investigating a multitude substances and product candidates further accounts for the justice principle, which should also guide environmental sustainability [45]. The justice principle entails fairness and the prevention of unwanted/unfair biases [21, 31], sharing benefits and prosperity and fostering solidarity [14]. In the AI-driven R&D context, justice implies that the research outcomes (e.g., substances or products), but also scientific data should equally benefit all stakeholders affected, within and across countries and regions. Given that developing countries are at the higher risks of adverse effects of climate change [18] and detrimental impacts by chemical pollutants [12], justice becomes particular important for R&D in the sustainability context. Therefore, countries’ environmental and social circumstances and idiosyncrasies should be taken into consideration—an objective that is better pursuable and achievable by means of AI-driven R&D as compared to conventional laboratory research.

Again, intelligible AI methods and effective research data management that transparently document final and preliminary research findings for follow-up research as well as for subsequent life-cycle assessments, registration, and approval processes (see Fig. 1) can be cornerstones for equitable and collaborative sustainability efforts. Eventually, AI methodologies and respective research findings on behalf of (global) the environmental sustainability should be shared with other researchers and/or made entirely publicly available. That is, open data/science approaches and policies [27, 35] should gain center stage. To facilitate global data utilization, research data management systems should follow the FAIR principles, that is, Findable, Accessible, Interoperable, Reusable [44]. Of course, such a transparency and open science policy is an ideal that is limited by proprietary and professional boundaries [2, 30], (perceived) incentives for researchers [25], and governance and technical requirements [27]. However, explicability of the R&D process and data accessibility seem imperative to collaboratively and substantially increase the scale in endeavors to purse sustainability.

4 Conclusion

AI combined with the big data offer unprecedented innovation potential. Backing (chemical) R&D with both provides substantial opportunities to investigate and identify solutions that underpin and accelerate sustainability, that is, AI can be a game changer [24]. The global inequality of environmental pollution and negative side effects require world spanning and collaborative sustainability efforts. Therefore, innovative AI-driven R&D processes and the respective research findings and knowledge should not be exclusively intelligible and accessible by a privileged minority or elite. Since pursuing sustainability is also a highly ethical objective—generally [45] and in respect to the development and deployment of AI [14]—AI-driven R&D for sustainability should be guided by explicability and open data/science policies. Eventually, not only the environment constitutes a social good, but also methods and insights aiming at protecting and sustaining it.