Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer (2024)

1^st Ahmed Abdeen HamedDept. of SSIE
Binghamton University
and
iMol Polish Academy of Sciences
Warsaw, Poland
ahamed1@binghamton.edu* 2^nd Tamer E. FandyDept. of Medical Education
Texas Tech University HSC
El Paso, United States
Tamer.fandy@ttuhsc.edu

Abstract

The objective of this research is to introduce a network specialized in predicting drugs that can be repurposed by investigating real-world evidence sources, such as clinical trials and biomedical literature. Specifically, it aims to generate drug combination therapies for complex diseases (e.g., cancer, Alzheimer’s). We present a multilayered network medicine approach, empowered by a highly configured ChatGPT prompt engineering system, which is constructed on the fly to extract drug mentions in clinical trials. Additionally, we introduce a novel algorithm that connects real-world evidence with disease-specific signaling pathways (e.g., KEGG database). This sheds light on the repurposability of drugs if they are found to bind with one or more protein constituents of a signaling pathway. To demonstrate, we instantiated the framework for breast cancer and found that, out of 46 breast cancer signaling pathways, the framework identified 38 pathways that were covered by at least two drugs. This evidence signals the potential for combining those drugs. Specifically, the most covered signaling pathway, ID hsa:2064, was covered by 108 drugs, some of which can be combined. Conversely, the signaling pathway ID hsa:1499 was covered by only two drugs, indicating a significant gap for further research. Our network medicine framework, empowered by GenAI, shows promise in identifying drug combinations with a high degree of specificity, knowing the exact signaling pathways and proteins that serve as targets. It is noteworthy that ChatGPT successfully accelerated the process of identifying drug mentions in clinical trials, though further investigations are required to determine the relationships among the drug mentions.

Index Terms:

Network Medicine, Drug Repurposing, Generative AI, LLMs, Multilayered Network, Signaling Pathways

I Introduction

I-A Highlights

1.
Presenting a multilayered network medicine approach for complex diseases, empowered by Generative AI, that produces drug repurposing combinations from real-world evidence.
2.
Introducing a highly configurable learning algorithm that harnesses ChatGPT prompt engineering, executed on the fly to analyze each data item (clinical trials) with the help of a few-shot examples.
3.
Demonstrating the full potential of the algorithm in a breast cancer case study, showing how it produces combinations with a high degree of specificity in terms of drug targets.

I-B Literature Review

Network Medicine, a well-established research area, has made significant contributions to understanding various human diseases, biology, drug treatments, and biological targets[1, 2]. In recent years, such networks have proven to be reliable platforms for drug repurposing, especially for complex diseases[3]. This was particularly evident during the Coronavirus pandemic, when various research efforts leveraged network medicine to develop viable treatments in response to the global crisis[4, 5, 6, 7, 8, 9, 10].

With the emergence of Generative AI (GenAI) and Large Language Models (LLMs), new opportunities driven by ChatGPT[11] are on the rise[12, 13, 14, 15, 16, 17]. In this work, we focus on multilayered network medicine and present some of the latest and most relevant research related to our approach.

Rocha et al. presented a multilayered personalized health platform that integrates various heterogeneous resources to provide insights for epilepsy management[18]. Their approach integrated data from electronic health records, social media platforms, and biomedical databases.

Kim et al. developed a multi-layered knowledge graph and employed a neural network approach to predict the recurrence risk of a hormone receptor related to breast cancer[19].

Xie et al. reviewed how heterogeneous multilayered knowledge graphs are used to integrate and analyze omics data produced from gene sequencing and high-throughput techniques[20].

McLean reviewed the use of multi-modal knowledge graphs as tools for drug discovery, particularly in the context of COVID-19[21].

Kim et al. presented a semantic multilayered knowledge graph for drug repurposing. They utilized the ”guilt-by-association” principle to recommend drug candidates for given diseases in the network, employing semantically guided random walks to achieve the desired results[22].

While multilayered knowledge graphs are powerful, integrating heterogeneous datasets and constructing linked graph layers are complex and challenging processes. In this paper, we aim to accelerate these processes to enable our network medicine platform to address drug repurposing questions, specifically demonstrating capabilities for breast cancer. We also explore how ChatGPT, as a GenAI tool, was used via prompt engineering to perform similar tasks.
Polak and Dane introduced a tool called ChatExtract, which performs information extraction from research papers using a set of engineered prompts. These prompts guide conversations, extract data, and enforce correctness through follow-up questions. The authors claimed that their approach ensured outstanding performance and accurate results[23].
Wang et al. examined the reliability of prompt engineering using different styles for five tasks related to the American Academy of Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Their study revealed that carefully engineered prompts could improve the accuracy of responses to professional medical questions and the information extracted[24].
Snyder et al. applied LLM prompt engineering using a few-shot learning technique for various drug discovery applications. They compared this approach with classical machine learning techniques and concluded that while classical methods are best for large and diverse datasets, LLMs can provide comparable results when analyzing small and hom*ogeneous datasets[25].

With this introduction, we propose a multilayered network medicine approach, accelerated by ChatGPT few-shot prompt engineering, that streamlines data processing and network layer construction without compromising the accuracy of results.

II Data

This research used three different type of resources:

1.
Clinical Trials as a Source of Real-World Evidence: We searched for drug combinations using the query ”drug combination” and acquired approximately 2,450 trials that mentioned treatments involving more than one drug. The trials were extracted in JSON format, with each trial stored in an independent file.
2.
KEGG Breast Cancer Signaling Pathways as Potential Targets: The signaling pathways serve two purposes: first, providing a list of relevant biological targets; second, identifying drugs that may bind with proteins in the same signaling pathway to offer combination insights.
3.
BioMedical Publications Retrieved Using PubMed APIs: To identify drug-target connections, we used the eSearch command-line utility to execute the query ”breast cancer.” This resulted in acquiring 440,000 MEDLINE records containing the PubMed ID, title, and abstract.

III Methods

The challenge of identifying drug combination opportunities for complex diseases in general and breast cancer in particular requires harvesting knowledge from multiple sources. This calls for a multifaceted framework, centered around layered network medicine, to address various tasks:

1.
Drug Combos Layer: Using manually annotated clinical trial descriptions, we engineered a ChatGPT model with few-shot learning to recognize drugs and identify keywords and symbols that indicate the combination of two or more drugs in a trial. Examples such as “combined with,” “in addition to,” and ”+” were included in the training examples. The results were pruned using the drug branch in the ChEBI ontology [26].
2.
Drug-Target Layer: The drug combinations identified in the previous layer serve as the source elements of the graph, while the targets are elements in the breast cancer KEGG signaling pathways. The links between these entities are captured based on the proximity distance in biomedical literature abstracts. Proximity is also commonly used in feature extraction from text [27], which is foundational in network medicine [28, 18, 29].
3.
Biological Target Signaling Pathways The success of this work hinges on recommending viable drug combinations. Therefore, we introduce an algorithmic approach that explores the various layers and produces recommendations with a high degree of specificity, and may be ready to be tested using in-vitro and clinical trials.

III-A Drug Combinations Layer

The description section of clinical trials is a rich source of knowledge and presents various clinical opportunities. By mining these resources, we can gain useful insights (e.g., the medical condition, treatment, indication, protocol followed, etc). To start, we extracted the description field from the JSON representation of the NCT records. We then hand-picked a few representative examples from trials that contained various drugs for different conditions, where the trials also investigated potential drug combinations. Each selected example signaled the use of combinations in a unique way. For instance, while one trial explicitly mentioned a drug to be combined using expressions like combined with,” co-administered with,” and “in addition to,” other trials used symbols such as “+” and “PLUS.”

To enable ChatGPT to recognize the drugs and combination signaling keywords, we manually annotated seven distinct examples as few-shot learning prompts for the prompt engineering algorithm. Specifically, we used different notations to distinguish drug mentions from combination signaling keywords. Drug mentions were annotated using the angle brackets “ $\langle$ ” and “ $\rangle$ ”, while combination keywords were annotated using square brackets “[” and “]”. Below are some concrete examples we used to train ChatGPT, illustrating the annotation method (refer to III-A2).

As for the actual prompt itself, it contained three types of information: (1) a system role, where the content instructed ChatGPT to adopt the role of a subject matter expert capable of recognizing drugs and combinations from clinical trials; (2) a user role, where the user instructs ChatGPT to identify drugs and combinations from unseen clinical trial descriptions by learning from the seven manually annotated examples provided; (3) an assistant role, which provides new but unannotated clinical trial descriptions. The prompt required the return of drugs and their combinations from each trial when found. It also instructed the ChatGPT engine that each drug combination to be written in a single line for easier post-processing. This was demonstrated to ChatGPT using two examples which we also injected as part of the prompt. Therefore, the prompt engineering algorithm expects various user inputs to be used in order: (a) the list of annotated examples used as few-shots for ChatGPT to learn from, (b) one clinical trial description at a time to be analyzed, (c) the ID of the clinical trial being processed, and (d) training output examples for ChatGPT to produce the output accordingly. These steps are described in detail in Algorithm 1.

1:Input: $nct\_dataset$ dataset

2:Input: [shot_0, …, shot_6] example annotations

3:Input: [comb_ex1, comb_ex2] example output

4:Output: response (ChatGPT response with detected drug combinations)

5:foreach $(nct\_desc)$ in $(nct\_dataset)$ do

6:prompt $\leftarrow$

7: """

8:You are a specialized drug annotator, detect drugs if annotated/marked using < and > and relationship if it is annotated/marked using [ and ]

10:Your task is to learn from the following few shots:

III-B Drug-Target Layer

The description sections of clinical trials are rich with drug names and their combinations. However, they often lack information about drug targets, which is a key indication for a drug to be repurposed. Drug-target interaction is essential in drug repurposing [30]. Therefore, it is crucial to explore the signaling pathways between drugs and targets to predict such repurposeable targets [31]. Since biomedical literature is rich in such knowledge [32], we employ a text-mining approach to identify drug names, drug targets, and possible interactions. Specifically, we use the Chemical Entities of Biological Interest (ChEBI) to identify drug names [26] and the proteins that constitute a cancer-type module in the KEGG signaling pathways database (e.g., breast cancer) to identify protein mentions in the abstracts [33]. While the cancer-disease pathways play a significant role in predicting repurposable combinations, we utilize them individually as independent entities. The following tasks are involved in constructing this layer:

1.
Abstract Text Processing: This includes sentence and word tokenization, and case normalization. For this task, we used the word_tokenize and sent_tokenize modules from the “nltk.tokenize” Python package.
2.
ChEBI Ontology for Drug Search: The identification of drugs requires a dictionary look-up mechanism where PubMed abstract words must be an identical match with one of the given terms.
3.
Cancer Diseases KEGG Signaling Pathways Pre-Processing: The breast cancer pathways branches are XML resources that require parsing. Starting from the root, we traverse the XML tree to search for elements of type “Gene.” We extract two bits of information: (1) the “name” of the branch, which has a unique ID, and (2) a set of proteins, known as “graphics,” which make up each pathway as part of the cancer module. The total number of unique targets found associated with breast cancer is 383 proteins. Figure 1 shows the various signaling pathways as part of the Breast Cancer KEGG module
4.
Keeping Track of Drug-Target Proximity: This task requires measuring the proximity of a drug to proteins that are part of the pathways extracted in the step above. By capturing the index of each term, we can measure the distance in token units.
5.
Constructing the Drug-Target Layer: Using the proximity as a condition to satisfy, we capture drug-target pairs that are found within the given proximity. They contribute a node of their own type (drug and protein) and a link of type proximity to the network layer. The network construction task is formally described in Algorithm 2, and is implemented using the using the “networkx” python package [34].

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer (1)

1:Input: $\mathcal{C}$ : CHEBI drug list

2:Input: $\mathcal{P}$ : List of breast cancer proteins

3:Output: $\mathcal{G}$ : Graph of drug-target pairs

4:Initialize $\mathcal{G}\leftarrow(\mathcal{V},\mathcal{E})$ $\triangleright$ Graph with nodes $\mathcal{V}$ and edges $\mathcal{E}$

5: $\mathcal{V}\leftarrow\emptyset$ $\triangleright$ Initialize set of vertices

6: $\mathcal{E}\leftarrow\emptyset$ $\triangleright$ Initialize set of edges

7:foreach $(\delta,\alpha)$ in PubMed datasetdo

8: $\tau\leftarrow\text{tokenize}(\alpha)$ $\triangleright$ Tokenize abstract $\alpha$

9: $\text{doc\_edges}\leftarrow[]$ $\triangleright$ Initialize document edge list

10:foreach $d\in\mathcal{C}$ do

11:foreach $p\in\mathcal{P}$ do

12:if $d\in\tau$ and $p\in\tau$ then

13: $\pi\leftarrow\text{calc\_distance}(d,p)$ $\triangleright$ Calculate proximity $\pi$

14: $\epsilon\leftarrow(d,p)$ $\triangleright$ Create drug-protein link

15: $\lambda\leftarrow(\delta,\epsilon,\pi)$ $\triangleright$ Create a proximity edge

16: $\text{doc\_edges}\leftarrow\text{doc\_edges}\cup\{\lambda\}$

17: $\mathcal{V}\leftarrow\mathcal{V}\cup\{d,p\}$ $\triangleright$ Add $d$ and $p$ to vertices

18: $\mathcal{E}\leftarrow\mathcal{E}\cup\{\epsilon\}$ $\triangleright$ Add $\epsilon$ to edges

19:endif

20:endfor

21:endfor

22:endfor

23: $\mathcal{G}\leftarrow(\mathcal{V},\mathcal{E})$ $\triangleright$ Finalize graph $\mathcal{G}$

24:return $\mathcal{G}$

III-C Biological Targets Pathways Layer

Upon successfully constructing a drug combination layer from clinical trials and a drug-target layer from biomedical literature, we require additional information to determine if a drug-target link is cancer-related. This crucial information can be derived from the breast cancer KEGG pathways, conditioned on identifying a target present in the drug-target layer that also corresponds to one of the cancer biological pathways extracted from the KEGG database. The algorithmic process consists of three tasks:

1.
Starting with the drug-target layer, examine the target nodes of each edge.
2.
Verify whether the target exists in a cancer pathway. If two or more different drugs are found connected with different proteins of the the same pathway, store them as potential candidates.
3.
Search the drug combination layer to determine if the drug candidates identified in step 2 also form a combination in clinical trials. If they do, store them as evidence for repurposing; otherwise, store them as potential combinations pending further investigation.

1:Input: $\mathcal{L}_{1}$ : Drug-target network layer

2:Input: $\mathcal{L}_{2}$ : Drug combination network layer

3:Input: $\mathcal{C}$ : Cancer disease pathway layer

4:Output: $\mathcal{R}$ : Drug combinations evidence graph

5:Initialize $\Gamma\leftarrow\emptyset$ $\triangleright$ Temporary graph storage

6:Initialize $\mathcal{R}\leftarrow\emptyset$ $\triangleright$ Result graph

7:for all $e\in\mathcal{L}_{1}$ do

8:for all $c\in\mathcal{L}_{2}$ do

9:for all $d\in e$ do

10:if $d\in\mathcal{L}_{2}$ then

11: $\Gamma\leftarrow\Gamma\cup\{(d,\tau,\nu)\}$ $\triangleright$ $\nu$ : Clinical trial ID, $\tau$ : Target

12:else

13: $\Gamma\leftarrow\Gamma\cup\{(d,\tau)\}$

14:endif

15:endfor

16:endfor

17:endfor

18:for all $e\in\Gamma$ do

19:for all $\pi\in\mathcal{C}$ do

20:if $\tau\in\pi$ then

21:if $e$ has $\nu$ then

22: $\mathcal{R}\leftarrow\mathcal{R}\cup\{(d,\tau,\pi_{\text{id}},\nu)\}$ $\triangleright$ $\pi_{\text{id}}$ : Pathway ID

23:else

24: $\mathcal{R}\leftarrow\mathcal{R}\cup\{(d,\tau,\pi_{\text{id}})\}$

25:endif

26:endif

27:endfor

28:endfor

29:return $\mathcal{R}$

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer (2)

IV Results

IV-A The Prompt-Engineering Outcome

The few-shot prompt engineering against 2496 clinical trials resulted in 3281 drugs and 4502 “co-occurrence” relationship among each pair. We found that 2754 of the co-occurrence relationships were semantically encoded as “combination therapy”, which signaled that the drugs that make up the two ends of the link are suggested as a combination. Since we aim to identify combinations from FDA-approved drugs, we validated drug members of the combinations using the OpenFDA database. The outcome of such validation confirmed that 2680 nodes identified by ChatGPT as recognizable, while the remaining 601 nodes were terms that are either not FDA-approved drugs or simply noise.

From the terms identified by ChatGPT as FDA-approved and recognizable (e.g., ’Oral anti-diabetics’, ’Antiretroviral Therapy’, ’Psychotherapy’, ’Chemotherapy treatment’, ’HIV medications’), it’s clear these are descriptive terms for drugs, but not specific for considering drug combinations. However, a small number of identified drugs, despite being part of FDA-approved terms, turned out to be false positives. ChatGPT mistakenly identified terms such as (’HIV infection’, ’Merkel cell cancer’, ’Prostate Cancer’) as drugs.

Conversely, many terms identified by ChatGPT that were not recognized by OpenFDA were indeed drugs. This discrepancy arises because OpenFDA lists only drugs that are already approved. Drugs identified by ChatGPT that follow a pattern like manufacturing initials followed by digits (e.g., AZD6738 by AstraZeneca, BMS-791325 by Bristol-Myers Squibb, JNJ-42847922 by Johnson and Johnson, TAK-491 by Takeda) are typically still under investigation and have not yet been approved for production.

Since the outcome of the prompt-engineering step forms a foundational block for subsequent steps, our focus is on removing false positives and confirming true negatives. The consideration of drugs under investigation requires careful inclusion. Table I show the general outcome of the FDA-approved.

#Nodes	# Edges	#Validated Nodes	# Validated Edges
3281	4502	2680	2754

IV-B The Outcome of Generating of Drug-Target Combinations Algorithm

The discovery process executed by Algorithm 3yielded 18,293 unique PubMed documents, covering drug-target relationships across 36-46 breast cancer signaling pathways. This coverage was explored using 5 different proximity values (10, 20, 30, 40, and 50 tokens) as they appeared in the original PubMed abstracts. The process captured each signaling pathway along with the proteins comprising it, drugs associated with those proteins as potential targets, and the proximity between each drug and target in the corresponding PubMed abstracts.

A common pattern observed was that shorter proximity values correlated with fewer occurrences between a drug and a protein within a given signaling pathway, and fewer supporting PubMed abstracts for such connections. Contrary to expectation, not all cases showed an increase in the number of proteins, as seen in pathway ID hsa:2353.

Table II presents 20 signaling pathways grouped by proximity. Each group includes the number of supporting PubMed abstracts, the count of drugs, and the mentions of proteins as potential targets. The table highlights pathways such as signaling pathway ID hsa:10000, supported by numerous drugs but covering only one protein.

Pathway	Prox-10			Prox-30			Prox-50
Pathway	No. PubMed	No. Drugs	No. Proteins	No. PubMed	No. Drugs	No. Proteins	No. PubMed	No. Drugs	No. Proteins
hsa:2099	33	25	4	79	59	4	98	69	4
hsa:1956	27	20	4	49	37	5	62	42	6
hsa:8202	1	1	1	9	6	4	17	11	6
hsa:5604	5	5	2	6	6	2	9	9	2
hsa:5594	10	9	4	37	26	5	49	35	6
hsa:182	2	2	1	6	8	2	9	12	2
hsa:4851	3	3	1	5	5	1	8	8	1
hsa:2475	17	17	1	25	24	2	27	31	2
hsa:3479	10	5	3	16	11	3	24	16	3
hsa:3480	1	1	1	6	5	2	10	8	2
hsa:2064	48	22	7	86	39	8	113	51	8
hsa:2353	8	7	3	19	14	3	24	15	3
hsa:5241	14	12	2	57	42	2	66	50	2
hsa:5290	16	11	3	35	28	4	47	33	4
hsa:10000	1	1	1	2	2	1	4	5	1
hsa:5728	6	6	3	17	16	3	30	28	3
hsa:595	2	2	1	9	10	1	14	14	2
hsa:4609	11	10	2	29	23	2	42	34	2
hsa:1019	3	3	1	8	8	1	12	13	1
hsa:5925	12	9	3	28	22	3	34	26	3

Taking a closer look to the summary of the statistics of the covered drugs and their potential target. On the one hand, the analysis shows that the means of the drug frequencies increased as the distance also increased (7, 11, 14, 17, 18), while the max number of drug found, also exhibited the same behavior, ranging from (25, 43, 59, 65, 69). On the other hand, the means of protein targets frequencies were much lower in value than drug, which ranged in values (1.9, 2.1, 2.29, 2.4, 2.4) for each of the proximity respectively. While the max frequencies were much lower than the counterparts of drugs (7, 8, 8, 8, 8). Overall, the ratios between drug frequencies vs protein frequencies in terms of mean and max are (0.27, 0.19, 0.16, 0.14, 0.13) and (0.28, 0.19, 0.14, 0.12, 0.12) respectively. Figure 3 is a heatmaps that shows the summary of this drug protein frequencies for each of the five proximity parameters.

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer (3)

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer (4)

As for the PubMed supporting evidence for the drug-target pairs, we found that the means of supporting number of documents are (9, 14, 17, 20, 23) while the max (48, 72, 86, 98, 113). Here, we observe that the Frequencies of PubMed supporting documents increase as the proximity distance also increases. By measuring the mean-to-maximum ratio which produces a ratio of (0.18, 0.19, 0.19, 0.20, 0.20) which indicate that means increased higher that the max as the distance increased. This is explained the PubMed dataset has a finite number of documents that support drug-targets at some point reaches its maximum and the distance becomes irrelevant.

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer (5)

The drug combination layer that was previously produced by the ChatGPT prompt-engineering process produced a graph of 3281 nodes and 4502 edges. The breast cancer pathways is comprised of 46 pathways which were supported by 1178 drugs where at least 2 drugs supported each pathway. The execution of Algorithm 2 resulting in confirming that 980 drugs discovered from the literature had a target on the breast cancer 46 pathways, while 198 drugs had no association with any protein targets of any of the pathways. Table III captures the summary of each pathway and how many drugs are supporting each path. While hsa:2064 and hsa:2099 have the most coverage from drugs (108, and 91), pathways such as hsa:6654 and hsa:1499 has the fewest number of drugs as coverage from drugs (3 and 2). It is our intuition such the prediction of drugs combinations for the pathways that has low coverage would be most valuable as it seems that little is known about such pathways.

Path ID	Drug Count	Path ID	Drug Count
hsa:2099	91	hsa:4609	37
hsa:1956	68	hsa:1499	2
hsa:8202	10	hsa:1019	10
hsa:3265	3	hsa:5925	32
hsa:5604	10	hsa:1869	8
hsa:5594	43	hsa:3815	38
hsa:2885	5	hsa:1026	42
hsa:6654	3	hsa:2260	4
hsa:182	11	hsa:8600	9
hsa:4851	5	hsa:1950	16
hsa:2475	29	hsa:672	34
hsa:6198	4	hsa:675	26
hsa:3479	25	hsa:2246	3
hsa:3480	7	hsa:2324	14
hsa:2064	108	hsa:7157	64
hsa:2353	19	hsa:581	26
hsa:5241	66	hsa:578	13
hsa:5290	45	hsa:10000	4
hsa:5728	28	hsa:595	12

V Discussion

This paper introduces a novel Generative AI approach to prompt engineering, leveraging ChatGPT with a few-shot training set. While the final results demonstrated high accuracy, achieving satisfactory prompts required multiple iterations. The primary goal was to extract drug names and identify their potential combinations as mentioned in clinical trials. The prompt was dynamically constructed using configuration parameters: (1) the actual clinical trial descriptions, (2) examples teaching ChatGPT drug names and how combinations are indicated (e.g., by symbols like ”+”), and (3) instructions for ChatGPT to format responses in a specific manner — specifically, pairing NCT_ID with two drugs per line.

Despite learning from the provided examples, ChatGPT occasionally deviated from instructions by listing more than two drugs on a line. We also experimented with structured formats like XML and JSON, but encountered challenges with producing valid outputs consistently. Interestingly, ChatGPT successfully recognized drug names and categories, even when they were not explicitly concrete drug names.

In extracting drug-target links from biomedical abstracts, each pair was encoded with proximity distance measured in tokens. Proximity proved effective in filtering noise by reducing unnecessary candidates while enhancing coverage when information about signaling pathways, proteins, and drug targets was limited. The adjustability of proximity is crucial, as closer distances typically indicate higher relevance.

The study focused on utilizing KEGG breast cancer signaling pathways as a case study for our framework to recommend drug opportunities. However, while some signaling pathways were well-covered with over 100 candidates, others had minimal coverage, with as few as one candidate. These were not included in Table III as our framework specializes in generating drug combinations. It is important to further investigate the significance of these proteins as potential targets and their roles in cancer progression. Future enhancements may involve integrating data from other databases such as the STRING database [35] and the IntAct database [36] to explore new protein-protein interactions targeted by existing drugs, potentially offering novel treatment strategies.

VI Conclusion and Future Directions

In this paper, we introduced a multilayered network medicine framework empowered by Generative AI to expedite drug development and discovery. Our framework specializes in predicting potentially repurposable drug combinations for complex diseases in general, and we demonstrated a case study for breast cancer therapy. Utilizing literature-based drug-target co-occurrences, we algorithmically explored and identified drugs targeting multiple proteins across the 46 KEGG signaling pathways associated with breast cancer, indicating potential combinations.

This research pushed the boundaries of Generative AI by leveraging ChatGPT for prompt engineering against real-world evidence sourced from clinical trials. We developed a parameterized system capable of dynamically analyzing clinical trial data to generate prompt instances, enriched with learning from a few-shot training approach. While ChatGPT successfully recognized all drugs mentioned in clinical trials and literature, it faced challenges in accurately identifying drug combinations as instructed by the prompt. We mitigated this issue by programmatically constructing the combinations.

Looking forward, our framework aims to expand its scope to investigate drug repurposing across various cancer types. Initially, we plan to focus on prostate cancer and lung cancer, integrating relevant literature and KEGG signaling pathways specific to these diseases. Additionally, ongoing efforts will involve refining ChatGPT prompts to enhance prediction accuracy of drug combination relationships from diverse sources.

To ensure safe and effective combination therapies with minimal adverse effects, we intend to conduct further validation of drug similarities. This process will prevent overly similar drugs from being combined, thereby reducing potential toxicity and side effects.

Ultimately, the final product of this research will be the deployment of our findings in a ChatGPT-based interactive service. This service will enable users to inquire about predicted drugs for repurposing, signaling pathway support for target proteins, and other relevant data pertaining to drug combinations. To ensure traceability, explainability, and trust, to follow some of most informing policies [37], on how combinations were determined, we document the source of each combination from clinical trials and PubMed abstracts, specifying the proteins within specific signaling pathways that the drugs are likely to bind with.

VII Acknowledgments

This research is not yet funded and this work is done as foundational for future funding. The authors would like to thank Dr. Pawel Mikulski of iMol PAS and Dr. Sachin Kote of Gdansk University for the valuable discussions.

References

[1]BradleyA Maron, Lucia Altucci, Jean-Luc Balligand, Jan Baumbach, PeterFerdinandy, Sebastiano Filetti, Paolo Parini, Enrico Petrillo, EdwinKSilverman, Albert-László Barabási, etal.A global network for network medicine.NPJ systems biology and applications, 6(1):29, 2020.
[2]Albert-László Barabási, Natali Gulbahce, and Joseph Loscalzo.Network medicine: a network-based approach to human disease.Nature reviews genetics, 12(1):56–68, 2011.
[3]Sepideh Sadegh, James Skelton, Elisa Anastasi, Judith Bernett, DavidBBlumenthal, Gihanna Galindez, Marisol Salgado-Albarrán, Olga Lazareva,Keith Flanagan, Simon co*ckell, etal.Network medicine for disease module identification and drugrepurposing with the nedrex platform.Nature Communications, 12(1):6848, 2021.
[4]Sepideh Sadegh, Julian Matschinske, DavidB Blumenthal, Gihanna Galindez, TimKacprowski, Markus List, Reza Nasirigerdeh, Mhaned Oubounyt, AndreasPichlmair, TimDaniel Rose, etal.Exploring the sars-cov-2 virus-host-drug interactome for drugrepurposing.Nature communications, 11(1):3518, 2020.
[5]AhmedAbdeen Hamed, TamerE Fandy, KarolinaL Tkaczuk, Karin Verspoor, andByungSuk Lee.Covid-19 drug repurposing: A network-based framework for exploringbiomedical literature and clinical trials for possible treatments.Pharmaceutics, 14(3):567, 2022.
[6]AhmedAbdeen Hamed, Jakub Jonczyk, MohammadZaiyan Alam, Ewa Deelman, andByungSuk Lee.Mining literature-based knowledge graph for predicting combinationtherapeutics: A covid-19 use case.In 2022 IEEE International Conference on Knowledge Graph(ICKG), pages 79–86. IEEE, 2022.
[7]Deisy MorselliGysi, Ítalo DoValle, Marinka Zitnik, Asher Ameli, Xiao Gan,Onur Varol, SusanDina Ghiassian, JJPatten, RobertA Davey, Joseph Loscalzo,etal.Network medicine framework for identifying drug-repurposingopportunities for covid-19.Proceedings of the National Academy of Sciences,118(19):e2025581118, 2021.
[8]LyndseyElaine Gates and AhmedAbdeen Hamed.The anatomy of the sars-cov-2 biomedical literature: Introducing thecovidx network algorithm for drug repurposing recommendation.Journal of medical Internet research, 22(8):e21169, 2020.
[9]Yadi Zhou, Yuan Hou, Jiayu Shen, Reena Mehra, Asha Kallianpur, DanielA Culver,MichaelaU Gack, Samar Farha, Joe Zein, Suzy Comhair, etal.A network medicine approach to investigation and population-basedvalidation of disease manifestations and drug repurposing for covid-19.PLoS biology, 18(11):e3000970, 2020.
[10]Suzana deSiqueiraSantos, Mateo Torres, Diego Galeano, María delMarSánchez, Luca Cernuzzi, and Alberto Paccanaro.Machine learning and network medicine approaches for drugrepositioning for covid-19.Patterns, 3(1), 2022.
[11]Chatgpt.Online: https://chat.openai.com, 2023.Accessed August 15, 2023.
[12]Rui Xu and Zhong Wang.Chatgpt in healthcare from the perspective of digital media:Applications, opportunities and challenges.Heliyon, 2024.
[13]DanissaV Rodriguez, Katharine Lawrence, Javier Gonzalez, BeatrixBrandfield-Harvey, Lynn Xu, Sumaiya Tasneem, DefneL Levine, and Devin Mann.Leveraging generative ai tools to support the development of digitalsolutions in health care research: case study.JMIR Human Factors, 11(1):e52885, 2024.
[14]Neil Savage.Drug discovery companies are customizing chatgpt: here’s how.Nat Biotechnol, 41(5):585–586, 2023.
[15]H.Chen, S.Zhang, and L.etal. Zhang.Multi role chatgpt framework for transforming medical data analysis.Scientific Reports, 14, 2024.
[16]Ewen Callaway.‘chatgpt for crispr’creates new gene-editing tools.Nature, 629(8011):272–272, 2024.
[17]Puneet Sharma, Guangze Luo, Cindy Wang, Dara Brodsky, CamiliaR Martin, AndrewBeam, and Kristyn Beam.Assessment of the clinical knowledge of chatgpt-4 inneonatal-perinatal medicine: a comparative analysis with chatgpt-3.5.Journal of Perinatology, pages 1–2, 2024.
[18]RionBrattig Correia, JordanC Rozum, Leonard Cross, Jack Felag, MichaelGallant, Ziqi Guo, BruceW HerrII, Aehong Min, DeborahStungis Rocha, XuanWang, etal.myaura: Personalized health library for epilepsy management viaknowledge graph sparsification and visualization.arXiv preprint arXiv:2405.05229, 2024.
[19]Sangseon Lee, Joonhyeong Park, Yinhua Piao, Dohoon Lee, Danyeong Lee, and SunKim.Multi-layered knowledge graph neural network reveals pathway-levelagreement of three breast cancer multi-gene assays.Computational and Structural Biotechnology Journal,23:1715–1724, 2024.
[20]Bohyun Lee, Shuo Zhang, Aleksandar Poleksic, and Lei Xie.Heterogeneous multi-layered network model for omics data integrationand analysis.Frontiers in genetics, 10:501269, 2020.
[21]Finlay MacLean.Knowledge graphs and their applications in drug discovery.Expert opinion on drug discovery, 16(9):1057–1069, 2021.
[22]Dongmin Bang, Sangsoo Lim, Sangseon Lee, and Sun Kim.Biomedical knowledge graph learning for drug repurposing by extendingguilt-by-association to multiple layers.Nature Communications, 14(1):3570, 2023.
[23]MaciejP Polak and Dane Morgan.Extracting accurate materials data from research papers withconversational language models and prompt engineering.Nature Communications, 15(1):1569, 2024.
[24]LiWang, XiChen, XiangWen Deng, Hao Wen, MingKe You, WeiZhi Liu, QiLi, andJian Li.Prompt engineering in consistency and reliability with theevidence-based guideline for llms.npj Digital Medicine, 7(1):41, 2024.
[25]ScottH Snyder, PatriciaA Vignaux, MustafaKemal Ozalp, Jacob Gerlach, AnaCPuhl, ThomasR Lane, John Corbett, Fabio Urbina, and Sean Ekins.The goldilocks paradigm: comparing classical machine learning, largelanguage models, and few-shot learning for drug discovery applications.Communications Chemistry, 7(1):134, 2024.
[26]Duncan Hull, Zara Josephs, Gareth Owen, Steve Turner, Marcus Ennis, Nico Adams,Adriano Dekker, Paula deMatos, Janna Hastings, Christoph Steinbeck, etal.Chebi–an open-access chemistry resource for the life sciences:*facilities for on-line submission and curation.Nature Precedings, pages 1–1, 2010.
[27]AhmedAbdeen Hamed and Xindong Wu.Detection of chatgpt fake science with the xfakesci learningalgorithm.arXiv preprint arXiv:2308.11767, 2023.
[28]Xiao Gan, Zixin Shu, Xinyan Wang, Dengying Yan, Jun Li, Shany Ofaim, RékaAlbert, Xiaodong Li, Baoyan Liu, Xuezhong Zhou, etal.Network medicine framework reveals generic herb-symptom effectivenessof traditional chinese medicine.Science advances, 9(43):eadh0215, 2023.
[29]ItaloF doValle, HarveyG Roweth, MichaelW Malloy, Sofia Moco, Denis Barron,Elisabeth Battinelli, Joseph Loscalzo, and Albert-LászlóBarabási.Network medicine framework shows that proximity of polyphenol targetsand disease proteins predicts therapeutic effects of polyphenols.Nature Food, 2(3):143–155, 2021.
[30]Qing Ye, Chang-Yu Hsieh, Ziyi Yang, YuKang, Jiming Chen, Dongsheng Cao, ShiboHe, and Tingjun Hou.A unified drug–target interaction prediction framework based onknowledge graph and recommendation system.Nature communications, 12(1):6775, 2021.
[31]Nikolai Hecker, Jessica Ahmed, Joachim von Eichborn, Mathias Dunkel, KarelMacha, Andreas Eckert, MichaelK. Gilson, PhilipE. Bourne, and RobertPreissner.SuperTarget goes quantitative: update on drug–targetinteractions.Nucleic Acids Research, 40(D1):D1113–D1117, 11 2011.
[32]Lixiang Hong, Jinjian Lin, Shuya Li, Fangping Wan, Hui Yang, Tao Jiang, DanZhao, and Jianyang Zeng.A novel machine learning framework for automated biomedical relationextraction from large-scale literature repositories.Nature Machine Intelligence, 2(6):347–355, 2020.
[33]Minoru Kanehisa and Susumu Goto.Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000.
[34]Aric Hagberg, PieterJ. Swart, and DanielA. Schult.Exploring network structure, dynamics, and function using networkx.1 2008.
[35]Damian Szklarczyk, AnnikaL Gable, KaterinaC Nastou, David Lyon, RebeccaKirsch, Sampo Pyysalo, NadezhdaT Doncheva, Marc Legeay, Tao Fang, Peer Bork,etal.The string database in 2021: customizable protein–protein networks,and functional characterization of user-uploaded gene/measurement sets.Nucleic acids research, 49(D1):D605–D612, 2021.
[36]Henning Hermjakob, Luisa Montecchi-Palazzi, Chris Lewington, Sugath Mudali,Samuel Kerrien, Sandra Orchard, Martin Vingron, Bernd Roechert, PeterRoepstorff, Alfonso Valencia, etal.Intact: an open source molecular interaction database.Nucleic acids research, 32(suppl_1):D452–D455, 2004.
[37]AhmedAbdeen Hamed, Malgorzata Zachara-Szymanska, and Xindong Wu.Safeguarding authenticity for mitigating the harms of generative ai:Issues, research agenda, and policies for detection, fact-checking, andethical ai.iScience, 2024.