Many local cancer patient cohorts including clinical and genomic data are generally small and heterogeneous, which makes learning the complex treatment outcome mechanisms challenging with traditional statistical and machine learning (ML) approaches due to the high-dimensional nature of molecular data. Recent advances in artificial intelligence (AI) have transformed how we can statistically learn from heterogeneous data sources. Instead of training models on each dataset independently with conventional supervised ML, we will employ an AI foundation model approach, which leverages unsupervised and self-supervised tasks across multiple datasets to learn a shared representation or embedding. This allows us to capture a common representation of complex treatment response mechanisms across different cancer types and treatment categories. Once this representation is learned, we can then fine-tune the model or just apply a specific downstream task even with a limited number of samples. To uncover novel treatment response patterns using AI on large-scale real-world data, we try to gather different large-scale datasets including rich genomic data modalities together with clinical information such as treatments and treatment responses. We will undertake the following steps: feature engineering using known and established biomarker and pathway extraction; training an AI foundation model to learn cancer treatment response mechanisms through a series of pan-cancer and pan-treatment pre-training tasks; and subsequent fine-tuning and benchmarking different cancer treatment response prediction downstream tasks. Finally, we will identify multi-modal signatures that are predictive of different treatment responses, supporting patient stratification and the prioritization of new drug targets. By using diverse cancer data sources in a unified approach we hope to improve the performance of many counterfactual treatment response prediction tasks across multiple cancer types and treatments.