Click-through rate (CTR) prediction is a crucial challenge in recommendation systems. Several public CTR datasets have emerged, but they have certain limitations. Firstly, existing datasets only include data from a single scenario and same type of items, while users usually click different types of items from multiple scenarios. Modeling from multiple scenarios can provide a more comprehensive understanding of users. Secondly, multi-modal features are important in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. However, existing datasets lack multi-modal features and are based solely on ID features. Thirdly, the scale of existing datasets is relatively small compared to real-world CTR prediction, limiting the evaluation of models. To overcome these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. AntM$^{2}$C offers several advantages:
1) It covers CTR data of 5 different types of items, providing insights into user preferences for advertisements, vouchers, mini-programs, contents, and videos.
2) In addition to ID-based features, AntM$^{2}$C provides 2 multi-modal features – raw text and image features, establishing effective connections between items with different IDs.
3) AntM$^{2}$C includes 1 billion CTR data with 200 features, encompassing 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct various typical CTR tasks and provide comparisons with baseline methods. The dataset homepage can be accessed at https://www.atecup.cn/home.