pn-Summary
A well-structured summarization dataset for the Persian language!
Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization 🦁
A well-structured summarization dataset for the Persian language consists of 93,207 records. It is prepared for Abstractive/Extractive tasks (like cnn_dailymail for English). It can also be used in other scopes like Text Generation, Title Generation, and News Category Classification. Moreover, we tested out this dataset on novel models and techniques.
- mT5: A pretrained encoder-decoder model
- BERT2BERT: A leveraging ParsBERT model as an encoder-decoder architecture.
Follow the rest of the repo for more details.
Paper link: 10.1109/CSICC52343.2021.9420563
References
2021
- Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text SummarizationIn 2021 26th International Computer Conference, Computer Society of Iran (CSICC), 2021
2020
- ParsBERT: Transformer-based Model for Persian Language UnderstandingNeural Processing Letters, 2020