Large Model Based Crossmodal Chinese Poetry Creation

Abstract
Generating Chinese poetry is a complex task with significant potential for large models. However, most current systems only support single-model of input and the output lacks interpretability. This paper proposes a large model based system that supports cross-modal input of text and image, provides interpretable annotations for generated Chinese poems, and sup- ports multiple rounds of iterative optimization. First, it analyzes images with CLIP and MiniGPT-4 and generates descriptive text from analysis with ERNIE-4.0. Then, it generates Chinese ancient poems from the input text and descriptive text by ERNIE-4.0, using our devised prompts based on CRISPE. Finally, it evaluates and then optimizes the created poems with prompts based on few-shot. Preliminary evaluations have validated the efficacy of our poetry scoring criteria and demonstrated the superior performance of the system when utilizing the conjunction of text and imagery as cross-modal inputs.
Type
Publication
In 2024 IEEE Smart World Congress (SWC)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.