Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg
Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg
Philipps-Universität Marburg
Motivation – Chinese landscape paintings have unique and different visual features and stylistic characteristics as compared to the western counterpart. At the thematic level, they depict the local traditions, culture, artistic movement and the geographic variety offered by the Asian subcontinent. And at the image level, the paintings depict low texture variations due to the particular handling of the brush-stroke, color, tone and the artistic genre. These landscapes, therefore, provide valuable insights into the structural and stylistic aspects of the paintings. The state of the art methods in image analysis enables finding semantic correlations for analysis across huge image collections, and also often constitute iconographical analysis. For example, the digitization of Dunhuang grottoes (a world heritage site) has led to immense knowledge creation in terms of iconographical art-historical understanding of Panofsky (Wang et al., 2018) for the categorical and semantic treatment of the objects present in the grottoes, as well as in the computational community for discovering the ruined parts of the murals via automatic restoration techniques. We aim to investigate the semantic aspect of the Chinese landscape paintings computationally using conditional control of the image generation.
Learning basic structure of the painting and the use of artistic style helps in getting conditional control over the generation of similar looking paintings. This allows us to get a closer reading of the formation process behind those landscapes. Generative adversarial networks (GAN) (Gatys et al., 2016) are a type of deep neural networks - when given a collection of images, they are able to learn the styles from them and are able to generate images with similar styles. For example, a GAN trained on a set of paintings with Impressionism style, will be able to generate images with the same style.
Related Work – Generative modeling has thus been applied to various artworks for a variety of tasks like artwork synthesis, image editing, style transfer (Gatys et al., 2016), and image-to-image translation. Previous work on generating landscapes mainly focused on image-to-image translation, going from sketches to landscape generation. These techniques use input conditions like ink wash tone, brush strokes or a sketch as conditional input for artwork generation. However, these popular neural style transfer methods do not work well with Chinese artworks since there are marked differences in the depiction of textures, abstraction, structure and style of the paintings. Another way of controlling the generative aspect is to use input semantic maps (Liu et al., 2019) which accentuate different foreground and background objects as conditional inputs.
Research Gap – The main problem with the above methods is that the generative networks are often caught in degenerate solutions which makes the networks generate images with limited variety. Although training with image-to-image translation networks uses the image itself or its sketch as a conditional input in order to capture more diverse aspects of the paintings, the generated images often render finer details of various foreground objects unclear. When only sketches are used, the GANs are not able to capture the color-texture distribution of the generated objects. They tend to mix or use colors that are not consistent with the true colors. Semantic maps of each input image can help to bring color-consistency in image generation and also provide control to generating specific objects or regions of interest in the final image. However, these semantic maps are difficult to obtain since they require manual annotation.
Proposed Approach – In our work, we demonstrate generative modeling of Chinese landscapes using sketches as well as semantic segmentation maps of the corresponding input paintings. Specifically, we propose a novel way of generating semantic segmentation maps without requiring any kind of manual annotations. The sketches and segmentation maps of the paintings are not easily available, therefore obtaining a good quality segmentation map is a big challenge. Our approach is divided into three stages:
In the first stage, we generate sketches of Chinese landscape paintings using an edge detector called HED. This edge detector is chosen since it generates high-level shapes of the image structures while also retaining low-level details. Next, we use watershed for image segmentation to generate weak segmentations. The algorithm does not work equally well for all images due to the variety of abstractions in shapes and styles in the landscape images.
In the second stage, we refine the weak segmentations by first color-coding them. The color-codes are chosen based on the average color of the corresponding overlapping region with the input landscape image. These maps are subdued in their color tones, so we color equalize to increase their contrast. Then, these maps are taken together with the corresponding sketches to train a U-Net (similar to the generator of Pix2Pix GAN) for multi-task optimization. The landscape paintings are used as inputs and the segmentation maps along with the sketches are the outputs, while optimization conditions are constrained to only sketches. Optimizing for sketches helps to improve the quality of segmentation maps based on the structure of the sketches. Subsequently the network is able to generate better segmentation with color-consistency.
In the third and final stage, we use the Pix2Pix GAN to generate similar samples of Chinese landscape paintings from input sketches and segmentation maps. A previous work (Xue et al., 2021) explores a similar method, but uses only sketches. We use high-quality segmentation maps in addition for better foreground region generation. The training time is huge and depends mainly on the dataset size and the input image size. Higher input image size normally tends to increase the quality of the finer objects, but also takes longer training times.
Significance of Our Approach - We show how to generate better quality semantic segmentation maps using a HED detector, watershed algorithm and an additional training stage with U-Net to refine the weak segmentations generated by the watershed algorithm. In all of these steps, there is no requirement of additional high-quality manual labels. Therefore, our approach can be applied to any new set of images for generating better segmentation maps.
Bibliography
Gatys, L. A., Ecker, A. S. and Bethge, M. (2016). Image Style Transfer Using Convolutional Neural Networks.
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, pp. 2414–23 doi:
10.1109/CVPR.2016.265.
Liu, X., Yin, G., Shao, J. and Wang, X. (2019). Learning to predict layout-to-image conditional convolutions for semantic image synthesis.
Advances in Neural Information Processing Systems,
32.
Wang, X., Song, N., Zhang, L. and Jiang, Y. (2018). Understanding subjects contained in Dunhuang mural images for deep semantic annotation.
Journal of Documentation,
74(2): 333–53 doi:
10.1108/JD-03-2017-0033.
Xue, A. (2021). End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks.
2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE, pp. 3862–70 doi:
10.1109/WACV48630.2021.00391.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Tokyo, Japan
July 25, 2022 - July 29, 2022
361 works by 945 authors indexed
Held in Tokyo and remote (hybrid) on account of COVID-19
Conference website: https://dh2022.adho.org/
Contributors: Scott B. Weingart, James Cummings
Series: ADHO (16)
Organizers: ADHO