Ðóñ Eng Cn Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Litera
Reference:

Computational creativity of neural network Midjourney in a polymodal space

Zhikulina Christina Petrovna

ORCID: 0000-0003-2488-4616

Postgraduate, Department of General and Russian Linguistics, Peoples' Friendship University of Russia named after Patrice Lumumba

117198, Russia, Moscow, Miklukho-Maklaya str., 6

christina.zhikulina@gmail.com
Other publications by this author
 

 
Kostromina Viktoriya Vladimirovna

Master's Degree; Department of General and Russian Linguistics; Peoples' Friendship University of Russia named after Patrice Lumumba

10 Miklukho-Maklaya str., building 2, Moscow, 117198, Russia

kostromina_vv@pfur.ru

DOI:

10.25136/2409-8698.2024.6.70890

EDN:

COCFNP

Received:

25-05-2024


Published:

01-06-2024


Abstract: This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.


Keywords:

artificial intelligence, computational creativity, transformational creativity, neural network, Midjourney, polymodal space, polymodal text, iteration, prompt, summarization

This article is automatically translated. You can find original text of the article here.

Introduction

According to the Enlightenment thinker Jonathan Swift, imagination is "the art of seeing things invisible to others" [20, p.11]. The idea of searching for new motifs on the border of graphics, science, cognitive psychology and mathematics is applicable today to many works of art created not only through paints and canvas, but also with the help of computers, algorithms, neural networks and other forms of artificial intelligence (hereinafter – AI). American researchers in the field of AI call such an opportunity for machines computational creativity.

English scientist Simon Colton offers the following definition of computational creativity: "philosophy, science and engineering of computing systems, assuming certain responsibilities, demonstrate behavior that impartial observers would consider creative" [17]. In the field of computer modeling, Professor Margaret Boden's approach has a special influence, in which she highlights research creativity, where a new idea is a continuation of traditions for the subject paradigm, and creativity consists in exploring the boundaries, content and potential of the creative space [1, p. 116]. The researcher also describes combinatorial creativity as the creation of a new one using combinations of previously known ideas (associations) [15, p. 350] and transformational creativity, which includes several fundamental areas that form a space for the emergence of previously unknown ideas [16, p. 362]. Although ideas related to the computational creativity of machines are not new, they receive real development only in our time, when the summarization of materials and information becomes not only a necessity, but also a problem for many areas of research.

In 1968, British artist Harold Cohen created a computer program for drawing AARON, which independently created paintings. More recent examples of digital art include user collaboration with the DeepDream computer vision program, created in 2015 by Google developer Alexander Mordvintsev. The program uses an artificial neural network that searches for and enhances patterns in images.

Neural network technologies today can be trained to classify and recognize objects based on a variety of training images or using a detailed text description: they search for certain patterns and hyperbolize them – "about the same way a person does when looking at clouds and trying to see the outlines of animals in them" [19, p. 195]. Such programs can help scientists better understand how artificial neural networks relate to real networks of neurons in the visual cortex of the brain. Technologies of this type also help to understand how the human brain searches for patterns and meaning between several components [Ibid.].

From the point of view of Yu.A. Yevgrafova, in modern linguistics, the text is understood as "a single structural and semantic whole (a system of linear and nonlinear spaces), which is created by contamination of elements of all levels, functioning in a certain pragmatic context that determines its perception and understanding" [3, p. 48]. It can be noted that the researcher identifies "contamination of elements of all levels" as a leading factor in understanding the text. At the same time, A.P. Guseva emphasizes that as a result of the combination of verbal and non-verbal, polymodal communication is not direct, since "the plan of the content of the utterance, expressed by the meanings of its components, does not coincide with the final communicative meaning" [2, p. 100]. In one of his works, O.I. Maksimenko notes that a person receives a significant share of knowledge about the world through vision, "that is, visual signs, which include both drawings (in the general sense of the word) and printed text, carry maximum information" [6, p. 93]. In other words, creating an image with a neural network through a textual description in a polymodality environment should also include semantic categories that require additional efforts for interpretation. Such a conclusion can be made due to the emphasis on the psychological characteristics of a person, but in the case of AI, the question of psychological perception of the world remains hypothetical.

The relevance of this work lies in the fact that the issues and problems of polymodal text have been described in linguistics for almost fifty years, and neural networks have been actively used to generate a "single graphic space" [11, p. 117], in which verbal and non-verbal components are combined within one or more images, relatively recently. Many users complain about the misinterpretation of the request from generative networks, and as a result of this – not the exact or partial generation of the requested image. Algorithms for processing text input data remain unexplored, undescribed, and intuitive for both researchers and users.

The novelty is that polymodal text and its creation using AI are being investigated for the first time. Although "our society jealously protects everything related to creativity," scientists from the computing division of Goldsmiths College Simon Colton and Geraint Wiggins believe that computational creativity (or machine creativity) is "probably the last limit of AI capabilities" [18, p. 26]. The term computational creativity (or machine creativity) can be viewed from several angles: 1) the field of AI development, where creativity is modeled using a computer or other technology (tablet, phone); 2) programs that expand human capabilities and are tools in the creative process. 

The theoretical significance of the work lies in the fact that linguists have the opportunity to explore linguistic units and phenomena and their functioning in a new format of digital text; compare the results with the data that were obtained during the research of traditional language forms; contribute to the development of a promising area of linguistic research – "analysis of the influence of speech practices in the digital environment on the language system and language components" [9, p. 192]. Also, due to the fact that many researchers note the high frequency of polymodal texts on the Internet and the simplicity with which Internet platforms and programs allow you to integrate and create visual works (not only in communication, but also in literature, business and other fields) [11, p. 115], polymodal texts and The polymodal type of communication opens up for research from a new, unexplored side – development in the digital environment using neural network technologies.

         The practical significance of the research lies in the fact that the materials and results obtained can be used for lectures and seminars on "General Linguistics" in sections that develop both theoretical and practical skills within the framework of the discipline "Internet Linguistics". Examples and their descriptions will become relevant in such topics as "The role of linguistics in the study of Internet communication", "Verbal specifics of communication on the Internet", "Polymodal texts on the Internet" (sections "Infographics", "Situational polymodal works" and "Characteristics of the visual component of polymodal texts").

 

Research materials

The materials were minimalistic illustrations from the inserts of the popular Turkish chewing gum ‘Love is...’ (Fig.1). Images of this type were not chosen by chance, since the smallest details, the faces of the characters, landscapes, etc. are not drawn in the pictures. It is also important that the verbal component (inscription-message) on the insert contains a sentence, which is presented using two autonomous graphic parts: "the subject of 'Love is...' and the predicative part of the utterance" [13, p. 152]

Figure 1/ Picture 1

Examples of inserts from Turkish chewing gum "Love is..." /

Examples of cartoon stuffers from Turkish chewing gum ‘Love is…’

Èçîáðàæåíèå âûãëÿäèò êàê òåêñò, ìóëüòôèëüì, ñíèìîê ýêðàíà, Ìóëüòôèëüì  Àâòîìàòè÷åñêè ñîçäàííîå îïèñàíèå

Source: Collection of inserts of the popular gum ‘Love is...’. URL: https://www.liveinternet.ru/users/zimuka/post354225218 / (date of request: 05/01/2024) /

Source: The collection of cartoon stuffers of popular gum ‘Love is…’. URL: https://www.liveinternet.ru/users/zimuka/post354225218/ (accessed: 01.05.2024).

 

 The combination of verbal and non-verbal components in 'Love is' allows us to analyze the results obtained from the side of the polymodal text, "the connection of its components, the use/non-use of a specific semiotic code, the features of decoding the author's intention and the presence of intertext in it" [8, p. 299].

 

Results and discussions

         During the experiment, about 20 images were generated and analyzed. For discussion, we chose the generation process based on the example of only one drawing, since the creation stages turned out to be long, and the algorithm for assembling and processing verbal and non-verbal components using the Midjourney neural network turned out to be the same for images of any type.

         The Midjourney neural network was supposed to reproduce the characters' images from the insert using a text query, user voice ratings and generative data processing capabilities.

         To describe the tests with the Midjourney neural network, we selected an insert with a plot in which the main characters of the mini-comic have aged (Fig.2). Figurative signs of the Russian concept of "old age" can "be gray and covered with wrinkles" [10, p. 1228]. "Gray hair" as a figurative sign of the concept of "old age" can be considered equally in both Russian and other cultures. It is assumed that the figurative feature becomes universal for computational creativity in machines.

In the insert (Fig.2) we see two seated figures, which are depicted on a white background. The white background in the development of the mini-comics ‘Love is’ is a classic component of images and allows you to focus on key images and their attributes. In addition, a white background may indicate a representation of a situation outside of time and space – the absence of a chronotope. Conventional non-verbal signs are a bench without a back and a lawn with grass and flowers. The lawn indicates the season, which makes it possible to assume in which specific period events unfold: in spring, summer or early autumn. These pointers can serve as an association with the key verbal element "love", as well as a symbol of life in connection with the verb "to live" used in part of the verbal component in the image. Another conditional component allows you to reveal the kinesics of the characters, since a bench with a back could hide an important form of physical intimacy – hugs. 

The figures of two people – male (left) and female (right), sitting on a bench, represent an iconic component, demonstrating a married couple who lived "together until old age." The gender characteristics are indicated by classical ideas about a man and a woman – the male figure has short hair, and in comparison with the female, the male figure is taller; the female figure has long hair, the height is below the figure on the left.

The reflection of the elderly age of the couple is caused by the universal figurative sign described above – gray hair (pale hair color of both the male and female characters), as well as a material attribute – a cane. It is important that the gray hair is mixed with the general color of the head: so, we see that the hair is not completely gray, but light gray in a boy and light yellow in a girl. The cane is present in both characters in an identical variation and equalizes the position of the characters, which indicates a unifying sign of age. It is worth noting that the male figure is wearing a blue jacket, and the female figure is wearing a pink dress, which emphasizes the gender identity of the characters.

Due to the fact that the couple is depicted with their backs to the reader, other visual signs of old age are not available to us. However, the configuration of the figures does not indicate other signs of old age: we see that the pair is of standard build and repeats the graphic images of other comics ‘Love is’, hinting at the childish or young age of the characters (as in the previous series of inserts). The formed classic image of the comic book characters ‘Love is’ does not allow us to talk about the characters as elderly people, but may indicate their temporary role in this appearance.

The slogan as a verbal component "Love is ... to live together until old age" actualizes the image of an elderly person, but does not fix it as a permanent characteristic of the characters. In the absence of a verbal component, the age of the characters is questioned, and therefore the wording "boy and girl" will be most correct when describing the characters.

 

Figure 2/ Picture 2

The plot from the inserts for the chewing gum "Love is..." /

The cartoon stuffers from chewing gum ‘Love is…’

__1

Source: Social Internet service, photo hosting "Pinterest". URL: https://www.pinterest.ca/pin/2251868556772924 / (date of request: 05/01/2024) /

Source: An image sharing and social media service ‘Pinterest’. URL: https://www.pinterest.ca/pin/2251868556772924/ (accessed: 01.05.2024).

        

              Let's consider the results in detail.

Figure 3/ Picture 3

The result of the iteration in block No. 1 on creating images in Midjourney /

The result of iteration in block ¹1 by creation images in Midjourney.

it_1

Source: Telegram chatbot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed 05/12/2024) /

Source: Telegram chat-bot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed: 12.05.2024).

 

Block 1

         We see that the same user text specified in the Midjourney chatbot is contained under four variations of images (Fig.3). The prompt processing system, that is, a request, hint, input data or instructions, converts four different pictorial variants for the same text at once. It is assumed that according to the descriptive characteristics in the script, the generative network cannot always choose the exact style of drawing, as well as create a detailed image in the image from the first time.

         Some users believe that one detailed description text is enough for a neural network to generate a simple image. However, most networks today are at a "transitional" stage of development. This means that the neural network spends a lot of resources and time on classifying queries, processing input data and placing accents. For example, after sending the promt, Midjourney informs the user in the chat that "processing the request may take 1-3 minutes" [14]. In reality, one request can be processed for 10-15 minutes. Moreover, image modeling takes place in several stages (in our study, blocks).

         It is also important to note that the types of speech, including the description that we use in promt, are "a thought complex expressing the connections of simultaneity, sequence or causal dependence between phenomena" [7, p. 24], which in turn affects the creation of a polymodal space: input data (text query) + query processing (neural network) = iteration result (image). In this scheme, it becomes clear that in the generation of images, the text is the leading one – a textual description – a search and selection of information, images based on verbal and nonverbal components, and, most importantly, the result that combines a person's thought in verbal form and an artificial "thought" in a nonverbal format.

         In the first draft, we introduced an insufficiently high-quality description of the query, since it was necessary to test the interaction of two channels in Midjourney: text and visual. We intentionally made speech or factual errors that mimic promt from the point of view of an ordinary user who does not know the algorithms of operation when requesting a neural network. So, in the first act of block 1, it turns out that "a boy and a girl in old age are sitting on a bench," "sitting with their backs" and "holding a stick of pensioners." If we evaluate this text from a linguistic point of view, then it looks more like a set of words. However, for a generative network, the number of errors made is not an obstacle to creating an image. Paying attention to the non-verbal semantics of the images we received (Fig.3), it can be seen that the neural network depicts people in old age, more similar in physique to children, but at the same time to people in old age due to the gray hair color, which corresponds to the original display of the characters on the insert. Also, the network literally understands the query in the script "sitting with their backs", and we see that the characters shown in the drawings are sitting with their backs turned relative to the person looking at the image. Midjourney defines the "wand of pensioners" in different ways: as a staff, as a stick or as a cane. The description in the video "hugging each other from behind" is not enough: in three images, a boy hugs a girl with one hand (Fig.3, pictures 1,3,4), in one there are no hugs (Fig.3, Picture 2). Among other things, in one of the images, a girl and a boy switch places (left-right) on a bench (Fig.2, Picture 3).

The promt "green lawn with flowers" was perceived by the neural network in different ways. We see that the lawn on indicates the late spring period due to the splendor of flowers (Fig.3, Picture 1); hints at the middle of summer (Fig.3, Picture 2); at the end of the summer season (Fig.3, Picture 3); the autumn period is depicted, since the flowers look withered and dry (Fig.3, Picture 4). The non-verbal characterization of the lawn in the presented results varies, since it is also not possible to decipher the season in the original.

         It is likely that in the first draft there were not enough descriptive characteristics for the wand, the landing and position of the hands, as well as conditional components (the type of bench, the nature of the background), so the neural network provides different options in the iteration results. Since these details were missing from our description, we cannot talk about a misinterpretation of the user request.

         In the next block (Fig.4), we clarified a number of components in the text and obtained new results.

 

Figure 4/ Picture 4

The results of the iterations in block No. 2 on creating images in Midjourney /

The results of iterations in block ¹2 by creation images in Midjourney.

.png

Source: Telegram chatbot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed 05/12/2024) /

Source: Telegram chat-bot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed: 12.05.2024).

 

Block 2

         In this block, the second promt is set anew in the Midjourney chatbot with the addition of descriptive characteristics. It is impossible not to pay attention to the fact that the images in block 1 (Fig.3) are far from the original image on the insert of the chewing gum ‘Love is’ (Fig.2) – there is no minimalism. In the request, we added that "the characters look like Love is in chewing gum ...", as well as: "the general background is white" and "the picture says: "Love is ..." (above), "... live together until old age" (below)".

         In the iteration results, we see that in two images the background has turned white (Fig.4, Pictures 1 and 4), "boy and girl" hug each other from behind in all four versions, a message inscription has been added everywhere. Midjourney perceives the position of the label (top, bottom) in its own way: 1) when the subject ‘Love is’ is at the top of the image, and the predicative part is at the bottom of the image (Fig.4, Picture 1); 2) when the subject ‘Love is’ and the predicative part are in the same place, but ‘Love is’ is above the rest of the text (Fig.4, Picture 4); 3) when the subject ‘Love is’ and the predicative part are on the same level (parallel to each other), but not at the top and bottom of the image, but to the right and left of each other (Fig.4, Pictures 2 and 3). It is also important to note that the text is not generated in Russian: the neural network translates it into English. As the experiment showed, when putting text on images, there are also double words like "love is...is "or "together together" or "old...old".

         Otherwise, it can be noted that after the introduction of an additional description in the promt "the characters look like they are in Love is chewing gum...", several variations of the Midjourney drawings became more close to the original image on the insert we selected. The appearance of the "girl" has changed – the hair has become short (Fig.4, pictures 1 and 3), although the characteristic with long hair was not adjusted by us in the second draft.

The promt "green lawn with flowers" in this series of results looks the most uniform, clearly and accurately does not indicate the season. All components are universal in nature – just like in the original ‘Love is’ insert.

         Next, we followed a different algorithm: we selected the image (Fig.4, Picture 3), since it seemed to us the closest to the picture on the insert of the chewing gum, and continued to create promts with an updated description already relative to this image.

 

Figure 5/ Picture 5

The results of iterations in blocks 3 and 4 on creating images in Midjourney /

The results of iterations in blocks ¹3 and ¹4 by creation images in Midjourney.

_3

Source: Telegram chatbot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed 05/12/2024) /

Source: Telegram chat-bot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed: 12.05.2024).

 

Blocks 3 and 4

         In these blocks, details and inaccuracies were worked out by adding a third channel: voice promt. By means of voice messages, we changed the description of the "wand for pensioners" to "cane", removed the line with the explanation "the characters look like they are in Love is chewing gum..." and added a description for the bench on which the characters are sitting. We noticed that for two blocks in a row, the bench was depicted by Midjourney as an object with a backrest. However, in the original image, which we took as a basis, the bench is drawn without a backrest (Fig.2).

         In block 3 (Fig.1, Picture 1), the text message input in the figure is saved using the Russian language (Fig.5, Picture 1). Also, the neural network repeats a series of double words "is...is "and "old old", duplicates the same message twice (Fig.5, Pictures 1-3). The bench remains with a backrest, but for the first time canes with a rounded handle appear in both characters in all images.

         In block 4 (Fig.5, pictures 2-4), we make written changes to the form of the voice script from block 3: we translate the inscription message from Russian into English. Midjourney produces a series of images that are close to the source we relied on, but with some inaccuracies:

1. – the back of the bench is missing only in one image (Fig.5, Picture 3);

2. – the cane on the side of the "boy" is located at a distance (Fig.5, pictures 2.3);

3. – the inscription-message, entered in English, again contains a double word (Fig.5, Picture 2);

4. – the hair of the "girl" is not light yellow gray (as in all recipes), but simply gray;

5. – the color of the "girl's" dress changes from light pink to yellow (Fig.5, Picture 1);

6. – the hair of the "girl" is short, and a description is entered in the promt that it is long (in all pictures);

7. – The white background is saved only on one image (Fig.5, Picture 4).

         In our opinion, several images generated by Midjourney turn out to be the closest to the original (Fig.6). It is possible to continue generating blocks with defect correction indefinitely, however, at 4-5 blocks the generative network begins to "walk in a circle" in images and their variations, changing only the colors and positions of the characters in space.

 

Figure 6/ Picture 6

The final results of iterations in blocks for creating images in Midjourney /

The final results of iterations in blocks by creation images in Midjourney. 

__20240523__23.14.31

Sources: Social Internet service, Pinterest photo hosting. URL: https://www.pinterest.ca/pin/2251868556772924 / (accessed: 05/01/2024); Telegram chatbot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed: 05/12/2024)  /

Sources: An image sharing and social media service ‘Pinterest’. URL: https://www.pinterest.ca/pin/2251868556772924/ (accessed: 01.05.2024); Telegram chat-bot ‘ChatGPT | Midjourney | Claude | Suno AI — GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot (accessed: 12.05.2024).

 

         It can be concluded that in the joint creation of an image of a person and Midjourney, the most important element turns out to be text – promt. Professional users have developed a system by which they build text for promt in order to get closer to the desired result. Promt turns out to be a certain kind of art or, in another way, a key module in creating an image.

         There are also errors or inaccuracies in the generation that even a well–written script is not able to correct: (according to the results of our research) the landing of heroes, the selection of colors, the correct transfer of the verbal component - inscriptions-messages in the image.  

         In programming, the term iteration is applied to special data processing, in which actions are repeated many times, while recursion, that is, calling a function (procedure) from the same one, is not implemented. In other words, a recursive program allows you to describe repetitive or infinite actions (calculations) without explicit repetitions of previous parts of the program and the use of loops. We believe that this term is also suitable for describing actions when generating an image using a neural network in a polymodality environment. Moreover, a step-by-step system for the formation of a polymodal environment in the digital world can be designated as a summarization of verbal, non-verbal and vocal components. A similar system has already been described by O.Y. Kolomiytseva and A.N. Moskaleva in an article dealing with the implementation of the category of polymodality in Instagram discourse. They noted that the text and the image form a "new verbal-visual form" [4, p. 115], which is interesting because separately the visual and verbal components may not carry deep meanings [Ibid.]. We would like to add a voice channel to their system in the Midjourney polymodal space, which would justify the use of the term polymodal in the context of digital art with generative networks.

 

Conclusion

         The acceptability of using the term polymodal in the context of computational creativity (or machine creativity) in the digital art of neural networks is due to the multichannel nature of the image creation process. On the one hand, there is an interaction of verbal and non-verbal components through promt and visualization of its generative network. On the other hand, voice input data can be used to quickly correct inaccuracies in the descriptive characteristics of the script. In the interaction of all parts, a polymodal environment arises in which a polymodal space is created.

         During the experiment, it was possible to test and describe the work of Midjourney. The verbal-visual images generated using this neural network partially coincide with the original plot depicted on the insert of the Turkish chewing gum ‘Love is'. We assume that Midjourney is trained to draw too well, so she is programmed to work by induction rather than deduction. The neural network is not trained to minimize, but it is trained to improve quality. This is proved by the results of the iteration in block 1, where the images look detailed and more realistic. However, the plot of "a boy and a girl in old age" depicted on the 'Love is' insert can be saved on each generated image.

         The interaction of all blocks when creating an image occurs at a high level, however, the generative network quite often incorrectly iterates the user's request. The erroneous vision of queries does not always depend on a well-written script.  Today, we can only assume that these capabilities in neural networks will be refined by developers.

         Thus, computational creativity (or machine creativity) is observed in Midjourney, if only because there are image variants in the creation blocks that do not correspond much to the characteristics set by the user (especially the first promt). Even if the neural network's new vision of the image is the result of deducing the arithmetic mean from multiple summation, this means that the generative network works according to the principle described by Margaret Boden, transformational creativity, which involves the creation of previously unknown ideas based on fundamental knowledge in a given area.  

References
1. Belova, S.S. (2008). Creativity: Psychological and computer models. Psychologic. The Journal of HSE University, 4, 112-119.
2. Guseva, A.P. (2018). Semiotically heterogenious literary text as a semantically complex communication. The journal ‘Vestnik of Moscow State Linguistic University’, 18(816), 98-109.
3. Evgrafova, Yu.A. (2020). Linguosemiotics of the screen: modeling reality in screen texts (based on the texts of cinema, television and the Internet): dissertation for the degree of doctor of Philology. (10.02.19). Moscow State Linguistic University. Moscow.
4. Kolomiytseva, O.Y., & Moskaleva, A.N. (2021). Ways of implementation of the category of multimodality in the English Instagram discourse. Samara University of Public Administration ‘International Market Institute’, 2, 115-125.
5. The collection of cartoon stuffers of popular gum ‘Love is…’. Retrieved from https://www.liveinternet.ru/users/zimuka/post354225218/
6. Maksimenko, O.I. (2012). Polycode vs. creolized text: terminology problems. RUDN Journal of language studies, semiotics and semantics, 2, 93-102.
7. Nechaeva, O.A. (1975). Functional and semiotic types of speech: (Description, Narration, Reasoning): Abstract of the dissertation for the degree of doctor of Philology. (10.02.01). Moscow Region Pedagogical Institute named after N.K. Krpskaya. Moscow.
8. Novospasskaya, N.V., & Dugalich, N.M. (2022). Terminological system of the polycode text theory. Russian Language Studies, 20(3), 298-311. Retrieved from http://doi.org/10.22363/2618-8163-2022-20-3-298-311
9. Polonskiy, A.V. (2018). Medialect: language at the mediaformat. Journal «Issues in Journalism, Education, Linguistics», 2, 230-240. Retrieved from http://doi.org/10.18413/2075-4574-2018-37-2-230-240
10. Safaralieva, L.A. & Perfilieva, N.V. (2023). The Modelling of a Multidimensional Linguocultural Concept onthe Example of the Concept ÑÒÀÐÎÑÒÜ ‘SENILITY’. RUDN Journal of Language Studies, Semiotics and Semantics, 14(4), 1217–1234. Retrieved from https://doi.org/10.22363/2313-2299-2023-14-4-1217-1234 (In Russ.).
11. Sonin, A. (2005). Experimental studies of multimodal text comprehension: main directions. Voprosy Jazykoznanija, 6, 115-123.
12. An image sharing and social media service ‘Pinterest’. Retrieved from https://www.pinterest.ca/pin/2251868556772924/
13. Stepanova, I.V. (2013). Creolized text as a means of realization of the concept of love (on the material of the comix). Bulletin of Chelyabinsk State University’, 24(315).
14. Telegram chat-bot ‘ChatGPT. Midjourney. Claude. Suno AI – GPT4Telegrambot Inc.’. Username: @GPT4Telegrambot
15. Boden, M.A. (1998). Creativity and artificial intelligence. Artificial Intelligence, 103, 347-356.
16. Boden, M.A. (1999). Computer models of creativity. Handbook of Creativity. R.J. Sternberg (ed.). Pp. 351-372. Cambridge University Press.
17. Colton, S. (2019). From Computational Creativity to Creative AI and Back Again. Interalia Magazine. Retrieved from https://www.interaliamag.org/articles/simon-colton/
18. Colton S. & Wiggins, G.A. (2012). Computational creativity: The final frontier? In ECAI 2012-20th European Conference on Artificial Intelligence, 27-31 August 2012, Montpellier, France-Including Prestigious Applications of Artificial Intelligence (PAIS-2012) System Demonstration. Frontiers in Artificial Intelligence and Applications; Vol. 242. IOS Press. Pp. 21-26. Retrieved from https://doi.org/10.3233/978-1-61499-098-7-21
19. Pickover, A. Clifford. (2021). Artificial Intelligence: An Illustrated History: From Medieval Robots to Neural Networks. Sterling Publishing Co., Inc. (USA) via Alexander Korzhenevski Agency (Russia).
20. Santini, C. (2019). Kinttsugi: Finding Strength in Imperfection. Andrews McMeel Publishing LLC.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The author of the reviewed article draws attention to the computational creativity of the generative Midjourney network in a polymodal space. The subject area of the work is quite clear, in principle, the relevance of the issue is also indicated. Therefore, the material corresponds to one of the sections of the magazine, it is, one way or another, relevant. The formal grades of the study are sustained: the manifestation of the main positions of the bottom – "The idea of searching for new motives on the border of graphics, science, cognitive psychology and mathematics is applicable today to many works of art created not only through paints and canvas, but also with the help of computers, algorithms, neural networks and other forms of artificial intelligence (hereinafter – AI). American AI researchers call such an opportunity in machines computational creativity," or "Neural network technologies today can be trained to classify and recognize objects based on a variety of training images or using a detailed text description: they search for certain patterns and hyperbolize them – "about the same way a person does when he looks at the clouds and tries to see the outlines of animals in them," or "The relevance of this work lies in the fact that the issues and problems of polymodal text have been described in linguistics for almost fifty years, and neural networks have begun to be actively used to generate a "single graphic space" in which verbal and non-verbal components are combined within one or more images, relatively recently." The evaluation vector of neural networks is currently subjective, only evaluation options are given, but there is no general description. The author tries to emphasize the novelty of the research, however, and it is weakly specific: "The novelty is that the polymodal text and its creation with the help of AI are being investigated for the first time. Although "our society jealously protects everything related to creativity," scientists from the computing division of Goldsmiths College Simon Colton and Geraint Wiggins believe that computational creativity (or machine creativity) is "probably the last limit of AI's capabilities...". I think that a special case may be of interest, but it is not a generalization: "The materials were minimalistic illustrations from the inserts of the popular Turkish chewing gum 'Love is...' (Fig.1). Images of this type were not chosen by chance, since the smallest details, the faces of the characters, landscapes, etc. are not drawn in the pictures. It is also important that the verbal component (inscription-message) on the insert contains a sentence, which is presented using two autonomous graphic parts: "the subject of 'Love is...' and the predicative part of the utterance." The statistics / experimental data were entered correctly, there are no actual violations: "About 20 images were generated and analyzed during the experiment. For discussion, we chose the generation process based on the example of only one drawing, since the creation stages turned out to be long, and the algorithm for assembling and processing verbal and non-verbal components using the Midjourney neural network turned out to be the same for images of any type." The assessment of the "visual" is given objectively, the author tries to resort to a modern methodology of analysis: "The figures of two people – male (left) and female (right), sitting on a bench, represent an iconic component demonstrating a married couple who lived "together until old age." The classical ideas of a man and a woman indicate gender characteristics – the male figure has short hair, and in comparison with the female, the male figure is taller; the female figure has long hair, the height is lower than the figure on the left." Formally significant points are described, but the proper dialogue of opinions remains: "we see that the same user text specified in the Midjourney chatbot is contained under four variations of images (Fig.3). The prompt processing system, that is, a request, hint, input data or instructions, converts four different pictorial variants for the same text at once. It is assumed that according to the descriptive characteristics in the script, the generative network cannot always choose the exact style of drawing, as well as create detailed images in the image from the first time." The author's comment is successfully introduced in the comparison mode: "In these blocks, details and inaccuracies were worked out by adding a third channel: voice promt. By means of voice messages, we changed the description of the "wand for pensioners" to "cane", removed the line with the explanation "heroes look like in Love is chewing gum ..." and added a description for the bench on which the heroes sit. We noticed that for two blocks in a row, the bench was depicted by Midjourney as an object with a backrest. However, in the original image, which we took as a basis, the bench is drawn without a back (Fig. 2)." The variation of the assessment is polar, which is important, the points of view are variable. The result of the work correlates with the main block, no actual violations have been identified. It is worth agreeing that "during the experiment, it was possible to test and describe the work of Midjourney. The verbal-visual images generated using this neural network partially coincide with the original plot depicted on the insert of the Turkish chewing gum ‘Love is'. We assume that Midjourney is trained to draw too well, so she is programmed to work by induction rather than deduction. The neural network is not trained to minimize, but it is trained to improve quality. This is proved by the results of the iteration in block 1, where the images look detailed and more realistic. However, the plot of "a boy and a girl in old age" depicted on the 'Love is' insert can be saved on each generated image." The work is impressed by the option of a non-trivial assessment of the generative network, its features and specifics. The basic requirements of the publication are taken into account, the text does not need fundamental editing and correction. I recommend the article "Computational creativity of the generative Midjourney network in a polymodal space" for publication in the journal "Litera".