Part 2 - The Research and Development of Generative AI Models

Written By Yan Zhang
Published by Yvette Depaepe, the 17th of May 2024

"Machines will be capable, within twenty years, of doing any work a man can do.” ~Herbert A. Simon (1965)

In July 2023, I attended in an AI research forum. An Amazon researcher introduced to us several AI projects currently undertaken at Amazon. During the event, we had lunch together. When she learned that I was also a photographer, she bluntly said to me: "Midjourney ended photography!"his statement, her words present the view of many professionals engaged in the cutting-edge research on generative AI. In this article, from the perspectives of both as an AI scientist and as a professional photographer, I try to thoroughly explore the profound impact that generative AI is having on traditional photography; and how we, as photographers, should face it to this challenge.

“Dream seascape”. Generated on Midjourney, by Yan Zhang.

PART 2 - THE RESEARCH AND DEVELOPMENT OF GENERATIVE AI MODELS

Reasoning and learning are the two most important features of human intelligence. In AI research, it always revolves around these two themes. After entering the 21st century, AI has gradually emerged from its long winter, and machine learning research has begun to make new and critical breakthroughs.

Deep Learning and ImageNet

Deep learning based on neural networks is one of many machine learning methods. Many of its key concepts and technologies were proposed and developed in the 1990s.

But deep learning really began to show its superiority over other machine learning methods in the first decade of this century. In 2007, Fei-Fei Li, an assistant professor at Princeton University, began to build the ImageNet dataset based on Fellbaum's WordNet with the support of Christiane Fellbaum, who was also a professor at Princeton University. In the following years, ImageNet collected 14 million classified and annotated images, becoming the most used training dataset for computer vision research at that time.

It is worth mentioning that at that time, machine learning research was focused on models and algorithms, and the training datasets used by researchers were relatively small. Li was the first researcher to focus on establishing extremely large datasets.

In 2012, the AlexNet model based on deep convolutional neural network (Deep CNN) stood out in the large-scale ImageNet image recognition competition, defeating other machine learning algorithms in image recognition with significant advantages. Since then, deep learning based on neural networks has become the mainstream of machine learning research and has continued to yield breakthrough results.

Generative Adversarial Networks (GANs)

In 2014, Canadian researcher Ian Goodfellow and his collaborators proposed a new neural network learning architecture, namely the generative adversarial network GAN, thus opening up a new research direction in generative AI. We can understand the basic principles of the GAN model from the following figure.

*Figure 1. The general architecture of GAN.*

Suppose we want to train an AI (GAN) model that can automatically generate human face panerns. First, we need to prepare enough (specific size) real human face photos as a training dataset, which is x in Figure 1. Secondly, we need to design two neural networks, called D and G - standing for discriminator and generator, respectively. Networks D and G compete in a zero-sum game mode: on the one hand, D continuously receives real face pictures from the training dataset and is told that these are human faces; on the other hand, network G generates a pattern, sends it to D and let it determine whether it is a human face pattern. Initially, G will only randomly generate irregular patterns. After D receives the face photo information from x, it is easy to recognise that the pattern generated by G is not a human face.

However, since both networks will continuously adjust their training parameters based on the evaluation results of each training cycle, the panerns produced by the generator are gradually getting close to human faces. This training process is repeated iteratively, and the pattern generated by the generator will continuously approach the real face pattern, until the discriminator can no longer tell whether the panern generated by G is a real face pattern input from x or a pattern generated from G.

Theoretically, in such a mutually competitive training method, G can eventually generate panerns that are not essentially different from any training dataset panerns.

However, in practice, it is still quite difficult to use GAN methods to generate realistic and very complex panerns. The most successful one is probably to generate realistic human face patterns: https://thispersondoesnotexist.com.

The main limitations of the GAN method are simply two aspects: First, the instability of training, which leads to model collapse and output low-quality pictures; second, because pictures are generated based on discrete pixel space, which may also easily lead to distortion or low-quality pictures.

Nevertheless, the GAN method has quickly become the mainstream of generative AI research since it was proposed. Researchers have jointly studied many different types of GAN models and key technologies related to them. Many of these results have also provided important support for the research on diffusion models.

*Figure 2. Generated fractal images. Generated by Fractal_GAN model developed by the author.*

*Figure 3. Generated mountain images. Generated by Mountain_GAN model developed by the author.*

Diffusion Models - A New Approach for Generative AI

The diffusion model is a new generative AI method proposed in 2015. Its intuitive idea can be understood through the following simple physical phenomenon.

*Figure 4. An intuitive explanation of the diffusion model.*

Suppose we drip a drop of blue ink into a glass of water (as shown in the lei picture above). Over time, the drop of ink will slowly spread, and finally dye the entire glass of water blue (as shown in the right picture above). This process is called forward diffusion. Now let's look at this diffusion process in reverse: if we know the diffusion trajectory of a drop of ink in clear water, then through reverse derivation, we can know the position and shape of the drop of ink in the clear water at the initial time. This process is called reverse diffusion.

Returning to the diffusion model, a clear image is equivalent to the initial drop of blue ink in the glass in our example above. The forward diffusion process of ink is equivalent to the process of continuously adding noises to the image, making the image slowly filled with noises. The reverse diffusion process of ink, on the other hand, is equivalent to the process of gradually removing noise from an image full of noises and restoring it to a clear image.

Through extensive learning and training, the diffusion model can finally obtain the distribution of noises in an image during the process of gradually adding noise, thus having the reverse diffusion process to remove noises and restore the original image. The general process is shown in the figure below.

*Figure 5. Two diffusion processes in diffusion models.*

The DeepMountain v1.1.4.2 model – that was developed and trained by the author based on the diffusion model architecture, can generate 512X512 high-quality mountain pictures, to the extent of photorealistic. Under the same prompts, the mountain imagines generated by DeepMountain v1.1.4.2 are richer than those generated by Midjourney v5.0 and SD v1.5.

Mini AI knowledge: Herbert Simon's prediction about AI made in 1965 did not come true. In the 50 years since then, people no longer seem to have expectations for Herbert Simon’s AI prediction. However, starting in 2016, this all began to change . . .

AlphaGo Zero: AlphaGo is a Go-playing computer program developed by DeepMind, a company located in London, England (later acquired by Google as a subsidiary). Unlike traditional AI chess-playing programs, AlphaGo's search algorithm is implemented through deep neural network training. In March 2016, AlphaGo defeated Korean 9-dan professional player Lee Sedol with a score of 4:1, and Lee became the only human Go player to have defeated the AlphaGo program.

In 2017, DeepMind launched a new version of AlphaGo called AlphaGo Zero. Compared with AlppaGo, this new version has made a significant leap: this intelligent program does not have any existing human knowledge about Go, that is, zero knowledge. All its subsequent knowledge about Go comes from self-learning - playing games with itself. It played 4,900,000 games against itself in three days. After 40 days of knowledge accumulation, AlphaGo Zero's Go level surpassed all older versions of AlphaGo programs.

On May 27, 2017, AlphaGo Zero competed with Ke Jie, the Chinese professional Go player ranked number one in the world at the time, and defeated Ke Jie 3:0.

At this point, AlphaGo Zero’s Go ability will never be surpassed by humans.

Write

Benno Klier

Thank you all for the very interesting and important discussion! True to the motto “defeat the system at its own game”, I asked ChatGPT for reliable methods for photographic web portals to identify a photo to be a real photo (with origin from a camera taken through a lens) and not be manipulated or completely created by generative AI. Here is the response: === START ChatGPT Response === Identifying whether a photo is real (taken with a camera through a lens) or manipulated/generated by generative AI is a challenging task, but several reliable methods and technologies can help photographic web portals verify the authenticity of images. Here are some approaches: ### 1. **Metadata Analysis** - **EXIF Data**: Examining the Exchangeable Image File Format (EXIF) metadata, which includes information about the camera, settings, date, and location of the photo. Authentic photos usually have consistent and plausible EXIF data. - **Tampering Signs**: Checking for inconsistencies or anomalies in the EXIF data that might suggest manipulation. ### 2. **Digital Watermarking** - **Invisible Watermarks**: Embedding and later detecting digital watermarks that are not visible to the naked eye but can be verified to confirm the photo's authenticity. ### 3. **File Format Analysis** - **JPEG Structures**: Analyzing the JPEG compression artifacts and structure. AI-generated images might not have the same compression patterns as real photos. - **Noise Patterns**: Checking for the natural noise patterns that are characteristic of specific camera sensors. Generative AI images might lack these natural patterns. ### 4. **AI and Machine Learning Models** - **Deep Learning Detection**: Using deep learning models trained to differentiate between real and AI-generated images. These models can analyze textures, inconsistencies, and other subtle clues. - **Generative Adversarial Networks (GANs)**: Employing GAN-based detectors that are specifically trained to spot synthetic images. ### 5. **Photo-Forensic Tools** - **Forensic Software**: Utilizing specialized software like Adobe's Content Authenticity Initiative or tools like FotoForensics that offer various analyses, including error level analysis (ELA), to detect manipulations. ### 6. **Blockchain Verification** - **Provenance Tracking**: Using blockchain to record the history and provenance of images. Each image can be logged from the point of capture, ensuring that any subsequent manipulations are recorded and verifiable. ### 7. **Human Review** - **Expert Analysis**: Involving human experts to review suspicious images. Experienced photographers and forensic analysts can often detect subtle signs of manipulation that automated tools might miss. ### 8. **Multi-Layered Approach** - **Combining Methods**: Implementing a multi-layered approach that uses a combination of the above methods for more robust verification. Cross-referencing results from different techniques can significantly improve reliability. ### Implementation Example An example implementation for a photographic web portal might look like this: 1. **Initial Upload Screening**: - Run automated checks on EXIF data and file format structures. - Use AI models to flag potentially synthetic images. 2. **Forensic Analysis**: - Apply forensic software tools for deeper analysis of flagged images. - Check for digital watermarks. 3. **Blockchain Logging**: - Record the image on a blockchain for provenance tracking if available. 4. **Expert Review**: - Have human experts review images that pass initial automated checks but still appear suspicious. By employing a combination of these methods, photographic web portals can significantly enhance their ability to discern real photographs from manipulated or AI-generated images, ensuring the integrity of the content they host. === END ChatGPT Response === As a conclusion, it is immediately apparent how complex, challenging and probably even expensive such a task would be!

Udo Dittmann PRO

As I am a curious but impatient person, I have now read Professor Zhang's entire article and therefore know where his journey of thought is going (please follow the author link & blog). I would have appreciated it if Yan had simply explained to us in advance, with reference to his technical article on the web, which aspects of the use of AI here the users and the management of 1x.com should discuss with each other objectively in order to agree on a course of action for the near future. The use of AI-generated and manipulated images has long since gone more or less unnoticed here too, although it is currently strictly prohibited by management.

Yan Zhang CREW

Dear Udo, thanks for your response and read my entire article from my website. In my opinion, as I have stated in my article (later part), that the traditional photography or called "pure photography" will stand, no matter how generative AI influences photography in general. But from a technical aspect, with the continuous advancement of generative AI models, very soon (if it is not now yet), we will not be able to distinguish between an AI generated image and a image taken from a camera, if no other means is involved such as checking RAW file, etc. So how can we trust a photographer's work when he /she claims that that work does not use any generative AI tools even if those tools are available in Photoshop? This is a big problem. Back to 1x, I have read that some awarded images were actually combined with generative AI components but curators did not discover during the evaluation process. At this stage, I do not know how we can resolve this problem. Before some new technology is developed, checking RAW files is the only way to find out if an image contains parts or all by generative AI, but obviously curators' workload would not make this method feasible.

Udo Dittmann PRO

Dear Yan, thank you very much for your comments, which I agree with without exception. Now it's getting exciting for 1x: 1) 1x remains a "pure" photo gallery, then compliance with the rules should at least be randomly checked 2) 1x opens up to the new technologies, then it should be recognizable for everyone whether it is a pure photo (only with basic editing), a composing (without exception from own photos), a completely AI-generated image or a composing with AI-generated image parts. I am very curious to see which path the management of 1x will take. For me, the AI-open path would be very exciting, as it would greatly expand the possibilities for the realization of image concepts. Greetings Udo

Yan Zhang CREW

I understand that many people struggling about reading the technical parts of generative AI. For those not interested in this part, it can be skipped. However, I think for some readers who are interested in getting a deeper understanding about generative AI, this part provides some essential insights explaining why this technology can make such impact. In Part I, I read some people comments that if he/she found out an image was made by AI, not a real photo, he/she would not buy that image. Of course, this is a personal choice. But I think the point here is: as Adobe has made a decision to integrate more and more generative AI tools into Photoshop and other its apps, no doubt, now more and more photos will be processed using generative AI components that already embedded into Photoshop, and you has not way to find out whether a photo you are viewing if a real photo or an AI generated (whole or parts) image. So, how do you decide you should buy an image (photo) you like ? In the later part of this article, there is a topic I specifically discussed: Ownership and Content Authenticity. I believe there will be a good way to resolve these issues, but from both technical and legal viewpoints, it is not an easy task to achieve that goal. In summary, if you find this part is too boring or too technical to read, just skip it. I think the following more parts will be quite reader-friendly :) Thanks.

Miro Susta CREW

Dear Yan I don't agree with part of your response, in my opinion the readers have right comment on your article, positive or negative comments or just to open the discussion, the article autor should have enough time to response each comment individually, and not to write "if it is too boring or too technical to read, just skip it". I also not agree with your statement regarding Photoshop, Photoshop is photo enhancing software, everybody who use Photoshop should use only own or photos purchased or photos with author knowledge, IMO is someone is not during it then he is using stolen photos without author approval. I know that the whole photo world is slowly moving in this direction. Now I recognized that it was very good idea to publish this articles here, this will open eyes of 1x fellows und make them aware of this issue. Thank you for taking your time to create it and many thanks Yvette to publish it. Have a nice weekend and blessed White Monday.

Udo Dittmann PRO

Part 2 is also a very interesting article, but I'm not sure whether the scientific aspects of AI generators will find the right readership here in 1x Magazine. I have already commented on the topic of “photography will die because of AI” in Part 1 and had expected the author to comment on all the contributions before his next article was published. For me personally, the most important question would be how 1x will deal with images that have more or less AI-generated content in the future. I also expect clear statements on this from the management.

Miro Susta CREW

I have the same opinion Udo. It is interesting subject but for me a bit confusing, especially the link with 1x.

Gabriela Pantu PRO

Thank you for this article, so rich in information, extremely interesting and provocative. We are witnessing a real revolution in terms of the creation and manipulation of images, the development of a new form of expression, not being in my opinion fully estimated the impact in all aspects, but predictable still the phenomenal scope. As for photography, it will remain a form of documenting the world and expressing the artistic vision for which there will be creators and consumers, just as there are for painting, sculpture, classical music etc. The times we live in are challenging from all points of view of sight, a fact that leaves its mark on those who want to express themselves artistically. Congratulations on this series of articles, and thanks for the effort, dear Yan!

Miro Susta CREW

Interesting to read article, but I must admit that I am completely confused. I don't understand what will tell the author to us, is it to be understood as instruction how to use AI for unrealistic photo production? If yes is 1x a proper platform for it? Unfortunately the author is not responding to our comments, this is not very good. I agree with Collin comment, AI will have, most probably, problems to survive without our photos. I wish all of you a very nice weekend and relaxing Pentecost Monday.

Alberto Fasani PRO

well said Miro

Izabella Végh PRO

Anche io penso che sbagliano, chi dicono che la fotografia e finito, e non abbiamo più bisogno. Ricordiamo quando quando agli trenta del 1800 quando cominciava la fotografia, i pittori avevano paura che la fotografia porta via la loro arte la pittura. E esiste ancora. Fortunatamente!

Colin Dixon CREW

Way over my head the technical side of AI but a fascinating article. The bit where the Amazon lady tells you photography is over? Don't they need our photos in the first place to generate from ??? Loving this articles though but hope she is wrong.

Written By Yan ZhangPublished by Yvette Depaepe, the 17th of May 2024

Deep Learning and ImageNet

Generative Adversarial Networks (GANs)

Diffusion Models - A New Approach for Generative AI

Written By Yan Zhang
Published by Yvette Depaepe, the 17th of May 2024