Technical password of generated image AI

In the past few years, artificial intelligence (AI) has made great progress, and AI’s new products include AI image generator. This is a tool that can convert input statements into images. There are many AI tools for text-to-image conversion, but the most prominent ones are DALL-E 2, Stable Diffusion and Midjourney.

DALL-E 2 is developed by OpenAI and the project of chatgpt is complementary. It generates images through a paragraph of text description. Its GPT-3 converter model trained with more than 10 billion parameters can interpret natural language input and generate corresponding images.

DALL-E 2 mainly consists of two parts-converting user input into a representation of an image (called Prior), and then converting this representation into an actual photo (called Decoder).

The text and images used in it are embedded in another network called CLIP (Contrast Language-Image Pre-training), which is also developed by OpenAI. CLIP is a neural network that returns the best title for the input image. What it does is the opposite of what DALL-E 2 does-it converts images into text, while DALL-E 2 converts text into images. The purpose of introducing CLIP is to learn the connection between visual and text representation of objects.

DALL-E 2′ s job is to train two models. The first one is Prior, which accepts text labels and creates CLIP image embedding. The second is Decoder, which accepts CLIP image embedding and generates images. After the model training is completed, the reasoning process is as follows:

  • The input text is converted into CLIP text embedding using neural network.

  • Use Principal Component Analysis to reduce the dimension of text embedding.

  • Create an image embedding using text embedding.

  • After entering the Decoder step, the diffusion model is used to embed the image into an image.

  • The image is enlarged from 64×64 to 256×256, and finally enlarged to 1024×1024 by using convolutional neural network.

Stable Diffusion is a text-to-image model, which uses CLIP ViT-L/14 text encoder and can adjust the model through text prompts. It separates the imaging process into a "diffusion" process at runtime-starting from the noisy situation, gradually improving the image until there is no noise at all, and gradually approaching the provided text description.

Stable Diffusion is based on Latent Diffusion Model(LDM), which is a top-notch text-to-image synthesis technology. Before understanding the working principle of LDM, let’s look at what is diffusion model and why we need LDM.

Diffusion Models, DM) is a generation model based on Transformer, which samples a piece of data (such as an image) and gradually increases the noise over time until the data cannot be recognized. This model tries to return the image to its original form, and in the process, it learns how to generate pictures or other data.

The problem of DM is that powerful DM often consumes a lot of GPU resources, and the cost of reasoning is quite high due to Sequential Evaluations. In order to train DM on limited computing resources without affecting its quality and flexibility, Stable Diffusion applies DM to powerful Pre-trained Autoencoders.

On this premise, the diffusion model is trained, which makes it possible to achieve an optimal balance between reducing complexity and preserving data details, and significantly improves the visual reality. The cross attention layer is introduced into the model structure, which makes the diffusion model a powerful and flexible generator and realizes the high-resolution image generation based on convolution.

Midjourney is also a tool driven by artificial intelligence, which can generate images according to the user’s prompts. MidJourney is good at adapting to the actual artistic style and creating images with any combination of effects that users want. It is good at environmental effects, especially fantasy and science fiction scenes, which look like the artistic effects of games.

DALL-E 2 uses millions of image data for training, and its output results are more mature, which is very suitable for enterprises to use. When there are more than two characters, the image generated by DALL-E 2 is much better than that generated by Midjourney or Stable Diffusion.

Midjourney is a tool famous for its artistic style. Midjourney uses its Discord robot to send and receive requests for AI servers, and almost everything happens on Discord. The resulting image rarely looks like a photo, it seems to be more like a painting.

Stable Diffusion is an open source model that everyone can use. It has a good understanding of contemporary art images and can produce works of art full of details. However, it needs to explain the complex prompt. Stable Diffusion is more suitable for generating complex and creative illustrations. However, there are some shortcomings in creating general images.

What is the difference between a steam room and a sauna?

I always can’t tell the difference between sweat steaming and sauna, I don’t think there is any difference between the two, so what is the difference between sweat steaming and sauna? Which is better, steam or sauna?
First, the principle is different
Sweat steaming mainly uses the energy field formed by the negative ions, far-infrared rays, and micro-currents released by tourmaline to act on the human body and stimulate the body to generate heat energy; while sauna uses high-temperature water vapor to act on the skin from the outside and conduct heat into the body. .

Second, the temperature is different
The steaming temperature is around 42-45 degrees Celsius, and the scientific steaming time is 40-60 minutes. People will not feel suffocated during the sweating process, and the breathing will be very smooth; the sauna temperature can reach 60-70 degrees Celsius, when the human body can only stay 5-10 minutes, if the time is too long, you will feel chest tightness and shortness of breath.

  1. Perspiration effect
    The sweat discharged by the steaming has no sweat smell, which will make the human skin smooth, which can have the benefits of beauty, body beauty, and disease conditioning. However, it should be noted that it is not suitable to shower within 6 hours after the steaming, as the shower will detract from the health care of the steaming. The sweat is sticky and has a smell of sweat, and it is necessary to take a bath after the sauna to remove the smell of sweat.
  2. Operation method
    Sweat steaming is through physical conditions such as far infrared rays, negative ions, expanding substances and trace elements to expand the pores of the human body in a high temperature manner, discharge toxic substances in the body, and achieve the effect of health care; while the sauna uses steam to make the deep skin heat effect under high temperature conditions. , and then achieve the effect of expelling toxins in the body through the pores that are enlarged under high temperature.