The Best Open Source AI Image Model in 2024: FLUX.1
Black Forest Labs, a company with a team that contributed to the creation of the original Stable Diffusion have released their new model called FLUX.1. It is a 12 billion parameter rectified flow transformer. Based on discussions on Reddit and HuggingFace, it seems like the community is very excited about this new model. We are as well, in fact, we think it is the best open-source AI image model currently available. Let’s take a look at what it offers.
3 Model Variants
The model has 3 options with different licenses:
- FLUX.1[pro]: The biggest model with the best performance. It’s closed source and available via an API: FLUX.1[pro] API.
- FLUX.1[dev]: A smaller model with a similar performance. It’s open-weight and available for research only purposes. You can download it here: FLUX.1[dev].
- FLUX.1[schnell]: The smallest and fastest model, surprisingly capable. It’s open-source (Apache 2.0) and available for commercial purposes. You can download it here: FLUX.1[schnell].
Our blog post will focus on FLUX.1[schnell] as it is open-source and available for commercial purposes. That being said, the other two models will perform slightly better.
Text in Images
FLUX.1 can incorporate text into images, and it does it very well. In our tests, compared to Stable Diffusion 3 Medium, it was able to generate images with text more accurately and with fewer tries. Here are a couple of examples:
Text Example 1
Text Example 2
Better Prompt Adherence
FLUX.1 has an exceptional understanding of your prompts because it incorporates significantly bigger text encoders. It can also accept longer prompts. This comes in handy when you are trying to describe complicated scenes or trying to go for a very specific style.
Adherence Example 1
Adherence Example 2
The Aesthetic
Some models adhere to your prompt well but the results are not very aesthetically pleasing. FLUX.1 is definitely not one of them as far as we are concerned. We like its general aesthetic a lot. It can do realistic images or artistic ones. It has a good understanding of color and light. Here are a couple of examples:
Example 1
Example 2
Example 3
The VRAM Problem
Because of the aforementioned bigger text encoders and 12 billion parameters, FLUX.1 requires more VRAM than some other models (although it is similar to Stable Diffusion 3 Medium). This means that you will need a beefier GPU to run it, with 8-bit quantized encoders, it barely fits on flagship consumer GPUs with 24GB of VRAM. It might still cause problems on some generations if you are generating 2+ images at a time at 1024x1024 resolution.
Conclusion
We like this one a lot. It has state of the art prompt adherence, it can do text, and it’s open-source. We also like its general aesthetic significantly better than our current default model Stable Diffusion 3 Medium. So, we are making FLUX.1 the default model on Stablecog going forward.
Give It A Try
You can try FLUX.1 on Stablecog right now! Just click the button below and see what it can do.