F Lite, developed by Freepik, has just introduced its latest breakthrough in AI: an open-source text-to-image model with 10 billion parameters. This model is designed to generate high-quality images efficiently, using a process known as diffusion transformation. It was trained on 80 million copyright-safe images from Freepik’s own collection and demonstrates the power of medium-range computing resources. To train the model, Freepik used 64 H100 GPUs for just two months, showing that large-scale AI tasks can be accomplished without needing ultra-high-end hardware.
What makes F Lite unique is its architecture, which builds upon the diffusion transformer framework. It uses cross-attention conditioning, which helps the model better understand and process text, enabling it to create more accurate images. In a key breakthrough, F Lite found that using intermediate layers from the T5-XXL text encoder resulted in better training efficiency, compared to relying solely on the final layer. The model also incorporates residual value connections and learnable register tokens, which help it perform more efficiently without requiring extra computing power.
The training method for F Lite was a multi-stage approach. It began with training on low-resolution images and moved to more complex, high-resolution datasets. This approach helps the model gradually improve its ability to handle intricate details. To further improve training, F Lite uses techniques like resolution-aware timestep sampling and sequence dropout, allowing the model to converge faster and generate better results.
You May Also Like: Meta Introduces AI App to Compete with ChatGPT
While F Lite excels in generating diverse and high-quality images, it still faces challenges in creating photorealistic images with detailed textures and anatomical accuracy. However, its design and architecture show great potential, and with further training, these challenges could be addressed. To improve its performance even further, F Lite uses supervised fine-tuning and reinforcement learning from human feedback (RLHF), which helps refine the model based on real-world user input.
F Lite’s open-source release is a big step forward for AI-driven image generation. It offers a scalable and efficient solution for anyone looking to create high-quality images, and it’s a valuable tool for researchers and developers in the field. By making it open-source, Freepik hopes to encourage collaboration and innovation, pushing the boundaries of what AI can do in creative industries.