Quan Dao

CS PhD student

mypro.jpg

I am first-year PhD student at Rutgers University under supervision of Distinguished Prof. Dimitris Metaxas. My research focuses on generative models, specifically diffusion models and visual autoregressive models, with a primary emphasis on fundamental research. For diffusion models, I concentrate on developing efficient and robust training methodologies. Additionally, I am deeply interested in consistency model training and distillation, particularly in improving robust training techniques and addressing the weaknesses inherent in consistency distillation processes. Besides, I have recently been researching visual autoregressive models, focusing on their applications to downstream tasks such as editing and text-to-3D generation. Previously, I was a Research Resident under the supervision of Dr. Tuan Anh Tran at VinAI Research, Vietnam and spent 2 wonderful years there. I received a bachelor degree in computer science from Monash University in 2020.


news

Mar 25, 2025 :star: I will join Apple MLR for Research Internship and work on fundamental machine learning problem
Feb 28, 2025 :zap: DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models got accepted at CVPR 2025. This paper proposes editing technique for discrete diffusion model)
Jan 22, 2025 :zap: Improved Latent Consistency Model got accepted at ICLR 2025. This paper proposes series of novel techniques like Cauchy loss, OT coupling, adaptive robust scale scheduler and diff loss at early timestep to efficiently train latent consistency model from scatch. Our technique bridges the performance gap between LDM and LCM training. (this is the first work discovering the unstability of consistency model on latent space due to impulsive outlier.)
Dec 10, 2024 :zap: SCFlow got accepted at AAAI 2025. This is the first work attempting to distill flow matching model into one and few step generation. With SCFlow, we could achieve consistent one and few step generation, which means starting from a noise, no matter how many NFEs is used for sampling, the final generated image is indentical.
Sep 23, 2024 :zap: Yummy DimSUM got accepted at NeurIPS 2024. DimSUM proposes novel hybrid transformer-mamba architecture allowing faster convergence training of diffusion/flow matching model and also achieve SoTA image generation.
Jul 21, 2024 :zap: RDUOT got accepted at ECCV 2024. This paper combines UOT generative framework with diffusion noising to allow train fast-converged and robust generative framework.
Jul 13, 2023 :zap: Antidreambooth got accepted at ICCV 2023. AntiDreambooth adds small undistinguished noise to your images to break the malicous explotation of Dreambooth on your images.
Feb 26, 2023 :zap: My first paper Wavediff got accepted at CVPR 2023. Wavediff proposes the frequency-aware Unet architecture allowing fast converence training for DiffusionGAN framework.

selected publications

  1. sLCT.png
    Improved Training Technique for Latent Consistency Models
    Quan Dao*, Khanh Doan*, Di Liu, Trung Le, and Dimitris Metaxas
    In International Conference on Learning Representations, 2025
  2. dimsum.png
    DiMSUM: Diffusion Mamba - A Scalable and Unified Spatial-Frequency Method for Image Generation
    Hao Phung*, Quan Dao*, Trung Dao, Hoang Phan, Dimitris Metaxas, and Anh Tran
    In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
  3. SCflow.png
    Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
    Quan Dao*, Hao Phung*, Trung Dao, Dimitris Metaxas, and Anh Tran
    In Association for the Advancement of Artificial Intelligence, 2025
  4. rduot.png
    A High-Quality Robust Diffusion Framework for Corrupted Dataset
    Quan Dao*, Binh Ta*, Tung Pham, and Anh Tran
    In European Conference on Computer Vision, 2024
  5. flow.png
    Flow Matching in Latent Space
    Quan Dao*, Hao Phung*, Binh Nguyen, and Anh Tran
    arXiv preprint arXiv:2307.08698, 2023
  6. anti.png
    Anti-DreamBooth: Protecting users from personalized text-to-image synthesis
    Thanh Van Le*, Hao Phung*, Thuan Hoang Nguyen*, Quan Dao*, Ngoc Tran, and Anh Tran
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Oct 2023
  7. single_wavelet.png
    Wavelet Diffusion Models Are Fast and Scalable Image Generators
    Hao Phung*, Quan Dao*, and Anh Tran
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2023