Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting

Abstract

3D style transfer refers to the artistic stylization of 3D assets based on reference style images. Recently, 3DGS-based stylization methods have drawn considerable attention, primarily due to their markedly enhanced training and rendering speeds. However, a vital challenge for 3D style transfer is to strike a balance between the content and the patterns and colors of the style. Although the existing methods strive to achieve relatively balanced outcomes, the fixed-output paradigm struggles to adapt to the diverse content-style balance requirements from different users. In this work, we introduce a creative intensity-tunable 3D style transfer paradigm, dubbed Tune-Your-Style, which allows users to flexibly adjust the style intensity injected into the scene to match their desired content-style balance, thus enhancing the customizability of 3D style transfer. To achieve this goal, we first introduce Gaussian neurons to explicitly model the style intensity and parameterize a learnable style tuner to achieve intensity-tunable style injection. To facilitate the learning of tunable stylization, we further propose the tunable stylization guidance, which obtains multi-view consistent stylized views from diffusion models through cross-view style alignment, and then employs a two-stage optimization strategy to provide stable and efficient guidance by modulating the balance between full-style guidance from the stylized views and zero-style guidance from the initial rendering. Extensive experiments demonstrate that our method not only delivers visually appealing results, but also exhibits flexible customizability for 3D style transfer.

Method

Our method comprises two pivotal components, namely Intensity-tunable Style Injection (ISI) and Tunable Stylization Guidance (TSG). ISI introduces Gaussian neurons to explicitly model style intensity and parameterizes a learnable style tuner, enabling users to flexibly adjust the style intensity injected into the scene. To facilitate the learning of the style intensity and tuner, TSG first employs a diffusion model to perform style transfer on rendered views, and obtains multi-view consistent stylized results through cross-view style alignment. Then, TSG adopts a two-stage optimization strategy to achieve stable and efficient tunable stylization guidance, with full-style guidance in the stylized results and zero-style guidance in the initial rendering.

Video

Qualitative Results

Comparison with 3DGS-based Methods.

Comparison with NeRF-based Methods.

Results of Intensity-tunable 3D Stylization