tl;dr We combine the efficiancy of convolutional approaches with the expressivity of transformers by introducing a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results