You are reading the documentation for the in-development version of VSGAN.

Interface#

This part of the documentation covers all the interfaces of VSGAN.

The initial clip provided to the Network will be the base clip used by all further calls. Each time you run a model, it will apply to the base clip, and then overwrite it.

Once you have done all the calls you wish to do on the clip, get the final clip by taking the clip property of the Network object.

Architectures#

class vsgan.archs.ESRGAN(clip: vapoursynth.VideoNode, device: str | int = 'cuda')#

ESRGAN - Enhanced Super-Resolution Generative Adversarial Networks. By Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy.

Supports the following iterative architectures:

ESRGAN (old/new): https://arxiv.org/abs/1809.00219
ESRGAN+: https://arxiv.org/abs/2001.08073
Real-ESRGAN (v1 only): https://arxiv.org/abs/2107.10833

Also supports ESRGAN-based models that only differ during training, e.g., A-ESRGAN and BSRGAN.

load(state: str) → ESRGAN#

Load an ESRGAN model state file and send to the PyTorch device. The model state can be changed at any point.

Supported Model Files: - Must be a Generator model. - ESRGAN (old and new) - ESRGAN+ - Real-ESRGAN (v1 and v2) - A-ESRGAN

Parameters:: state – Path to a supported PyTorch .pth Model state file.

apply(overlap: int = 16) → ESRGAN#

Apply the model on each frame of the clip.

Overlap should generally be a multiple of 16. The larger the input resolution, the larger overlap may need to be set. Avoid using a value excessively large.

Parameters:: overlap – Amount to overlap each tile as to hide artefact seams.

class vsgan.archs.EGVSR(clip: vapoursynth.VideoNode, device: str | int = 'cuda')#

EGVSR - Efficient & Generic Video Super-Resolution. By Yanpeng Cao, Chengcheng Wang, Changjun Song, Yongming Tang, and He Li. https://arxiv.org/abs/2107.05307

load(state: str, scale: int = 4, in_nc: int = 3, out_nc: int = 3, nf: int = 64, nb: int = 16, degradation: Literal['BI', 'BD'] = 'BI') → EGVSR#

Load an EGVSR model state file and send to the PyTorch device. The model state can be changed at any point.

Parameters:

state – Path to a supported PyTorch Model file.
scale – Model Scale, the resulting scale relative to the input.
in_nc – Input number of channels.
out_nc – Output number of channels.
nf – Number of filters.
nb – Number of blocks.
degradation – Upsample Function.

apply(interval: int = 5) → EGVSR#

Apply the model on each frame of the clip.

Parameters:: interval – Amount of frames ahead to inference. Must be greater than 0.

class vsgan.archs.SwinIR(clip: vapoursynth.VideoNode, device: str | int = 'cuda')#

SwinIR - Image Restoration Using Swin Transformer. By Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. https://arxiv.org/abs/2108.10257

load(state: str, img_size: int | tuple[int, ...] = 64, patch_size: int | tuple[int, ...] = 1, qkv_bias: bool = True, qk_scale: float | None = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.1, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>, ape: bool = False, patch_norm: bool = True, use_checkpoint: bool = False) → SwinIR#

Load a SwinIR model state file and send to the PyTorch device. The model state can be changed at any point.

Parameters:

state – Path to a supported PyTorch Model file.
img_size – Input image size.
patch_size – Patch size.
qkv_bias – If True, add a learnable bias to query, key, value.
qk_scale – Override default qk scale of head_dim ** -0.5 if set.
drop_rate – Dropout rate.
attn_drop_rate – Attention dropout rate.
drop_path_rate – Stochastic depth rate.
norm_layer – Normalization layer.
ape – If True, add absolute position embedding to the patch embedding.
patch_norm – If True, add normalization after patch embedding.
use_checkpoint – Whether to use checkpointing to save memory.

apply(overlap: int = 16) → SwinIR#

Apply the model on each frame of the clip.

Overlap should generally be a multiple of 16. The larger the input resolution, the larger overlap may need to be set. Avoid using a value excessively large.

Parameters:: overlap – Amount to overlap each tile as to hide artefact seams.

class vsgan.archs.HAT(clip: vapoursynth.VideoNode, device: str | int = 'cuda')#

Hybrid Attention Transformer - Activating More Pixels in Image Super-Resolution Transformer. By Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. https://arxiv.org/abs/2205.04437

load(state: str, img_size: int | tuple[int, int] = 64, patch_size: int | tuple[int, ...] = 1, compress_ratio: int = 3, squeeze_factor: int = 30, conv_scale: float = 0.01, overlap_ratio: float = 0.5, qkv_bias: bool = True, qk_scale: float | None = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.1, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>, ape: bool = False, patch_norm: bool = True, use_checkpoint: bool = False, img_range: float = 1.0) → HAT#

Load a SwinIR model state file and send to the PyTorch device. The model state can be changed at any point.

Parameters:

state – PyTorch Model State dictionary.
img_size (int | tuple(int)) – Input image size. Default 64
patch_size (int | tuple(int)) – Patch size. Default: 1
qkv_bias – If True, add a learnable bias to query, key, value. Default: True
qk_scale – Override default qk scale of head_dim ** -0.5 if set. Default: None
drop_rate – Dropout rate. Default: 0
attn_drop_rate – Attention dropout rate. Default: 0
drop_path_rate – Stochastic depth rate. Default: 0.1
norm_layer – Normalization layer. Default: nn.LayerNorm.
ape – If True, add absolute position embedding to the patch embedding. Default: False
patch_norm – If True, add normalization after patch embedding. Default: True
use_checkpoint – Whether to use checkpointing to save memory. Default: False
img_range – Image range. 1. or 255.

apply(overlap: int = 16) → HAT#

Apply the model on each frame of the clip.

Overlap should generally be a multiple of 16. The larger the input resolution, the larger overlap may need to be set. Avoid using a value excessively large.

Parameters:: overlap – Amount to overlap each tile as to hide artefact seams.

Utilities#

vsgan.utilities.frame_to_tensor(f: vapoursynth.VideoFrame, as_f16=True) → Tensor#

Convert a VapourSynth VideoFrame into a PyTorch Tensor.

Parameters:

f – VapourSynth VideoFrame from a clip.
as_f16 – Convert to float16 in 0,1 range.

vsgan.utilities.get_frame_plane(f: vapoursynth.VideoFrame, n: int) → memoryview#

Get a VideoFrame’s Plane data as a MemoryView or a numpy array. Supports VS API 3 and 4.

Parameters:

f – VapourSynth VideoFrame from a clip.
n – Plane number.

vsgan.utilities.join_tiles(tiles: tuple[Tensor, ...], overlap: int) → Tensor#

Join Tiled PyTorch Tensor quadrants into one large PyTorch Tensor. Expects input PyTorch Tensor’s shapes to end in HW order.

Ensure the overlap value is what it currently is, possibly after super-resolution, not before!

Parameters:

tiles – The PyTorch Tensor tiles you wish to rejoin.
overlap – The amount of overlap currently between tiles.

vsgan.utilities.tensor_to_clip(clip: vapoursynth.VideoNode, image: Tensor) → vapoursynth.VideoNode#

Convert a PyTorch Tensor into a VapourSynth VideoNode (clip).

Expecting Torch shape to be in CHW order.

Parameters:

clip – Used to inherit expected return properties only.
image – PyTorch Tensor.

vsgan.utilities.tensor_to_frame(f: vapoursynth.VideoFrame, t: Tensor) → vapoursynth.VideoFrame#

Copies each channel from a Tensor into a VapourSynth VideoFrame. Supports any depth and format, and will return in the same format.

It expects the tensor array to have the dimension count (C) first in the shape, e.g., CHW or CWH.

Parameters:

f – VapourSynth frame to store retrieved planes.
t – PyTorch Tensor array to retrieve planes from.

vsgan.utilities.tile_tensor(t: Tensor, overlap: int = 16) → tuple[Tensor, ...]#: Tile PyTorch Tensor into 4 quadrants with an overlap between tiles. Expects input PyTorch Tensor’s shape to end in HW order.

vsgan.utilities.tile_tensor_r(t: Tensor, model: Module, overlap: int = 16, max_depth: int = None, current_depth: int = 1) → tuple[Tensor, int]#: Recursively Tile PyTorch Tensor until the device has enough VRAM. It will try to tile as little as possible, and wont tile unless needed. Expects input PyTorch Tensor’s shape to end in HW order.

vsgan.utilities.window_partition(x: Tensor, window_size: int) → Tensor#

Chunk a Tensor to batches of n windows/chunks.

Parameters:

x – Tensor in the shape (b, h, w, c)
window_size – Window size

Returns Tensor in the shape (num_windows*b, window_size, window_size, c)

vsgan.utilities.window_reverse(windows: Tensor, window_size: int, h: int, w: int) → Tensor#

Stitch back together batches of Windows/Chunks.

Parameters:

windows – Tensor in the shape (num_windows*b, window_size, window_size, c)
window_size – Window size
h – Height of image
w – Width of image

Returns stitched tensor in the shape (b, h, w, c)