tednet - A Toolkit for Tensor Decomposition Networks

tednet is a toolkit for tensor decomposition networks. Tensor decomposition networks are neural networks whose layers are decomposed by tensor decomposition, including CANDECOMP/PARAFAC, Tucker2, Tensor Train, Tensor Ring and so on. For a convenience to do research on it, tednet provides excellent tools to deal with tensorial networks.

Installation

With pip, it is convenient to install the tednet by

pip install tednet

To upgrade to the latest version, run

pip install tednet --upgrade

Quick Start

In this section, we would like to show a overview to give a quick start.

Operation

There are some operations supported in tednet, and it is convinient to use them.

[1]:
import tednet as tdt

Create matrix whose diagonal elements are ones

[2]:
diag_matrix = tdt.eye(5, 5)
print(diag_matrix)
tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])

Take Pytorch tensor to Numpy narray

[3]:
print(type(diag_matrix))
<class 'torch.Tensor'>
[4]:
diag_matrix = tdt.to_numpy(diag_matrix)
[5]:
print(type(diag_matrix))
<class 'numpy.ndarray'>

Take Numpy narray to Pytorch tensor

[6]:
diag_matrix = tdt.to_tensor(diag_matrix)
[7]:
print(type(diag_matrix))
<class 'torch.Tensor'>

Tensor Decomposition Networks (Tensor Ring for Sample)

To use tensor ring decomposition models, simply calling the tensor ring module is enough.

[8]:
import tednet.tnn.tensor_ring as tr

Here, we would like to give a case of building the TR-LeNet5.

[9]:
# Define a TR-LeNet5
model = tr.TRLeNet5(10, [6, 6, 6, 6])
compression_ration is:  0.3968253968253968
compression_ration is:  14.17233560090703
compression_ration is:  241.54589371980677
compression_ration is:  2.867383512544803

API

Access to classes and functions of tednet.

tednet

tednet.hard_sigmoid(tensor: torch.Tensor)torch.Tensor

Computes element-wise hard sigmoid of x. See e.g. https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/sigm.py#L279

Parameters

tensor (torch.Tensor) – tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Returns

tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Return type

torch.Tensor

tednet.eye(n: int, m: int, device: torch.device = 'cpu', requires_grad: bool = False)torch.Tensor

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.

Parameters
  • n (int) – the number of rows

  • m (int) – the number of columns

  • device (torch.device) – the desired device of returned tensor. Default will be the CPU.

  • requires_grad (bool) – If autograd should record operations on the returned tensor. Default: False.

Returns

2-D tensor \(\in \mathbb{R}^{{n} \times {m}}\)

Return type

torch.Tensor

tednet.to_numpy(tensor: torch.Tensor)numpy.ndarray

Convert torch.Tensor to numpy.ndarray.

Parameters

tensor (torch.Tensor) – tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Returns

arr \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Return type

numpy.ndarray

tednet.to_tensor(arr: numpy.ndarray)torch.Tensor

Convert numpy.ndarray to torch.Tensor.

Parameters

arr (numpy.ndarray) – arr \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Returns

tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Return type

torch.Tensor

tednet.tnn

tednet.tnn.initializer

tednet.tnn.initializer.trunc_normal_init(model, mean: float = 0.0, std: float = 0.1)

Initialize network with truncated normal distribution

Parameters
  • model (Any) – a model needed to be initialized

  • mean (float) – mean of truncated normal distribution

  • std (float) – standard deviation of truncated normal distribution

tednet.tnn.initializer.normal_init(model, mean: float = 0.0, std: float = 0.1)

Initialize network with standard normal distribution

Parameters
  • model (Any) – a model needed to be initialized

  • mean (float) – mean of normal distribution

  • std (float) – standard deviation of normal distribution

tednet.tnn.initializer.uniform_init(model, a: float = 0.0, b: float = 1.0)

Initialize network with standard uniform distribution

Parameters
  • model (Any) – a model needed to be initialized

  • a (float) – the lower bound of the uniform distribution

  • b (float) – the upper bound of the uniform distribution

tednet.tnn.tn_module

class tednet.tnn.tn_module._TNBase(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias: bool = True)

Bases: torch.nn.modules.module.Module

The basis of tensor decomposition networks.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\)

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\)

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\)

  • bias (bool) – use bias or not. True to use, and False to not use

check_setting()

Check whether in_shape, out_shape, ranks are 1-D params.

abstract set_tn_type()

Set the tensor decomposition type. The types are as follows:

type

tensor decomposition

tr

Tensor Ring

tt

Tensor Train

tk2

Tucker2

cp

CANDECAMP/PARAFAC

btt

Block-Term Tucker

Examples

>>> tn_type = "tr"
>>> self.tn_info["type"] = tn_type
abstract set_nodes()

Generate tensor nodes, then add node information to self.tn_info.

Examples

>>> nodes_info = []
>>> node_info = dict(name="node1", shape=[2, 3, 4])
>>> nodes_info.append(node_info)
>>> self.tn_info["nodes"] = nodes_info
abstract set_params_info()

Record information of Parameters.

Examples

>>> self.tn_info["t_params"] = tn_parameters
>>> self.tn_info["ori_params"] = ori_parameters
>>> self.tn_info["cr"] = ori_parameters / tn_parameters
abstract tn_contract(inputs: torch.Tensor)torch.Tensor

The method of contract inputs and tensor nodes.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_m}}\)

Returns

tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)

Return type

torch.Tensor

abstract recover()

Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tn_module.LambdaLayer(lambd)

Bases: torch.nn.modules.module.Module

A layer consists of Lambda function.

Parameters

lambd – a lambda function.

training: bool
_is_full_backward_hook: Optional[bool]
forward(inputs: torch.Tensor)torch.Tensor

Forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

tednet.tnn.tn_linear

class tednet.tnn.tn_linear._TNLinear(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias=True)

Bases: tednet.tnn.tn_module._TNBase

The Tensor Decomposition Linear.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of feature out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\). The ranks of linear

  • bias (bool) – use bias of linear or not. True to use, and False to not use

forward(inputs)

Tensor linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

Returns

tensor \(\in \mathbb{R}^{b \times C'}\)

Return type

torch.Tensor

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.tn_cnn

class tednet.tnn.tn_cnn._TNConvNd(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)

Bases: tednet.tnn.tn_module._TNBase

Tensor Decomposition Convolution.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of channel out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\). The ranks of the decomposition

  • kernel_size (Union[int, tuple]) – The convolutional kernel size

  • stride (int) – The length of stride

  • padding (int) – The size of padding

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

forward(inputs: torch.Tensor)

Tensor convolutional forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times H' \times W' \times C'}\)

Return type

torch.Tensor

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.tn_rnn

class tednet.tnn.tn_rnn._TNLSTMCell(hidden_size: int, tn_block, drop_ih=0.3, drop_hh=0.35)

Bases: torch.nn.modules.module.Module

Tensor LSTMCell.

Parameters
  • hidden_size (int) – The hidden size of LSTMCell

  • tn_block – The block class of input-to-hidden door

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

reset_hh()

Reset parameters of hidden-to-hidden layer.

forward(inputs: torch.Tensor, state: tednet.tnn.tn_rnn.LSTMState)

Forwarding method. LSTMState = namedtuple(‘LSTMState’, [‘hx’, ‘cx’])

Parameters
  • inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

  • state (LSTMState) – namedtuple: [hx \(\in \mathbb{R}^{H}\), cx \(\in \mathbb{R}^{H}\)]

Returns

result: hy \(\in \mathbb{R}^{H}\), [hy \(\in \mathbb{R}^{H}\), cy \(\in \mathbb{R}^{H}\)]

Return type

torch.Tensor, [torch.Tensor, torch.Tensor]

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tn_rnn._TNLSTM(hidden_size, tn_block, drop_ih=0.3, drop_hh=0.35)

Bases: torch.nn.modules.module.Module

Tensor LSTM.

Parameters
  • hidden_size (int) – The hidden size of LSTM

  • tn_block – The block class of input-to-hidden door

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

forward(inputs, state)

Forwarding method. LSTMState = namedtuple(‘LSTMState’, [‘hx’, ‘cx’])

Parameters
  • inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{S \times b \times C}\)

  • state (LSTMState) – namedtuple: [hx \(\in \mathbb{R}^{H}\), cx \(\in \mathbb{R}^{H}\)]

Returns

tensor \(\in \mathbb{R}^{S \times b \times C'}\), LSTMState is a namedtuple: [hy \(\in \mathbb{R}^{H}\), cy \(\in \mathbb{R}^{H}\)]

Return type

torch.Tensor, LSTMState

training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.cp

class tednet.tnn.cp.CPConv2D(c_in: int, c_out: int, rank: int, kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)

Bases: tednet.tnn.tn_cnn._TNConvNd

CANDECOMP/PARAFAC Decomposition Convolution.

Parameters
  • c_in (int) – The decomposition shape of channel in

  • c_out (int) – The decomposition shape of channel out

  • rank (int) – The rank of the decomposition

  • kernel_size (Union[int, tuple]) – The convolutional kernel size

  • stride (int) – The length of stride

  • padding (int) – The size of padding

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as CANDECOMP/PARAFAC decomposition type.

set_nodes()

Generate CANDECOMP/PARAFAC nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tensor Decomposition Convolution.

Parameters

inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

forward(inputs: torch.Tensor)

Tensor convolutional forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.cp.CPLinear(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], rank: int, bias: bool = True)

Bases: tednet.tnn.tn_linear._TNLinear

The CANDECOMP/PARAFAC Decomposition Linear.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of feature out

  • ranks (int) – The rank of linear

  • bias (bool) – use bias of linear or not. True to use, and False to not use

set_tn_type()

Set as CANDECOMP/PARAFAC decomposition type.

set_nodes()

Generate tensor ring nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

CANDECOMP/PARAFAC linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

Returns

tensor \(\in \mathbb{R}^{b \times C'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.cp.CPLeNet5(num_classes: int, rs: Union[list, numpy.ndarray])

Bases: torch.nn.modules.module.Module

LeNet-5 based on CANDECOMP/PARAFAC.

Parameters
  • num_classes (int) – The number of classes

  • rs (Union[list, numpy.ndarray]) – The ranks of network.

forward(inputs: torch.Tensor)torch.Tensor

forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times num\_classes}\)

Return type

torch.Tensor

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.cp.CPResNet20(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.cp.cp_resnet.CPResNet

ResNet-20 based on CANDECOMP/PARAFAC.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.cp.CPResNet32(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.cp.cp_resnet.CPResNet

ResNet-32 based on CANDECOMP/PARAFAC.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.cp.CPLSTM(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: int, drop_ih: float = 0.3, drop_hh: float = 0.35)

Bases: tednet.tnn.tn_rnn._TNLSTM

LSTM based on CANDECOMP/PARAFAC.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM

  • hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The hidden shape of LSTM

  • ranks (int) – The rank of linear

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

reset_ih()

Reset parameters of input-to-hidden layer.

training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.tucker2

class tednet.tnn.tucker2.TK2Conv2D(c_in: int, c_out: int, ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)

Bases: tednet.tnn.tn_cnn._TNConvNd

Tucker-2 Decomposition Convolution.

Parameters
  • c_in (int) – The decomposition shape of channel in

  • c_out (int) – The decomposition shape of channel out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\). The ranks of the decomposition

  • kernel_size (Union[int, tuple]) – 1-D param \(\in \mathbb{R}^m\). The convolutional kernel size

  • stride (int) – The length of stride

  • padding (int) – The size of padding

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Tucker-2 decomposition type.

set_nodes()

Generate Tucker-2 nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tucker-2 Decomposition Convolution.

Parameters

inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tucker2.TK2Linear(in_shape: Union[list, numpy.ndarray], out_size: int, ranks: Union[list, numpy.ndarray], bias: bool = True)

Bases: tednet.tnn.tn_linear._TNLinear

Tucker-2 Decomposition Linear.

input length

ranks length

1

1

3

2

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m, m \in \{1, 3\}\). The decomposition shape of feature in

  • out_size (int) – The output size of the model

  • ranks (Union[list, numpy.ndarray]) – 1-D param. The rank of the decomposition

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Tucker-2 decomposition type.

set_nodes()

Generate Tucker-2 nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tucker-2 linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

Returns

tensor \(\in \mathbb{R}^{b \times C'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tucker2.TK2LeNet5(num_classes: int, rs: Union[list, numpy.ndarray])

Bases: torch.nn.modules.module.Module

LeNet-5 based on the Tucker-2.

Parameters
  • num_classes (int) – The number of classes

  • rs (Union[list, numpy.ndarray]) – The ranks of network.

forward(inputs: torch.Tensor)torch.Tensor

forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times num\_classes}\)

Return type

torch.Tensor

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tucker2.TK2ResNet20(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.tucker2.tk2_resnet.TK2ResNet

ResNet-20 based on Tucker-2.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tucker2.TK2ResNet32(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.tucker2.tk2_resnet.TK2ResNet

ResNet-32 based on Tucker-2.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tucker2.TK2LSTM(in_shape: Union[list, numpy.ndarray], hidden_size: int, ranks: Union[list, numpy.ndarray], drop_ih: float = 0.3, drop_hh: float = 0.35)

Bases: tednet.tnn.tn_rnn._TNLSTM

LSTM based on Tucker-2.

input length

ranks length

1

1

3

2

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m, m \in \{1, 3\}\). The input shape of LSTM

  • hidden_size (int) – The hidden size of LSTM

  • ranks (Union[list, numpy.ndarray]) – 1-D param. The ranks of linear

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

reset_ih()

Reset parameters of input-to-hidden layer.

training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.bt_tucker

class tednet.tnn.bt_tucker.BTTConv2D(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], block_num: int, kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)

Bases: tednet.tnn.tn_cnn._TNConvNd

Block-Term Tucker Decomposition Convolution.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+2}\). The rank of the decomposition

  • block_num (int) – The number of blocks

  • kernel_size (Union[int, tuple]) – The convolutional kernel size

  • stride (int) – The length of stride

  • padding (int) – The size of padding

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Block-Term Tucker decomposition type.

set_nodes()

Generate Block-Term Tucker nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Block-Term Tucker Decomposition Convolution.

Parameters

inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

forward(inputs: torch.Tensor)

Block-Term Tucker convolutional forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.bt_tucker.BTTLinear(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], block_num: int, bias: bool = True)

Bases: tednet.tnn.tn_linear._TNLinear

Block-Term Tucker Decomposition Linear.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The rank of the decomposition

  • block_num (int) – The number of blocks

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Block-Term Tucker decomposition type.

set_nodes()

Generate Block-Term Tucker nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Block-Term Tucker linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

Returns

tensor \(\in \mathbb{R}^{b \times C'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.bt_tucker.BTTLeNet5(num_classes: int, rs: Union[list, numpy.ndarray])

Bases: torch.nn.modules.module.Module

LeNet-5 based on the Block-Term Tucker.

Parameters
  • num_classes (int) – The number of classes.

  • rs (Union[list, numpy.ndarray]) – The ranks of network.

forward(inputs: torch.Tensor)torch.Tensor

Block-Term Tucker linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times num\_classes}\)

Return type

torch.Tensor

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.bt_tucker.BTTResNet20(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.bt_tucker.btt_resnet.BTTResNet

ResNet-20 based on Block-Term Tucker.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.bt_tucker.BTTResNet32(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.bt_tucker.btt_resnet.BTTResNet

ResNet-32 based on Block-Term Tucker.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.bt_tucker.BTTLSTM(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], block_num: int, drop_ih: float = 0.3, drop_hh: float = 0.35)

Bases: tednet.tnn.tn_rnn._TNLSTM

LSTM based on Block-Term Tucker.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM

  • hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The hidden shape of LSTM

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The ranks of linear

  • block_num (int) – The number of blocks

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

reset_ih()

Reset parameters of input-to-hidden layer.

training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.tensor_train

class tednet.tnn.tensor_train.TTConv2D(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)

Bases: tednet.tnn.tn_cnn._TNConvNd

Tensor Train Decomposition Convolution.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The rank of the decomposition

  • kernel_size (Union[int, tuple]) – The convolutional kernel size

  • stride (int) – The length of stride

  • padding (int) – The size of padding

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Tensor Train decomposition type.

set_nodes()

Generate Tensor Train nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tensor Train Decomposition Convolution.

Parameters

inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_train.TTLinear(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias: bool = True)

Bases: tednet.tnn.tn_linear._TNLinear

Tensor Train Decomposition Linear.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m-1}\). The rank of the decomposition

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Tensor Train decomposition type.

set_nodes()

Generate Tensor Train nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tensor Train linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

Returns

tensor \(\in \mathbb{R}^{b \times C'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_train.TTLeNet5(num_classes: int, rs: Union[list, numpy.ndarray])

Bases: torch.nn.modules.module.Module

LeNet-5 based on the Tensor Train.

Parameters
  • num_classes (int) – The number of classes.

  • rs (Union[list, numpy.ndarray]) – The ranks of network.

forward(inputs: torch.Tensor)torch.Tensor

Tensor Train linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times num\_classes}\)

Return type

torch.Tensor

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_train.TTResNet20(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.tensor_train.tt_resnet.TTResNet

ResNet-20 based on Tensor Train.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_train.TTResNet32(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.tensor_train.tt_resnet.TTResNet

ResNet-32 based on Tensor Train.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_train.TTLSTM(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], drop_ih: float = 0.3, drop_hh: float = 0.35)

Bases: tednet.tnn.tn_rnn._TNLSTM

LSTM based on Tensor Train.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM

  • hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The hidden shape of LSTM

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m-1}\). The ranks of linear

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

reset_ih()

Reset parameters of input-to-hidden layer.

training: bool
_is_full_backward_hook: Optional[bool]

tednet.tnn.tensor_ring

class tednet.tnn.tensor_ring.TRConv2D(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)

Bases: tednet.tnn.tn_cnn._TNConvNd

Tensor Ring Decomposition Convolution.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of channel out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+n+1}\). The ranks of the decomposition

  • kernel_size (Union[int, tuple]) – The convolutional kernel size

  • stride (int) – The length of stride

  • padding (int) – The size of padding

  • bias (bool) – use bias of convolution or not. True to use, and False to not use

set_tn_type()

Set as Tensor Ring decomposition type.

set_nodes()

Generate Tensor Ring nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tensor Decomposition Convolution.

Parameters

inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

A tensor \(\in \mathbb{R}^{b \times H' \times W' \times C'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_ring.TRLinear(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias: bool = True)

Bases: tednet.tnn.tn_linear._TNLinear

The Tensor Ring Decomposition Linear.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in

  • out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of feature out

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+n}\). The ranks of linear

  • bias (bool) – use bias of linear or not. True to use, and False to not use

set_tn_type()

Set as Tensor Ring decomposition type.

set_nodes()

Generate tensor ring nodes, then add node information to self.tn_info.

set_params_info()

Record information of Parameters.

reset_parameters()

Reset parameters.

tn_contract(inputs: torch.Tensor)torch.Tensor

Tensor Ring linear forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)

Returns

tensor \(\in \mathbb{R}^{b \times C'}\)

Return type

torch.Tensor

recover()

Todo: Use for rebuilding the original tensor.

_abc_impl = <_abc_data object>
training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_ring.TRLeNet5(num_classes: int, rs: Union[list, numpy.ndarray])

Bases: torch.nn.modules.module.Module

LeNet-5 based on Tensor Ring.

Parameters
  • num_classes (int) – The number of classes

  • rs (Union[list, numpy.ndarray]) – The ranks of network.

forward(inputs: torch.Tensor)torch.Tensor

forwarding method.

Parameters

inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)

Returns

tensor \(\in \mathbb{R}^{b \times num\_classes}\)

Return type

torch.Tensor

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_ring.TRResNet20(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.tensor_ring.tr_resnet.TRResNet

ResNet-20 based on Tensor Ring.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_ring.TRResNet32(rs: Union[list, numpy.ndarray], num_classes: int)

Bases: tednet.tnn.tensor_ring.tr_resnet.TRResNet

ResNet-32 based on Tensor Ring.

Parameters
  • rs (Union[list, numpy.ndarray]) – rs \(\in \mathbb{R}^{7}\). The ranks of network

  • num_classes (int) – The number of classes

training: bool
_is_full_backward_hook: Optional[bool]
class tednet.tnn.tensor_ring.TRLSTM(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], drop_ih: float = 0.25, drop_hh: float = 0.25)

Bases: tednet.tnn.tn_rnn._TNLSTM

LSTM based on Tensor Ring.

Parameters
  • in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM

  • hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The hidden shape of LSTM

  • ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+n}\). The ranks of linear

  • drop_ih (float) – The dropout rate of input-to-hidden door

  • drop_hh (float) – The dropout rate of hidden-to-hidden door

reset_ih()

Reset parameters of input-to-hidden layer.

training: bool
_is_full_backward_hook: Optional[bool]

Contact

This project is mainly developed by Yu Pan(iperryuu@gmail.com) and Maolin Wang(morin.w98@gmail.com), etc. Feel free to contact us.

A sample for Tensorial Convolutional Neural Network

By replacing convolutional kernel with tensor cores, tensorial CNN is constructed.

Here is an tensor ring example to use a TR-based model with tednet.

[1]:
from managpu import GpuManager
my_gpu = GpuManager()
my_gpu.set_by_memory(1)

import random

import tednet as tdt
import tednet.tnn.tensor_ring as tr

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torchvision import datasets, transforms
No GPU Util Limit!
Sorted by memory:
    GPU Index: 1       GPU FreeMemory: 11176 MB       GPU Util: 0%
    GPU Index: 2       GPU FreeMemory: 11176 MB       GPU Util: 0%
    GPU Index: 4       GPU FreeMemory: 11176 MB       GPU Util: 0%
    GPU Index: 0       GPU FreeMemory: 6133 MB        GPU Util: 74%
    GPU Index: 3       GPU FreeMemory: 1109 MB        GPU Util: 100%
    GPU Index: 5       GPU FreeMemory: 1109 MB        GPU Util: 100%
    GPU Index: 6       GPU FreeMemory: 1109 MB        GPU Util: 100%
    GPU Index: 7       GPU FreeMemory: 1109 MB        GPU Util: 0%
Qualified GPU Index is: [1]

Set basic environment

[2]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
seed = 233
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if use_cuda:
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = True

Set dataloader

[3]:
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=128, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=False, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=256, shuffle=True, **kwargs)

Set training and testing process

[4]:
def train(model, device, train_loader, optimizer, epoch, log_interval=200):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.cross_entropy(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Begin training

[5]:
# Define a TR-LeNet5
model = tr.TRLeNet5(10, [6, 6, 6, 6])
model.to(device)
optimizer = optim.SGD(model.parameters(), lr=2e-2, momentum=0.9, weight_decay=5e-4)

for epoch in range(20):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)
compression_ration is:  0.3968253968253968
compression_ration is:  14.17233560090703
compression_ration is:  241.54589371980677
compression_ration is:  2.867383512544803
Train Epoch: 0 [0/60000 (0%)]   Loss: 2.633792
Train Epoch: 0 [25600/60000 (43%)]      Loss: 0.109367
Train Epoch: 0 [51200/60000 (85%)]      Loss: 0.133933

Test set: Average loss: 0.0756, Accuracy: 9751/10000 (98%)

Train Epoch: 1 [0/60000 (0%)]   Loss: 0.074946
Train Epoch: 1 [25600/60000 (43%)]      Loss: 0.039371
Train Epoch: 1 [51200/60000 (85%)]      Loss: 0.029103

Test set: Average loss: 0.0691, Accuracy: 9782/10000 (98%)

Train Epoch: 2 [0/60000 (0%)]   Loss: 0.113578
Train Epoch: 2 [25600/60000 (43%)]      Loss: 0.099431
Train Epoch: 2 [51200/60000 (85%)]      Loss: 0.084437

Test set: Average loss: 0.0544, Accuracy: 9826/10000 (98%)

Train Epoch: 3 [0/60000 (0%)]   Loss: 0.130137
Train Epoch: 3 [25600/60000 (43%)]      Loss: 0.083295
Train Epoch: 3 [51200/60000 (85%)]      Loss: 0.021406

Test set: Average loss: 0.0608, Accuracy: 9799/10000 (98%)

Train Epoch: 4 [0/60000 (0%)]   Loss: 0.044310
Train Epoch: 4 [25600/60000 (43%)]      Loss: 0.025041
Train Epoch: 4 [51200/60000 (85%)]      Loss: 0.017827

Test set: Average loss: 0.0446, Accuracy: 9861/10000 (99%)

Train Epoch: 5 [0/60000 (0%)]   Loss: 0.035976
Train Epoch: 5 [25600/60000 (43%)]      Loss: 0.130144
Train Epoch: 5 [51200/60000 (85%)]      Loss: 0.066351

Test set: Average loss: 0.0457, Accuracy: 9854/10000 (99%)

Train Epoch: 6 [0/60000 (0%)]   Loss: 0.071825
Train Epoch: 6 [25600/60000 (43%)]      Loss: 0.031684
Train Epoch: 6 [51200/60000 (85%)]      Loss: 0.049287

Test set: Average loss: 0.0444, Accuracy: 9854/10000 (99%)

Train Epoch: 7 [0/60000 (0%)]   Loss: 0.074904
Train Epoch: 7 [25600/60000 (43%)]      Loss: 0.083052
Train Epoch: 7 [51200/60000 (85%)]      Loss: 0.021132

Test set: Average loss: 0.0397, Accuracy: 9880/10000 (99%)

Train Epoch: 8 [0/60000 (0%)]   Loss: 0.020113
Train Epoch: 8 [25600/60000 (43%)]      Loss: 0.022854
Train Epoch: 8 [51200/60000 (85%)]      Loss: 0.008770

Test set: Average loss: 0.0424, Accuracy: 9866/10000 (99%)

Train Epoch: 9 [0/60000 (0%)]   Loss: 0.007447
Train Epoch: 9 [25600/60000 (43%)]      Loss: 0.095077
Train Epoch: 9 [51200/60000 (85%)]      Loss: 0.018731

Test set: Average loss: 0.0339, Accuracy: 9896/10000 (99%)

Train Epoch: 10 [0/60000 (0%)]  Loss: 0.025279
Train Epoch: 10 [25600/60000 (43%)]     Loss: 0.038482
Train Epoch: 10 [51200/60000 (85%)]     Loss: 0.043692

Test set: Average loss: 0.0391, Accuracy: 9882/10000 (99%)

Train Epoch: 11 [0/60000 (0%)]  Loss: 0.022135
Train Epoch: 11 [25600/60000 (43%)]     Loss: 0.008357
Train Epoch: 11 [51200/60000 (85%)]     Loss: 0.031139

Test set: Average loss: 0.0380, Accuracy: 9882/10000 (99%)

Train Epoch: 12 [0/60000 (0%)]  Loss: 0.004145
Train Epoch: 12 [25600/60000 (43%)]     Loss: 0.024185
Train Epoch: 12 [51200/60000 (85%)]     Loss: 0.030595

Test set: Average loss: 0.0354, Accuracy: 9887/10000 (99%)

Train Epoch: 13 [0/60000 (0%)]  Loss: 0.013407
Train Epoch: 13 [25600/60000 (43%)]     Loss: 0.008846
Train Epoch: 13 [51200/60000 (85%)]     Loss: 0.061894

Test set: Average loss: 0.0380, Accuracy: 9867/10000 (99%)

Train Epoch: 14 [0/60000 (0%)]  Loss: 0.017808
Train Epoch: 14 [25600/60000 (43%)]     Loss: 0.002656
Train Epoch: 14 [51200/60000 (85%)]     Loss: 0.013447

Test set: Average loss: 0.0354, Accuracy: 9887/10000 (99%)

Train Epoch: 15 [0/60000 (0%)]  Loss: 0.009893
Train Epoch: 15 [25600/60000 (43%)]     Loss: 0.081577
Train Epoch: 15 [51200/60000 (85%)]     Loss: 0.018266

Test set: Average loss: 0.0326, Accuracy: 9893/10000 (99%)

Train Epoch: 16 [0/60000 (0%)]  Loss: 0.011158
Train Epoch: 16 [25600/60000 (43%)]     Loss: 0.004466
Train Epoch: 16 [51200/60000 (85%)]     Loss: 0.034247

Test set: Average loss: 0.0343, Accuracy: 9891/10000 (99%)

Train Epoch: 17 [0/60000 (0%)]  Loss: 0.030956
Train Epoch: 17 [25600/60000 (43%)]     Loss: 0.010426
Train Epoch: 17 [51200/60000 (85%)]     Loss: 0.061093

Test set: Average loss: 0.0315, Accuracy: 9897/10000 (99%)

Train Epoch: 18 [0/60000 (0%)]  Loss: 0.017390
Train Epoch: 18 [25600/60000 (43%)]     Loss: 0.023027
Train Epoch: 18 [51200/60000 (85%)]     Loss: 0.029767

Test set: Average loss: 0.0332, Accuracy: 9888/10000 (99%)

Train Epoch: 19 [0/60000 (0%)]  Loss: 0.034303
Train Epoch: 19 [25600/60000 (43%)]     Loss: 0.003748
Train Epoch: 19 [51200/60000 (85%)]     Loss: 0.026581

Test set: Average loss: 0.0307, Accuracy: 9898/10000 (99%)

A sample for Tensorial Recurrent Neural Network

By replacing input-to-hidden layer of a RNN with tensor cores, tensorial RNN is constructed.

Here is an tensor ring example to use a TR-based model with tednet.

[1]:
from managpu import GpuManager
my_gpu = GpuManager()
my_gpu.set_by_memory(1)

import random
from collections import namedtuple

import tednet as tdt
import tednet.tnn.tensor_ring as tr

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torchvision import datasets, transforms
No GPU Util Limit!
Sorted by memory:
    GPU Index: 2       GPU FreeMemory: 11176 MB       GPU Util: 0%
    GPU Index: 4       GPU FreeMemory: 11176 MB       GPU Util: 0%
    GPU Index: 1       GPU FreeMemory: 10129 MB       GPU Util: 0%
    GPU Index: 0       GPU FreeMemory: 6133 MB        GPU Util: 37%
    GPU Index: 3       GPU FreeMemory: 1109 MB        GPU Util: 94%
    GPU Index: 5       GPU FreeMemory: 1109 MB        GPU Util: 100%
    GPU Index: 6       GPU FreeMemory: 1109 MB        GPU Util: 100%
    GPU Index: 7       GPU FreeMemory: 1109 MB        GPU Util: 95%
Qualified GPU Index is: [2]

Set basic environment

[2]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
seed = 233
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if use_cuda:
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = True

LSTMState = namedtuple('LSTMState', ['hx', 'cx'])
Input_Size = np.prod([28, 28])
Hidden_Size = 256

Set dataloader

[3]:
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=128, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=False, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=256, shuffle=True, **kwargs)

Set TR-LSTM Classifier

[4]:
class ClassifierTR(nn.Module):
    def __init__(self, num_class=10):
        super(ClassifierTR, self).__init__()
        in_shape = [28, 28]
        hidden_shape = [16, 16]

        self.hidden_size = Hidden_Size

        self.lstm = tr.TRLSTM(in_shape, hidden_shape, [5, 5, 5, 5])
        self.fc = nn.Linear(self.hidden_size, num_class)

    def forward(self, x, state):
        input_shape = x.shape
        batch_size = input_shape[0]
        seq_size = input_shape[1]
        x = x.view(batch_size, seq_size, -1)
        x = x.permute(1, 0, 2)
        _, x = self.lstm(x, state)
        x = self.fc(x[0])
        return x

Set training and testing process

[5]:
def train(model, device, train_loader, optimizer, epoch, log_interval=200):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()

        batch_size = data.shape[0]
        state = LSTMState(torch.zeros(batch_size, Hidden_Size, device=device),
                              torch.zeros(batch_size, Hidden_Size, device=device))
        output = model(data, state)

        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)

            batch_size = data.shape[0]
            state = LSTMState(torch.zeros(batch_size, Hidden_Size, device=device),
                              torch.zeros(batch_size, Hidden_Size, device=device))
            output = model(data, state)

            test_loss += F.cross_entropy(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Begin training

[6]:
# Define a TR-LSTM
model = ClassifierTR()
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=2e-4, weight_decay=0.00016667)

for epoch in range(20):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)
compression_ration is:  236.12235294117647
Train Epoch: 0 [0/60000 (0%)]   Loss: 2.271237
Train Epoch: 0 [25600/60000 (43%)]      Loss: 2.037606
Train Epoch: 0 [51200/60000 (85%)]      Loss: 1.804040

Test set: Average loss: 1.5393, Accuracy: 5888/10000 (59%)

Train Epoch: 1 [0/60000 (0%)]   Loss: 1.675199
Train Epoch: 1 [25600/60000 (43%)]      Loss: 1.413317
Train Epoch: 1 [51200/60000 (85%)]      Loss: 1.376522

Test set: Average loss: 1.0163, Accuracy: 6931/10000 (69%)

Train Epoch: 2 [0/60000 (0%)]   Loss: 1.204728
Train Epoch: 2 [25600/60000 (43%)]      Loss: 1.068120
Train Epoch: 2 [51200/60000 (85%)]      Loss: 1.048317

Test set: Average loss: 0.7734, Accuracy: 7470/10000 (75%)

Train Epoch: 3 [0/60000 (0%)]   Loss: 0.902623
Train Epoch: 3 [25600/60000 (43%)]      Loss: 0.709798
Train Epoch: 3 [51200/60000 (85%)]      Loss: 0.772015

Test set: Average loss: 0.6653, Accuracy: 7714/10000 (77%)

Train Epoch: 4 [0/60000 (0%)]   Loss: 0.793773
Train Epoch: 4 [25600/60000 (43%)]      Loss: 0.747470
Train Epoch: 4 [51200/60000 (85%)]      Loss: 0.739394

Test set: Average loss: 0.5988, Accuracy: 8006/10000 (80%)

Train Epoch: 5 [0/60000 (0%)]   Loss: 0.711895
Train Epoch: 5 [25600/60000 (43%)]      Loss: 0.610803
Train Epoch: 5 [51200/60000 (85%)]      Loss: 0.705731

Test set: Average loss: 0.5535, Accuracy: 8195/10000 (82%)

Train Epoch: 6 [0/60000 (0%)]   Loss: 0.803615
Train Epoch: 6 [25600/60000 (43%)]      Loss: 0.608962
Train Epoch: 6 [51200/60000 (85%)]      Loss: 0.600730

Test set: Average loss: 0.5210, Accuracy: 8317/10000 (83%)

Train Epoch: 7 [0/60000 (0%)]   Loss: 0.507197
Train Epoch: 7 [25600/60000 (43%)]      Loss: 0.634771
Train Epoch: 7 [51200/60000 (85%)]      Loss: 0.603676

Test set: Average loss: 0.4965, Accuracy: 8445/10000 (84%)

Train Epoch: 8 [0/60000 (0%)]   Loss: 0.553993
Train Epoch: 8 [25600/60000 (43%)]      Loss: 0.539877
Train Epoch: 8 [51200/60000 (85%)]      Loss: 0.589516

Test set: Average loss: 0.4719, Accuracy: 8535/10000 (85%)

Train Epoch: 9 [0/60000 (0%)]   Loss: 0.575935
Train Epoch: 9 [25600/60000 (43%)]      Loss: 0.494978
Train Epoch: 9 [51200/60000 (85%)]      Loss: 0.600699

Test set: Average loss: 0.4522, Accuracy: 8601/10000 (86%)

Train Epoch: 10 [0/60000 (0%)]  Loss: 0.425709
Train Epoch: 10 [25600/60000 (43%)]     Loss: 0.439076
Train Epoch: 10 [51200/60000 (85%)]     Loss: 0.427697

Test set: Average loss: 0.4368, Accuracy: 8677/10000 (87%)

Train Epoch: 11 [0/60000 (0%)]  Loss: 0.512469
Train Epoch: 11 [25600/60000 (43%)]     Loss: 0.499898
Train Epoch: 11 [51200/60000 (85%)]     Loss: 0.412309

Test set: Average loss: 0.4227, Accuracy: 8710/10000 (87%)

Train Epoch: 12 [0/60000 (0%)]  Loss: 0.555337
Train Epoch: 12 [25600/60000 (43%)]     Loss: 0.330346
Train Epoch: 12 [51200/60000 (85%)]     Loss: 0.340294

Test set: Average loss: 0.4089, Accuracy: 8746/10000 (87%)

Train Epoch: 13 [0/60000 (0%)]  Loss: 0.419118
Train Epoch: 13 [25600/60000 (43%)]     Loss: 0.335568
Train Epoch: 13 [51200/60000 (85%)]     Loss: 0.328040

Test set: Average loss: 0.3973, Accuracy: 8792/10000 (88%)

Train Epoch: 14 [0/60000 (0%)]  Loss: 0.384958
Train Epoch: 14 [25600/60000 (43%)]     Loss: 0.436771
Train Epoch: 14 [51200/60000 (85%)]     Loss: 0.440793

Test set: Average loss: 0.3865, Accuracy: 8819/10000 (88%)

Train Epoch: 15 [0/60000 (0%)]  Loss: 0.483415
Train Epoch: 15 [25600/60000 (43%)]     Loss: 0.395679
Train Epoch: 15 [51200/60000 (85%)]     Loss: 0.482825

Test set: Average loss: 0.3761, Accuracy: 8861/10000 (89%)

Train Epoch: 16 [0/60000 (0%)]  Loss: 0.436840
Train Epoch: 16 [25600/60000 (43%)]     Loss: 0.339861
Train Epoch: 16 [51200/60000 (85%)]     Loss: 0.366399

Test set: Average loss: 0.3689, Accuracy: 8894/10000 (89%)

Train Epoch: 17 [0/60000 (0%)]  Loss: 0.442870
Train Epoch: 17 [25600/60000 (43%)]     Loss: 0.370757
Train Epoch: 17 [51200/60000 (85%)]     Loss: 0.403360

Test set: Average loss: 0.3585, Accuracy: 8924/10000 (89%)

Train Epoch: 18 [0/60000 (0%)]  Loss: 0.346232
Train Epoch: 18 [25600/60000 (43%)]     Loss: 0.452554
Train Epoch: 18 [51200/60000 (85%)]     Loss: 0.318595

Test set: Average loss: 0.3496, Accuracy: 8960/10000 (90%)

Train Epoch: 19 [0/60000 (0%)]  Loss: 0.272001
Train Epoch: 19 [25600/60000 (43%)]     Loss: 0.430083
Train Epoch: 19 [51200/60000 (85%)]     Loss: 0.446394

Test set: Average loss: 0.3433, Accuracy: 8976/10000 (90%)

Tensors

Tensors, also known as multi-way arrays, can be viewed as a higher-order extension of vectors (i.e., an 1st-order tensors) and a matrices (i.e., an 2nd-order tensors). Like rows and columns in a matrix, an Nth-order tensor \({\mathcal X}\in\mathbb R^{I_1\times I_2 \ldots\times I_N}\) has N-modes (or ways, orders, indices) whose lengths (or dimensions) are represented by \(I_1, \ldots, I_N\) respectively. Tensors can be graphically represented in diagrams which is known as Tensor Networks. As following illustration, a black node denotes a tensor and a edge connected to the node means a tensor mode.

_images/tensor.png

Tensor Contraction

Tensor contraction is the most typical operation for tensors, contracting two tensors into one tensor along the associated pairs of indices. As a result, the corresponding connected edges disappear while the dangling edges persist. The Tensor Network representation of such operation can be illustrated as:

_images/TC.png

As show in above figure, contraction between a 5th-order tensor \({\mathcal A}\) and a 4th-order tensor \({\mathcal B}\) along the index pairs \((i_5,j_1)\) and \((i_3,j_2)\) yields a 5th-order tensor \({\mathcal C}\), with entries

\({\mathcal C}_{i_1,i_2,i_4,j_3,j_4}=\sum_{i_3,i_5} {\mathcal A}_{i_1,i_2,i_3,i_4,i_5} {\mathcal B}_{i_5,i_3,j_3,j_4}\).

Tensor contractions among multiple tensors can be computed by performing tensor contraction between two tensors many times. Hence, the order (or number of modes) of an entire Tensor Network is given by the number of dangling edges which are not contracted.

Tensor decomposition is a common technique for compressing Neural Networks, by decomposing a higher-order tensor into several lower-order tensors (usually matrices or 3rd-order tensors) that are sparsely interconnected through the tensor contraction operator. The basic tensor decomposition include CANDECOMP/PARAFAC (CP), Tucker, Block Term (BT), Tensor Train (TT) and so on. And such decomposition formats can be illustrated as the corresponding Tensor Network diagrams.

Tensorized FC Layers

By replacing Fully-Connected (FC) layers or Convolution Layers with tensorized layers, large amount of parameters can be reduced. For example, a FC layer is formulated as \({y}= {W}{x}\) and can be illustrated as

_images/FC.png

By a simple reshaping method, we can reformulate the FC layer as

\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}{\mathcal W}_{i_1,\ldots,i_N,j_1,\ldots,j_M} ~x_{i_1,i_2,\ldots,i_N}\).

According to the differences of the tensor decomposition ways, there are different tensor formats to represent FC Layers. The most popular tensor decomposition format included CP, Tucker, Block-Term Tucker, Tensor Train and Tensor Ring.

CP Layers

The CP decomposition (also called CANDECOMP/PARAFAC decomposition) factorizes a higher-order tensor into a sum of several rank-1 tensor components. The mathematical neural network layer format of utilizing CP decomposition is

\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{r=1}^Rg_{r} a^{(1)}_{i_1,r}\ldots a^{(N)}_{i_N,r}a^{(N+1)}_{j_1,r}\ldots a^{(N+M)}_{j_M,r} x_{i_1,i_2,\ldots,i_N}\).

_images/CP.png

When calculating the CP decomposition, the first issue arises is how to determine the number of rank-1 tensor components, i.e., CP-rank \(R\). Actually, it’s an NP-hard problem. In practice, an numerical value is usually assumed in advance, i.e., as a hyperparameter, to fit various CP-based models

Tucker Layers

Tucker decomposition factorizes a higher-order tensor into a core tensor multiplied by a corresponding factor matrix along each mode. To be more specific, the mathematical neural network layer format of utilizing tucker decomposition is

\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\cdots\sum_{r_N=1}^{R_N}g_{r_1,r_2,\ldots,r_N} a^{(1)}_{i_1,r}\ldots a^{(N)}_{i_N,r}a^{(N+1)}_{j_1,r}\ldots a^{(N+M)}_{j_M,r} x_{i_1,i_2,\ldots,i_N}\).

_images/tucker.png

Here, please note that compared with the CP-rank, \(R_1, R_2, \ldots, R_N\) could take different numerical values.

Block-Term Tucker Layers

Recently, a more generalized decomposition method called Block Term (BT) decomposition, which generalizes CP and Tucker via imposing a block diagonal constraint on the core tensor, has been proposed to make a trade-off between them. The BT decomposition aims to decompose a tensor into a sum of several Tucker decompositions with low Tucker-ranks. The mathematical neural network layer format is

\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{c}^{C}\sum_{r_1,\ldots,r_N=1}^{R_1,\ldots,R_N}g_{r_1,\ldots,r_N} a^{(1)}_{i_1,c,r_1}\ldots a^{(N)}_{i_N,c,r_N}a^{(N+1)}_{j_1,c,r_{N+1}}\ldots a^{(N+M)}_{j_M,c,r_{N+M}} x_{i_1,i_2,\ldots,i_N}\).

_images/BT.png

\(R_T\) denotes the Tucker-rank (which means the Tucker-rank equals \(\{R_1, ..., R_N\}\)) and \(C\) represents the CP-rank. They are together called BT-ranks.

Tensor Train (Matrix Product Operator) Layers

Matrix Tensor train (mTT) decomposition(sometimes also called Tensor Train), also called Matrxi Product Operator(MPO) in quantum physics, factorizes a higher-order tensor into a linear multiplication of a series of 4th-order core tensors. The mathematical neural network layer format is

\({\mathcal{Y}}_{j_1,\ldots,j_N}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{r_{1},\ldots,r_{N}=1}^{R_1,\ldots,R_{N}} g^{(1)}_{i_1,j_1,r_1} g^{(2)}_{r_1,i_2,j_2,r_2}\cdots g^{(N)}_{r_{N},i_N,j_N} x_{i_1,i_2,\ldots,i_N}\).

_images/TT.png

\(\{R_1,R_2,\ldots,R_{N-1}\}\) denote the TT-ranks.

Tensor Ring (Matrix Product State) Layers

Tensor Train benefits from fast convergence, however, it suffers from the two endpoints, which hinders the representation ability and the flexibility of the TT-based models. Thus, to release the power of the linear architecture, researchers link the endpoints to constitute a ring format named Tensor Ring(TR). The mathematical neural network layer format is

\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_{N+M}~~=1}^{I_1,\ldots,I_{N+M}}\sum_{r_{0},\ldots,r_{N+M-1}~~~=1}^{R_0,\ldots,R_{N+M-1}} g^{(1)}_{r_0,i_1,r_1} \cdots g^{(N+1)}_{r_{N},j_1, r_{N+1}} \cdots g^{(N+M)}_{r_{N+M-1}~~,j_M, r_{0}} x_{i_1,i_2,\ldots,i_N}\).

_images/TR.png

where \(\{R_0,R_1,\ldots,R_{N}\}\) denote the TR-ranks, each node s an 3rd-order tensor and \(R_0=R_N\). Compared with Tensor Train, it is not necessary for TR to follow a strict order when multiplying its nodes.

Combination of Tensor Decomposition and Deep Networks

These tensor Layer formats can be directlty adopted by simple Neural Networks and RNNs to replace the huge FC layers.

For the convolution layers, we can also generalize this idea. We can also represent the convolution weights into tensor decomposition format. For example, the convolution weight is:

\(\mathcal{W}\in R^{H \times W \times C_{in} \times C_{out}}\)

the CP format is:

\(\mathcal{W}_{h,w,c_{in},c_{out}} = \sum_{r=1}^Rg_{r} a^{(1)}_{h,r}a^{(2)}_{w,r}a^{(3)}_{c_{in},r}a^{(4)}_{c_{out},r}\)

if we generalize the two channel dimentions, the weight can be represented as:

\(\mathcal{W}_{h,w,c_{in(1)}\cdots c_{in(N)},c_{out(1)}\cdots c_{out(M)}} = \sum_{r=1}^Rg_{r} a^{(1)}_{h,r}a^{(2)}_{w,r} a^{(3)}_{c_{in(1)},r}\cdots a^{(2+n)}_{c_{in(n)},r}a^{(3+N)}_{c_{out(1)},r}\cdots a^{(2+N+M)}_{c_{out(M)},r}\)

We can also represent the weight in other tensor formats.