tednet - A Toolkit for Tensor Decomposition Networks¶
tednet
is a toolkit for tensor decomposition networks. Tensor decomposition networks are neural networks whose layers are decomposed by tensor decomposition, including CANDECOMP/PARAFAC, Tucker2, Tensor Train, Tensor Ring and so on. For a convenience to do research on it, tednet
provides excellent tools to deal with tensorial networks.
Installation¶
With pip
, it is convenient to install the tednet
by
pip install tednet
To upgrade to the latest version, run
pip install tednet --upgrade
Quick Start¶
In this section, we would like to show a overview to give a quick start.
Operation¶
There are some operations supported in tednet, and it is convinient to use them.
[1]:
import tednet as tdt
Create matrix whose diagonal elements are ones
[2]:
diag_matrix = tdt.eye(5, 5)
print(diag_matrix)
tensor([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Take Pytorch tensor to Numpy narray
[3]:
print(type(diag_matrix))
<class 'torch.Tensor'>
[4]:
diag_matrix = tdt.to_numpy(diag_matrix)
[5]:
print(type(diag_matrix))
<class 'numpy.ndarray'>
Take Numpy narray to Pytorch tensor
[6]:
diag_matrix = tdt.to_tensor(diag_matrix)
[7]:
print(type(diag_matrix))
<class 'torch.Tensor'>
Tensor Decomposition Networks (Tensor Ring for Sample)¶
To use tensor ring decomposition models, simply calling the tensor ring module is enough.
[8]:
import tednet.tnn.tensor_ring as tr
Here, we would like to give a case of building the TR-LeNet5.
[9]:
# Define a TR-LeNet5
model = tr.TRLeNet5(10, [6, 6, 6, 6])
compression_ration is: 0.3968253968253968
compression_ration is: 14.17233560090703
compression_ration is: 241.54589371980677
compression_ration is: 2.867383512544803
API¶
Access to classes and functions of tednet.
tednet¶
-
tednet.
hard_sigmoid
(tensor: torch.Tensor) → torch.Tensor¶ Computes element-wise hard sigmoid of x. See e.g. https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/sigm.py#L279
- Parameters
tensor (torch.Tensor) – tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Returns
tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Return type
torch.Tensor
-
tednet.
eye
(n: int, m: int, device: torch.device = 'cpu', requires_grad: bool = False) → torch.Tensor¶ Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.
- Parameters
- Returns
2-D tensor \(\in \mathbb{R}^{{n} \times {m}}\)
- Return type
torch.Tensor
-
tednet.
to_numpy
(tensor: torch.Tensor) → numpy.ndarray¶ Convert torch.Tensor to numpy.ndarray.
- Parameters
tensor (torch.Tensor) – tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Returns
arr \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Return type
numpy.ndarray
-
tednet.
to_tensor
(arr: numpy.ndarray) → torch.Tensor¶ Convert numpy.ndarray to torch.Tensor.
- Parameters
arr (numpy.ndarray) – arr \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Returns
tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Return type
torch.Tensor
tednet.tnn¶
tednet.tnn.initializer¶
-
tednet.tnn.initializer.
trunc_normal_init
(model, mean: float = 0.0, std: float = 0.1)¶ Initialize network with truncated normal distribution
tednet.tnn.tn_module¶
-
class
tednet.tnn.tn_module.
_TNBase
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias: bool = True)¶ Bases:
torch.nn.modules.module.Module
The basis of tensor decomposition networks.
- Parameters
-
check_setting
()¶ Check whether in_shape, out_shape, ranks are 1-D params.
-
abstract
set_tn_type
()¶ Set the tensor decomposition type. The types are as follows:
type
tensor decomposition
tr
Tensor Ring
tt
Tensor Train
tk2
Tucker2
cp
CANDECAMP/PARAFAC
btt
Block-Term Tucker
Examples
>>> tn_type = "tr" >>> self.tn_info["type"] = tn_type
-
abstract
set_nodes
()¶ Generate tensor nodes, then add node information to self.tn_info.
Examples
>>> nodes_info = [] >>> node_info = dict(name="node1", shape=[2, 3, 4]) >>> nodes_info.append(node_info) >>> self.tn_info["nodes"] = nodes_info
-
abstract
set_params_info
()¶ Record information of Parameters.
Examples
>>> self.tn_info["t_params"] = tn_parameters >>> self.tn_info["ori_params"] = ori_parameters >>> self.tn_info["cr"] = ori_parameters / tn_parameters
-
abstract
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ The method of contract inputs and tensor nodes.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_m}}\)
- Returns
tensor \(\in \mathbb{R}^{{i_1} \times \dots \times {i_n}}\)
- Return type
torch.Tensor
-
abstract
recover
()¶ Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tn_module.
LambdaLayer
(lambd)¶ Bases:
torch.nn.modules.module.Module
A layer consists of Lambda function.
- Parameters
lambd – a lambda function.
-
forward
(inputs: torch.Tensor) → torch.Tensor¶ Forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
tednet.tnn.tn_linear¶
-
class
tednet.tnn.tn_linear.
_TNLinear
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias=True)¶ Bases:
tednet.tnn.tn_module._TNBase
The Tensor Decomposition Linear.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of feature out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\). The ranks of linear
bias (bool) – use bias of linear or not.
True
to use, andFalse
to not use
-
forward
(inputs)¶ Tensor linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C'}\)
- Return type
torch.Tensor
-
_abc_impl
= <_abc_data object>¶
tednet.tnn.tn_cnn¶
-
class
tednet.tnn.tn_cnn.
_TNConvNd
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)¶ Bases:
tednet.tnn.tn_module._TNBase
Tensor Decomposition Convolution.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of channel out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\). The ranks of the decomposition
kernel_size (Union[int, tuple]) – The convolutional kernel size
stride (int) – The length of stride
padding (int) – The size of padding
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
forward
(inputs: torch.Tensor)¶ Tensor convolutional forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times H' \times W' \times C'}\)
- Return type
torch.Tensor
-
_abc_impl
= <_abc_data object>¶
tednet.tnn.tn_rnn¶
-
class
tednet.tnn.tn_rnn.
_TNLSTMCell
(hidden_size: int, tn_block, drop_ih=0.3, drop_hh=0.35)¶ Bases:
torch.nn.modules.module.Module
Tensor LSTMCell.
- Parameters
-
reset_hh
()¶ Reset parameters of hidden-to-hidden layer.
-
forward
(inputs: torch.Tensor, state: tednet.tnn.tn_rnn.LSTMState)¶ Forwarding method. LSTMState = namedtuple(‘LSTMState’, [‘hx’, ‘cx’])
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
state (LSTMState) – namedtuple: [hx \(\in \mathbb{R}^{H}\), cx \(\in \mathbb{R}^{H}\)]
- Returns
result: hy \(\in \mathbb{R}^{H}\), [hy \(\in \mathbb{R}^{H}\), cy \(\in \mathbb{R}^{H}\)]
- Return type
torch.Tensor, [torch.Tensor, torch.Tensor]
-
class
tednet.tnn.tn_rnn.
_TNLSTM
(hidden_size, tn_block, drop_ih=0.3, drop_hh=0.35)¶ Bases:
torch.nn.modules.module.Module
Tensor LSTM.
- Parameters
-
forward
(inputs, state)¶ Forwarding method. LSTMState = namedtuple(‘LSTMState’, [‘hx’, ‘cx’])
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{S \times b \times C}\)
state (LSTMState) – namedtuple: [hx \(\in \mathbb{R}^{H}\), cx \(\in \mathbb{R}^{H}\)]
- Returns
tensor \(\in \mathbb{R}^{S \times b \times C'}\), LSTMState is a namedtuple: [hy \(\in \mathbb{R}^{H}\), cy \(\in \mathbb{R}^{H}\)]
- Return type
torch.Tensor, LSTMState
tednet.tnn.cp¶
-
class
tednet.tnn.cp.
CPConv2D
(c_in: int, c_out: int, rank: int, kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)¶ Bases:
tednet.tnn.tn_cnn._TNConvNd
CANDECOMP/PARAFAC Decomposition Convolution.
- Parameters
c_in (int) – The decomposition shape of channel in
c_out (int) – The decomposition shape of channel out
rank (int) – The rank of the decomposition
kernel_size (Union[int, tuple]) – The convolutional kernel size
stride (int) – The length of stride
padding (int) – The size of padding
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as CANDECOMP/PARAFAC decomposition type.
-
set_nodes
()¶ Generate CANDECOMP/PARAFAC nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tensor Decomposition Convolution.
- Parameters
inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
-
forward
(inputs: torch.Tensor)¶ Tensor convolutional forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.cp.
CPLinear
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], rank: int, bias: bool = True)¶ Bases:
tednet.tnn.tn_linear._TNLinear
The CANDECOMP/PARAFAC Decomposition Linear.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of feature out
ranks (int) – The rank of linear
bias (bool) – use bias of linear or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as CANDECOMP/PARAFAC decomposition type.
-
set_nodes
()¶ Generate tensor ring nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ CANDECOMP/PARAFAC linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.cp.
CPLeNet5
(num_classes: int, rs: Union[list, numpy.ndarray])¶ Bases:
torch.nn.modules.module.Module
LeNet-5 based on CANDECOMP/PARAFAC.
- Parameters
-
forward
(inputs: torch.Tensor) → torch.Tensor¶ forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times num\_classes}\)
- Return type
torch.Tensor
-
class
tednet.tnn.cp.
CPResNet20
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.cp.cp_resnet.CPResNet
ResNet-20 based on CANDECOMP/PARAFAC.
- Parameters
-
class
tednet.tnn.cp.
CPResNet32
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.cp.cp_resnet.CPResNet
ResNet-32 based on CANDECOMP/PARAFAC.
- Parameters
-
class
tednet.tnn.cp.
CPLSTM
(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: int, drop_ih: float = 0.3, drop_hh: float = 0.35)¶ Bases:
tednet.tnn.tn_rnn._TNLSTM
LSTM based on CANDECOMP/PARAFAC.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM
hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The hidden shape of LSTM
ranks (int) – The rank of linear
drop_ih (float) – The dropout rate of input-to-hidden door
drop_hh (float) – The dropout rate of hidden-to-hidden door
-
reset_ih
()¶ Reset parameters of input-to-hidden layer.
tednet.tnn.tucker2¶
-
class
tednet.tnn.tucker2.
TK2Conv2D
(c_in: int, c_out: int, ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)¶ Bases:
tednet.tnn.tn_cnn._TNConvNd
Tucker-2 Decomposition Convolution.
- Parameters
c_in (int) – The decomposition shape of channel in
c_out (int) – The decomposition shape of channel out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^r\). The ranks of the decomposition
kernel_size (Union[int, tuple]) – 1-D param \(\in \mathbb{R}^m\). The convolutional kernel size
stride (int) – The length of stride
padding (int) – The size of padding
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Tucker-2 decomposition type.
-
set_nodes
()¶ Generate Tucker-2 nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tucker-2 Decomposition Convolution.
- Parameters
inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tucker2.
TK2Linear
(in_shape: Union[list, numpy.ndarray], out_size: int, ranks: Union[list, numpy.ndarray], bias: bool = True)¶ Bases:
tednet.tnn.tn_linear._TNLinear
Tucker-2 Decomposition Linear.
input length
ranks length
1
1
3
2
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m, m \in \{1, 3\}\). The decomposition shape of feature in
out_size (int) – The output size of the model
ranks (Union[list, numpy.ndarray]) – 1-D param. The rank of the decomposition
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Tucker-2 decomposition type.
-
set_nodes
()¶ Generate Tucker-2 nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tucker-2 linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tucker2.
TK2LeNet5
(num_classes: int, rs: Union[list, numpy.ndarray])¶ Bases:
torch.nn.modules.module.Module
LeNet-5 based on the Tucker-2.
- Parameters
-
forward
(inputs: torch.Tensor) → torch.Tensor¶ forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times num\_classes}\)
- Return type
torch.Tensor
-
class
tednet.tnn.tucker2.
TK2ResNet20
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.tucker2.tk2_resnet.TK2ResNet
ResNet-20 based on Tucker-2.
- Parameters
-
class
tednet.tnn.tucker2.
TK2ResNet32
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.tucker2.tk2_resnet.TK2ResNet
ResNet-32 based on Tucker-2.
- Parameters
-
class
tednet.tnn.tucker2.
TK2LSTM
(in_shape: Union[list, numpy.ndarray], hidden_size: int, ranks: Union[list, numpy.ndarray], drop_ih: float = 0.3, drop_hh: float = 0.35)¶ Bases:
tednet.tnn.tn_rnn._TNLSTM
LSTM based on Tucker-2.
input length
ranks length
1
1
3
2
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m, m \in \{1, 3\}\). The input shape of LSTM
hidden_size (int) – The hidden size of LSTM
ranks (Union[list, numpy.ndarray]) – 1-D param. The ranks of linear
drop_ih (float) – The dropout rate of input-to-hidden door
drop_hh (float) – The dropout rate of hidden-to-hidden door
-
reset_ih
()¶ Reset parameters of input-to-hidden layer.
tednet.tnn.bt_tucker¶
-
class
tednet.tnn.bt_tucker.
BTTConv2D
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], block_num: int, kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)¶ Bases:
tednet.tnn.tn_cnn._TNConvNd
Block-Term Tucker Decomposition Convolution.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+2}\). The rank of the decomposition
block_num (int) – The number of blocks
kernel_size (Union[int, tuple]) – The convolutional kernel size
stride (int) – The length of stride
padding (int) – The size of padding
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Block-Term Tucker decomposition type.
-
set_nodes
()¶ Generate Block-Term Tucker nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Block-Term Tucker Decomposition Convolution.
- Parameters
inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
-
forward
(inputs: torch.Tensor)¶ Block-Term Tucker convolutional forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.bt_tucker.
BTTLinear
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], block_num: int, bias: bool = True)¶ Bases:
tednet.tnn.tn_linear._TNLinear
Block-Term Tucker Decomposition Linear.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The rank of the decomposition
block_num (int) – The number of blocks
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Block-Term Tucker decomposition type.
-
set_nodes
()¶ Generate Block-Term Tucker nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Block-Term Tucker linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.bt_tucker.
BTTLeNet5
(num_classes: int, rs: Union[list, numpy.ndarray])¶ Bases:
torch.nn.modules.module.Module
LeNet-5 based on the Block-Term Tucker.
- Parameters
-
forward
(inputs: torch.Tensor) → torch.Tensor¶ Block-Term Tucker linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times num\_classes}\)
- Return type
torch.Tensor
-
class
tednet.tnn.bt_tucker.
BTTResNet20
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.bt_tucker.btt_resnet.BTTResNet
ResNet-20 based on Block-Term Tucker.
- Parameters
-
class
tednet.tnn.bt_tucker.
BTTResNet32
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.bt_tucker.btt_resnet.BTTResNet
ResNet-32 based on Block-Term Tucker.
- Parameters
-
class
tednet.tnn.bt_tucker.
BTTLSTM
(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], block_num: int, drop_ih: float = 0.3, drop_hh: float = 0.35)¶ Bases:
tednet.tnn.tn_rnn._TNLSTM
LSTM based on Block-Term Tucker.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM
hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The hidden shape of LSTM
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The ranks of linear
block_num (int) – The number of blocks
drop_ih (float) – The dropout rate of input-to-hidden door
drop_hh (float) – The dropout rate of hidden-to-hidden door
-
reset_ih
()¶ Reset parameters of input-to-hidden layer.
tednet.tnn.tensor_train¶
-
class
tednet.tnn.tensor_train.
TTConv2D
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)¶ Bases:
tednet.tnn.tn_cnn._TNConvNd
Tensor Train Decomposition Convolution.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The rank of the decomposition
kernel_size (Union[int, tuple]) – The convolutional kernel size
stride (int) – The length of stride
padding (int) – The size of padding
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Tensor Train decomposition type.
-
set_nodes
()¶ Generate Tensor Train nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tensor Train Decomposition Convolution.
- Parameters
inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
A tensor \(\in \mathbb{R}^{b \times C' \times H' \times W'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tensor_train.
TTLinear
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias: bool = True)¶ Bases:
tednet.tnn.tn_linear._TNLinear
Tensor Train Decomposition Linear.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m-1}\). The rank of the decomposition
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Tensor Train decomposition type.
-
set_nodes
()¶ Generate Tensor Train nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tensor Train linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tensor_train.
TTLeNet5
(num_classes: int, rs: Union[list, numpy.ndarray])¶ Bases:
torch.nn.modules.module.Module
LeNet-5 based on the Tensor Train.
- Parameters
-
forward
(inputs: torch.Tensor) → torch.Tensor¶ Tensor Train linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times num\_classes}\)
- Return type
torch.Tensor
-
class
tednet.tnn.tensor_train.
TTResNet20
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.tensor_train.tt_resnet.TTResNet
ResNet-20 based on Tensor Train.
- Parameters
-
class
tednet.tnn.tensor_train.
TTResNet32
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.tensor_train.tt_resnet.TTResNet
ResNet-32 based on Tensor Train.
- Parameters
-
class
tednet.tnn.tensor_train.
TTLSTM
(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], drop_ih: float = 0.3, drop_hh: float = 0.35)¶ Bases:
tednet.tnn.tn_rnn._TNLSTM
LSTM based on Tensor Train.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM
hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The hidden shape of LSTM
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m-1}\). The ranks of linear
drop_ih (float) – The dropout rate of input-to-hidden door
drop_hh (float) – The dropout rate of hidden-to-hidden door
-
reset_ih
()¶ Reset parameters of input-to-hidden layer.
tednet.tnn.tensor_ring¶
-
class
tednet.tnn.tensor_ring.
TRConv2D
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], kernel_size: Union[int, tuple], stride=1, padding=0, bias=True)¶ Bases:
tednet.tnn.tn_cnn._TNConvNd
Tensor Ring Decomposition Convolution.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of channel in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of channel out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+n+1}\). The ranks of the decomposition
kernel_size (Union[int, tuple]) – The convolutional kernel size
stride (int) – The length of stride
padding (int) – The size of padding
bias (bool) – use bias of convolution or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Tensor Ring decomposition type.
-
set_nodes
()¶ Generate Tensor Ring nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tensor Decomposition Convolution.
- Parameters
inputs (torch.Tensor) – A tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
A tensor \(\in \mathbb{R}^{b \times H' \times W' \times C'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tensor_ring.
TRLinear
(in_shape: Union[list, numpy.ndarray], out_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], bias: bool = True)¶ Bases:
tednet.tnn.tn_linear._TNLinear
The Tensor Ring Decomposition Linear.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The decomposition shape of feature in
out_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The decomposition shape of feature out
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+n}\). The ranks of linear
bias (bool) – use bias of linear or not.
True
to use, andFalse
to not use
-
set_tn_type
()¶ Set as Tensor Ring decomposition type.
-
set_nodes
()¶ Generate tensor ring nodes, then add node information to self.tn_info.
-
set_params_info
()¶ Record information of Parameters.
-
reset_parameters
()¶ Reset parameters.
-
tn_contract
(inputs: torch.Tensor) → torch.Tensor¶ Tensor Ring linear forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C}\)
- Returns
tensor \(\in \mathbb{R}^{b \times C'}\)
- Return type
torch.Tensor
-
recover
()¶ Todo: Use for rebuilding the original tensor.
-
_abc_impl
= <_abc_data object>¶
-
class
tednet.tnn.tensor_ring.
TRLeNet5
(num_classes: int, rs: Union[list, numpy.ndarray])¶ Bases:
torch.nn.modules.module.Module
LeNet-5 based on Tensor Ring.
- Parameters
-
forward
(inputs: torch.Tensor) → torch.Tensor¶ forwarding method.
- Parameters
inputs (torch.Tensor) – tensor \(\in \mathbb{R}^{b \times C \times H \times W}\)
- Returns
tensor \(\in \mathbb{R}^{b \times num\_classes}\)
- Return type
torch.Tensor
-
class
tednet.tnn.tensor_ring.
TRResNet20
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.tensor_ring.tr_resnet.TRResNet
ResNet-20 based on Tensor Ring.
- Parameters
-
class
tednet.tnn.tensor_ring.
TRResNet32
(rs: Union[list, numpy.ndarray], num_classes: int)¶ Bases:
tednet.tnn.tensor_ring.tr_resnet.TRResNet
ResNet-32 based on Tensor Ring.
- Parameters
-
class
tednet.tnn.tensor_ring.
TRLSTM
(in_shape: Union[list, numpy.ndarray], hidden_shape: Union[list, numpy.ndarray], ranks: Union[list, numpy.ndarray], drop_ih: float = 0.25, drop_hh: float = 0.25)¶ Bases:
tednet.tnn.tn_rnn._TNLSTM
LSTM based on Tensor Ring.
- Parameters
in_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^m\). The input shape of LSTM
hidden_shape (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^n\). The hidden shape of LSTM
ranks (Union[list, numpy.ndarray]) – 1-D param \(\in \mathbb{R}^{m+n}\). The ranks of linear
drop_ih (float) – The dropout rate of input-to-hidden door
drop_hh (float) – The dropout rate of hidden-to-hidden door
-
reset_ih
()¶ Reset parameters of input-to-hidden layer.
Contact¶
This project is mainly developed by Yu Pan(iperryuu@gmail.com) and Maolin Wang(morin.w98@gmail.com), etc. Feel free to contact us.
A sample for Tensorial Convolutional Neural Network¶
By replacing convolutional kernel with tensor cores, tensorial CNN is constructed.
Here is an tensor ring example to use a TR-based model with tednet
.
[1]:
from managpu import GpuManager
my_gpu = GpuManager()
my_gpu.set_by_memory(1)
import random
import tednet as tdt
import tednet.tnn.tensor_ring as tr
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
No GPU Util Limit!
Sorted by memory:
GPU Index: 1 GPU FreeMemory: 11176 MB GPU Util: 0%
GPU Index: 2 GPU FreeMemory: 11176 MB GPU Util: 0%
GPU Index: 4 GPU FreeMemory: 11176 MB GPU Util: 0%
GPU Index: 0 GPU FreeMemory: 6133 MB GPU Util: 74%
GPU Index: 3 GPU FreeMemory: 1109 MB GPU Util: 100%
GPU Index: 5 GPU FreeMemory: 1109 MB GPU Util: 100%
GPU Index: 6 GPU FreeMemory: 1109 MB GPU Util: 100%
GPU Index: 7 GPU FreeMemory: 1109 MB GPU Util: 0%
Qualified GPU Index is: [1]
Set basic environment
[2]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
seed = 233
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if use_cuda:
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = True
Set dataloader
[3]:
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('./data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=128, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=256, shuffle=True, **kwargs)
Set training and testing process
[4]:
def train(model, device, train_loader, optimizer, epoch, log_interval=200):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.cross_entropy(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
Begin training
[5]:
# Define a TR-LeNet5
model = tr.TRLeNet5(10, [6, 6, 6, 6])
model.to(device)
optimizer = optim.SGD(model.parameters(), lr=2e-2, momentum=0.9, weight_decay=5e-4)
for epoch in range(20):
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
compression_ration is: 0.3968253968253968
compression_ration is: 14.17233560090703
compression_ration is: 241.54589371980677
compression_ration is: 2.867383512544803
Train Epoch: 0 [0/60000 (0%)] Loss: 2.633792
Train Epoch: 0 [25600/60000 (43%)] Loss: 0.109367
Train Epoch: 0 [51200/60000 (85%)] Loss: 0.133933
Test set: Average loss: 0.0756, Accuracy: 9751/10000 (98%)
Train Epoch: 1 [0/60000 (0%)] Loss: 0.074946
Train Epoch: 1 [25600/60000 (43%)] Loss: 0.039371
Train Epoch: 1 [51200/60000 (85%)] Loss: 0.029103
Test set: Average loss: 0.0691, Accuracy: 9782/10000 (98%)
Train Epoch: 2 [0/60000 (0%)] Loss: 0.113578
Train Epoch: 2 [25600/60000 (43%)] Loss: 0.099431
Train Epoch: 2 [51200/60000 (85%)] Loss: 0.084437
Test set: Average loss: 0.0544, Accuracy: 9826/10000 (98%)
Train Epoch: 3 [0/60000 (0%)] Loss: 0.130137
Train Epoch: 3 [25600/60000 (43%)] Loss: 0.083295
Train Epoch: 3 [51200/60000 (85%)] Loss: 0.021406
Test set: Average loss: 0.0608, Accuracy: 9799/10000 (98%)
Train Epoch: 4 [0/60000 (0%)] Loss: 0.044310
Train Epoch: 4 [25600/60000 (43%)] Loss: 0.025041
Train Epoch: 4 [51200/60000 (85%)] Loss: 0.017827
Test set: Average loss: 0.0446, Accuracy: 9861/10000 (99%)
Train Epoch: 5 [0/60000 (0%)] Loss: 0.035976
Train Epoch: 5 [25600/60000 (43%)] Loss: 0.130144
Train Epoch: 5 [51200/60000 (85%)] Loss: 0.066351
Test set: Average loss: 0.0457, Accuracy: 9854/10000 (99%)
Train Epoch: 6 [0/60000 (0%)] Loss: 0.071825
Train Epoch: 6 [25600/60000 (43%)] Loss: 0.031684
Train Epoch: 6 [51200/60000 (85%)] Loss: 0.049287
Test set: Average loss: 0.0444, Accuracy: 9854/10000 (99%)
Train Epoch: 7 [0/60000 (0%)] Loss: 0.074904
Train Epoch: 7 [25600/60000 (43%)] Loss: 0.083052
Train Epoch: 7 [51200/60000 (85%)] Loss: 0.021132
Test set: Average loss: 0.0397, Accuracy: 9880/10000 (99%)
Train Epoch: 8 [0/60000 (0%)] Loss: 0.020113
Train Epoch: 8 [25600/60000 (43%)] Loss: 0.022854
Train Epoch: 8 [51200/60000 (85%)] Loss: 0.008770
Test set: Average loss: 0.0424, Accuracy: 9866/10000 (99%)
Train Epoch: 9 [0/60000 (0%)] Loss: 0.007447
Train Epoch: 9 [25600/60000 (43%)] Loss: 0.095077
Train Epoch: 9 [51200/60000 (85%)] Loss: 0.018731
Test set: Average loss: 0.0339, Accuracy: 9896/10000 (99%)
Train Epoch: 10 [0/60000 (0%)] Loss: 0.025279
Train Epoch: 10 [25600/60000 (43%)] Loss: 0.038482
Train Epoch: 10 [51200/60000 (85%)] Loss: 0.043692
Test set: Average loss: 0.0391, Accuracy: 9882/10000 (99%)
Train Epoch: 11 [0/60000 (0%)] Loss: 0.022135
Train Epoch: 11 [25600/60000 (43%)] Loss: 0.008357
Train Epoch: 11 [51200/60000 (85%)] Loss: 0.031139
Test set: Average loss: 0.0380, Accuracy: 9882/10000 (99%)
Train Epoch: 12 [0/60000 (0%)] Loss: 0.004145
Train Epoch: 12 [25600/60000 (43%)] Loss: 0.024185
Train Epoch: 12 [51200/60000 (85%)] Loss: 0.030595
Test set: Average loss: 0.0354, Accuracy: 9887/10000 (99%)
Train Epoch: 13 [0/60000 (0%)] Loss: 0.013407
Train Epoch: 13 [25600/60000 (43%)] Loss: 0.008846
Train Epoch: 13 [51200/60000 (85%)] Loss: 0.061894
Test set: Average loss: 0.0380, Accuracy: 9867/10000 (99%)
Train Epoch: 14 [0/60000 (0%)] Loss: 0.017808
Train Epoch: 14 [25600/60000 (43%)] Loss: 0.002656
Train Epoch: 14 [51200/60000 (85%)] Loss: 0.013447
Test set: Average loss: 0.0354, Accuracy: 9887/10000 (99%)
Train Epoch: 15 [0/60000 (0%)] Loss: 0.009893
Train Epoch: 15 [25600/60000 (43%)] Loss: 0.081577
Train Epoch: 15 [51200/60000 (85%)] Loss: 0.018266
Test set: Average loss: 0.0326, Accuracy: 9893/10000 (99%)
Train Epoch: 16 [0/60000 (0%)] Loss: 0.011158
Train Epoch: 16 [25600/60000 (43%)] Loss: 0.004466
Train Epoch: 16 [51200/60000 (85%)] Loss: 0.034247
Test set: Average loss: 0.0343, Accuracy: 9891/10000 (99%)
Train Epoch: 17 [0/60000 (0%)] Loss: 0.030956
Train Epoch: 17 [25600/60000 (43%)] Loss: 0.010426
Train Epoch: 17 [51200/60000 (85%)] Loss: 0.061093
Test set: Average loss: 0.0315, Accuracy: 9897/10000 (99%)
Train Epoch: 18 [0/60000 (0%)] Loss: 0.017390
Train Epoch: 18 [25600/60000 (43%)] Loss: 0.023027
Train Epoch: 18 [51200/60000 (85%)] Loss: 0.029767
Test set: Average loss: 0.0332, Accuracy: 9888/10000 (99%)
Train Epoch: 19 [0/60000 (0%)] Loss: 0.034303
Train Epoch: 19 [25600/60000 (43%)] Loss: 0.003748
Train Epoch: 19 [51200/60000 (85%)] Loss: 0.026581
Test set: Average loss: 0.0307, Accuracy: 9898/10000 (99%)
A sample for Tensorial Recurrent Neural Network¶
By replacing input-to-hidden layer of a RNN with tensor cores, tensorial RNN is constructed.
Here is an tensor ring example to use a TR-based model with tednet
.
[1]:
from managpu import GpuManager
my_gpu = GpuManager()
my_gpu.set_by_memory(1)
import random
from collections import namedtuple
import tednet as tdt
import tednet.tnn.tensor_ring as tr
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
No GPU Util Limit!
Sorted by memory:
GPU Index: 2 GPU FreeMemory: 11176 MB GPU Util: 0%
GPU Index: 4 GPU FreeMemory: 11176 MB GPU Util: 0%
GPU Index: 1 GPU FreeMemory: 10129 MB GPU Util: 0%
GPU Index: 0 GPU FreeMemory: 6133 MB GPU Util: 37%
GPU Index: 3 GPU FreeMemory: 1109 MB GPU Util: 94%
GPU Index: 5 GPU FreeMemory: 1109 MB GPU Util: 100%
GPU Index: 6 GPU FreeMemory: 1109 MB GPU Util: 100%
GPU Index: 7 GPU FreeMemory: 1109 MB GPU Util: 95%
Qualified GPU Index is: [2]
Set basic environment
[2]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
seed = 233
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if use_cuda:
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = True
LSTMState = namedtuple('LSTMState', ['hx', 'cx'])
Input_Size = np.prod([28, 28])
Hidden_Size = 256
Set dataloader
[3]:
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('./data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=128, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=256, shuffle=True, **kwargs)
Set TR-LSTM Classifier
[4]:
class ClassifierTR(nn.Module):
def __init__(self, num_class=10):
super(ClassifierTR, self).__init__()
in_shape = [28, 28]
hidden_shape = [16, 16]
self.hidden_size = Hidden_Size
self.lstm = tr.TRLSTM(in_shape, hidden_shape, [5, 5, 5, 5])
self.fc = nn.Linear(self.hidden_size, num_class)
def forward(self, x, state):
input_shape = x.shape
batch_size = input_shape[0]
seq_size = input_shape[1]
x = x.view(batch_size, seq_size, -1)
x = x.permute(1, 0, 2)
_, x = self.lstm(x, state)
x = self.fc(x[0])
return x
Set training and testing process
[5]:
def train(model, device, train_loader, optimizer, epoch, log_interval=200):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
batch_size = data.shape[0]
state = LSTMState(torch.zeros(batch_size, Hidden_Size, device=device),
torch.zeros(batch_size, Hidden_Size, device=device))
output = model(data, state)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
batch_size = data.shape[0]
state = LSTMState(torch.zeros(batch_size, Hidden_Size, device=device),
torch.zeros(batch_size, Hidden_Size, device=device))
output = model(data, state)
test_loss += F.cross_entropy(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
Begin training
[6]:
# Define a TR-LSTM
model = ClassifierTR()
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=2e-4, weight_decay=0.00016667)
for epoch in range(20):
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
compression_ration is: 236.12235294117647
Train Epoch: 0 [0/60000 (0%)] Loss: 2.271237
Train Epoch: 0 [25600/60000 (43%)] Loss: 2.037606
Train Epoch: 0 [51200/60000 (85%)] Loss: 1.804040
Test set: Average loss: 1.5393, Accuracy: 5888/10000 (59%)
Train Epoch: 1 [0/60000 (0%)] Loss: 1.675199
Train Epoch: 1 [25600/60000 (43%)] Loss: 1.413317
Train Epoch: 1 [51200/60000 (85%)] Loss: 1.376522
Test set: Average loss: 1.0163, Accuracy: 6931/10000 (69%)
Train Epoch: 2 [0/60000 (0%)] Loss: 1.204728
Train Epoch: 2 [25600/60000 (43%)] Loss: 1.068120
Train Epoch: 2 [51200/60000 (85%)] Loss: 1.048317
Test set: Average loss: 0.7734, Accuracy: 7470/10000 (75%)
Train Epoch: 3 [0/60000 (0%)] Loss: 0.902623
Train Epoch: 3 [25600/60000 (43%)] Loss: 0.709798
Train Epoch: 3 [51200/60000 (85%)] Loss: 0.772015
Test set: Average loss: 0.6653, Accuracy: 7714/10000 (77%)
Train Epoch: 4 [0/60000 (0%)] Loss: 0.793773
Train Epoch: 4 [25600/60000 (43%)] Loss: 0.747470
Train Epoch: 4 [51200/60000 (85%)] Loss: 0.739394
Test set: Average loss: 0.5988, Accuracy: 8006/10000 (80%)
Train Epoch: 5 [0/60000 (0%)] Loss: 0.711895
Train Epoch: 5 [25600/60000 (43%)] Loss: 0.610803
Train Epoch: 5 [51200/60000 (85%)] Loss: 0.705731
Test set: Average loss: 0.5535, Accuracy: 8195/10000 (82%)
Train Epoch: 6 [0/60000 (0%)] Loss: 0.803615
Train Epoch: 6 [25600/60000 (43%)] Loss: 0.608962
Train Epoch: 6 [51200/60000 (85%)] Loss: 0.600730
Test set: Average loss: 0.5210, Accuracy: 8317/10000 (83%)
Train Epoch: 7 [0/60000 (0%)] Loss: 0.507197
Train Epoch: 7 [25600/60000 (43%)] Loss: 0.634771
Train Epoch: 7 [51200/60000 (85%)] Loss: 0.603676
Test set: Average loss: 0.4965, Accuracy: 8445/10000 (84%)
Train Epoch: 8 [0/60000 (0%)] Loss: 0.553993
Train Epoch: 8 [25600/60000 (43%)] Loss: 0.539877
Train Epoch: 8 [51200/60000 (85%)] Loss: 0.589516
Test set: Average loss: 0.4719, Accuracy: 8535/10000 (85%)
Train Epoch: 9 [0/60000 (0%)] Loss: 0.575935
Train Epoch: 9 [25600/60000 (43%)] Loss: 0.494978
Train Epoch: 9 [51200/60000 (85%)] Loss: 0.600699
Test set: Average loss: 0.4522, Accuracy: 8601/10000 (86%)
Train Epoch: 10 [0/60000 (0%)] Loss: 0.425709
Train Epoch: 10 [25600/60000 (43%)] Loss: 0.439076
Train Epoch: 10 [51200/60000 (85%)] Loss: 0.427697
Test set: Average loss: 0.4368, Accuracy: 8677/10000 (87%)
Train Epoch: 11 [0/60000 (0%)] Loss: 0.512469
Train Epoch: 11 [25600/60000 (43%)] Loss: 0.499898
Train Epoch: 11 [51200/60000 (85%)] Loss: 0.412309
Test set: Average loss: 0.4227, Accuracy: 8710/10000 (87%)
Train Epoch: 12 [0/60000 (0%)] Loss: 0.555337
Train Epoch: 12 [25600/60000 (43%)] Loss: 0.330346
Train Epoch: 12 [51200/60000 (85%)] Loss: 0.340294
Test set: Average loss: 0.4089, Accuracy: 8746/10000 (87%)
Train Epoch: 13 [0/60000 (0%)] Loss: 0.419118
Train Epoch: 13 [25600/60000 (43%)] Loss: 0.335568
Train Epoch: 13 [51200/60000 (85%)] Loss: 0.328040
Test set: Average loss: 0.3973, Accuracy: 8792/10000 (88%)
Train Epoch: 14 [0/60000 (0%)] Loss: 0.384958
Train Epoch: 14 [25600/60000 (43%)] Loss: 0.436771
Train Epoch: 14 [51200/60000 (85%)] Loss: 0.440793
Test set: Average loss: 0.3865, Accuracy: 8819/10000 (88%)
Train Epoch: 15 [0/60000 (0%)] Loss: 0.483415
Train Epoch: 15 [25600/60000 (43%)] Loss: 0.395679
Train Epoch: 15 [51200/60000 (85%)] Loss: 0.482825
Test set: Average loss: 0.3761, Accuracy: 8861/10000 (89%)
Train Epoch: 16 [0/60000 (0%)] Loss: 0.436840
Train Epoch: 16 [25600/60000 (43%)] Loss: 0.339861
Train Epoch: 16 [51200/60000 (85%)] Loss: 0.366399
Test set: Average loss: 0.3689, Accuracy: 8894/10000 (89%)
Train Epoch: 17 [0/60000 (0%)] Loss: 0.442870
Train Epoch: 17 [25600/60000 (43%)] Loss: 0.370757
Train Epoch: 17 [51200/60000 (85%)] Loss: 0.403360
Test set: Average loss: 0.3585, Accuracy: 8924/10000 (89%)
Train Epoch: 18 [0/60000 (0%)] Loss: 0.346232
Train Epoch: 18 [25600/60000 (43%)] Loss: 0.452554
Train Epoch: 18 [51200/60000 (85%)] Loss: 0.318595
Test set: Average loss: 0.3496, Accuracy: 8960/10000 (90%)
Train Epoch: 19 [0/60000 (0%)] Loss: 0.272001
Train Epoch: 19 [25600/60000 (43%)] Loss: 0.430083
Train Epoch: 19 [51200/60000 (85%)] Loss: 0.446394
Test set: Average loss: 0.3433, Accuracy: 8976/10000 (90%)
Tensors¶
Tensors, also known as multi-way arrays, can be viewed as a higher-order extension of vectors (i.e., an 1st-order tensors) and a matrices (i.e., an 2nd-order tensors). Like rows and columns in a matrix, an Nth-order tensor \({\mathcal X}\in\mathbb R^{I_1\times I_2 \ldots\times I_N}\) has N-modes (or ways, orders, indices) whose lengths (or dimensions) are represented by \(I_1, \ldots, I_N\) respectively. Tensors can be graphically represented in diagrams which is known as Tensor Networks. As following illustration, a black node denotes a tensor and a edge connected to the node means a tensor mode.

Tensor Contraction¶
Tensor contraction is the most typical operation for tensors, contracting two tensors into one tensor along the associated pairs of indices. As a result, the corresponding connected edges disappear while the dangling edges persist. The Tensor Network representation of such operation can be illustrated as:

As show in above figure, contraction between a 5th-order tensor \({\mathcal A}\) and a 4th-order tensor \({\mathcal B}\) along the index pairs \((i_5,j_1)\) and \((i_3,j_2)\) yields a 5th-order tensor \({\mathcal C}\), with entries
\({\mathcal C}_{i_1,i_2,i_4,j_3,j_4}=\sum_{i_3,i_5} {\mathcal A}_{i_1,i_2,i_3,i_4,i_5} {\mathcal B}_{i_5,i_3,j_3,j_4}\).
Tensor contractions among multiple tensors can be computed by performing tensor contraction between two tensors many times. Hence, the order (or number of modes) of an entire Tensor Network is given by the number of dangling edges which are not contracted.
Tensor decomposition is a common technique for compressing Neural Networks, by decomposing a higher-order tensor into several lower-order tensors (usually matrices or 3rd-order tensors) that are sparsely interconnected through the tensor contraction operator. The basic tensor decomposition include CANDECOMP/PARAFAC (CP), Tucker, Block Term (BT), Tensor Train (TT) and so on. And such decomposition formats can be illustrated as the corresponding Tensor Network diagrams.
Tensorized FC Layers¶
By replacing Fully-Connected (FC) layers or Convolution Layers with tensorized layers, large amount of parameters can be reduced. For example, a FC layer is formulated as \({y}= {W}{x}\) and can be illustrated as

By a simple reshaping method, we can reformulate the FC layer as
\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}{\mathcal W}_{i_1,\ldots,i_N,j_1,\ldots,j_M} ~x_{i_1,i_2,\ldots,i_N}\).
According to the differences of the tensor decomposition ways, there are different tensor formats to represent FC Layers. The most popular tensor decomposition format included CP, Tucker, Block-Term Tucker, Tensor Train and Tensor Ring.
CP Layers¶
The CP decomposition (also called CANDECOMP/PARAFAC decomposition) factorizes a higher-order tensor into a sum of several rank-1 tensor components. The mathematical neural network layer format of utilizing CP decomposition is
\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{r=1}^Rg_{r} a^{(1)}_{i_1,r}\ldots a^{(N)}_{i_N,r}a^{(N+1)}_{j_1,r}\ldots a^{(N+M)}_{j_M,r} x_{i_1,i_2,\ldots,i_N}\).

When calculating the CP decomposition, the first issue arises is how to determine the number of rank-1 tensor components, i.e., CP-rank \(R\). Actually, it’s an NP-hard problem. In practice, an numerical value is usually assumed in advance, i.e., as a hyperparameter, to fit various CP-based models
Tucker Layers¶
Tucker decomposition factorizes a higher-order tensor into a core tensor multiplied by a corresponding factor matrix along each mode. To be more specific, the mathematical neural network layer format of utilizing tucker decomposition is
\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\cdots\sum_{r_N=1}^{R_N}g_{r_1,r_2,\ldots,r_N} a^{(1)}_{i_1,r}\ldots a^{(N)}_{i_N,r}a^{(N+1)}_{j_1,r}\ldots a^{(N+M)}_{j_M,r} x_{i_1,i_2,\ldots,i_N}\).

Here, please note that compared with the CP-rank, \(R_1, R_2, \ldots, R_N\) could take different numerical values.
Block-Term Tucker Layers¶
Recently, a more generalized decomposition method called Block Term (BT) decomposition, which generalizes CP and Tucker via imposing a block diagonal constraint on the core tensor, has been proposed to make a trade-off between them. The BT decomposition aims to decompose a tensor into a sum of several Tucker decompositions with low Tucker-ranks. The mathematical neural network layer format is
\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{c}^{C}\sum_{r_1,\ldots,r_N=1}^{R_1,\ldots,R_N}g_{r_1,\ldots,r_N} a^{(1)}_{i_1,c,r_1}\ldots a^{(N)}_{i_N,c,r_N}a^{(N+1)}_{j_1,c,r_{N+1}}\ldots a^{(N+M)}_{j_M,c,r_{N+M}} x_{i_1,i_2,\ldots,i_N}\).

\(R_T\) denotes the Tucker-rank (which means the Tucker-rank equals \(\{R_1, ..., R_N\}\)) and \(C\) represents the CP-rank. They are together called BT-ranks.
Tensor Train (Matrix Product Operator) Layers¶
Matrix Tensor train (mTT) decomposition(sometimes also called Tensor Train), also called Matrxi Product Operator(MPO) in quantum physics, factorizes a higher-order tensor into a linear multiplication of a series of 4th-order core tensors. The mathematical neural network layer format is
\({\mathcal{Y}}_{j_1,\ldots,j_N}= \sum_{i_1,\ldots,i_N=1}^{I_1,\ldots,I_N}\sum_{r_{1},\ldots,r_{N}=1}^{R_1,\ldots,R_{N}} g^{(1)}_{i_1,j_1,r_1} g^{(2)}_{r_1,i_2,j_2,r_2}\cdots g^{(N)}_{r_{N},i_N,j_N} x_{i_1,i_2,\ldots,i_N}\).

\(\{R_1,R_2,\ldots,R_{N-1}\}\) denote the TT-ranks.
Tensor Ring (Matrix Product State) Layers¶
Tensor Train benefits from fast convergence, however, it suffers from the two endpoints, which hinders the representation ability and the flexibility of the TT-based models. Thus, to release the power of the linear architecture, researchers link the endpoints to constitute a ring format named Tensor Ring(TR). The mathematical neural network layer format is
\({\mathcal{Y}}_{j_1,\ldots,j_M}= \sum_{i_1,\ldots,i_{N+M}~~=1}^{I_1,\ldots,I_{N+M}}\sum_{r_{0},\ldots,r_{N+M-1}~~~=1}^{R_0,\ldots,R_{N+M-1}} g^{(1)}_{r_0,i_1,r_1} \cdots g^{(N+1)}_{r_{N},j_1, r_{N+1}} \cdots g^{(N+M)}_{r_{N+M-1}~~,j_M, r_{0}} x_{i_1,i_2,\ldots,i_N}\).

where \(\{R_0,R_1,\ldots,R_{N}\}\) denote the TR-ranks, each node s an 3rd-order tensor and \(R_0=R_N\). Compared with Tensor Train, it is not necessary for TR to follow a strict order when multiplying its nodes.
Combination of Tensor Decomposition and Deep Networks¶
These tensor Layer formats can be directlty adopted by simple Neural Networks and RNNs to replace the huge FC layers.
For the convolution layers, we can also generalize this idea. We can also represent the convolution weights into tensor decomposition format. For example, the convolution weight is:
\(\mathcal{W}\in R^{H \times W \times C_{in} \times C_{out}}\)
the CP format is:
\(\mathcal{W}_{h,w,c_{in},c_{out}} = \sum_{r=1}^Rg_{r} a^{(1)}_{h,r}a^{(2)}_{w,r}a^{(3)}_{c_{in},r}a^{(4)}_{c_{out},r}\)
if we generalize the two channel dimentions, the weight can be represented as:
\(\mathcal{W}_{h,w,c_{in(1)}\cdots c_{in(N)},c_{out(1)}\cdots c_{out(M)}} = \sum_{r=1}^Rg_{r} a^{(1)}_{h,r}a^{(2)}_{w,r} a^{(3)}_{c_{in(1)},r}\cdots a^{(2+n)}_{c_{in(n)},r}a^{(3+N)}_{c_{out(1)},r}\cdots a^{(2+N+M)}_{c_{out(M)},r}\)
We can also represent the weight in other tensor formats.