联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2019-04-28 08:50

IEEE P1918.1.1

Haptic Codecs for the Tactile Internet Task Group

Proposal for Tactile Codec

TUM Vibrotactile Perceptual Codec based on DWT and SPIHT (TUM-VPC-DS)

DCN: HC_NGS_19-1-r0_Proposal_for_Tactile_Codec

Date: 2019-3-29

Abstract

This document describes a proposal for a tactile codec for the IEEE P1918.1.1 standardization

activity in response to the respective Call for Contributions. The proposed codec uses a

perceptual approach with a DWT and subsequent quantization. The quantizer is designed to be

adaptive considering a psychohaptic model. After quantization we further compress using the

SPIHT algorithm and generate the bitstream. The whole process is modulary, hence the encoder

can work with any psychohaptic model. This allows for future enhancements.

Subclause 5.2.1 of the IEEE-SA Standards Board Bylaws states, "While participating in IEEE

standards development activities, all participants...shall act in accordance with all applicable laws

(nation-based and international), the IEEE Code of Ethics, and with IEEE Standards policies and

procedures."

The contributor acknowledges and accepts that this contribution is subject to

The IEEE Standards copyright policy as stated in the IEEE-SA Standards Board Bylaws,

section 7, http://standards.ieee.org/develop/policies/bylaws/sect6-7.html#7, and the

IEEE-SA Standards Board Operations Manual, section 6.1,

http://standards.ieee.org/develop/policies/opman/sect6.html

The IEEE Standards patent policy as stated in the IEEE-SA Standards Board Bylaws,

section 6, http://standards.ieee.org/develop/policies/bylaws/sect6-7.html#6, and the

IEEE-SA Standards Board Operations Manual, section 6.3,

http://standards.ieee.org/develop/policies/opman/sect6.html

1 Technical Description

The proposed compression scheme involves various operations in the encoder as depicted in

Figure 1.

Figure 1: Encoding structure of the proposed tactile codec.

The input signal is split into blocks. Each block is first decomposed by a Discrete Wavelet

Transform (DWT) using CDF 9/7 filters. At the same time, each block is transformed by a DFT.

The obtained spectrum is passed on to a psychohaptic model that computes masking and

perception thresholds for the corresponding block. Results from the psychohaptic model are used

to adapt the quantization and allocate bits to different DWT bands. After quantizing the wavelet

coefficients we further compress them with an adaptation of the SPIHT algorithm from [1]. The

compressed bitstream is then multiplexed with side information from the quantization and stored

or transmitted. In the following, we will explain all steps in more detail.

1.1 DWT

The DWT operates on the blocks by applying the CDF 9/7 filters. These filters are chosen as they

have a symmetric impulse response, which implies linear phase. Therefore, we achieve the same

number of wavelet coefficients in each block as we have input signal values. In addition, the CDF

9/7 filters are almost orthogonal, meaning we can calculate signal energy values in the wavelet

domain with acceptable accuracy.

1.2 Psychohaptic Model

The psychohaptic model plays a crucial role in the codec as it adapts the quantizer to introduce

distortion where it is least perceivable. We start by taking the FFT of an input block and represent

it in dB. We determine dominant peaks in the spectrum. It has been shown in [2] that for tonal

signals we observe masking phenomena. Thus, in a more complex signal we assume that masking

will occur as well, which means that dominant peaks will increase perception thresholds around

them. To model this we use quadratic spreading functions that constitute masking thresholds for

peaks at different frequencies. Then all masking thresholds are added together with the absolute

threshold of perception by power additive combination. This yields the so-called global masking

threshold. This process is illustrated in Figure 2.

After obtaining the global masking threshold, we compute the so-called Signal-to-Mask-Ratio

(SMR) for each DWT band. That is we take the energy of the spectrum in each band divided by

the energy of the previously obtained global masking threshold. The SMR values are passed on to

the quantizer together with the values of the signal energy in each band.

1.3 Quantization

The quantizer is the core component of our codec. It allocates a certain bit budget to the different

DWT bands according to the psychohaptic model to reduce the rate considerably without

introducing any perceivable distortion.

To accomplish this task the quantizer takes into account the values from the psychohaptic

model. In a loop a total of n bits are allocated to each band. We start with 0 bits allocated to all

bands. In every iteration we calculate the in dB using the signal energy values in each band

passed over by the psychohaptic model and the noise energy introduced by the quantization. We

then calculate the so-called Mask-to-Noise-Ratio. Then we

allocate one bit to the band with the lowest value and repeat until all n bits are allocated.

Since in general the bands will have a different number of quantizer bits, we design the quantizer

itself as an embedded deadzone quantizer adapted from [2]. We first calculate the maximum

wavelet coefficient for the current block ?)*+. This value is quantized to a fixed point number

with 3 integer bits and 4 fraction bits by a ceiling operation to receive ?,)*+ . The 7 bits

representing this maximum value are passed on to the bitstream encoding as side information. The

quantizer then takes the bits allocated to each band and this maximum value to determine the

quantization interval asΔ = ,)*+2/ ,

where is the number of bits allocated to a particular band. The wavelet coefficients are then

quantized according to2 = sgn() 8Δ9 Δ.

Figure 2: Magnitude spectrum of an exemplary block (blue), computed masking thresholds (red), absolute threshold

of perception (green) and the resulting global masking threshold (black).

Thus, the wavelet coefficients are quantized to the original range. This formula also implies the

addition of one sign bit. After all bits have been allocated and therefore all wavelet coefficients

have been quantized, we scale all the quantized wavelet coefficients to integers by.

These quantized integer wavelet coefficients are passed on to the SPIHT algorithm.

1.4 SPIHT

In order to efficiently compress the quantized wavelet coefficients, we employ a 1D version of Set

Partitioning in Hierarchical Trees (SPIHT) algorithm proposed in [1]. SPIHT is a zero tree based

coding method, which achieves superior performance than Embedded Zero-tree Wavelet (EZW)

coding. It utilizes two types of zero trees and encodes the significant coefficients and zero trees by

successive significance and refinement passes. The details of the algorithm for coding of 2D

wavelet coefficients is provided in [1], and exemplified in [3]. We adapt the same for the quantized

1D wavelet coefficients, by constructing the parent-child relationship in only one dimension. The

output of the SPIHT module is the bitstream of lossless compression of quantized 1D wavelet

coefficients.

1.5 Bitstream Encoding

In order for the decoder to be able to decompress the signals correctly, we need to pass some side

information in the bitstream. We therefore add a header on front of every compressed block. This

header consists of 32 bits for a 512 sample long block and codes the following information:

- 14 bits: Length of the following bitstream segment belonging to one block

- 2 bits: Coding of block length chosen from 64, 128, 256, and 512.

- 6 bits: Integer number coding the maximum number of bits allocated to the DWT bands

- 3 bits: Integer coding the level of the DWT

- 7 bits: Fixed point number with 3 integer and 4 fraction bits coding the maximum wavelet

coefficient value of the current block.

For smaller block lengths the length of the header can be reduced accordingly.

1.6 Decoding

The decoder can be built very simply by 4 operations. First, the blocks are separated out of the

bitstream, followed be an inverse SPIHT algorithm. Then we dequantize and do an inverse DWT

to obtain the reconstructed signal suitable for playback.

2 Performance Evaluation

We aim to show the performance of our compression scheme by examining its rate-distortion

behavior. We use the provided test data set consisting of 280 vibrotactile signals recorded with an

accelerometer. The test dataset contains signals of various materials for different exploration

speeds. We compress the signals using a block length of 512 samples and a DWT of level 7.

All signals are encoded, decoded and the resulting output is then compared to the original. We

vary the bit budgets of the quantizer between 8 and 112 bits to achieve different rates and therefore

quality levels. We define the compression ratio () as the ratio between the original rate and the

compressed rate. Then, we compute SNR and PSNR for all 280 test signals for different values.

The respective scatter plots for all three metrics with averages are given in the following plots. In

blue are the scatter plots for all test signals at different rates and in red the average over all test

signals. It is clearly visible that the quality decreases with increasing compression. At a of 10

we have an SNR of about 10 dB and a PSNR of about 52 dB.

Additionally, the results for different bit budgets are given in the following table.

Here we also computed the required runtime per block of our algorithm in MATLAB. Especially

for low rates, this time is sufficiently low, to allow for a real-time scenario. In this case we would

have to choose a significantly lower block length, since 512 samples already will account for a

delay of about 180ms. A block length of 64 samples would deliver 23ms of delay at the cost of a

slightly worse compression performance.

To assess the behavior of our algorithm in more detail, we examine individual signals in terms of

their PSNR over performance. The resulting plots are given in the following figures.

MSE SNR (dB) PSNR (dB) Runtime per block (ms)

8 54.65 1.51 × 10FG 2.56 45.12 4.3

10 41.62 1.38 × 10FG 3.20 45.75 4.2

12 32.58 1.23 × 10FG 3.81 46.36 4.7

14 26.74 1.10 × 10FG 4.44 47.00 5.4

16 22.24 9.61 × 10FH 5.02 47.58 5.9

20 15.90 6.68 × 10FH 6.24 48.80 6.9

24 11.53 4.19 × 10FH 7.78 50.34 8.5

28 8.73 2.46 × 10FH 9.56 52.11 9.7

32 6.90 1.31 × 10FH 11.50 54.06 11.2

40 4.98 4.00 × 10FI 15.12 57.67 12.9

48 3.69 1.17 × 10FI 19.22 61.78 15.0

56 2.77 3.26 × 10FJ 24.77 67.33 17.1

64 2.29 8.32 × 10FK 30.65 73.20 18.7

80 1.78 5.93 × 10FO 42.55 85.11 20.6

96 1.47 9.28 × 10FP 54.41 96.97 23.4

112 1.26 6.21 × 10FP 66.38 108.94 26.0

128 1.10 6.03 × 10FP 78.26 120.81 28.4

'Direct_-_1spike_Probe_-_cork_-_slower.mat'

(Signal #20)

'Direct_-_3x1spike_Probe_-_antiVibPad_-_fast.mat'

(Signal #84)

'Direct_-_3x1spike_Probe_-

_polyesterPad_-_slow.mat' (Signal #107)

'Direct_-_3x3small-round_Probe_-_felt_-_fast.mat'

(Signal #175)

'Direct_-_big-round_Probe_-_foam_-_fast.mat'

(Signal #255)

'Direct_-_big-round_Probe_-_foam_-_medium.mat'

(Signal #256)

'Direct_-_big-round_Probe_-_foam_-_tooSlow.mat'

(Signal #258)

'Direct_-_finger_Probe_-_foam_-_slower.mat'

(Signal #274)

We see that the quality decreases for all signals over the compression ratio.

Lastly, we aim to exemplify the behavior of our method towards the signal shape. This will help

to gain some further intuition into how perceivable the introduced distortions are. We take the first

signal from the 8 examples before ('Direct_-_1spike_Probe_-_cork_-_slower.mat') and plot the

first 200 samples together with reconstructed signals for = 8, 16, 32, 64. The results are given

in Figure 3. We can see that the general structure of the signal is preserved even for very high

levels of compression (= 8 is equivalent here to ≈ 62). At = 64 the two signals are so

close that we assume that no distortions should be perceivable. To assess the codec correctly in

terms of its transparency, we need to conduct extensive experiments and develop new metrics

based on human haptic perception.

3 Conclusion

We have presented a novel method to compress and encode 1D tactile signals. The rate distortion

performance is good and the algorithm allows for offline and online encoding with the appropriate

choice of block length. The transparency of the codec should be evaluated in terms of subjective

experiments and newly developed perceptual metrics.

The presented codec works with any choice of perceptual model, which readily allows for future

enhancements as better psychohaptic models are being developed. In addition, it can fairly easily

be extended to higher dimensional signals to allow for more points of interaction.

Figure 3: First 200 samples of the signal 'Direct_-_1spike_Probe_-_cork_-_slower.mat' for various levels of

compression determined by bit budget ?.

4 References

[1] A. Said and W. A. Pearlman, "A new, fast, and efficient image codec based on set partitioning

in hierarchical trees," IEEE Transactions on Circuits and Systems for Video Technology, vol.

6, no. 3, pp. 243-250, June 1996.

[2] R. Chaudhari, C. Schuwerk, M. Danaei and E. Steinbach, "Perceptual and Bitrate-Scalable

Coding of Haptic Surface Texture Signals," IEEE Journal of Selected Topics in Signal

Processing, vol. 9, no. 3, pp. 462-473, November 2014.

[3] D. S. Taubman und M. W. Marcellin, JPEG2000: Image compression fundamentals,

standards, and practice, Kluwer Academic, 2002.

Annex A: Information form for the submission of contributions

Name of Contribution: TUM Vibrotactile Perceptual Codec based on DWT and SPIHT

(TUM-VPC-DS)

Authors and Affiliation: Andreas Noll, Basak Gülecyüz, Eckehard Steinbach; Chair of

Media Technology, Technical University of Munich

Addressed Requirements and Test Conditions (see Section 4.2.1): Test condition 1: test data

traces

Summary of Proposal: The proposed codec uses a perceptual approach with a DWT and

subsequent quantization. The quantizer is designed to be adaptive considering a psychohaptic

model. After quantization we further compress using the SPIHT algorithm and generate the

bitstream. The whole process is modulary, hence the encoder can work with any psychohaptic

model. This allows for future enhancements.

Comments on Relevance to CfC: Fully in line with the CfC


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp