计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (4): 906-918.doi: 10.1007/s11390-022-2037-5

所属专题: Computer Graphics and Multimedia Computer Networks and Distributed Computing

• • 上一篇    下一篇



  • 收稿日期:2021-11-20 修回日期:2022-06-15 接受日期:2022-07-12 出版日期:2022-07-25 发布日期:2022-07-25

Gaze-Assisted Viewport Control for 360° Video on Smartphone

Linfeng Shen (申林峰), Student Member, IEEE, Yuchi Chen (陈煜驰), Student Member, IEEE, and Jiangchuan Liu (刘江川), Fellow, IEEE        

  1. School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, Canada
  • Received:2021-11-20 Revised:2022-06-15 Accepted:2022-07-12 Online:2022-07-25 Published:2022-07-25
  • Contact: Lin-Feng Shen E-mail:linfeng_shen@sfu.ca
  • About author:Linfeng Shen is currently a Ph.D. student in the School of Computing Science at Simon Fraser University, Burnaby, Canada. He received his B.Eng. degree in information security from Beijing University of Posts and Telecommunications, Beijing, in 2019, and his M.Sc. degree in computing science from Fraser University, Burnaby, in 2021. His research interests include edge computing and multimedia.

360°视频正在成为近年来的主要多媒体形式之一,与传统视频相比,它为观众提供了更多互动的沉浸式体验。当今的大多数实现都依赖于笨重的头戴式显示器 (HMD) 或需要触摸屏操作来进行交互式显示,这种方法不仅价格昂贵,而且对观看者来说也不方便。 并且还有一定比例的用户(大约 10-20%)在使用HMD时会出现恶心症状。据我们所知,目前还没有关于不使用HMD或者触屏操作在移动设备上进行360°视频播放的相关研究。
在本文中,我们展示了交互式 360°视频流的视场控制可以通过当今移动设备(例如智能手机)的前置摄像头检测到的视线运动来完成。为此,我们设计了一种轻量级的实时视线追踪方法。 并通过该结果进行360°视频流视场的更新。
我们的解决方案仅使用前置摄像头,通过轻量级的类 Haar 级联分类器检测用户的面部信息,测量用户面对屏幕的距离和视角,然后按照自定义的三角模型推导出用户注视点的位置.我们将其与流媒体模块集成并应用动态边缘自适应算法,以最大限度地减少电池受限的移动设备的整体能耗。
4、结果(Result & Findings)
我们首先评估了轻量级视线追踪方法的性能。我们将整个屏幕划分为 9块,让观众尽量将注意力集中在每个块的中心3到5秒。实验结果表明大部分注视点都位于正确的区域内,但仍有一些点位于区域外。鉴于当今智能手机的快速发展,尤其是嵌入式神经计算芯片的出现,我们相信很快可以在智能手机上实现高精度。我们方法的延迟从大约 200 毫秒降低到 132 毫秒,这在大多数情况下是可以接受的。除此之外,我们评估了动态边缘自适应算法的性能及其对节能的贡献。为了评估我们算法的可行性,我们定义了两个新的评估矩阵,名为“像素精确率”(PPR)和“像素效率”(PER)。 为了评估我们的方法对能源效率的贡献,我们测量了并对比了我们的方法与其他方法的耗电量。实验结果表明我们的方法可以达到能源效率和显示质量的最佳平衡。
我们在本文中设计了一种新的实时视线追踪方法,并将其应用于交互式 360°视频流。 这种方法使用户免于头戴式显示器和操作的不便。 为了解决我们系统面临的能耗挑战,我们设计了一种动态边缘自适应算法来提高系统的能耗效率。 在智能手机上的实验证明了我们系统的可行性和效率。 虽然现在它的精度仍然很低,我们必须在电脑上模拟一些模块,但我们相信它为未来的应用程序和设备带来了新的方向。 在未来的工作中,我们将继续使用瞳孔中心估计算法提高视线追踪的准确性,并探索其他新的方法,如帧编码,以减少更多的能源消耗。 我们还将探索模型压缩等高级机器学习方法,并努力在智能手机上完整的实现整套系统。

关键词: 360°视频, 视线追踪, 移动设备, 能源效率, 机器学习

Abstract: 360° video has been becoming one of the major media in recent years, providing immersive experience for viewers with more interactions compared with traditional videos. Most of today's implementations rely on bulky Head-Mounted Displays (HMDs) or require touch screen operations for interactive display, which are not only expensive but also inconvenient for viewers. In this paper, we demonstrate that interactive 360° video streaming can be done with hints from gaze movement detected by the front camera of today's mobile devices (e.g., a smartphone). We design a lightweight real-time gaze point tracking method for this purpose. We integrate it with streaming module and apply a dynamic margin adaption algorithm to minimize the overall energy consumption for battery-constrained mobile devices. Our experiments on state-of-the-art smartphones show the feasibility of our solution and its energy efficiency toward cost-effective real-time 360° video streaming.

Key words: 360° video, gaze tracking, mobile device, energy efficiency, machine learning

[1] Jiang N, Liu Y, Guo T, Xu W Y, Swaminathan V, Xu L S, Wei S. QuRate: Power-efficient mobile immersive video streaming. In Proc. the 11th ACM Multimedia Systems Conference, June 2020, pp.99-111. DOI: 10.1145/3339825.3391863.

[2] Chen H X, Dai Y T, Meng H, Chen Y L, Li T. Understanding the characteristics of mobile augmented reality applications. In Proc. the 2018 IEEE International Symposium on Performance Analysis of Systems and Software, Apr. 2018, pp.128-138. DOI: 10.1109/ISPASS.2018.00026.

[3] Chen J W, Hu M, Luo Z X, Wang Z L, Wu D. SR360: Boosting 360-degree video streaming with super-resolution. In Proc. the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, Jun. 2020, pp.1-6. DOI: 10.1145/3386290.3396929.

[4] Saredakis D, Szpak A, Birckhead B, Keage H, Rizzo A, Loetscher T. Factors associated with virtual reality sickness in head-mounted displays: A systematic review and meta-analysis. Frontiers in Human Neuroscience, 2020, 14: Article No. 96. DOI: 10.3389/fnhum.2020.00096.

[5] Chen X, Nixon K, Chen Y R. Practical power consumption analysis with current smartphones. In Proc. the 29th IEEE International System-on-Chip Conference, Sept. 2016, pp.333-337. DOI: 10.1109/SOCC.2016.7905505.

[6] Lee S, Jang D M, Jeong J B, Ryu E. Motion-constrained tile set based 360-degree video streaming using saliency map prediction. In Proc. the 29th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, Jun. 2019, pp.20-24. DOI: 10.1145/3304112.3325614.

[7] He J, Qureshi M A, Qiu L L, Li J, Li F, Han L. Rubiks: Practical 360-degree streaming for smartphones. In Proc. the 16th Annual International Conference on Mobile Systems, Applications, and Services, Jun. 2018, pp.482-494. DOI: 10.1145/3210240.3210323.

[8] Qian F, Ji L S, Han B, Gopalakrishnan V. Optimizing 360 video delivery over cellular networks. In Proc. the 6th Workshop on All Things Cellular: Operations, Applications and Challenges, Oct. 2016, pp.1-6. DOI: 10.1145/2980055.2980056.

[9] Dambra S, Samela G, Sassatelli L, Pighetti R, Aparicio-Pardo R, Pinna-Déry A. Film editing: New levers to improve VR streaming. In Proc. the 9th ACM Multimedia Systems Conference, Jun. 2018, pp.27-39. DOI: 10.1145/3204949.3204962.

[10] Sassatelli L, Winckler M, Fisichella T, Aparicio R, Pinna-Déry A. A new adaptation lever in 360 video streaming. In Proc. the 29th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, Jun. 2019, pp.37-42. DOI: 10.1145/3304112.3325610.

[11] Li Z J, Li M, Mohapatra P, Han J S, Chen S Y. iType: Using eye gaze to enhance typing privacy. In Proc. the 2017 IEEE Conference on Computer Communications, May 2017. DOI: 10.1109/INFOCOM.2017.8057233.

[12] Li Y H, Cao Z C, Wang J L. Gazture: Design and implementation of a gaze based gesture control system on tablets. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2017, 1(3): Article No. 74. DOI: 10.1145/3130939.

[13] Yan Z S, Song C, Lin F, Xu W Y. Exploring eye adaptation in head-mounted display for energy efficient smartphone virtual reality. In Proc. the 19th International Workshop on Mobile Computing Systems & Applications, Feb. 2018, pp.13-18. DOI: 10.1145/3177102.3177121.

[14] Yan K G, Zhang X Y, Tan J W J, Fu X. Redefining QoS and customizing the power management policy to satisfy individual mobile users. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2016, pp.1-12. DOI: 10.1109/MICRO.2016.7783756.

[15] He S H, Shen H Y, Soundararaj V, Yu L. Cloud assisted traffic redundancy elimination for power efficiency in smartphones. In Proc. the 15th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, Oct. 2018, pp.371-379. DOI: 10.1109/MASS.2018.00060.

[16] Lugaresi C, Tang J Q, Nash H et al. MediaPipe: A framework for building perception pipelines. arXiv:1906.08172, 2019. https://arxiv.org/abs/1906.08172, May 2021.

[17] Shen L F, Chen Y C, Liu J C. Energy-efficient interactive 360 video streaming with real-time gaze tracking on mobile devices. In Proc. the 18th International Conference on Mobile Ad Hoc and Smart Systems, Oct. 2021, pp.243-251. DOI: 10.1109/MASS52906.2021.00040.

[18] Sandler M, Howard A, Zhu M L, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510-4520. DOI: 10.1109/CVPR.2018.00474.

[19] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.

[20] Shi S, Gupta V, Jana R. Freedom: Fast recovery enhanced VR delivery over mobile networks. In Proc. the 17th Annual International Conference on Mobile Systems, Applications, and Services, Jun. 2019, pp.130-141. DOI: 10.1145/3307334.3326087.

[21] Fan C L, Lee J, Lo W C, Huang C Y, Chen K T, Hsu C H. Fixation prediction for 360video streaming in headmounted virtual reality. In Proc. the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video, Jun. 2017, pp.67-72. DOI: 10.1145/3083165.3083180.

[22] Gül S, Podborski D, Buchholz T, Schierl T, Hellge C. Low-latency cloud-based volumetric video streaming using head motion prediction. In Proc. the 30th Workshop on Network and Operating Systems Support for Digital Audio and Video, Jun. 2020, pp.27-33. DOI: 10.1145/3386290.3396933.

[23] Weisberg S. Applied Linear Regression (3rd edition). John Wiley & Sons, 2005.

[24] Królak A, Strumiłło P. Eye-blink detection system for human-computer interaction. Universal Access in the Information Society, 2012, 11(4): 409-419. DOI: 10.1007/s10209-011-0256-6.

[25] Soukupová T, Čech J. Real-time eye blink detection using facial landmarks. In Proc. the 21st Computer Vision Winter Workshop, Feb. 2016.

[26] Zhang X C, Sugano Y, Fritz M, Bulling A. It’s written all over your face: Full-face appearance-based gaze estimation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jul. 2017, pp.2299-2308. DOI: 10.1109/CVPRW.2017.284.

[27] Zhang X C, Sugano Y, Fritz Mario, Bulling A. MPIIGaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(1): 162-175. DOI: 10.1109/TPAMI.2017.2778103.

[1] 曹荣禹、曹逸轩、周干斌、罗平. 从长文档中提取深度可变的文档逻辑结构:方法、评估和应用[J]. 计算机科学技术学报, 2022, 37(3): 699-718.
[2] Geun Yong Kim, Joon-Young Paik, Yeongcheol Kim, and Eun-Sun Cho. 基于字节频率特征码的勒索病毒检测方法[J]. 计算机科学技术学报, 2022, 37(2): 423-442.
[3] 赵建喆, 王兴伟, 毛克明, 黄辰希, 苏昱恺, 李宇宸. 机器学习中基于相关差分隐私保护的多方数据发布方法[J]. 计算机科学技术学报, 2022, 37(1): 231-251.
[4] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. 机器学习辅助的抗逻辑块加密密钥猜测攻击范式[J]. 计算机科学技术学报, 2021, 36(5): 1102-1117.
[5] Sara Elmidaoui, Laila Cheikhi, Ali Idri, Alain Abran. 用于软件可维护性预测的机器学习技术:精度分析[J]. 计算机科学技术学报, 2020, 35(5): 1147-1174.
[6] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. 用于老年人面部表情识别的深度学习模型和传统机器学习方法的对比研究[J]. 计算机科学技术学报, 2020, 35(5): 1127-1146.
[7] Shu-Zheng Zhang, Zhen-Yu Zhao, Chao-Chao Feng, Lei Wang. 基于的特征选择的用于加速芯片物理设计Floorplan的机器学习框架[J]. 计算机科学技术学报, 2020, 35(2): 468-474.
[8] Rui Ren, Jiechao Cheng, Xi-Wen He, Lei Wang, Jian-Feng Zhan, Wan-Ling Gao, Chun-Jie Luo. HybridTune:基于时空数据关联的大数据系统性能诊断[J]. 计算机科学技术学报, 2019, 34(6): 1167-1184.
[9] João Fabrício Filho, Luis Gustavo Araujo Rodriguez, Anderson Faustino da Silva. 另一种智能代码生成系统:一种灵活低成本解决方案[J]. 计算机科学技术学报, 2018, 33(5): 940-965.
[10] Lan Yao, Feng Zeng, Dong-Hui Li, Zhi-Gang Chen. 基于Lp正则化的稀疏支持向量机特征选择算法[J]. , 2017, 32(1): 68-77.
[11] 包新启, 吴云芳. 面向问题检索的层级自训练张量神经网络模型[J]. , 2016, 31(6): 1151-1160.
[12] Najam Nazar, Yan Hu, He Jiang. 软件工件摘要方法综述[J]. , 2016, 31(5): 883-909.
[13] Xi-Jin Zhang, Yi-Fan Lu, Song-Hai Zhang. 用于食品识别和分析的深度卷积神经网络多任务学习[J]. , 2016, 31(3): 489-500.
[14] Lixue Xia, Peng Gu, Boxun Li, Tianqi Tang, Xiling Yin, Wenqin Huangfu, Shimeng Yu, Yu Cao, Yu Wang, Huazhong Yang. 忆阻器阵列矩阵向量乘的设计空间优化[J]. , 2016, 31(1): 3-19.
[15] Jin-Kai Zhang, Cui-Xia Ma, Yong-Jin Liu, Qiu-Fang Fu, and Xiao-Lan Fu . 移动设备上基于草图手势的视频协同交互方法[J]. , 2013, 28(5): 810-817.
Full text



[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 冯玉琳;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[4] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[5] 王选; 吕之敏; 汤玉海; 向阳;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[6] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[7] 孙钟秀; 商陆军;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[8] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[9] 潘启敬;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[10] 吴恩华;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn