Advances in telecommunication systems and innovative network solutions indicate that transmission of immersive media contents will become more widespread in the future, as bandwidth capacity becomes progressively larger. However, since large volumes of data, several orders of magnitude larger than traditional image and video contents, are involved when using immersive acquisition, new algorithmic solutions are needed to efficiently reduce and transmit the data while maintaining the desired perceptual quality.
I am interested in designing and evaluating new compression solutions for immersive media that can be suitable for real-time, peer-to-peer transmission systems while providing the best possible quality of experience for the users.
In particular, considering the use case of real-time communication in XR, core systems aspects such as low latency, low complexity at the encoder and decoder side, and limited power consumption should be combined with faithful self-representation in the spatial and temporal domain, saliency-driven partitioning and compression, and user-adaptive encoding and transmission.
Compression solutions for immersive media
User behaviour in immersive media content consumption
Immersive media systems have provoked a paradigm shift in how contents are consumed: while traditional video is passively consumed as-is, with immersive content, users can actively decide where to look and where to direct their attention. Thus, being able to model user behavior when navigating XR scenes with 6 Degrees of Freedom (6DoF) can lead to sensible reductions in network and computational resources consumption, while limiting the impact on the perceived quality.
However, modeling user behavior in 6DoF presents several challenges, as current solutions do not easily adapt to immersive scenarios where several factors can contribute to the final content that is displayed for the user. For example, if two
users are looking at the same content from two different distances, they will visualize different levels of details, thus significantly changing their consequent behavior.
I am interested in creating models for user behavior in 6DoF that can take into account the visual saliency of the immersive content under display, as well as the similarities between users' predispositions, to effectively explain how users move and interact with immersive media.
Measuring the QoE in immersive media
The QoE of multimedia contents is defined by many factors, including perceptual audiovisual quality, levels of expectations when using a system, and novelty effect, making it hard to model in a comprehensive way. The introduction of more degrees of freedom, which is to be expected when experiencing immersive media, further convolutes the issue.
Measuring the QoE in immersive scenarios allows us to understand which factors play a role in the human perception of multidimensional contents. The goal is to build models that can factor the intrinsic distortions of the content under exam, placed in the larger context of position and function in the virtual space, its importance in the task at hand, and how it interacts with previous expectations, beliefs, and experiences of the final user.
Predicting the QoE in immersive media
Direct measurements of QoE through user studies are commonly considered as ground-truth information regarding the perceptual merit of distorted contents. However, they are cumbersome and expensive to execute. Thus, great effort has been spent in the literature in order to create algorithmic solutions that can mimic and predict users' perception.
The goal is to build QoE models that holistically consider the intrinsic distortions of the content, as well as the rendering parameters, lighting effects, and position in the virtual space. Such models will help optimize acquisition, transmission and
rendering in an end-to-end system considering all its components, and fine-tuning each part in order to achieve the best possible results.