NAB proceedings – The Future of TV – Project Fresco

Editorial

Qualidade técnica insuperável

Sem sombra de dúvida 2013 está sendo um ano marco para a Revista da SET. Depois de todas as grandes mudanças no começo do ano, com o novo layout, periodicidade e projeto editorial, neste número você recebe mais uma novidade: a publicação periódica dos Proceedings da NAB.
A Revista da SET sempre teve um caráter muito especial, o de juntar em uma única publicação o melhor das reportagens e cases de mercado e lançamentos de produto sem perder a relevância técnico-acadêmica que sempre teve. É com isso em mente que a cada nova edição selecionamos com carinho os melhores artigos técnicos de profissionais do setor de radiodifusão e produção áudio-visual.
Com esta missão em mente, de sempre trazer o melhor conteúdo técnico ao nosso leitor, a SET firmou uma parceria com a NAB (National Broadcasters Association), entidade que representa os radiodifusores nos Estados Unidos, para a publicação de seus artigos do congresso anual da NAB. Estes textos são reunidos anualmente pela entidade num compilado chamado de NAB Broadcast Engineering Conference Proceedings, que agora começam a integrar, pontualmente, também a Revista da SET, no original em inglês.
E para estrear a pareceria, começamos a publicação com um verdadeiro debate sobre o futuro da televisão. Neste artigo, escrito por profissionais da Cisco, aborda-se todas as tendências para a transmissão de conteúdo nas próximas décadas, em um mundo onde a barreira entre televisão e internet ficam cada vez menos densas, e a largura de banda passa a ser um aspecto tão importante quanto a largura do espectro.

Boa leitura,
Olímpio José Franco
Presidente da SET


The Future of TV – Project Fresco

Simon Pamall, Kevin Murray and James Walker
Cisco
Staines, England

Nº 135 – Agosto 2013

proceedings

 

AbstractWithin the next five years advances in display technology will make science fiction reality, with screens that are unobtrusive, frameless, ultra-high definition and ambient. No longer need there be a ‘black-hole’ in the corner of the living room, but screens will instead seamlessly blend into the home environment. Organic LED technology needs no edges at all, and it will be possible to create tiled displays of almost any shape using low-cost standard parts. The concept of ‘immersion’, readily expressed in a simple control, really does give viewers the opportunity to enjoy programs according to their wishes at that moment, with content whose size, position, and level of interactivity adaptively matches the current needs of each audience. Key to the concept is a simple architecture which reacts to user input and metada within a multiplicity of content items and streams, and display-independent metadata to support adaptive and dynamic content presentation across a wide range of domestic display environments.

INTRODUCTION The choice of type and size of television screen for the home is so often a compromise between the extremes of an exciting viewing experience when the device is switched on and the wall or corner space occupied by a dark and dull object when the device is switched off. And, when the screen is on, the size of the picture may well be inappropriate for the type of content and engagement of the occupants of the room.
Science Fiction overcomes such concerns by assuming an invisible and scalable screen – often taking the place of the wall itself, or a window or indeed in mid-air. Science Fiction has also assumed an intelligent management of presented material, following the individual and assimilating and prioritizing a range of sources.
Today’s mobile phones make the Star Trek communicator look somewhat bulky as advances over the years have successively removed the novelty of such a concept. In the same way today’s screen, projection and graphics technologies are slowly and steadily bringing us closer to a reality of the vision of Science Fiction. In fact, we are now very nearly at the point where key aspects of this vision can be realized and could be adopted by consumers in the not-so-distant future.
Walk into a consumer electronics exhibition today and you find many example components of this vision. Ther are thin-bezel screens that can be treated as tiles to create larger and larger displays, or glass screens that transparently reveal the wall behind when off. We already have sophisticated companion devices offering touch control and each year we are seeing ever more sophisticated gesture and voice recognition.
Our role in this opportunity space will be to create the technologies that integrate such components to produce a sophisticated and intuitive user experience that matches content and mood, and which produces pictures of an appropriate size and position for each circumstance. Furthermore the presented audiovisual content will be supplemented with additional content and so-called domotic feeds (that is material concerning the home).
In this paper, believing in the inevitability of this trend in display technologies and the opportunities this creates, we set out our vision for how the television experience will evolve, some lessons learned from both the first and second phase prototype implementations that we have constructed, and discuss our approach to authoring content experiences for such a system.

VISION
Our vision of the future is of a viewing environment with large displays constructed from modular tiles. Displays that are a) unobtrusive, b) frameless, c) ultra-high definition and d) ambient. They can be adapted to fill or partially fill one or more walls of a room, and will co-operate to provide an integrated experience. The opportunity is to open up possibilities way beyond the limits of today’s devices through:
• content comprising multiple visual elements that can be adapted spatially and temporally, freeing the user from choosing a single element, or the system from having to impose overlays;
• shared, co-operative usage of the displays, with conected companion devices becoming personal extensions;
• supporting connected applications and services operating in a more streamlined, integrated manner, reflecting and effecting changes in viewer engagement in TV content;
• dynamic adaption to, and control over, the environment of the displays, and adapting to the wallpaper and lighting; and
• introducing domotic content into the TV display in a sympathetic manner.

Project Fresco is an expression of this vision. We have in Project Fresco developed and demonstrated a first phase ‘single display’ prototype at both IBC 2011 and CES 2012.

This single display was constructed from six tiles, driven from one client. Subsequently we have developed a second phase ‘dual display’ prototype demonstrated at both IBC 2012 and CES 2013. A photograph of this is shown in figure 1. This shows both large displays; each constructed from multiple tiled screens, and two users each with a companion device that may be used simultaneously to control and interact with the system.

Figure 1: Prototype ‘Dual Display’ System

IMMERSION
Many programs have a natural flow and pace – points at which the viewer or viewers are extremely immersed and engaged in the content. Examples of this may be a critical part of play in a sports game, a news story of direct relevance or a very dramatic scene in a soap. Likewise there may be times of lesser immersion or engagement. Examples of this may be waiting for players to take their positions, an uninteresting news item or a section of the soap that is recapping past happenings. In these areas of lesser immersion, the viewer’s interest may naturally be taken by other related items, such as the current scores in related games, the next news story or what is being said about the soap by their social contacts.

In Project Fresco we have introduced the concept of ‘immersion’. Immersion is key to the way that the displays are used and the way that the content is presented on them. Put simply, the more immersed in the content the viewer is, the greater emphasis that is placed on the core video, and the less immersed they are the more emphasis comes to be placed on related content which may then be introduced. This related material could be social media, advertising, program graphics, additional material, or virtually anything.
Examples of high and low immersion are shown in figures 2 and 3 respectively, which are screen captures taken from our prototype. In figure 2, we see how the video roughly shares the display with other information, ‘call-to-action’ (that is, inviting the user to interact) and promotion graphics, and content sources. By comparison, figure 3 shows the high immersionexample where the program in figure 2 has moved on to a section of significance and dramatic tension, and the related items have been removed, and the video increased in size and prominence.

Figure 2: A low immersion example

In Project Fresco, immersion is controlled in two ways – via “broadcast metadata” (as was used for the examples above) which indicated the broadcasters expected level of immersion, and also via a control in the companion device which allows the user to modify the immersion (both up and down) as they wish. Clearly other mechanisms could also be employed, such as audio or video analysis of the room and the viewers, but the prototype shows that these two simple mechanisms work very effectively.

TECHNOLOGICAL MOTIVATORS
I. Displays
Display technology is continually improving. We have seen that relentlessly the average screen size is increasing year by year, as evidenced by [3]. But there are two key technological changes which directly relate to our vision.
Firstly, screen bezel sizes are getting smaller. Our prototype system uses professional monitors with 5mm bezels, but LED backlit consumer displays are approaching similar, or better, bezel sizes and OLED offers the prospect of a bezel width of near zero. Even with today’s widths there is the real option of creating large ultra-high definition displays out of tiled arrays of inexpensive screens.
Secondly, whilst still in the research laboratories, transparent displays which naturally allow the underlying environment to show through are starting to emerge as niche products. These would trivially allow the blending of displays into the room environment.

Figure 3: A high immersion example

II. Video Content
We are also starting to see the first indications of next jump in resolution beyond HD with the advent of Ultra High Definition – both in displays and in content. At the same time as this higher resolution content is arriving, the importance of lower resolution content is not diminishing, whether from archives, citizen journalists or from challenging remote locations. Thus it is becoming hard to just assume that any content will look acceptable on any display size.

III. Non video Content
Outside the display arena, we are seeing ever more related data sources, from social media through games to dedicated websites, feeds and web-service APIs. In the interconnected world, these are a crucial part of the entertainment experience, but today we are faced with the dilemma of either destroying the television experience by placing graphics over the video, or taking the viewer away from the lean back world of television into the very different and highly-interactive world of the internet.

BREAKING THE SCREEN BOUNDARIES
Today’s television makes the basic assumption that “the display is always filled”. Thus, video will fill the display, regardless of the size of display, quality of the video, or the resulting impact of an oversized face or object; and it also effectively does only one (main) thing at a time.
With larger, higher resolution displays this implicit behavior and more can be challenged. Content need no longer necessarily fill the display, and the display can simultaneously be used for many different components.
In turn, these new capabilities mean that the traditional means of laying out video and graphics can be challenged.
For instance we might:
• share the display between the content of more than one viewer, helping to make the TV a shared focal point rather than a point of contention;
• ‘unpack’ the constituent elements that are composited by a broadcaster in post-production, presenting these alongside the ‘clean’ audio-visual (AV) content, leaving it unobscured. Obvious examples include digital on-screen graphics such as tickers, banners and sports statistics. To enable this, the composited elements would need to be delivered separately alongside the clean AV and then rendered in the client;
• ‘unpack’ all of the contextual assets that are composited in the Set-Top Box (STB), such as interactive applications and multi-screen content (e.g. multi-camera sports events);
• present contextually relevant online content alongside the video, for example, relevant web content, social comments (such as twitter hash-tags for the show), relevant online video etc;
• enable navigation and discovery user interfaces to be presented alongside video, going beyond today’s ‘picturein- guide’ presentation;
• present personal content, which whilst not directly related to the main television content, may still be desirable to end users to be seen on screen. Examples would include personal social feeds, news feeds, images, discussion forums etc;
• present domotic content, such as user interfaces for in-home devices and systems, which can include video feeds from devices such as security cameras, door entry systems and baby monitors; and
• integrate visual communications, such as personal video calls, noting these may sometimes be used in a contextual way e.g. virtual shared viewing experiences between homes.

This, the way the TV experience takes advantage of the large display is by continuously managing a wide range of content sources and types that are combined appropriately for presentation.

I. Real Object Size
The tradition of a television picture scaling up to fill the display means that an object is effectively displayed at an unknown size. With this assumption broken, it now seems realistic to allow an object to be displayed at its real size, regardless of the display (as displays report their size though the standard connectors). For instance, in advertising it could be interesting to show just how thin the latest phone really is, just as is possible in print media today.

II. Content Opportunities
In the same way that the composition has always assumed a need to fill the rectangle, so has the creation of video content – wich has followed the model of filling the proscenium arch of classical theatre. The proposed systems can offer new opportunities to the content creator.

Figure 4: Non-Rectangular Content

One simple example of this shown in figure 4. Here, the movie trailer is blended into the background to give the appearance that it tears its way through the wallpaper, dramatically conveying the unsettling nature of the promoted movie.
There are numerous other areas where this technique opens up new opportunities. For example:
• editing coul become more subtle with gentle fades, and several scenes can co-exist for longer and with less interference;
• content need no longer be fixed into a given size – if portrait content is provided from citizen journalists, then it can be displayed naturally in that form; and • multiple synchronized videos could be used, in a fashion made popular in TV series such as 24, but without any requirement for their relative placement.

Implicit in this capability is the requirement to support an “alpha plane” style functionality that can be used both to describe arbitrary shapes and to allow for blending of the content into the background. This is, of course, not new and techniques such as luma and chroma keying are well known both in the professional head-end market place as well as supporting functionality in DVD and BluRay media. However, bringing this functionality into a traditional broadcast chain would represent a new usage.

A COMPANIONABLE EXPERIENCE
The growing importance of companion devices (tablets, phones, laptops etc.) to the modern TV experience cannot be understated. Such devices permit us to construct an experience which is, at the same time, both collective (involving everyone in the room) and yet personal (allowing each person to interact with the various elements as they wish).
The companion device is key and integral to Project Fresco – and interactions with the companion device are directly connected with what is seen on the large display(s). This is achieved through several means:
• The companion devices are able, within constraints, to adapt the content on display, including adding or removing components or re-arranging the layout. An example of this is interface is shown in the iPad screen capture of the web-browser in figure 5, where, for instance, the display can be re-arranged by dragging around the icons representing the parts of the content displayed on the large displays.
• Interactions, such as voting or feedback is done on the companion device, but this directly feeds back into the large display presentation (in addition to the normal feedback one would expect).

Figure 5: A Companion Application Interface

• Control over the level of immersion. Although, as discussed earlier, a change in the level of immersion can be triggered through broadcast data and sensors in the room, the companion device is fundamentally able to control the final immersion experienced. In the prototype, as shown in figure 5, this is managed through a slider control.

This approach results in interactions with the companion device that end back at the main display(s), rather than just with the companion device itself. For example scores from a game played by the whole family during a show could be displayed on the large display.

A FRESCO SYSTEM ARCHITECTURE
The first-phase Project Fresco prototype drove a single tiled display. This was built using a single, six-output computer (an AMD Eyefinity graphics card in a powerful PC) with software that was itself built on standard HTML5 technologies (e.g. javascript and CSS transitions) in functionality largely contained within a standard browser. This approach enabled a fast and flexible development and exploration of the principles. Whilst the HTML-5 toolset proved to be an excellent platform, the use of a single six- output graphics card places fundamental limits on scalability, the number of displays that can be supported and, of course, on cost.
We have subsequently developed the architecture to support multiple large-display clients, and implemented a second-phase prototype which has two large-display clients in a single room. In doing so we have been exploring how these can be combined for the presentation of a single entertainment experience, for example in addition to displaying further content elements, to supporting a ‘watch party’ where the viewer’s couch can be ‘virtually’ extended onto the second display to give an ambient shared viewing experience with remote friends or family. We have also explored how the two displays can co-operate to support multiple simultaneous entertainment experiences (e.g. the big game and the soap).
To achieve the required flexibility in the number of displays, scalability, cost and content presentation dynamism, the architecture developments have been based around several concepts, including:
• rendering the graphics and video on more than one independent client device;
• utilizing synchronization between the rendering client devices, such as used in SAGE [1], but tailored for the specific use cases we are tackling;
• a separation of layouts policy issues and rendering issues; and
• a single layout with a ‘world view’ of the entire set of displays in use.

A high-level overview of the current architecture is shown in figure 6. This shows two separate large displays, each driven by its own client, although this is highly extensible to many displays and clients. These clients then interact with the layout engine and synchronization server(s) to ensure a consistent experience across the displays. In addition, the diagram shows that the audio is driven from only one display client, a deliberate choice to simplify the architecture. We anticipate that a future deployment of this architecture could put the layout engine component in the cloud, with synchronized rendering clients integrated within display tiles, as an evolution of today’s connected TVs, being both scalable and affordable.

Figure 6: A new fresco architecture

I. Synchronization Architecture
It is important to be able to synchronize content spread between different clients. In a more traditional broadcast architecture architecture, this would theoretically be possible using mechanisms such as the PCR values contained within a transport stream, but our approach does not assume either a direct transport stream feed to each client, or even that the content is made available in transport streams (e.g. it could be streamed over HTTP using any one of a number of mechanisms such as HLS or Smooth Streaming).
Instead, we have chosen to synchronize to a master audio playback clock on the main audio output. Where broadcast content is being consumed, there are many techniques that can be used to match this clock to that of the live broadcast content. This master audio clock is then replicated and synchronized via the synchronization server to other clients that are involved in playing back synchronized media.
Our implementation has shown that it appears to provide a reliable synchronization between different clients to a level that is acceptable for lip synchronization.
Further details of our synchronization model are given in [5].

II. Audio Architecture
Normally, audio is decoded and presented with simply a level control. However, in our proposed system the audio architecture becomes more complex than in a traditional approach, with various audio processing operations becoming an essential part of the overall architecture.
The most obvious audio processing requirement is positioning. From the proposed layout of displays in figure 6, it is clear that the secondary display is not between the main speakers, and so any video that is presented on this display with synchronized audio needs to have this audio repositioned. This repositioning needs to be dynamic, for instance as a video is moved from the primary to the secondary display, the audio should be moved in synchronization. And, given the potential size of a display, repositioning of the audio is desirable even when the content is moved within a display. For example a video that occupies only the left third of the display should have its sound stage correctly placed.
Earlier we discussed the concept of immersion, and how the video element of the experience can be balanced against other components to reflect the levels of interest both through a program’s length and of a given viewer or viewers. This has a direct mapping to processing of the audio. Whilst the volume levels are one key part of this, this is best when combined with controlled compression – a reduction of the dynamic range of the content so that quieter parts become louder and the louder parts become quieter. Such processing allows the volume to be reduced in a fashion that retains access to the quiet sections of the content.
Much of the required functionality described above appears to be relatively easy to implement in the proposed Web Audio APIs that have recently become available on various platforms [2]. This should make implementing the required audio architecture within an HTML5 environment relatively straightforward.

III. Layout
One final component of the architecture deals with the layout of the media items to be displayed. Earlier in this paper we discussed how content typically packed together can be transmitted in an unpacked form, with the chosen and relevant components then laid out by the Fresco system when the content is finally presented to the viewer. This process is not the highly constrained process we are used to where precise locations can be given for each item and, as the displays to be used might well be substantially different in each viewing environment, the processo must be very flexible, and it is this flexibility that is an interesting challenge.
One aspect of the required flexibility comes from the number of inputs to the layout process to control what is displayed. These come from the local environment such as the range, sizes, locations and properties of the displays available, the immersion level of the viewer and domotic content sources or interrupts, and from the broadcaster, such as the list of potential components, their relevant priorities and a potential preferred immersion level. It is the layout engine that balances these inputs and selects a suitable set of components to display and locations for them.
In addition to the “what” of the layout is the “how’, the appearance. More specifically, certain components may need to be adapted to the environment into which they are to be placed. For instance, if the room has white walls and the content item is white text, some means of making the text legible must be provided automatically. More generically, the design of an item should be able to adapt to the predominant background colors of the environment.
This introduces challenges at several levels that go beyond that of most current content presentation designs, such as may be found in many websites. Firstly we need an adaptive description of the requirements a broadcaster desires beyond those commonly in use today. Next, we need a mechanism that can quickly and efficiently resolve these requirements in the face of a collection of local inputs. Finally, and perhaps most challengingly, we need the content producers and designers to understand that their content can and will be presented in many different ways, and a complete control over this presentation is potentially very counter-productive to the viewer’s engagement.

AUTHORING
I. Goals
When designing a content presentation for Fresco, the goals of content producers and designers would typically be:
• To create a large screen content presentation comprising video and / or other information and content that can adapt to: – The available display resources in people’s homes, which by definition will vary considerably in both size and shape, and in likelihood, much more than today’s range of TV screen sizes o The viewer’s desire for immersion as they watch, which will vary across the audience
• To create companion applications to support personal interaction, typically drawing on the kinds of applications that we have seen before in interactive TV (voting, play-along games, supporting information etc.), but also potentially interacting with the large screen content presentation.
In Project Fresco, the large screen presentation is realized through two sets of metadata:
• A playlist which defines instances of on-screen elements (video/audio, subtitles/captions, images, information feeds, web content etc.) and their lifetime on a content timeline
• A layout which defines a set of layout requirements for each element in terms of its priority and (rectangular and non-overlapping) size and position on-screen (either relative or absolute), which the layout engine will honor as far as possible, as well as an entrance / exit transition style.

Our prototype implementation provides a set of elements which can be instantiated ‘as is’ (video/audio, subtitles/captions, images, RSS, twitter), or re-styled using CSS. For more bespoke content, Ajax (HTML) fragments can be created. Use of a data binding framework supports data-driven content in these Ajax fragments; allowing changing of either the content itself or its styling in response to a data change which may originate from the playlist timeline, or user interaction via a companion application.

II. Adapting the Experience
A fundamental question we have been exploring in the development of Project Fresco is ‘How does a content presentation adapt to the available space?’ The layout engine will allocate a rectangular on-screen region to the content elements of each selected playlist / layout. The area allocated to a playlist / layout will vary as a function of:
• The size and shape of the display
• Other content items (i.e.playlist/layouts) displayed, and
• The immersion level of displayed content

In general, the higher the immersion level for this content, the larger the area of its region; at full immersion the largest area that the display can accommodate. There is a specific condition for both maximum and minimum immersion levels where only the video element will be presented regardless of the region size.
There are typically two mechanisms by which the experience will be adapted in response to the available size and shape of this region, the first is the selection, size and position of the elements presented, and the second is the way in which the content of these individual elements behaves in response to their own changing size and shape (or aspect ratio). As well as a priority of each of the elements (where for example, the video may be given top priority so that under space constraints it may be the only element of the experience presented), we wil typically define a minimum size, since for certain types of content it would be preferable to omit the element rather than present it in a minuscule form (certainly for text). The height and width can also be specified in absolute size, as proportions of the parent region, or with a fixed aspect ratio (for video and images). A further mechanism for adapting the layout is available, that is to subdivide the region into a series of named spans of columns of a defined size; each of these spans can be prioritized, and elements targeted to a span or set of spans with fallbacks, such that as the region gets smaller, spans will collapse in a deterministic order, and the elements can be moved between spans if specified as fallbacks.

Figure 7: Example lay out wireframe

As an example of layout, figure 7 shows a ‘wireframe’ of a five-span layout, and how it responds to a reduction in horizontal size by the layout engine. In this layout, spans B and D have the lowest priority (3; shown in brackets), so collapse as the horizontal space is reduced (taking elements d and with them). Element b has been defined to target spans A+B+C+D+E, with A+C+E, and C as fallbacks, and so will exist in all forms of the layout as the layout adapts, however its content will need to adapt to the changing aspect ratio.
The way in which the content of these individual elements behaves in response to their own changing size and shape (or aspect ratio) is somewhat content-dependent. For rasterbased content such as video or images, then a conventional scaling is likely most appropriate (and preferable to cropping), but for text, a reflowing may be more appropriate. A reduction in font size may be tolerable, although legibility at reasonable viewing distance will determine a minimum acceptable font size. Unlike a web page layout, where reflowing typically results in a ‘longer’ rendered page (with the inevitable scroll bars), we shouldn’t overflow our element area, and hence an appropriate truncation of the content will be necessary.
The problem domain of large-screen layout is very similar to that of adapting web content to a wide range of device screen sizes and resolutions which so-called ‘responsive’ design [4] is addressing. However, the additional factor of immersion, and in our case a fixed screen height imposing constraints on reflowing content means that the standard responsive designs tools do not meet all of our needs.

III. Companion Applications
Within a Fresco content presentation, any of the large-screen elements can have an associated companion application, from a simple ‘branded’ URL for a web site or application, though to an interactive Ajax (HTML) application which runs within the companion experience. All these applications are accessible whether the corresponding large-screen element has been laid out or not (so for example, even in maximum immersion, where only the video element would be visible on the large screen, all of the companion applications and links would still be accessible). Of course these companion applications could be a ‘mirror’ presentation of the large screen version, although presented appropriately for viewing on the companion device.

IV. Lessons learned
Certainly a challenge for designers when considering an adaptive content presentation for Fresco is to avoid designing a rigid layout or composition of elements that would only work with a particular screen size or aspect ratio, and instead adopting a more flexible or ‘elastic’ approach to layout and composition. Fundamental in determining how the experience should adapt is what elements are essential, and what can and indeed should be sacrificed where there is insufficient space. In our experience to date, once a wireframe of the large screen layout and companion application functionality has been designed, getting a quick ‘prototype’ of the layout up and running can be invaluable before refining the visual design and implementing the interactive and data driven aspects of the large experience and any companion interaction.
To date our design and implementation of Fresco content presentations has been manual (and hence somewhat labor intensive!). The content presentations have all been for previously produced content, so we have only been able to use assets (images and graphics) that were created as part of the existing production process (e.g. for press packs, web sites and mobile applications), however we have been able to build rich experiences limited to just these assets. Having the opportunity to plan and capture specific assets as part of the broader content commissioning and production process would only expand what could be created for a Fresco experience. We would anticipate a visual timeline and layout authoring tool which could expedite the authoring process. We would also expect that over time a series of ‘tried and trusted’ layouts would emerge, and that certainly for episodic content, re-use of a common layout, and for live content, use of pre-defined layout using either pre-produced or proxy assets (e.g. images and graphics) would be pragmatic.

V. Metadata and Content Delivery
In order for a Fresco system (i.e. layout engine and client(s)) to playback a content presentation, it will need to acquire the playlist and layout metadata, as well as all of the supporting assets referenced by the playlist. For pre-produced (i.e. nonlive) content then this playlist and layout metadata would be acquired prior to the content presentation, with the other assets being acquired in advance of the presentation according to the playlist (typically on-demand). For live content it is likely that a pre-defined layout would be used, which could be published in advance of the video and audio going ‘to air’, but that fragments of the playlist would be published on a dynamic basis (for example, they could be delivered to the layout engine via a suitable server-side push method such as web-sockets, server-sent events or http long polling), reflecting the dynamic nature of the live content. Examples of the dynamic parts of the metadata might include live broadcaster- signaled immersion changes, or updates to sports game statistics delivered near-live and presented via a data-bound Ajax (HTML) fragment.

CLOSING THOUGHTS

Our thinking started when considering the possibilities that the display industry will be offering in just a few years when the black boxes in the corners of out rooms disappear and unobtrusive, frameless, ultra high definition ambient displays take their place. In exploring the opportunities this technology will offer we have come to consider how content is presented, and the way in which its various components (current and future) will be assembled for the viewer. We have come to an appreciation of the way in which control and interaction with such an experience can work both in a personal and collective manner. And, in contrast to the ‘lean forward’ experience of today’s connected TV we have seen how the ‘lean back’ experience of Project Fresco requires a sophisticated automatic layout control engine, driven by metadata that allows content designers to express how potentially rich experiences can adapt to different viewing environments and appropriate user immersion.
As we have explored function, so we have explored form, and the PC based solution for a first phase demonstration has evolved to a believably scalable and cost effective hardware and software architecture. The second phase demonstration has validated this architecture, and allowed us to further explore a range of experiences.
It is often commented that the role of television in our lives has changed dramatically as other devices have fought for our time and won our attention. And yet, families and groups still wish to spend time together, sharing space and switching between personal and collective experiences. A developed television experience which embraces this truth, and which invites immersion and interaction at appropriate levels, must surely be for our industry a goal worth aiming for. Project Fresco is, for us, a vehicle to explore this space and we are excited by the future we see before us, and the reaction we have received. The future is not one where the medium is marginalized, but a future in which people will truly find a new way of looking at TV

REFERENCES
[1] Jeong, B, Renambot, L, Jagodic, R, Singh, R, Aguilera, J et al, “High-Performance Dynamic Graphics Streaming for Scalable Adaptive Graphics Environment”, SuperComputing 2006, http://www.evl.uic.edu/files/pdf/SAGE-SC06-final.pdf, November 11-17 2006
[2] Rogers, C, “Web Audio API”, W3C, https://dvcs. w3.org/hg/audio/raw-file/tip/webaudio/specification.html
[3] Putman, P, “Of Large LCDs, Unused Fabs, and Projector Killers”, Display Daily, http://displaydaily. com/2012/04/09/of-large-lcds-unused-fabs-and-projector- killers/, April 9th, 2012. [4] Marcotte E, “Responsive Web Design”, ISBN 978-0-9844425-7-7. http://www.abookapart.com/products/responsive-web-design, 2011
[5] Ashley, A, Costello, M, Murray, K, Parnall, S, Walker, J, “PROJECT FRESCO – TILEABLE TECHNOLOGY FOR TELEVISION’S BIG FUTURE IN THE HOME”, Proceedings of the 2012 International Broadcasting Convention, 2012

Simon Parnall, Cisco, Staines, England.
Kevin Murray, Cisco, Chandlers Ford, England.
James Walker, Cisco, Chandlers Ford, England.