blog

opentok-rs: easy WebRTC with Rust

Wed, 17 Nov 2021 00:00:00 +0000

OpenTok is Vonage’s (formerly TokBox’s) PaaS (Platform as a Service) that enables developers to easily build custom video experiences within any mobile, web, or desktop application, on top of a WebRTC stack.

One of the customer projects that I am working on at Igalia requires publishing and subscribing to streams to and from OpenTok sessions. The main application of this project needs to run on a Linux box and Vonage already provides a nice OpenTok C++ SDK for Linux. However, the entire application for this customer project is written in Rust so, together with my colleague Philippe Normand, we decided to write Rust bindings for the OpenTok C++ SDK.

opentok-rs contains the result of this work. There you can find the FFI bindings, mostly generated with bindgen, and the safe wrapper API.

We recently published a first version in crates.io.

There is really not much documentation yet, apart from the rustdoc published here, that is mostly a copy & paste of the C++ documentation. But there are a few examples that demonstrate how easy and fast you can write your own custom video experiences.

Basic video chat application

With opentok-rs you can write a very basic video chat application like this one in only a few dozen lines of code.

If you are not familiar with the basic concepts of OpenTok, I recommend reading the official documentation at Vonage’s developer site.

In a nutshell, all OpenTok activity occurs within a session, which is somewhat like a “room” where clients interact with one another in real-time. Each participant in a session can publish streams to the session or subscribe to other participants’ streams.

To connect to OpenTok sessions you need its identifier and a token. For testing purposes, you can obtain a session ID and a token from the project page in your Vonage Video API account. However, in a production application, you will need to dynamically obtain the session ID and token from a web service that uses one of the Vonage Video API server SDKs.

For a basic chat application you need to create a Publisher instance, to publish your video stream, and a Subscriber instance, likely in a different thread, to subscribe to the rest of the streams in the session. Each entity may connect to the session separately.

Publisher

The OpenTok SDK is heavily based on callbacks. Starting with the session, you need to provide a SessionCallbacks instance to the Session constructor. For the sake of simplicity, we only care about the on_connected and on_error callbacks in this case.

You also need to provide the session credentials. This is the Vonage API key, the session ID and its token.

let session_callbacks = SessionCallbacks::builder()
    .on_connected(move |session| {
        // At this point, we can start publishing
        session.publish(&*publisher.lock().unwrap())
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();
let session = Session::new(
    &credentials.api_key,
    &credentials.session_id,
    session_callbacks,
)?;
session.connect(&credentials.token)?;

The Publisher constructor gets a PublisherCallbacks instance and optionally a VideoCapturer instance. If you do not provide a custom video capturer, the default one capturing audio and video from your local mic and webcam will be used.

let publisher_callbacks = PublisherCallbacks::builder()
    .on_stream_created(move |_, stream| {
        println!("Publishing stream with ID {}", stream.id());
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();
let publisher = Arc::new(Mutex::new(Publisher::new(
    "publisher" /* Publisher name */,
    None, /* Use WebRTC's video capturer */,
    publisher_callbacks,
)));

The basic video chat example demonstrates how to add a custom video capturer. In this case, it uses a GStreamer videotestsrc element to produce test video data. You can use whatever mechanism to produce video that you prefer though.

Subscriber

The subscriber part is somewhat similar. It needs to connect to the session, providing the credentials and the session callbacks. In this case, the callback that we care about the most is the on_stream_received callback. Within this callback, you can set the stream on your Subscriber instance and instruct the session to use it.

let session_callbacks = SessionCallbacks::builder()
    .on_stream_received(move |session, stream| {
        if subscriber.set_stream(stream).is_ok() {
            if let Err(e) = session.subscribe(&subscriber) {
                eprintln!("Could not subscribe to session {:?}", e);
            }
        }
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();

The Subscriber gets the video frames through repeated calls to the on_render_frame callback.

let subscriber_callbacks = SubscriberCallbacks::builder()
    .on_render_frame(move |_, frame| {
        let width = frame.get_width().unwrap() as u32;
        let height = frame.get_height().unwrap() as u32;

        let get_plane_size = |format, width: u32, height: u32| match format {
            FramePlane::Y => width * height,
            FramePlane::U | FramePlane::V => {
                let pw = (width + 1) >> 1;
                let ph = (height + 1) >> 1;
                pw * ph
            }
            _ => unimplemented!(),
        };

        let offset = [
            0,
            get_plane_size(FramePlane::Y, width, height) as usize,
            get_plane_size(FramePlane::Y, width, height) as usize
                + get_plane_size(FramePlane::U, width, height) as usize,
        ];

        let stride = [
            frame.get_plane_stride(FramePlane::Y).unwrap(),
            frame.get_plane_stride(FramePlane::U).unwrap(),
            frame.get_plane_stride(FramePlane::V).unwrap(),
        ];
        renderer_
            .lock()
            .unwrap()
            .as_ref()
            .unwrap()
            .push_video_buffer(
                frame.get_buffer().unwrap(),
                frame.get_format().unwrap(),
                width,
                height,
                &offset,
                &stride,
            );
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();

The snippet above uses a video renderer based on the GStreamer autovideosink element. But just like with the custom video capturer, you can use whatever you like to render your video frames.

Audio

The OpenTok SDK handles audio and video in different ways. While video streams are independently tied to each publisher and each subscriber in a session, audio is tied to a global audio device that is shared by all publishers and subscribers.

This design imposes two hard limitations:

There is no way to obtain the independent audio stream from each participant. OpenTok provides a single audio stream which is a mix of every participant’s audio, so there is no way to do things like speech-to-text, moderation or any kind of audio processing per participant, unless you create a somewhat complex workaround where you run each audio subscriber in its own dedicated process.
It is not possible to run two instances of the OpenTok SDK in the same process. A second instance of the OpenTok SDK overwrites the audio callbacks set from the previous instance.

Vonage claimed to be working on improving this design.

There is more

Everything in opentok-rs is meant to run on client applications, but as mentioned before, Vonage also provides server side OpenTok SDKs.

opentok-server-rs wraps a minimal subset of the OpenTok REST API. It lets developers to securely create sessions and generate tokens for their OpenTok applications.

I started it only to be able to write automatic tests for opentok-rs, so the functionality is limited and will hopefully be extended soon.

Acknowledgements

I would like to thanks Televic Conference for sponsoring this work.
Huge thanks to José Antonio Olivera from Vonage for his continuous guidance and support while writting the bindings.

gst-dots: live view of GStreamer pipelines

Thu, 11 Nov 2021 00:00:00 +0000

These days I spend a lot of time dealing with large dynamic GStreamer pipelines. More often than not, I find myself stuck in problems that take some careful analysis of the endless stream of debug logs that GStreamer produces. In these situations, taking a look at how the pipelines of the application look like really helps me with the debugging process. To get this information, GStreamer has the capability of outputing graph files that describe the topology of your pipelines. The information that you get is really well presented, but the process of getting it can be a bit cumbersome when you have to do it over and over. The output files are .dot files that require programs like GraphViz to get a displayable version of the graph. Many GStreamer developers end up writing scripts or creating their own tools to ease this process. My version of this kind of tool is gst-dots, an extremely simple NodeJS server that watches for GStreamer .dot files in the path defined by the GST_DEBUG_DUMP_DOT_DIR environment variable, convert them into SVG images and displays them in a browser with live reload.

This is how it looks in action.

2021 WebKit Contributors Meeting talk - WPE Android

Fri, 15 Oct 2021 00:00:00 +0000

A couple of weeks ago I attended my first WebKit Contributors Meeting and I presented this talk about WPE WebKit for Android.

WPE WebKit for Android

Mon, 10 May 2021 00:00:00 +0000

WPE WebKit is the official WebKit port for embedded and low-consumption computer devices. It has been designed from the ground-up with performance, small footprint, accelerated content rendering, and simplicity of deployment in mind.

It brings the excellence of the WebKit engine to countless platforms and target devices, serving as a base for systems and environments that primarily or completely rely on web platform technologies to build their interfaces.

WPE WebKit’s architecture allows for inclusion in a variety of use cases and applications. It can be custom embedded into an existing application, or it can run as a standalone web runtime under a variety of presentation systems, from platform-specific display managers to existing window management protocols like Wayland or X11.

Today, we (Igalia) are happy to announce initial support of WPE for Android.

This effort was initiated back in 2017 by my colleague Žan Doberšek, who fully implemented a WPE backend for Android along with the required pieces to get rendering and basic input work. The work was paused for quite some time until the beginning of this year, when I joined Igalia and took over his work. Since then, I have been heads down working on it, trying to make it more usable thanks to Cerbero and a WebView based Java API.

How it looks

A picture is worth a thousand words. This is how it currently looks running on an Android phone:

As you can see, we have the basic set of functionality enough to implement a simple multi-tabs web browser with progress report, navigation controls and IME support.

Support is not limited to mobile devices though. Thanks to the wide range of architectures and devices that support Android we can now run WPE WebKit on an even wider set of devices. Like a pair of XR glasses. This is a video of a port of Firefox Reality using WPEView instead of GeckoView:

Building blocks

Cerbero build system

WPE WebKit has a very long list of dependencies. Cross compiling all these dependencies manually can be quite cumbersome, so in order to ease the development process I focused my first weeks of work on setting up a more usable build system. We decided to use Cerbero, GStreamer’s cross compilation system, which already had recipes - this is how Cerbero names its build scripts - for many of the required dependencies. I wrote all the missing Cerbero recipes and integrated it into WPE Android’s build system, to the point that building everything requires a single python3 scripts/bootstrap.py --build command.

For now the only supported architecture is arm64. There are plans to support other architectures soon.

WPEView API

WPEView wraps the WPE WebKit browser engine in a reusable Android API. WPEView serves a similar purpose to Android’s built-in WebView and tries to mimick its API aiming to be an easy to use drop-in replacement with extended functionality.

Setting up WPEView in your Android application is fairly simple.

First, add the WPEView widget to your Activity layout

<com.wpe.wpeview.WPEView
        android:id="@+id/wpe_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        tools:context=".MainActivity"/>

And next, wire it in your Activity implementation to start using the API, for example, to load an URL:

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    var browser = findViewById(R.id.wpe_view)
    browser?.loadUrl(INITIAL_URL)
}

To get a better sense on how to use WPEView, check the code of the MiniBrowser demo in the examples folder.

Process model

In order to safeguard the rest of the system and to allow the application to remain responsive even if the user loads a web page that infinite loops or otherwise hangs, the modern incarnation of WebKit uses a multi-process architecture. Web pages are loaded in its own WebProcess. Multiple WebProcesses can share a browsing session, which lives in a shared NetworkProcess. In addition to handling all network accesses, this process is also responsible for managing the disk cache and Web APIs that allow websites to store structured data such as Web Storage API and IndexedDB API.

Given that Android forbids the fork syscall on non-rooted devices, we cannot directly spawn child processes. Instead, we use Android Services to host the logic of WebKit’s auxiliary processes. The life cycle of all WebKit’s auxiliary processes is managed by WebKit itself. The Android layer only proxies requests to spawn and terminate these processes/services.

In addition to the multi-process architecture, modern WebKit versions introduce the PSON model (Process Swap On Navigation) which aims to improve security by creating an independent WebProcess for each security origin. This is currently disabled for WPE Android, although partial support is already in place.

Browser and Pages

The central piece of WPE Android is the Browser top level Singleton object. This is somehow the equivalent to WebKit’s UIProcess. Among other duties it:

Manages the creation and destruction of Page instances.
Funnels WPEView API calls to the appropriate Page instance.
Manages the Android Services equivalent to WebKit’s auxiliary processes (Web and Network processes).
Hosts the UIProcess thread where the WebKitWebContext instance lives and where the main loop is run.

A Page roughly corresponds to a tab in a regular browser UI. There is a 1:1 relationship between WPEView and Page. Each Page instance has its own gfx.View and WebKitWebView instances associated.

WPE Backend

The common interface between WPEWebKit and its rendering backends is provided by libwpe. WPEBackend-android is our Android-oriented implementation of the libwpe API, bridging the gap between the WebKit architecture and the internal composition structure on one side and the Android system on the other.

gfx.View

gfx.View is an extension of android.opengl.GLSurfaceView living in the UI Process. It manages the life cycle of a Surface Texture, which is some sort of buffer consumer, that is handed off to the Web Process through Android’s IPC mechanisms, where the actual rendering happens.

It is also in charge of relaying input events to the internal WebKit input-methods.

This part is currently being significantly changed by Žan to use Native Hardware Buffers.

Future work

There are still plenty of things to do and, we have a growing list of issues in the main repository. The next steps will be towards extending support for other architectures - so far only arm64 is supported. Multimedia support is also on the list of immediate plans. Along with the big rendering engine refactor that Žan is working on.

Try it yourself

If you want to try the current prototype, you can follow the instructions in the README of the main repo.

We welcome contributions of all kinds. Give it a try and file issues as you encounter them. And if you feel encouraged enough, send us patches!

Acknowledgements

I would like to thank Igalia for giving me the time and space to work on this project.
Huge thanks to Žan Doberšek for his amazing work and continuous guidance.
Kudos to Philippe Normand and Thibault Saunier for their recommendations and support around Cerbero.
Many thanks to Imanol Fernández for his contributions so far and for the VR demo.

Servo Media Mid-Year review

Tue, 09 Jul 2019 00:00:00 +0000

We recently closed the first half of 2019 and with that it is time to look back and do a quick summary of what the media team has achieved during this 6 months period.

Looking at some stats, we merged 87 Pull Requests, we opened 56 issues, we closed 42 issues and we welcomed 13 new amazing contributors to the media stack.

A/V playback

These are some of the selected A/V playback related H1 acomplishments

Media cache and improved seeking

We significally improved the seeking experience of audio and video files by implementing preloading and buffering support and a media cache.

Basic media controls

After a few months of work we got partial support for the Shadow DOM API, which gave us the opportunity to implement our first basic set of media controls.

The UI is not perfect, among other things, because we still have no way to render a progress or volume bar properly, as that depends on the input type="range"> layout, which so far is rendered as a simple text box instead of the usual slider with a thumb.

GStreamer backend for MagicLeap

Another great achievement by Xavier Claessens from Collabora has been the GStreamer backend for Magic Leap. The work is not completely done yet, but as you can see on the animation bellow, he already managed to paint a full screen video on the Magic Leap device.

Hardware accelerated decoding

One of the most wanted features that we have been working on for almost a year and that has recently landed is hardware accelerated decoding.

Thanks to the excellent and constant work from the Igalian Víctor Jáquez, Servo recently gained support for hardware-accelerated media playback, which means lower CPU usage, better battery life and better thermal behaviour, among other goodies.

We only have support on Linux and Android (EGL and Wayland) so far. Support for other platforms is on the roadmap.

The numbers we are getting are already pretty nice. You might not be able to see it clearly on the video, but the renderer CPU time for the non hardware accelerated playback is ~8ms, compared to the ~1ms of CPU time that we get with the accelerated version.

Improved web compatibility of our media elements implementation

We also got a bunch of other smaller features that significantly improved the web compatibility of our media elements.

ferjm added support for the HTMLMediaElement poster frame attribute
swarnimarun implemented support for the HTMLMediaElement loop attribute
jackxbritton implemented the HTMLMediaElement crossorigin attribute logic.
Servo got the ability to mute and unmute as well as controlling the volume of audio and video playback thanks to stevesweetney and lucasfantacuci.
sreeise implemented the AudioTrack, VideoTrack, AudioTrackList and VideoTrackList interfaces.
georgeroman coded the required changes to allow changing the playback rate of audio and video files.
georgeroman, again, implemented support for the HTMLMediaElement canPlayType function.
dlrobertson paved the way for timed text tracks support by implementing the basics of the TextTrack API and the HTMLTrackElement interface.

WebAudio

We also got a few additions on the WebAudio land.

PurpleHairEngineer implemented the StereoPannerNode backend.
collares implemented the DOM side of the ChannelSplitterNode.
Akhilesh1996 implemented the AudioParam setValueCurveAtTime function.
snarasi6 implemented the deprecated setPosition and setOrientation AudioListener methods.

WebRTC

Thanks to jdm’s and Manishearth’s work, Servo has now the foundations of a WebRTC implementation and it is able to perform a 2-way calling with audio and video playback coming from the getUserMedia API.

Next steps

That’s not all folks! We have exciting plans for the second half of 2019.

A/V playback

On the A/V playback land, we want to:

Focus on adding hardware accelerated playback on Windows and OSX.
Add support for fullscreen playback.
Add support for 360 video.
Improve the existing media controls by, for instance, implementing a nicer layout for the <input type="range"> element, with a proper slider and a thumb, so we can have progress and volume bars.

WebAudio

For WebAudio there are plans to make some architectural improvements related to the timeline and the graph traversals.

We would also love to work on the MediaElementAudioSourceNode implementation.

WebRTC

For WebRTC, data channels are on the roadmap for the second half.

We currently support the playback of a single stream of audio and video simultaneously, so allowing the playback of multiple simulatenous streams of each type is also something that we would like to get during the following months.

Others

There were also plans to implement support for a global mute feature, and I am happy to say, that khodza already got this done right at the start of the second half.

Finally, we have been trying to get Youtube to work on Servo, but it turned out to be a difficult task because of non-media related issues (i.e. layout or web compatibility issues), so we decided to adjust the goal and focus on embedded Youtube support instead.

Originally published at https://blog.servo.org/

TIDx 2018 talk - Rust 101

Wed, 28 Feb 2018 00:00:00 +0000

On February 2018 I gave an introductory talk about Rust at the TIDx conference.

Project Link Networking

Fri, 22 Apr 2016 00:00:00 +0000

For the last few months, I have been involved in Project Link, one of Mozilla’s Connected Devices new research projects that aims to create a personal User Agent for the smart homes.

We have recently completed our first milestone where we managed to prototype a device that is able to communicate with a small set of other different devices through some wireless communication protocols like Zigbee and Z-Wave and that exposes an HTTP API for clients to get moderated access to these devices through the Link hub. So as today, we are able to setup a Link device in a network where other devices like a set of smart light bulbs, a smart door lock and a motion sensor are connected and we are able to create rules, from inside and outside of that network, to do things like turning off the lights, locking the door and sending a notification when the motion sensor detects that the user leaves her home.

Making Link communicate with the different devices through Zigbee or Z-Wave was certainly not an easy task and it required a lot of effort from many members of the team. But it was something that somehow we knew that we could do. In the end, these are known protocols, and even if we had to write a lot of code from scratch because of the choice of technology (Rust), there are already a lot of products in the market based on these technologies and a few examples of code that we could take as a starting point for our work.

To me, one of the most interesting challenges that we had to face during this initial stage of the project has been how to discover and securely connect to Link (a.k.a the box) from the client side while keeping a decent UX.

As Mozillians, we believe in the power of the web, so one of our self-imposed initial requirements for this project was that we wanted our client demo application to be written entirely with web technologies. We wanted to make this client potentially able to run on any platform with a modern web browser. And there were also other requirements:

This client had to be able to access Link locally, from the same network where the Link device was running on, but also remotely, from outside of that network.
The connection between Link and the client had to be securely encrypted in both cases (local and remote access).
And both things needed to happen seamlessly and transparently for the user.

Michiel B. de Jong did an excellent research work about the discovery and secure connection area and he proposed a few different solutions to these problems, that included different combinations of cloud, QR codes, Let’s Encrypt, mDNS and other technologies and protocols.

While we do not discard implementing any other of these proposals for the next phases of the project, for the initial prototype we ended up choosing a solution that most part of the team considered that had a good balance between security, privacy and user friendly experience and that could work cross platform and cross browser, taking advantage of the full power of the web.

Discovering the box

For the discovery part, we implemented the same mechanism that Philips uses to discover their Hue Lights Bridge. They call this nUPNP (network UPNP). And it is pretty simple. It requires Link to periodically register itself with a server in the cloud that has a known URL for the client. The data that is stored for this registration is a match between Link’s public and local IP addresses. To get the local address, the client just needs to do a HTTP GET request to the registration server ping endpoint, which should return a JSON object containing this information. This request has to be done from the same network Link is connected to.

Securely connecting to the box

Unfortunately, we cannot securely connect to local IP addresses through HTTPS. At least not with a proper UX that would not require a terrified user to accept warnings about insecure connections, and even in that case (with a self-signed certificate), it would be quite a poor security solution. We needed host names and a trusted CA for this. And here is where Let’sEncrypt and Plex’s solution enter in the game.

We heard about this company called Plex that has a very similar use case as ours and that is offering secure TLS connections to all their users. They have these media servers that users can self-host in their machines and can access to them securely from other devices. You can read about the details of Plex’s implementation in this blog post and see how it slightly defers from ours.

Remotely accessing the box (a.k.a tunneling)

To provide remote access to Link for those users that choose to have this kind of feature, we initially tried to use ngrok, but we found out that they do not support SNI on their open source version. So we ended up moving to PageKite, which offers the same core functionality but also provides SNI support.

Putting it all together

With all the above we ended up implementing the following bootstrap process for Link:

Link exposes HTTP and WebSockets services.
First thing that Link does is to generate a self-signed certificate that becomes its identifier.
It connects to an API on knilxof.org (our dev server) to create its public DNS zone under <fingerprint>.knilxof.org, using its self-signed certificate as a client certificate. The API server checks the fingerprint from the DNS zone edit request against the fingerprint of the client certificate presented.
Now that the Box has a public DNS zone it can control, it can get a LetsEncrypt certificate, using the DNS-01 challenge.
Link sets its main DNS A record to its current local IP address which it obtained via DHCP earlier. It will update this A record whenever its local IP address changes.
It also sets two or more mirror A records to its current local IP address. The idea here being that only one of the records will be cached by caching DNS servers, so switching to the other one at the right time will avoid downtime due to DNS propagation delays. This is currently not implemented.
If Link is setup to allow remote access, it starts up a PageKite client, which connects to a PageKite frontend, and adds the IP address of the public interface to the PageKite frontend into its DNS zone.
With the local, mirrors and tunneled URLs, Link sends a registration request to the nUPNP like registration server.

After the above process is completed, when the user browses to our client demo application, the app makes a cross-origin request to the registration server ping endpoint to obtain the URLs the app can use to securely connect to Link.

GET /ping HTTP/1.1

HTTP/1.1 200 OK  
Access-Control-Allow-Origin: *  
Access-Control-Allow-Headers: accept, authorization, content-type  
Content-Type: application/json; charset=utf-8  
Access-Control-Allow-Methods: GET, POST, PUT  
Content-Length: 312  
Date: Fri, 22 Apr 2016 14:39:44 GMT

[
   {
      "public_ip":"88.xxx.xxx.xxx",
      "client":"80a3c3ff0ffc7da455214fe7daaed9216bc4a5a6",
      "message": {
        "local_origin":
            "https://local.80a3c3ff0ffc7da455214fe7daaed9216bc4a5a6.box.knilxof.org:3000",
            "tunnel_origin":"https://remote.80a3c3ff0ffc7da455214fe7daaed9216bc4a5a6.box.knilxof.org"
      },
      "timestamp":1461335726
   }
]

The connection to the box is completely seamless for the user as she is never asked to enter a URL or to add any security exception on her browser.

Credits

Most part of the design and implementation work has been done by Michiel B. de Jong and Sam Giles.

Improving the Firefox OS Contacts application start-up time

Wed, 11 Mar 2015 00:00:00 +0000

One of the biggest challenges that we have in Firefox OS is the performance. We have been fighting it since day one and by applying some different techniques we managed to get to a point where we have some very decent application start-up time numbers.

The last application to get a considerable performance boost has been the Contacts application.

During the last few weeks, the Contacts team has been working on a patch that finally landed on master yesterday. The result is an improvement of around 720 milliseconds of perceived start-up time, which means that we saved almost 50% of the previous start-up time.

Datazilla already shows the change.

Comparing the results of running the Gaia performance tests with a heavy workload before and after the patch we get the following numbers:

communications/contacts (means in ms)	Base	Patch	Δ
moz-chrome-dom-loaded	1147	585	-562
moz-chrome-interactive	1267	1393	126
moz-app-visually-complete	1601	874	-727
moz-content-interactive	2131	1393	-738
moz-app-loaded	10942	10409	-533

As you can see we are sending the moz-app-visually-complete event ~727 milliseconds earlier than before. This is the event that we use to indicate that the application appears visually ready for user interaction and the one that we really want to send as soon as possible. We also get similar improvements for the moz-chrome-dom-loaded, moz-content-interactive and moz-app-loaded events. You can also notice that we had to make some trade offs and we lost some ground with the moz-chrome-interactive event. If you look closer, you will see that chrome and content are marked as interactive at the same time. This was not happening before. We were not able to interact with the application content until almost one second after being able to interact with the application chrome. Now we have everything ready at once ~700 milliseconds before and given that the most important part of the Contacts application is the contacts data itself, we consider that the small lost in the moz-chrome-interactive event is worth the result. You can check the MDN responsiveness guidelines page for more details about these events.

I recorded a quick video comparing the previous situation (left) with the current one (right). (Apologies for the low quality of the recording).

How did we get there

The target was to have some usable UI ready as soon as possible before the browser painted anything on the screen. For us, this usable UI is the application chrome with the Add contact and Settings options and the first chunk of contacts, including favorite and ICE contacts.

So far we were not doing bad showing the application chrome, but we were taking extra time to load the first group of visible contacts. To show this first content we needed to do a request to the MozContacts API to obtain the list of stored contacts and start appending to the DOM one new node per each contact information retrieved. The thing is that the result of this request rarely changed from one execution to the other. So why not caching it?

We followed the same approach that the Email team already applied on the Email application for caching the email list. We used localStorage to save the result of getting the contacts list from the MozContacts API and rendering the first chunk of contacts. To avoid having to do object serialization and parsing before and after accessing the localStorage item, we initially tried storing the whole outerHTML string of the contacts groups container holding the first chunk of contacts and applying it via innerHTML, but that did not give good enough performance and it made the logic for managing the contacts cache harder. Also, in the end we figured out that we needed to store other information like the language direction or the cache date along with the HTML to decide wether the cache was valid or not, so object serialization and parsing was required in any case. Instead of that, we ended up storing an object with this information to ease the cache eviction decision and enough information to rebuild the DOM containing the first chunk of contacts. We applied this data to the DOM via documentFragment. You can checkout the code for building and applying the cache.

The trickiest part of maintaining this cache is the eviction policy. We need to evict and rebuild the cache every time a contact is changed (added, removed or edited) and because this can happen from inside and from outside of the Contacts app (even when the app is closed), we need to be specially careful and verify the cache after applying it to the DOM without affecting the performance or causing visual reflows. You can follow this code to see how we managed to do that. Other scenarios where we need to evict the cache are language direction changes, ICE contacts changes, favorite contacts modifications and when the user changes the way the contacts are displayed (by first or last name).

Apart from building the cache mechanism we also changed the application bootstrap process in a way that we only load the minimum set of scripts required to get the cached information from localStorage and to apply it in the DOM. Once we have this process completed, we load the rest of the application Javascript that is required to continue the rest of the boot process. You can see this logic in the new bootstrap script.

Next steps

We want to keep improving the performance of the Contacts application. The next thing that we want to target is improving the loading of the contacts thumbnails. In fact, Francisco Jordano has already started working on it and there are already some visible improvements.

We also want to experiment with caching most part or even the whole contacts list in different chunks to allow the user to use the alpha scrolling and to get a fully loaded application even sooner.

Finally, the Gaia team is starting to play around with Service Workers and with the idea of using this new feature to cache already rendered views in a similar way that we did for Contacts. I cannot wait to see more progress in this area :)

Credits

Francisco Jordano, Johan Lorenzo, Sergi Mansilla.

Behind the scenes of the new Web Payments API from Mozilla

Tue, 09 Apr 2013 00:00:00 +0000

When we started working on Firefox OS, we realized that one of the biggest challenges would be enabling web application developers to securely monetise their content, not only for Firefox OS but for the Open Web in general.

We were looking for the same seamless experience that developers find in existing mobile app stores but we wanted to avoid tying them to any store or proprietary solution, while also allowing them to use the same payment mechanisms in desktop and mobile. We also had the challenge of easing the user’s payment process by adding carrier billing capabilities, along with other payment methods like credit cards, to this solution.

The Mozilla Marketplace team already had an experimental feature based on google.payments.inapp.buy to allow developers to add the capability for in-app payments to their apps. However, this solution was tied to the Firefox Marketplace and, as with Google’s solution, it involved the injection of a JS shim in the application code. So even if we liked this approach, we needed to modify it to fulfill our self-imposed requirements.

With Andreas Gal’s help, we started writing the first draft of navigator.mozPay() with the intention of proposing the first steps for an API that allows Open Web Apps to initiate payment requests from the user for digital goods with multiple payment providers and carrier billing options.

Once we had an agreement from both the Telefónica and Mozilla teams, we started implementing it for Firefox OS with valuable support from Fabrice Desré and Kumar McMillan. Along with this work, the BlueVia and Firefox Marketplace teams also started working on a first implementation of the WebPaymentProvider spec, one after the other, and we started working with Bango to enable them as a payment partner.

How it works

The navigator.mozPay API allows the developer to create payment requests for different payment providers to charge a user for the purchase of a digital good. In order to create each payment request, the developer needs to create a JSON Web Token (JWT) for each payment provider signed with the Application Secret given by each corresponding provider. This token contains the details of the payment request, including the Application Key, which uniquely identifies the developer and the product being sold.

{
  "iss": APPLICATION_KEY,
  "aud": "marketplace.firefox.com",
  ...
  "request": {
    "name": "Magical Unicorn",
    "pricePoint": 1,
    "postbackURL": "https://yourapp.com/postback",
    "chargebackURL": "https://yourapp.com/chargeback"
  }
}

Applications using navigator.mozPay asynchronously receive responses about the completion of payment requests through Javascript callbacks and through POST notifications done by the payment provider to the URLs specified by the developer within the JWT request as postbackURL (for payments) and chargebackURL (for refunds) parameters. The application must only rely on the server side notification to determine the result of a purchase.

 const request = navigator.mozPay([signedJWT1, signedJWTn]);
 request.onsuccess = function() {
   console.log(`Payment flow successfully completed ${evt.target.result}`);
   // The payment buy flow completed without errors.
   // This does NOT mean the payment was successful.
   waitForServerPostback();
 };
 request.onerror = function(evt) {
   console.error(`navigator.mozPay() error: ${evt.target.errorMsg.name}`);
 };

For more in-depth documentation read the navigator.mozPay() spec and the Firefox Marketplace guide to in-app payments.

The Firefox Marketplace itself uses navigator.mozPay() to request payments for application purchases and so it is a good proof of concept of the usage of the WebPayments API.

What next?

As Mozilla wrote on their blog last week, navigator.mozPay() is an experimental API and it is just a first step towards an Open Web Standard for payments. We are already working on some improvements like removing the server pre-requisite entirely or a better user experience for the payment flow on the payment provider’s side.

The plan is to keep working closely with Mozilla and others through the W3C to make a flexible API for payments part of the Open Web Standards.

After the launch of the first Firefox OS devices, we will be helping Mozilla to add support for navigator.mozPay() to Firefox desktop and Firefox for Android

Originally published at BlogThinkBig