Ending Android USB Freezes

$ cat content.md

In 2024 we undertook an effort to optimize the time taken to navigate to our Select Payment Method screen. This is one of our most common navigations — it’s critical that it completes in a timely manner.

Loading image...

We found that devices with USB barcode scanners occasionally had egregiously bad performance. We attached USB barcode scanners to our development devices and captured Perfetto traces while simulating payments.

Loading image...

We weren’t able to reproduce the egregiously bad performance every time, but we found instances where our main thread was blocked on a somewhat mysterious Perfetto span named IncrementalDisableThreadFlip which isn't part of our own codebase — hinting at a deeper system-level problem.

Loading image...

In this particular trace, this span took 867ms, which is much longer than the entire navigation takes under normal circumstances. Clearly this was problematic.

Digging deeper to find the root cause

To better understand the cause, we examined what other threads in the process were doing during the delay. Two stood out — both were blocked at the same time:

Loading image...

The thread name HeapTaskDaemon and the span name Background concurrent copying GC were a strong hint that this was garbage-collection related. The other thread involved was Sq-usb_barcode_30652. This was not surprising in that we knew the problem involved a USB barcode scanner, but we didn’t know what this had to do with garbage collection and the Perfetto trace provided very little detail. We collected a Simpleperf trace so we could see the stack trace when the thread was blocked, looking through each method on the stack for hints:


1    at __schedule
2    at __schedule
3    at schedule
4    at schedule_timeout
5    at wait_for_common
6    at wait_for_completion_timeout
7    at usb_start_wait_urb
8    at usb_bulk_msg
9    at proc_bulk
10    at usbdev_do_ioctl
11    at usbdev_ioctl
12    at do_vfs_ioctl
13    at sys_ioctl
14    at __sys_trace
15    at __ioctl
16    at ioctl
17    at usb_device_bulk_transfer
18    at android_hardware_UsbDeviceConnection_bulk_request
19    at android.hardware.usb.UsbDeviceConnection.bulkTransfer
20    at android.hardware.usb.UsbDeviceConnection.bulkTransfer
21    at com.squareup.cdx.barcodescanners.UsbBarcodeBulkTransferCommunication.transferBytes
22    at com.squareup.cdx.barcodescanners.UsbBarcodeScanner$start$usbRequestRunnable$1.run
23    at java.util.concurrent.ThreadPoolExecutor.runWorker
24    at java.util.concurrent.ThreadPoolExecutor$Worker.run
25    at com.squareup.thread.Threads$namedThreadFactory$1$newThread$1.invoke
26    at com.squareup.thread.Threads$namedThreadFactory$1$newThread$1.invoke
27    at kotlin.concurrent.ThreadsKt$thread$thread$1.run
28    at __pthread_start
29    at __start_thread

One method that caught our attention was android_hardware_UsbDeviceConnection_bulk_request, which is native code in the Android framework:


cpp
1...
2jbyte* bufferBytes = NULL;
3if (buffer) {
4    bufferBytes = (jbyte*)env->GetPrimitiveArrayCritical(buffer, NULL);
5}
6
7jint result = usb_device_bulk_transfer(device, endpoint, bufferBytes + start, length, timeout);
8
9if (bufferBytes) {
10    env->ReleasePrimitiveArrayCritical(buffer, bufferBytes, 0);
11}
12...

GetPrimitiveArrayCritical/ReleasePrimitiveArrayCritical are Java Native Interface (JNI) APIs. The Critical suffix indicates that special caution is warranted. The API documentation states:

After calling GetPrimitiveArrayCritical, the native code should not run for an extended period of time before it calls ReleasePrimitiveArrayCritical. We must treat the code inside this pair of functions as running in a "critical region." Inside a critical region, native code must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread. (For example, the current thread must not call read on a stream being written by another Java thread.)

These restrictions make it more likely that the native code will obtain an uncopied version of the array, even if the VM does not support pinning. For example, a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical.

Garbage Collection theory review

Garbage Collection (GC) is the process by which a runtime (like ART — the Android Runtime) automatically reclaims unused memory. In a "concurrent copying GC" the garbage collector copies live objects from one part of the heap to another, updating object references in the process. These parts of the heap are commonly called from-space and to-space. At the end of the copying process, all the live objects are in the to-space — allowing the from-space to be completely reclaimed.

The GC is considered concurrent because the garbage collector closely cooperates with the rest of the VM to allow the threads to continue to run in parallel with the garbage collector. The mechanisms that do this are interesting, but out of scope for this blog post. Suffice to say there is a critical point at which the garbage collector must briefly pause each thread in order to flip it — that is, scan through its stack and update each object reference to point to the corresponding location in the to-space.

Loading image...

Most JNI APIs are carefully designed to include a level of indirection — C/C++ code cannot access Java objects directly. Instead C/C++ code calls JNI methods like GetByteField to retrieve a field of type byte. This level of indirection allows the VM to direct the access to the correct space, cooperating with the GC to allow objects to be moved.

Critical JNI APIs throw a wrench in the mix. The GC updates object references using temporary forwarding pointers placed in the original (from-space) object headers — but these cannot be created until the object is moved, and the object cannot be moved while it is held in place via GetPrimitiveArrayCritical. Thus, the thread flip is blocked until a corresponding ReleasePrimitiveArrayCritical is called for each prior GetPrimitiveArrayCritical. This is what the following line from the API documentation refers to:

a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical

When used carefully, Critical JNI APIs can avoid unnecessary copies, leading to better performance. But if used incorrectly, these APIs can cause problems far worse than the copies they strive to avoid.

Loading image...

Applying the theory to our problem at hand

android_hardware_UsbDeviceConnection_bulk_request was calling usb_device_bulk_transfer which was taking a long time to complete. This was a clear violation of the GetPrimitiveArrayCritical contract.

The particular barcode scanner we were using would block waiting for new data. We called it with a timeout value of one second. That's not a problem with the barcode scanner — USB transfers are allowed to block and/or take a long time. It's incorrect for android_hardware_UsbDeviceConnection_bulk_request to hold an array from GetPrimitiveArrayCritical while calling an API that can block like this.

The last piece of the puzzle: the GC cannot complete the thread flip until all objects have been relocated to "to" space, and it can't relocate arrays until ReleasePrimitiveArrayCritical is called for each outstanding GetPrimitiveArrayCritical. The main thread was attempting to call GetPrimitiveArrayCritical and the GC was blocking that until it finished the thread flip.


1    at art::JNI::GetPrimitiveArrayCritical
2    at art::(anonymous namespace)::CheckJNI::GetPrimitiveArrayCritical
3    at android::NativeApplyStyle
4    at android.content.res.AssetManager.applyStyle
5    at android.content.res.ResourcesImpl$ThemeImpl.obtainStyledAttributes
6    at android.content.res.Resources$Theme.obtainStyledAttributes
7    at android.content.Context.obtainStyledAttributes
8    at android.view.View.<init>
9    at android.widget.TextView.<init>
10    ...

Fixing the problem in AOSP

This problematic use of GetPrimitiveArrayCritical has been in AOSP since 2013, but it didn't actually become a problem until a year a half later when Lollipop was released, which included Android's first moving garbage collector (you can learn about the history of garbage collection on Android here).

We prepared a change to use a non-Critical API variant within android_hardware_UsbDeviceConnection_bulk_request. The problem occurred on our own hardware so we could ship a fix independently of AOSP, but we chose to upstream our fix too. In addition to keeping our sources in sync with AOSP we also benefitted from more thorough code reviews and testing, which uncovered a subtle flaw in our first attempt. Others had reported the same issue years earlier, so they will benefit from this fix too.

We eventually identified this problem in some of our production ANR traces, which led us to identifying the same problem in android_hardware_UsbDeviceConnection_control_request.

Our changesets:

Both are in Android 16, ensuring the fix is available to a wide audience.

Results

Even before we fixed the problem at the OS level we were able to work around the issue by using UsbRequest instead of bulkTransfer. We saw our P90 latency decrease by nearly 40%.

Loading image...

Takeaways

Monitor your application's performance in production so you know when your customers experience problems.
Follow API contracts, but when things go wrong it pays to also understand the theory behind those contracts.
Android provides powerful tools — like Perfetto and Simpleperf — for understanding your application's behavior.
No code of non-trivial size is perfect. Platform bugs are rare, but they do occur.
Contributing to open-source is in everyone's best interest.

Digging deeper to find the root cause

Garbage Collection theory review

Applying the theory to our problem at hand

Fixing the problem in AOSP

Results

Takeaways

Tom Mulcahy