k j

Waken: Reverse Engineering Usage Information and Interface Structure from Software Videos

Nikola Banovic, Tovi Grossman, Justin Matejka, George Fitzmaurice
January 2012 · Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST)

Abstract

We present Waken, an application-independent system that recognizes UI components and activities from screen captured videos, without any prior knowledge of that application. Waken can identify the cursors, icons, menus, and tooltips that an application contains, and when those items are used. Waken uses frame differencing to identify occurrences of behaviors that are common across graphical user interfaces. Candidate templates are built, and then other occurrences of those templates are identified using a multi-phase algorithm. An evaluation demonstrates that the system can successfully reconstruct many aspects of a UI without any prior application-dependant knowledge. To showcase the design opportunities that are introduced by having this additional meta-data, we present the Waken Video Player, which allows users to directly interact with UI components that are displayed in the video.

Figures

Figure 1. Waken uses frame differencing to extract UI elements, such as cursors (a) and icons (b).
Figure 2. UI Buttons states in a) Google SketchUp, b) Adobe Photoshop, and c) Microsoft Word: i) de-fault, ii) highlighted, iii) clicked, and iv) active.
Figure 3. The four main phases of the Waken processing system.
Figure 4. Pixel differences between frames (a) and (b) is show in (c). Applying our filter removes differ-ences due to noise (d).
Figure 5. (Top) Consecutive frames and the corre- sponding blobs in each of the two absolute differ- ences when only the cursor moves (Bottom).
Figure 6. Consecutive frames and a) corresponding single blobs in the absolute difference frames when cursor moves a short distance. b) Cursor shapes in intersection of consecutive differences.
Figure 7. The cursor viewer application visualizes cursor templates we generate. Variance is repre-sented by color in (b) and height in (c).
Figure 8. Typical cursor movement when approach- ing and acquiring an icon and resulting frame differ- ences.
Figure 9. Typical cursor movement when clicking an icon and resulting frame differences.
Figure 10. a) Cursor reconstructed using our algo- rithm on the test data. b) Actual cursor icon. c) Sys- tem cursor with hotspot estimate (solid black dot in- dicates 95% confidence).
Figure 11. Cursor tracking accuracy by video. Ad-justed accuracy shows results when the low-sampled cursors are removed from the data set.
Figure 12. Cursor tracking accuracy by cursor.
Figure 13. The Waken Video Player user interface components. a) The playback area. b) Highlighted cursor. c) Navi-gation panel. d) Event based timelines. e) Cursor highlight toggle.
Figure 14. a) A tooltip is rendered over the video b) Menu contents are rendered over the video.
Figure 15. Cursor and clickable icon recognition from Adobe Photoshop (a) and Microsoft Word (b).

BibTeX

@inproceedings{10.1145/2380116.2380129,
 abstract = {We present Waken, an application-independent system that recognizes UI components and activities from screen captured videos, without any prior knowledge of that application. Waken can identify the cursors, icons, menus, and tooltips that an application contains, and when those items are used. Waken uses frame differencing to identify occurrences of behaviors that are common across graphical user interfaces. Candidate templates are built, and then other occurrences of those templates are identified using a multi-phase algorithm. An evaluation demonstrates that the system can successfully reconstruct many aspects of a UI without any prior application-dependant knowledge. To showcase the design opportunities that are introduced by having this additional meta-data, we present the Waken Video Player, which allows users to directly interact with UI components that are displayed in the video.},
 address = {New York, NY, USA},
 author = {Banovic, Nikola and Grossman, Tovi and Matejka, Justin and Fitzmaurice, George},
 booktitle = {Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology},
 doi = {10.1145/2380116.2380129},
 isbn = {9781450315807},
 keywords = {pixel-based reverse engineering, tutorials, videos},
 location = {Cambridge, Massachusetts, USA},
 numpages = {10},
 pages = {83–92},
 publisher = {Association for Computing Machinery},
 series = {UIST '12},
 title = {Waken: Reverse Engineering Usage Information and Interface Structure from Software Videos},
 url = {https://doi.org/10.1145/2380116.2380129},
 year = {2012}
}