What we typically call an Augmented Reality app on a smartphone or a touchpad device is built the following way: it captures the video stream provided by the built-in camera, transforms it, and delivers the output stream to the screen. The process of ‘transformation’ generally includes detection of a known object (marker) in the input stream, calculating its position in the 3D scene geometry, overlaying the scene with an artificial 3D model, placed at the right position and angles, and putting all this together. Depending on the application, the model can be static or dynamic, interactive or not, etc.
The main challenge here is that the whole process is highly desirable to happen in real time, so that the artificial objects melded into the real scene look natural. The higher a video frame rate that is acheived (frames-per-second, FPS), the smoother objects move, thus the better the overall user experience is.
Modern middle-class handheld devices are typically able to record and output a VGA (640 by 480 pixels) video, or even HD (1080 by 720 pixels) quality at a rate of 30 frames per second. However this speed refers only to zero processing time when the input stream is literally wired to the screen. When it comes to more or less complicated captured stream processing the frame rate drops dramatically.
The cause of this drop is that handheld devices still yield significantly to their desktop brothers in computational power despite having vivid and responsive UI and fast graphics, video and sound processing. These tasks are mostly accomplished by special hardware that implements standard algorithms as audio/video codecs and graphics rendering.
As object capturing and 3D scene reconstruction cannot be done purely in hardware, all AR algorithms running on gadgets face CPU performance limitations and induced frame rate drops on any gadget model, except the recently appeared Apple iPhone 5.
There are plenty of companies who provide AR toolkits and SDKs. Nevertheless none of them can avoid the lack of performance needed to process AR tasks in true real time (with no FPS drop). The AR library, targeted to run in resource-tight conditions, is necessary to be state-of-the-art engineered and coded to minimize the required CPU time and memory footprint. Profiling of several popular AR frameworks shows that they use various software tricks to consider, say, device movement to gain better performance.
DataArt Computer Vision Competence Center has recently started creating its own all-custom AR engine as a response to a RFP need, where capturing a dynamic (of a constant form, but with changeable content) marker was required. Trying to accomplish this, several 3rd party AR engines made an attempt, but none were able to handle dynamic markers.
Based on the custom engine, DataArt is building an iOS-targeted AR solution that not only implements typical AR app tasks such as replacing the captured marker with an artificial one (optionally animated and/or interactive), but also captures the inner contents of a marker, decodes it, and acts according to the obtained code.
The engine is written in C++ with the use of an open-source computer vision library, the openCV. The engine supports COLLADA as 3D animation format.
The AR algorithms for the engine have been modeled, proven and polished in Matlab prior to being coded for the Apple iOS platform. The challenges the team had to face besides the performance limitations were: 3D spatial vectors instability, tracked object loss, frame timing tricks and smoothing, etc. Some of the issues are addressed in the currently released version, some are modeled in a Matlab prototype, currently pending delivery.