Category Archives: Parallel

Optical Character Recognition with parallel computing?

Some references to collect here on the use of general-purpose graphic processors in OCR.  Nominally, you run some kind of MapReduce based algorithm with a GPGPU over some image, and out spits text much faster than any ordinary processor of any speed; of course, there are lots of details to attend to.

From Berkeley EECS: Algorithms and Frameworks for OCR and Content-based Image Retrieval

Jike Chong, Bryan Christopher Catanzaro, Narayanan Sundaram, Fares Hedayati and Kurt Keutzer

A new breed of general-purpose manycore computing platform is
emerging. Exemplary examples include the Niagara from Sun Microsystems,
the G80 from Nvidia, the Cell from IBM, and the up-coming Larrabee from
Intel. These manycore processors each pack 8-32 relatively simple cores
on a chip, capable of supporting up to 100s of threads, and boast
tremendous potential peak single-chip performances up to the range of
Tera-FLOPS. However, traditional algorithms and applications in many
domains cannot take advantage of much of the parallelism provided by
these platforms.

The ParLab at Berkeley was recently founded in part to help
meet this acute need for novel algorithmic approaches to unleash the
performance potentials of emerging manycore platforms for a wide range
of application domains. It proposes to concentrate on analyzing the
communication and computation patterns (or Dwarfs) of important classes
of algorithms underlying modern application domains, and develop
techniques to efficiently parallelize them for the general purpose
manycore platforms.

We concentrate on the domain of image recognition and
retrieval, leveraging the Intel PIRO content-based image retrieval
framework as a motivating application. Specifically, we study the
parallelization of classification algorithms for machine learning, and
develop parallelization techniques to improve the performance of these
algorithms and applications on the emerging manycore platforms.

Here's a discussion of an OCR system that uses the CUDA libraries called cocr

The cuda implementation allows for exceptionally fast image
manipulation, cleaning and segmentation before being presented to the
template based ocr system.  Traditional cpu based ocr systems are very
slow, especially in the image rotation, cleaning etc departments and
although cocr is by no means a complete ocr package it is orders of
magnitude faster than the various cpu based ones I’ve tried. With a bit
more work and an additional of a neural network ocr it could easily
become a system able to do greater than realtime ocr’ing.  Neural
networks are extremely well suited to GPU implementations due to their
inherit parallelism. I will post more snippets and modules of the
system here over the coming weeks.

I wasn't able to find any evidence that the ocropus OCR system that Google has used and released is using any parallel computing mechanisms, so this still looks like "gee, it should work for someone, if you can throw a PhD student or two at it."


massively parallel systems of massively parallel machines

Here's some notes about parallel computing, going in two directions at the same time.  In one direction you have tools like Map/Reduce or Hadoop managing the work of parceling a problem into many pieces and then coordinating the collection of those pieces.  In another direction you have graphics chips being turned into interesting general purpose parallel computing chips that can do some specific operations very fast.

It's mostly in the form of clippings, in part because I don't have a whole story yet, just the fragments.

from P16: Practical Progress – Supercomputing for the masses

One thing CUDA doesn't provide is a way to manage and process
massive datasets. The cards have somewhat limited memory, and you have
to write an app that runs on the host and feeds the card with data. For
search applications like topic clustering — which I'd like to use this
for — CUDA alone doesn't provide an answer.

Perhaps it would make sense eventually to use Hadoop plus CUDA —
write your map/reduce tasks in CUDA, and rely on Hadoop to distribute
data around a cluster of Nvidia-accelerated boxes? – Map Reduce on GPUs

A fellow at the german hadoop user meeting (Thanks to Isabel that
organized that again) pointed me to the fact that GPUs on a graphic
cards basically working like server grids.
He mentioned there are some research papers in this field. I spend some
time to read through what I could found and it was quite interesting.
Let me citate some of the facts from the two most interesting papers:

+ “A Map Reduce Framework for Programming Graphics Processors” by
Bryan Catanzaro, Narayanan Sundaram and Kurt Keutzer UC, Berkeley
+ “Mars: A MapReduce Framework on Graphics Processors” by Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, Tuyong Wang

and, hiding at the end of a long set of searches, is Cloudera, a company

Hadoop is the popular open source
implementation of MapReduce, a powerful tool designed for the detailed
analysis and transformation of very large data sets. Hadoop enables you
to explore complex data in its native form, using custom analyses
tailored to the information and questions you have.

Cloudera can help you install, configure and run Hadoop for large-scale data processing and analysis.

The vision is not a rack of ordinary CPUs using Hadoop to manage them; the vision is a rack of CPU+GPU combinations where you can take advantage of parallelization both between machines and on the machine.