04/12/2017

Using Win10 Built-in OCR

TLDR;

To get OCR in C# Console- Wpf- or WinForms-App:

  1. run on a modern Windows Version (e.g.: Win10)
  2. add nuget UwpDesktop
  3. add the following code:
var engine = OcrEngine.TryCreateFromLanguage(new Windows.Globalization.Language("en-US"));
string filePath = TestData.GetFilePath("testimage.png");
var file = await Windows.Storage.StorageFile.GetFileFromPathAsync(filePath);
var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
var decoder = await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(stream);
var softwareBitmap = await decoder.GetSoftwareBitmapAsync();
var ocrResult = await engine.RecognizeAsync(softwareBitmap);
Console.WriteLine(ocrResult.Text);

OCR Troubles

When UWP (=Universal Windows Platform) Apps were introduced, I was interested in what new APIs came with them. Soon the OcrEngine (https://docs.microsoft.com/en-us/uwp/api/windows.media.ocr.ocrengine) peaked my interest, because it promised a simple and quick way to retrieve text from images.

A simple OcrEngine was something that I was looking for as the alternatives are big and cumbersome to use (I am looking at you Tesseract), discontinued (MODI; was included with Office), in the cloud and/or expensive.

Back then the problem was that you needed to create a UWP Application to access the UWP APIs, but at the same time an UWP Application was completely sandboxed! You couldn't even use any cross process communication (with the exception of using the cloud and a very basic file based approach).

That meant I couldn't use the OcrEngine in a WindowsService or WebService or even over a commandline!

So with that being the case, I put together a quick solution using Tesseract, but I never got around to tuning it and it never performed well.

UwpDesktop

Time went by and then the great Lucian Wischik (https://blogs.msdn.microsoft.com/lucian) published the library uwp-desktop (https://github.com/ljw1004/uwp-desktop) as a nuget package called UwpDesktop.

This package made UWP APIs available to Applications based on the normal .NET Framework. When I read the announcement, I was instantly reminded of my previous failure to make use of the OcrEngine and finally today I took it out for a spin and it worked great!

Example Code

The following code reads in the supplied file and prints out the detected text:

var engine = OcrEngine.TryCreateFromLanguage(new Windows.Globalization.Language("en-US"));
string filePath = TestData.GetFilePath("testimage.png");
var file = await Windows.Storage.StorageFile.GetFileFromPathAsync(filePath);
var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
var decoder = await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(stream);
var softwareBitmap = await decoder.GetSoftwareBitmapAsync();
var ocrResult = await engine.RecognizeAsync(softwareBitmap);
Console.WriteLine(ocrResult.Text);

Example Application

I've put together a very simple example app and pushed it to github (https://github.com/8/ConsoleUwpOcr) that makes use of the OcrEngine.

Example Output:

ocr.exe ..\..\..\ConsoleUwpOcr.Test\TestData\testimage.png
Welcome to Thunderbird Donate to Thunderbird Thunderbird IS the leading open source, cross- platform email and calendaring client, free for business and personal use. We want it to stay secure and become even better. If you like Thunderbird, please consider a donation! By donating, you Will help us to continue delivering an ad-free top-notch email client. Make a donation » Other ways to contribute to Thunderbird Now IS a great time for you to get involved: writing code, testing, support, localization and more. Join a global community! Share your skills and Pick up a few new ones along the way. Volunteer as much as you like. Or as little. It's totally up to you. Learn more » Why we need donations You might already know that Thunderbird improvements are no longer paid for by Mozilla. Fortunately there IS an active community keeping it running and developing it further. But to survive long term, the project needs funding. Thunderbird IS currently transitioning to an independent organization. Being independent, we can shape our own fate, but there IS significant infrastructure that must be majntajned to deliver the application to our tens of millions of users. For Thunderbird to survive and continue to evolve, we need your support and ask for your donation today. All the money donated Will go directly to funding Thunderbird development and infrastructure.

References

Last updated 04/12/2017 18:17:43
blog comments powered by Disqus
Questions?
Ask Martin