Tutorial Open CVand Direct Show

OpenCV Tutorial by R. Laganiere http://www.site.uottawa.
ca/~laganier/tutorial/opencv+directshow/
Programming computer vision applications:
A step-by-step guide to the use of the Intel OpenCV library

and the Microsoft DirectShow technology
Robert Laganière, VIVA lab, University of Ottawa.
The objective of this page is to teach you how to use the Intel libraries to build applications where images or
sequences of images have to be processed. In addition, the DirectShow technology is also introduced; this
one is particularly useful to process image sequences or sequences captures using PC cameras.
Since this is a beginner’s guide, efforts have been made to describe in details all the necessary steps to obtain
the shown results. In addition, all the source codes used here have been made available. Note, however, that
the goal was to keep these programs as simple and short as possible; as a consequence the programming style
is not always of good quality. In particular, a better adherence to the object-oriented paradigm would have
considerably increased the quality of the programming.
The Intel Image Processing Library can be found at:

q developer.intel.com/software/products/perflib/ipl/index.htm
However, the IPL
is no longer available at this official site. This is not too problematic, since most functionalities are still
available through OpenCV (including the IplImage data structure). The home page of the Open Computer
Vision library is at:
q www.intel.com/research/mrl/research/opencv/
Finally, to use the DirectShow technology, you must download the Microsoft DirectX SDK, that can be
found at:
q www.microsoft.com/windows/directx
The OpenCV beta 2.1 has been used to produce the examples below with DirectX 8.1 and Visual C++ 6.0
service pack 5 under Windows 2000.
March 28, 2003: Note that the last section has been updated and that the OpenCV beta 3.1 has been used in
these last examples.
February 18, 2003: All source codes have been updated to OpenCV beta 3.1 and any reference to the old IPL
library has been removed. An additional example using an image iterator has been added.
Your inputs are welcome.
1. Creating a Dialog-based application
All applications presented here will be simple dialog-based applications. This kind of applications can easily
be created using the MFC application wizard. On you Visual C++ menu bar, select the File|New option.
Then start the MFCAppWizard (exe). You choose a dialog-based application; select a name for the
application (here it is called cvision). VC++ should create a simple OK/Cancel Dialog for you. The class
with a name ending by Dlg will contain the member functions that control the widget of the dialog.
The first task will be to open and display an image. To do this, we will first add a button that will allow us to
select the file that contains the image. Drag a button onto the dialog and then right click on it and select
Properties option; this will allow you to change the caption to Open Image. Once this done,
double-click on the new button and change the corresponding member function name to OnOpen. The dialog
now looks like this:
1 von 33 11.04.2008 11:43

OpenCV Tutorial by R. Laganiere http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/
The CFileDialog
class is the one to use in order to create a file dialog. This one will show up by adding the following code to
the the OnOpen member function
void CCvisionDlg::OnOpen()
{
CFileDialog dlg(TRUE, _T("*.bmp"), "",
OFN_FILEMUSTEXIST|OFN_PATHMUSTEXIST|OFN_HIDEREADONLY,
"image files (*.bmp; *.jpg) |*.bmp;*.jpg|
AVI files (*.avi) |*.avi|All Files (*.*)|*.*||",NULL);
char title[]= {"Open Image"};

dlg.m_ofn.lpstrTitle= title;
if (dlg.DoModal() == IDOK) {
CString path= dlg.GetPathName(); // contain the

// selected filename
}
}
Note how the extensions of interest (here .bmp .jpg and .avi) for the files to be opened are specified using
the fourth argument of the CFileDialog
constructor. Now, by clicking on the Open Image button, the following dialog appears:
2. Loading and displaying an image
Now that we learnt how to select a file, let’s load and display the underlying image. The Intel libraries will
help us to accomplish this task. In particular, the HighGui component of OpenCV will be put to contribution.
This one contains the required functions to load, save and display images under the Windows environment.
Since we will be using these libraries in all the example to follow, we will first see how to setup adequately
2 von 33 11.04.2008 11:43

our VC++ projects in order to have the libraries linked to our application. Selection the
Project|Settings… option. A dialog will pop up. Select the C/C++ tab and the category
Preprocessor. Add the following directories to additional include directories:
q C:\Program Files\Intel\plsuite\include
q C:\Program Files\Intel\opencv\cv\include
q C:\Program Files\Intel\opencv\otherlibs\highgui
Select now the Link Tab, category Input. Add to additional library path the following directories:
q C:\Program Files\Intel\plsuite\lib\msvc
q C:\Program Files\Intel\opencv\lib
And finally select the category General of the Link tab and add the following libraries to library modules:
ipl.lib cv.lib highgui.lib
This setup is valid for the current project. It could be a good idea to add all these directories to the global
search path of your VC++ such that they will always be active each time you create a new project. This can
be done from the Tools|Options…
menu. You then select the Directories tab. The following two screenshots show you the information that
should be included there.
Note also that we have included the DirectX directory information (which is, in our case, C:\DXSDK\Lib)
3 von 33 11.04.2008 11:43

that we will use in later examples. This one should always be the first in the list to avoid incompatibilities
with other libraries.
With these global settings, only the names of the library modules need to be specified when a new project is
created:
Now add the following header file to the project, here called cvapp.h:
#if !defined IMAGEPROCESSOR

#define IMAGEPROCESSOR
#include <stdio.h>
#include <math.h>
#include <string.h>
#include "cv.h" // include core library interface
#include "highgui.h" // include GUI library interface
class ImageProcessor {
IplImage* img; // Declare IPL/OpenCV image pointer
public:
ImageProcessor(CString filename, bool display=true) {
img = cvvLoadImage( filename ); // load image
if (display) {
// create a window
cvvNamedWindow( "Original Image", 1 );
// display the image on window

cvvShowImage( "Original Image", img );
}
}
4 von 33 11.04.2008 11:43

~ImageProcessor() {
cvReleaseImage( &img );
}
};
#endif
The function names starting with cvv are HighGui functions. To use the ImageProcessor class in the
application, just include the header to the dialog. Once a file is open, an ImageProcessor instance can be
created, this can be done as follows:
{
"BMP files (*.bmp) |*.bmp|AVI files (*.avi) |*.avi|
All Files (*.*)|*.*||",NULL);

CString path= dlg.GetPathName();

ImageProcessor ip(path); // load, create and display
}
}
Then when you select an image, this window should appear:
3. Processing an image
Now let’s try to call one of the OpenCV function. We rewrite the header as follows:
#if !defined IMAGEPROCESSOR

#define IMAGEPROCESSOR
#include <stdio.h>
#include <math.h>
#include <string.h>
5 von 33 11.04.2008 11:43

#include "cv.h" // include core library interface

#include "highgui.h" // include GUI library interface
class ImageProcessor {
public:
ImageProcessor(CString filename, bool display=true) {
img = cvvLoadImage( filename ); // load image
if (display) {
cvvNamedWindow( "Original Image", 1 );

cvvShowImage( "Original Image", img );
}
}
void display() {
cvvNamedWindow( "Resulting Image", 1 );

cvvShowImage( "Resulting Image", img );
}
void execute();
~ImageProcessor() {
cvReleaseImage( &img );
}
};
extern ImageProcessor *proc;
#endif
and we add a C++ source file, here named cvapp.cpp, that contains the function that does the processing.
#include "stdafx.h"
#include "cvapp.h"
// A global variable
ImageProcessor *proc = 0;
// the function that processes the image

void process(void* img) {
IplImage* image = reinterpret_cast<IplImage*>(img);

cvErode( image, image, 0, 2 );
void ImageProcessor::execute() {
process(img);
}
6 von 33 11.04.2008 11:43

The process
function is the one that calls the OpenCV function that does the processing. In this example, the processing
consists in a simple morphological erosion (cvErode). Obviously, all the processing could have been done
directly inside the execute member function. Also, there is no justification, at this point, to have use a
void pointer as parameter for the process
function. This has been done just for consistency with the examples to follow where the process function
will become a callback function in the processing of a sequence. Note that for simplicity, we have added a
global variable that points to the ImageProcessor instance that this application uses. Let’s now modify
our dialog by adding another button, i.e.:
The member functions now become:
{
OFN_FILEMUSTEXIST|OFN_PATHMUSTEXIST|
OFN_HIDEREADONLY,
"image files (*.bmp; *.jpg) |*.bmp;*.jpg|
AVI files (*.avi) |*.avi|All Files (*.*)|*.*||",NULL);

if (proc != 0)
delete proc;
proc= new ImageProcessor(path);

}
}
void CCvisionDlg::OnProcess()
{
if (proc != 0) {
// process and display

proc->execute();
proc->display();
}
}
If you open an image and push the process button, then the result is:
7 von 33 11.04.2008 11:43

Check point #1: source code of the above example.
4. Creating an image and accessing its pixels
In the preceding example, the image has been created from a file. In many applications, it would be also
useful to create an image from scratch. This can be done using the IPL functions in which case you must first
create a header that specify the image format. The following two examples show how to create a gray level
image and a color image.
// Creating a gray level image

IplImage* gray=
iplCreateImageHeader(1,0,IPL_DEPTH_8U,
"GRAY","G",
IPL_DATA_ORDER_PIXEL,IPL_ORIGIN_TL,IPL_ALIGN_QWORD,
width,height,
NULL,NULL,NULL,NULL);
iplAllocateImage(gray, 1, 0);
// Creating a color image

IplImage* color =
iplCreateImageHeader(3,0, IPL_DEPTH_8U,
"RGB", "BGR",
IPL_DATA_ORDER_PIXEL, IPL_ORIGIN_TL, IPL_ALIGN_QWORD,
width, height
NULL,NULL,NULL,NULL);
iplAllocateImage(color, 1, 0);
The first parameter specifies the number of channel and the second is 0 if there is no alpha channel in the
image (which is most often the case in computer vision). The third parameter defines the pixel type. An
unsigned 8 bits pixel (IPL_DEPTH_8U ) is the common choice but 2-byte signed integer
(IPL_DEPTH_16S) and 4-byte float (IPL_DEPTH_32F ) are also very useful. The next parameters
specify the color model (basically "GRAY" or "RGB") and the channel sequence (in case of a color image).
The data order parameter specifies how the different color channels are ordered. Under IPL the choices are
pixel-oriented, i.e. RGBRGBRGB… or plane-oriented, i.e. RRRR…GGGGG…BBBB… The origin is
normally at the top left corner (IPL_ORIGIN_TL). For an efficient use of the MMX capabilities of the
processor, the line length of an image should be a multiple of 8 bytes. This is guaranteed by choosing the
quad-word alignment, each line being padded with dummy pixels if necessary. Finally, the width (number of
column) and the height (number of lines) of the image are specified. The last four parameters are usually
8 von 33 11.04.2008 11:43

NULL.
Once the header created, memory must be allocated. This is the role of the iplAllocateImage function.
An initial value for the pixel data can be specified, this is the last parameter. The middle parameter of this
function must be set to 0 if no initialization is required. Do not forget to deallocate the images at the end of
the process by calling iplDeallocate(image, IPL_IMAGE_ALL ). Note that for floating point
image, iplAllocateImageFP and iplDeallocateImageFP must be used instead.
An alternative way to create and allocate image is to use the OpenCV equivalent function. Here only the size,
the pixel depth and the number of channels need to be specified, e.g.:
IplImage* color = cvCreate(

cvSize(width,height),
IPL_DEPTH_8U, 3);
To deallocate, you can then call cvReleaseImage(&image).
When manipulating images, it is common to sequentially access all pixels of an image. To this end the
iplPutPixel and iplGetPixel
can be used. You just specified the pixel coordinates and an array containing the values, as follows:
unsigned char values[3]; // 3 is for color image

iplGetPixel(image, x, y, values);
But for a more efficient loop, it is possible to directly access the buffer containing the pixels. Caution must
however be taken, because the way this loop must be executed depends on the exact image format. This is
illustrated by the following process
function, where a 8-bit RGB image, with pixel-oriented data order is scanned.
int nl= image->height;

int nc= image->width * image->nChannels;
int step= image->widthStep; // because of alignment
// because imageData is a signed char*

unsigned char *data=
reinterpret_cast<unsigned char *>(image->imageData);
for (int i=0; i<nl; i++) {

for (int j=0; j<nc; j+= image->nChannels) {
// 3 channels per pixel
if (data[j+1] > data[j] && data[j+1] > data[j+2]) {
data[j]= 0xFF; // 255

data[j+1]= 0xFF;
data[j+2]= 0xFF;
}
}
data+= step; // next line

}
}
9 von 33 11.04.2008 11:43

The result is:
Although this is the most efficient way to scan an image, this process can be error prone. In order to simplify
this frequent task, an image Iterator can be introduced. The role of this iterator template is to take care of
the pointer manipulation involve in the processing of an image. The template is as follows:
template <class PEL>

class IplImageIterator {
int i, i0,j;
PEL* data;
PEL* pix;
int step;
int nl, nc;
int nch;
public:
/* constructor */
IplImageIterator(IplImage* image,
int x=0, int y=0, int dx= 0, int dy=0) :
i(x), j(y), i0(0) {
data= reinterpret_cast<PEL*>(image->imageData);
step= image->widthStep / sizeof(PEL);
nl= image->height;
if ((y+dy)>0 && (y+dy)<nl) nl= y+dy;
if (y<0) j=0;
data+= step*j;
nc= image->width ;
if ((x+dx)>0 && (x+dx)<nc) nc= x+dx;
nc*= image->nChannels;
if (x>0) i0= x*image->nChannels;
i= i0;
nch= image->nChannels;
pix= new PEL[nch];}
/* has next ? */
bool operator!() const { return j < nl; }
10 von 33 11.04.2008 11:43

/* next pixel */
IplImageIterator& operator++() {i++;
if (i >= nc) { i=i0; j++; data+= step; }
return *this;}
IplImageIterator& operator+=(int s) {i+=s;
if (i >= nc) { i=i0; j++; data+= step; }
return *this;}
/* pixel access */
PEL& operator*() { return data[i]; }
const PEL operator*() const { return data[i]; }
const PEL neighbor(int dx, int dy) const
{ return *(data+dy*step+i+dx); }
PEL* operator&() const { return data+i; }
/* current pixel coordinates */

int column() const { return i/nch; }
int line() const { return j; }
};
An iterator of this type can be declared by specifying the type of the pixels in the image and by giving a
pointer to the IplImage as argument to the iterator constructor, e.g.:
IplImageIterator<unsigned char> it(image);
Once the iterator constructed, two operators can be used to iterate over an image. First the ! operator allows
to determine if we reach the end of the image and the * operator that give access to the current pixel. A
typical loop will therefore look like this:
while (!it) {
if (*it < 10) {
*it= 0xFF; // 255

}
++it;
}
Note that if the image contains more than one channel, each iteration will give access to one of the channel
of a pixel. This means that in the case of a color pixel, you have to iterate three times for each pixel. In order
to access all components of a pixel, the operator &
can be used. This one returns an array that contains the current pixel channel values. For example, the
previous example will look like this (note how the iterator is incremented this time to make sure that we go
from one pixel to another):
IplImageIterator<unsigned char> it(image);

unsigned char* pixel;
while (!it) {
pixel= ⁢
if (pixel[1]>pixel[0] && pixel[1]>pixel[2]) {
pixel[0]= 0xFF; // 255

pixel[1]= 0xFF; // 255
pixel[2]= 0xFF; // 255
11 von 33 11.04.2008 11:43

}
it+= 3;
}
}
The use of image iterators is as efficient as directly looping with pointers. This is true as long as you set the
compiler to optimize for speed, i.e.:
When the processing involves more than one image, more than one iterator can be used. This is illustrated in
the following example:

IplImage* tmp= cvCloneImage(image);
IplImageIterator<unsigned char>
src(tmp,1,1,tmp->width-2,tmp->height-2);
IplImageIterator<unsigned char>
res(image,1,1,image->width-2,image->height-2);
while (!src) {
*res= abs(*src - src.neighbor(-1,-1) +

src.neighbor(-1,0) –
src.neighbor(0,-1));
++src;
++res;
}
cvReleaseImage(&tmp);
}
Here the clone of the source image is used as input while the source image is modified inside the loop. Two
iterators are therefore defined. Since the processing also involves the neighboring pixels, the neighbor
method defined by the iterator is used. Also, in this case, a window is specified when creating the iterator
(here it defines a 1-pixel strip around the image where no processing is undertaken). The resulting image is:
12 von 33 11.04.2008 11:43

Check point #1b: source code of the above example.
5. Displaying an image sequence
In order to process image sequences (from files or from a camera), you have to use DirectShow. The
DirectShow architecture that is part of Microsoft DirectX relies on a filter architecture. There are three types
of filters: source filters that output video and/or audio signals, transform filters that process an input signal
and produce one (or several) output and finally rendering filters that display or save a media signal. The
processing of a sequence is therefore done using a series of filters connected together; the output of one filter
becoming the input of the next one (you can also have filters with multiple outputs). The first filter is usually
a decompressor that reads a file stream and the last filter could be a renderer that displays the sequence in a
window. In the DirectShow terminology, a series of filters is called a filter graph.
We will first try to process an AVI sequence. Let’s first see if DirectX is working fine. To do so, just use the
GraphEdit
application. This a very useful application included in the DirectX SDK that makes easy the building of filter
graphs. It can be started from the Start|Programs|Microsoft DirectX 8.1 SDK|DirectX
Utilities menu. The GraphEdit application window will pop up.
Our objective is now to visualize the building blocks required to obtain an AVI renderer.
Select Graph|Insert Filters… A window will display the list of available filters.
Choose the DirectShow Filters tree and select the File Source(Async.) filter.
13 von 33 11.04.2008 11:43

You will be asked to select an AVI file. The filter will appear in the GraphEdit window in the form of a
box. Right-click on the output pin and select the Render Pin option. This is an intelligent option that will
determine what filters are required to render the selected source file and will automatically assemble them
together as shown here:
For an AVI sequence, the video renderer should be composed of 3 filters. The first one is the splitter that
separates the video and audio components; this filter normally has two outputs (video and audio) but note
that in the case of the selected sequence, no audio component was available. The second one is the
appropriate decompressor that decodes the video sequence. Finally, the third filter is the renderer itself that
creates the window and that displays the frame sequence in it. Just push the play button to execute the
graph and the selected AVI sequence should be displayed in a window.
We can build the same filter graph using Visual C++. You first need to include the following include path in
your project settings:
C:\DXSDK\samples\Multimedia\DirectShow\BaseClasses
14 von 33 11.04.2008 11:43

And the following library path:
C:\DXSDK\lib
Finally add the following library:
STRMBASE.LIB
DirectX is implemented using the Microsoft COM technology. This means that when you want to do
something, you do it by using a given COM interface. In order to initialize the COM layer, you must call:
CoInitialize(NULL);
And similarly, when you are done with COM, you need to uninitialize it:
CoUninitialize();
A COM interface is an abstract class containing pure virtual functions (forming together the interface). Using
a COM interface is the only way to communicate with a COM object. They are obtained by calling the
appropriate API function. These functions return a value of type HRESULT representing an error code. The
simplest way to verify whether a COM call failed or succeeded is to check the return value using the
FAILED macro. All COM interface derives from the IUnknown interface.
A very important rule when you use an interface is to never forget to release it after you have finished to use
it otherwise it will result resource leaks. This is done by calling the Release method of the IUnknown
interface which decrements the object's reference count by 1; when the count reaches 0, the object is
deallocated. The safest way to call the Realease method is to use the macro SAFE_RELEASE that can be
found in dxutil.h located in C:\DXSDK\samples\Multimedia\Common\include
This macro is simply defined as:
#define SAFE_RELEASE(p) { if(p){(p)->Release();(p)=NULL;}}
To use a component of DirectX, you must first call its top-level interface. These are identified by a CLSID
identifier and each interface is identified by an IID. For example, to create a DirectShow filter graph (use to
build a series of filters) you call:
IGraphBuilder *pGraph;
CoCreateInstance(CLSID_FilterGraph, // object identifier
NULL, CLSCTX_INPROC,
IID_IGraphBuilder, // interface identifier
(void **)&pGraph); // pointer to the
// top-level interface
To request the other interfaces of this object, you use QueryInterface method. For example:
pGraph->QueryInterface(
IID_IMediaControl, // interface identifier
void **)&pMediaControl); // pointer to the interface
Once the filter graph is created, it becomes easy to create all the filters required to render an AVI file. This is
done by calling
pGraph->RenderFile(MediaFile, NULL);
This call does what the Render Pin option do in the GraphEdit application. To play the video, two
more interfaces are required the IMediaControl that is used to start the playback and the
IMediaEvent used to catch when the stream rendering has completed. Here is the complete class:
15 von 33 11.04.2008 11:43

class SequenceProcessor {

IGraphBuilder *pGraph;
IMediaControl *pMediaControl;
IMediaEvent *pEvent;
public:
SequenceProcessor(CString filename, bool display=true) {
CoInitialize(NULL);
pGraph= 0;
// Create the filter graph

if (!FAILED(
CoCreateInstance(CLSID_FilterGraph,
NULL, CLSCTX_INPROC,
IID_IGraphBuilder,
(void **)&pGraph))) {
// The two control interfaces

pGraph->QueryInterface(IID_IMediaControl,
(void **)&pMediaControl);
pGraph->QueryInterface(IID_IMediaEvent,
(void **)&pEvent);
// Convert Cstring into WCHAR*

WCHAR *MediaFile=
new WCHAR[filename.GetLength()+1];
MultiByteToWideChar(CP_ACP, 0,
filename, -1, MediaFile,
filename.GetLength()+1);
// Create the filters

pGraph->RenderFile(MediaFile, NULL);
if (display) {
// Execute the filter

pMediaControl->Run();
// Wait for completion.

long evCode;
pEvent->WaitForCompletion(INFINITE, &evCode);
}
}
}
~SequenceProcessor() {
// Do not forget to release after use

SAFE_RELEASE(pMediaControl);
SAFE_RELEASE(pEvent);
SAFE_RELEASE(pGraph);
CoUninitialize();
}
16 von 33 11.04.2008 11:43

};
When an AVI file is selected, a rendering filter is created and the sequence is displayed. To have an idea of
what filters have been created, we can enumerate them by adding the following member function to our
class:
std::vector<CString> enumFilters() {
IEnumFilters *pEnum = NULL;

IBaseFilter *pFilter;
ULONG cFetched;
std::vector<CString> names;
pGraph->EnumFilters(&pEnum);
while(pEnum->Next(1, &pFilter, &cFetched) == S_OK)

{
FILTER_INFO FilterInfo;
char szName[256];
CString fname;
pFilter->QueryFilterInfo(&FilterInfo);
WideCharToMultiByte(CP_ACP, 0, FilterInfo.achName,
-1, szName, 256, 0, 0);
fname= szName;
names.push_back(fname);
SAFE_RELEASE(FilterInfo.pGraph);
SAFE_RELEASE(pFilter);
}
SAFE_RELEASE(pEnum);
return names;
}
This method simply creates a vector of strings (you have to include <vector>) containing the names of the
filters associated with the generated filter graph. This name is obtained by reading the FILTER_INFO
structure. The enumeration is obtained by calling the method EnumFilter of the FilterGraph
instance. Note how all interfaces are released, including the one indirectly obtained through FILTER_INFO
that also contains a pointer to the associated filter graph.
To display the filter names, we add a CListBox

to the dialog. Do not forget to add a control member variable to this list. This can be done using the Class
Wizard of the View menu. Select the Member Variables tab and then select the control ID that
corresponds to the ClistBox (the name should be IDC_LIST1). Click on Add Variable… button, call
the variable m_list ; its Category must be Control. The m_list variable is now available as a
member variable of the dialog class. The filter names are added to this list by changing the OnOpen method
as follows:
{
"image files (*.bmp; *.jpg) |
*.bmp;*.jpg|AVI files (*.avi) |
*.avi|All Files (*.*)|*.*||",NULL);
17 von 33 11.04.2008 11:43



CString ext= dlg.GetFileExt();
if (proc != 0)
delete proc;
if (procseq != 0)
delete procseq;
if (ext.Compare("avi")) {
proc= new ImageProcessor(path);
} else {
procseq= new SequenceProcessor(path);
// Obtaining the list of filters

std::vector<CString> names= procseq->enumFilters();
m_list.ResetContent();
for (int i=0; i<names.size(); i++)
m_list.AddString(names[i]);
}
}
}
and now if you open an AVI file, you can see the filter list:
6. Building a filter graph
The next step is to try to build the same filter graph ourselves without using the RenderFile method.
Instead, we will create each filter and connect them together. This way we will able to modify the graph by
adding our own filters and thus performing the processing we want. Filters are connected together using their
pins; an output pin of a filter is connected to the input pin of the next filter. To obtain the pin of a filter, you
have to use the EnumPins
method. You then iterate through all the pins until you find the required one (either output or input). This is
what the following function does:
IPin *GetPin(IBaseFilter *pFilter, PIN_DIRECTION PinDir)
18 von 33 11.04.2008 11:43

{
BOOL bFound = FALSE;
IEnumPins *pEnum;
IPin *pPin;
pFilter->EnumPins(&pEnum);
while(pEnum->Next(1, &pPin, 0) == S_OK)
{
PIN_DIRECTION PinDirThis;
pPin->QueryDirection(&PinDirThis);
if (bFound = (PinDir == PinDirThis))
break;
pPin->Release();
}
pEnum->Release();
return (bFound ? pPin : 0);
}
The PIN_DIRECTION can be PINDIR_OUTPUT or PINDIR_INPUT. For example, to obtain a source

filter and its output pin ready to be connected, we can do:
IBaseFilter* pSource= NULL;

// Add a source filter to the current graph
pGraph->AddSourceFilter(mediaFile,0,&g_pSource);
// Obtain the output pin
IPin* pSourceOut= GetPin(pSource, PINDIR_OUTPUT);
To add a filter (it must first be created) to the filter graph, we use the AddFilter method:
// Add the pFilter to the current graph

pGraph->AddFilter( pFilter, L"Name of the Filter");
The second argument is a name for the filter that must identifies it uniquely in the filter graph (if you set it to
NULL, the graph manager will generate one for you). To connect to pins together, we simply use the
Connect method
// Connect pIn to pOut

pGraph->Connect(pOut, pIn);
What filters do we need to display an AVI sequence? We know the answer from the results displayed in the
filter list box or in the GraphEdit application:
1. a Source filter that reads the file

2. an AVI splitter that reads the stream and split it into a video and an audio channel (we ignore the latter
here).
3. an AVI video decompressor that decodes the video stream
4. a Video renderer that plays the video sequence in a window.
Note that for some filter, the pins are created dynamically. This is the case of the AVI splitter that will create
the required output pins (video and/or audio) only when the source is connected to its input. This makes
sense since the format of the output of this kind of filter is known only when the type of its input is known. It
must also be obvious that, to be connected together, the respective output and input pins of two filters must
be of compatible types. The properties of a given pin (such as major type and subtype) can be obtained as
follows:
AM_MEDIA_TYPE amt;
pPin->ConnectionMediaType(&amt);
19 von 33 11.04.2008 11:43

The following member function will now create the complete filter graph. The procedure is simple: we first
create the filter using CoCreateInstance (finding the right CLSID identifier is the key to obtain the
filter we want), add it to the filter graph, obtain its input pin and connect if to the output pin of the previous
filter.
bool createFilterGraph(CString filename) {
WCHAR *mediaFile= new WCHAR[filename.GetLength()+1];

MultiByteToWideChar(CP_ACP, 0, filename, -1,
mediaFile, filename.GetLength()+1);
// Create a source filter specified by filename

IbaseFilter* pSource= NULL;
if(FAILED(pGraph->AddSourceFilter(mediaFile,0,&pSource)))
{
::MessageBox( NULL, "Unable to create source filter",
"Error", MB_OK | MB_ICONINFORMATION );
return 0;
}
IPin* pSourceOut= GetPin(pSource, PINDIR_OUTPUT);

if (!pSourceOut) {
::MessageBox( NULL, "Unable to obtain source pin",

return 0;
}
// Create an AVI splitter filter

IBaseFilter* pAVISplitter = NULL;
if(FAILED(CoCreateInstance(CLSID_AviSplitter, NULL,
CLSCTX_INPROC_SERVER,
IID_IBaseFilter,
(void**)&pAVISplitter)) || !pAVISplitter)
{
::MessageBox( NULL, "Unable to create AVI splitter",
return 0;
}
IPin* pAVIsIn= GetPin(pAVISplitter, PINDIR_INPUT);

if (!pAVIsIn) {
::MessageBox( NULL,
"Unable to obtain input splitter pin", "Error",
MB_OK | MB_ICONINFORMATION );
return 0;
}
// Connect the source and the splitter

if(FAILED(pGraph->AddFilter( pAVISplitter, L"Splitter"))
|| FAILED(pGraph->Connect(pSourceOut, pAVIsIn)) )
{
::MessageBox( NULL,
"Unable to connect AVI splitter filter", "Error",
return 0;
}
// Create an AVI decoder filter
20 von 33 11.04.2008 11:43

IBaseFilter* pAVIDec = NULL;

if(FAILED(CoCreateInstance(CLSID_AVIDec, NULL,
IID_IBaseFilter,
(void**)&pAVIDec)) || !pAVIDec)
{
::MessageBox( NULL, "Unable to create AVI decoder",
"Error", MB_OK | MB_ICONINFORMATION);
return 0;
}
IPin* pAVIsOut= GetPin(pAVISplitter, PINDIR_OUTPUT);

if (!pAVIsOut) {
::MessageBox( NULL,
"Unable to obtain output splitter pin", "Error",
return 0;
}
IPin* pAVIDecIn= GetPin(pAVIDec, PINDIR_INPUT);

if (!pAVIDecIn) {
::MessageBox( NULL,
"Unable to obtain decoder input pin", "Error",
return 0;
}
if(FAILED(pGraph->AddFilter( pAVIDec, L"Decoder")) ||

FAILED(pGraph->Connect(pAVIsOut, pAVIDecIn)) )
{
::MessageBox( NULL,
"Unable to connect AVI decoder filter",
return 0;
}
IPin* pAVIDecOut= GetPin(pAVIDec, PINDIR_OUTPUT);

if (!pAVIDecOut) {
::MessageBox( NULL,
"Unable to obtain decoder output pin",
return 0;
}
// Render from the decoder

if(FAILED(pGraph->Render( pAVIDecOut )))
{
::MessageBox( NULL, "Unable to connect to renderer",
return 0;
}
SAFE_RELEASE(pAVIDecIn);
SAFE_RELEASE(pAVIDecOut);
SAFE_RELEASE(pAVIDec);
SAFE_RELEASE(pAVIsOut);
SAFE_RELEASE(pAVIsIn);
21 von 33 11.04.2008 11:43

SAFE_RELEASE(pAVISplitter);
SAFE_RELEASE(pSourceOut);
SAFE_RELEASE(pSource);
return 1;
}
By executing this manually built filter, the result is the same as previously.
7. Processing an image sequence
It is now time to process an image sequence. What we want to do is to sequentially process each frame of an
AVI sequence. To do so, the OpenCV library offers a special filter called ProxyTrans. It should be
located in
C:\Program Files\Intel\opencv\bin. To be used, it must first be registered. This can be done
from the MS-Dos window using the regsvr32 application (you just type regsvr32 ProxyTrans.ax,
you might have to include
C:\Program Files\Intel\opencv\bin in your path environment variable).
To check if the ProxyTrans filter is ready to be used, we use again the GraphEdit application. Build a
rendering filter graph and then delete the connection between the Decompression filter and the Video
Renderer (just click on the arrow and push on Delete button). Now select Select Graph|Insert
Filters…, the ProxyTrans
should be in the list of DirectShow filter. Insert it and connect its input pin to the decompressor and its output
pin to the renderer. The sequence should appear again when you play the filter graph.
22 von 33 11.04.2008 11:43

We will see latter how the ProxyTrans

filter can be used to process the sequence. But since we want to transform the original sequence through
some process, it might be useful to be able to save the processed sequence. Let’s make some test using again
the GraphEdit
application. Delete the Video Renderer filter; we will replace it by a chain that will compress back the
sequence and save it to a file. We therefore need a Video Compressor, an AVI multiplexor and a File Writer.
You can easily find all these filters in the list of available filters when you click on the Insert Filter
button. Note that when you select the File Writer filter, you will be ask to specify a name for the output file.
The resulting graph should be as follows:
23 von 33 11.04.2008 11:43

Obviously, if you play this graph, the resulting file will be the same as the original because our
ProxyTrans
filter that is supposed to do the processing does not do anything for now. However, the size of the output
sequence might be different from the size of the original sequence, this is because of the compressor used in
the graph that might use different parameters to compress the sequence. You probably also noted that when
you play the graph, no sequence is displayed, simply because we removed the Renderer. It is quite easy to
add an extra path to the filter in order to allow the simultaneous display and saving of the sequence. The
Smart Tee is the filter you need. Add it and create the following graph:
Note that the Smart Tee

filter has two output pins. The capture pin controls the sequence flow; the preview pin will receive frames
only if extra computational resources are available. When processing a sequence, you could also use two
Smart Tee
filters, one to display the original sequence, the other to display the processed one; that is what we will do
now when building manually our filter graph.
24 von 33 11.04.2008 11:43

As you can see in the figure above, the creation of a video processing filter graph requires connecting several
filters together. Many lines have to be added to our createFilterGraph method; the probability of
making an error becomes then quite high. However, a closer look at this method reveals that the same
sequence is repeated several times, suggesting that some generic function could be introduced to help the
programmer.
Following this idea, we can write an addFilter

utility function. This one will be called each time a new filter need to be created and connected to some filter
of a graph. This function has the following signature:
bool addFilter(REFCLSID filterCLSID,

WCHAR* filtername,
IGraphBuilder *pGraph,
IPin **outputPin,
int numberOfOutput);
The first parameter is the CLSID

identifier that specifies which filter will be created. The second parameter is the name that will be given to
this filter in the current graph. The third parameter is a pointer to the filter graph. The outputPin
parameter is both an input and an output parameter. As an input, it contains a pointer to the output pin to
which the filter to be created must be connected. When the function returns, this parameter will contain a
pointer to the output pin(s) of the filter thus created; the number of output pins that needs to be created is
given by the last parameter of this function. The function returns true if the filter has been successfully
created and connected to the filter graph.
This function can be written in a straightforward manner. First the filter is created using
CoCreateInstance, then the input pin is obtained and is connected to the specified output pin. Once this
done, the last step consists in obtaining the required number of output pins. The function is then as follows:
bool addFilter(REFCLSID filterCLSID,

WCHAR* filtername,
IGraphBuilder *pGraph,
IPin **outputPin,
int numberOfOutput) {
// Create the filter.

IBaseFilter* baseFilter = NULL;
char tmp[100];
if(FAILED(CoCreateInstance(
filterCLSID, NULL, CLSCTX_INPROC_SERVER,
IID_IBaseFilter,
(void**)&baseFilter)) ||!baseFilter)
{
sprintf(tmp,"Unable to create %ls filter", filtername);
::MessageBox( NULL, tmp, "Error",
MB_OK|MB_ICONINFORMATION );
return 0;
}
// Obtain the input pin.
IPin* inputPin= GetPin(baseFilter, PINDIR_INPUT);

if (!inputPin) {
sprintf(tmp,
"Unable to obtain %ls input pin", filtername);
25 von 33 11.04.2008 11:43


return 0;
}
// Connect the filter to the ouput pin.
if(FAILED(pGraph->AddFilter( baseFilter, filtername)) ||

FAILED(pGraph->Connect(*outputPin, inputPin)) )
{
sprintf(tmp,
"Unable to connect %ls filter", filtername);
return 0;
}
SAFE_RELEASE(inputPin);
SAFE_RELEASE(*outputPin);
// Obtain the output pin(s).
for (int i=0; i<numberOfOutput; i++) {
outputPin[i]= 0;
outputPin[i]= GetPin(baseFilter, PINDIR_OUTPUT, i+1);
if (!outputPin[i]) {
sprintf(tmp,
"Unable to obtain %s output pin (%d)",
filtername, i);
return 0;
}
}
SAFE_RELEASE(baseFilter);
return 1;
}
Using this function, it becomes easy to create a complex filter graph. The one we will build now will include
the ProxyTrans filter (note that the header file initguid.h must be included to be able to use this
filter). To be useful, this filter must do something. In fact, the objective of this filter is to give access to the
programmer to each frame of the sequence that can thus be processed. This is realized through a callback
function that is automatically called for each frame of the sequence. This callback function passes in
argument a pointer to the current image, the user is then free to analyze and modify this image. Here is an
example of a valid callback function that can be used with the ProxyTrans filter.
cvErode( image, image, 0, 2 );

}
26 von 33 11.04.2008 11:43

In order to have this function called, it must be registered to the ProxyTrans filter. This is simply done by
calling this method of the IProxyTransform interface.
pProxyTrans->set_transform(process, 0);
Here is now the function that creates the filter graph that processes an input sequence and save the result in a
file. Two preview windows are displayed, one for the original sequence, the other one for the out sequence.
bool createFilterGraph() {
IPin* pSourceOut[2];
pSourceOut[0]= pSourceOut[1]= NULL;
// Video source
addSource(ifilename, pGraph, pSourceOut);
// Add the decoding filters

addFilter(CLSID_AviSplitter, L"Splitter",
pGraph, pSourceOut);
addFilter(CLSID_AVIDec, L"Decoder", pGraph, pSourceOut);
// Insert the first Smart Tee

addFilter(CLSID_SmartTee, L"SmartTee(1)",
pGraph, pSourceOut,2);
// Add the ProxyTrans filter

addFilter(CLSID_ProxyTransform, L"ProxyTrans",
// Set the ProxyTrans callback

IBaseFilter* pProxyFilter = NULL;
IProxyTransform* pProxyTrans = NULL;
pGraph->FindFilterByName(L"ProxyTrans",&pProxyFilter);
pProxyFilter->QueryInterface(IID_IProxyTransform,
(void**)&pProxyTrans);
pProxyTrans->set_transform(process, 0);
SAFE_RELEASE(pProxyTrans);
SAFE_RELEASE(pProxyFilter);
// Render the original (decoded) sequence

// using 2nd SmartTee(1) output pin
addRenderer(L"Renderer(1)", pGraph, pSourceOut+1);
// Insert the second Smart Tee

addFilter(CLSID_SmartTee, L"SmartTee(2)",
pGraph, pSourceOut,2);
// Encode the processed sequence

addFilter(CLSID_AviDest, L"AVImux", pGraph, pSourceOut);
addFileWriter(ofilename, pGraph, pSourceOut);
// Render the transformed sequence

// using 2nd SmartTee(2) output pin
return 1;
}
27 von 33 11.04.2008 11:43

You will note that the output file produced by this program is quite big. This is simply because we are not
using any compressor when the sequence is saved. This is because such filter can only be obtained through
enumeration. This is discussed in the next section.
8. Enumerating filters and devices
Filters are registered COM objects made available by the operating system to your applications. Depending
on the software applications installed on your machine, different sets of filters might be available. These files
are classified by category. There is, for example, a category identified by
CLSID_VideoCompressorCategory
and that includes all the compression filters available. When you wish to use a filter of a given category, you
must enumerate the filters available and select one of these.
To enumerate the filters, the first step consists in the creation of a system device enumerator:
ICreateDevEnum *pSysDevEnum;
CoCreateInstance(CLSID_SystemDeviceEnum, NULL,
IID_ICreateDevEnum,
(void **)&pSysDevEnum);
Then an enumerator for a given category is obtained as follows:
IEnumMoniker *pEnumCat = NULL;

pSysDevEnum->CreateClassEnumerator(
CLSIDcategory, &pEnumCat, 0);
A simple loop is then required to obtain each filter of the category:
IMoniker *pMoniker;
ULONG cFetched;
while(pEnumCat->Next(
1, // number of elements requested
&pMoniker, // pointer to the moniker
&cFetched) // number of elements returned
== S_OK)
These ones are identified by the IMoniker

interface, an interface used to uniquely identify a COM object. A moniker is similar to a path in a file system
and it can be used to obtain information about a given filter:
IPropertyBag *pPropBag;
28 von 33 11.04.2008 11:43

pMoniker->BindToStorage(0, 0, IID_IPropertyBag,
(void **)&pPropBag);
Properties of a filter are obtained using the IPropertyBag interface. This generic interface is used to read
and write properties using text. Moniker can also be used to create a filter:
IBaseFilter* baseFilter;
pMoniker->BindToObject(NULL, NULL,
IID_IBaseFilter, (void**)&baseFilter);
This later approach must be used to create an enumerated filter instead of using the CoCreateInstance
function. The function presented below can be used to obtain the available filters of a category. It returns the
friendly names and the CLSID identifier of each filter. Either can be used after to create a given filter.
void enumFilters(REFCLSID CLSIDcategory,

std::vector<CString>& names,
std::vector<CLSID>& clsidFilters) {
// Create the System Device Enumerator.

HRESULT hr;
ICreateDevEnum *pSysDevEnum = NULL;
hr = CoCreateInstance(CLSID_SystemDeviceEnum, NULL,
IID_ICreateDevEnum,
(void **)&pSysDevEnum);
// Obtain a class enumerator for the specified category.

IEnumMoniker *pEnumCat = NULL;
hr = pSysDevEnum->CreateClassEnumerator(CLSIDcategory,
&pEnumCat, 0);
if (hr == S_OK) {
// Enumerate the monikers.

IMoniker *pMoniker;
ULONG cFetched;
while(pEnumCat->Next(1, &pMoniker, &cFetched) == S_OK)
{
IPropertyBag *pPropBag;
pMoniker->BindToStorage(0, 0, IID_IPropertyBag,
(void **)&pPropBag);
// To retrieve the friendly name of the filter

VARIANT varName;
VariantInit(&varName);
hr = pPropBag->Read(L"FriendlyName", &varName, 0);
if (SUCCEEDED(hr))
{
CString str(varName.bstrVal);
names.push_back(str);
SysFreeString(varName.bstrVal);
}
VariantClear(&varName);
VARIANT varFilterClsid;
varFilterClsid.vt = VT_BSTR;
// Read CLSID string from property bag

hr = pPropBag->Read(L"CLSID", &varFilterClsid, 0);
29 von 33 11.04.2008 11:43

if(SUCCEEDED(hr))
{
CLSID clsidFilter;
// Save filter CLSID

if(CLSIDFromString(varFilterClsid.bstrVal,
&clsidFilter) == S_OK)
{
clsidFilters.push_back(clsidFilter);
}
SysFreeString(varFilterClsid.bstrVal);
}
// Clean up.
pPropBag->Release();
pMoniker->Release();
}
pEnumCat->Release();
}
pSysDevEnum->Release();
}
This function is used to select a compression filter to be used in our sequence processing application. The list
of compression filters is displayed in a list box (obtained after the output sequence is selected:
void CCvisionDlg::OnSave()
{
// Select output file
// Obtain and display compressors
std::vector<CString> fname;
std::vector<CLSID> fclsid;
enumFilters(CLSID_VideoCompressorCategory,
fname, fclsid);
m_list.ResetContent();
for (int i=0; i<fname.size(); i++)
m_list.AddString(fname[i]);
}
The compression filter is selected by clicking on the corresponding item before pushing the process button.
The sequence will then be saved, compressed according to the default control parameters of the chosen
compressor. What if you are not satisfied with the resulting compression rate? You can obviously try to
select another compression filter; however, it is also possible to use different control parameter values for the
chosen filter. This can be done through a special interface called IAMVideoCompression. This interface
is normally supported by the output pin of a compression filter. You can obtain the interface by calling the
QueryInterface method of the pin:
IAMVideoCompression *pCompress;
pPin->QueryInterface(IID_IAMVideoCompression,
(void**)&pCompress);
30 von 33 11.04.2008 11:43

Once obtained, the interface can be used to set the compression properties, namely: the key frame rate (a
long integer), the number of predicted frames per key frame (also a long integer), and the relative
compression quality (a double expressing a percentage between 0.0 and 1.0). It is then easy to set these
values using the appropriate methods.
long keyFrames, pFrames;

double quality;
hr = pCompress->put_KeyFrameRate(keyFrames);
hr = pCompress->put_PFramesPerKeyFrame(pFrames);
hr = pCompress->put_Quality(quality);
The same strategy can be used to select a video capture device (e.g. a USB camera). The only difference is
that these devices obviously do not have input pins. However, they normally have two output pins (one for
capture and one for preview). The basic steps to build a camera-based video processing filter graph are as
follows. First add the video capture device through enumeration:
CString cameraName= ?;
IPin* pSourceOut[2];
pSourceOut[0]= pSourceOut[1]= NULL;
addFilterByEnum(CLSID_VideoInputDeviceCategory,
cameraName,pGraph,pSourceOut,2);
Second, add the ProxyTrans filter:
addFilter(CLSID_ProxyTransform, L"ProxyTrans",
Then you should add a renderer to the preview pin:
And finally, you add the required filters to save the resulting sequence to a file:
addFilter(CLSID_AviDest, L"AVImux", pGraph, pSourceOut);

addFileWriter(ofilename, pGraph, pSourceOut);
The complete application that includes the camera selection is given here.
31 von 33 11.04.2008 11:43

Now to be able to change the camera settings (such as resolution or frame rate), you must access the facilities
offered by the driver of the camera. The easiest way to do it is to use the old VideoForWindows technology
(an ancestor of DirectShow). If the camera you use has a driver compatible with this technology, then it is
possible to obtain dialog boxes to control the camera settings. This is done through the
IAMVfwCaptureDialogs
interface of the camera filter. The first thing to do is then to check if the camera supports this filter and if yes,
to check what dialogs are available. The three standard dialogs are designated by an enumerated type:
VfwCaptureDialog_Source, VfwCaptureDialog_Format,
VfwCaptureDialog_Display. The procedure to obtain one of these dialogs is quite straightforward:
IAMVfwCaptureDialogs *pVfw = 0;
// pCap is a pointer to the camera base filter

if (SUCCEEDED(pCap->QueryInterface(
IID_IAMVfwCaptureDialogs, (void**)&pVfw))) {
// Check if the device supports this dialog box.

if (S_OK == pVfw->HasDialog(VfwCaptureDialog_Format)){
// Show the dialog box.

pVfw->ShowDialog(VfwCaptureDialog_Format,
hwndParent); // parent window
}
}
A dialog like the following should then appear:
Since March 28th 2003, this page has been visited times according to StatCounter.com
32 von 33 11.04.2008 11:43

Since March 28th 2003, this page has been visited times according to www.digits.com
33 von 33 11.04.2008 11:43

Tutorial Open CVand Direct Show

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Tutorial Open CVand Direct Show

Caricato da

Copyright:

Formati disponibili

OpenCV Tutorial by R. Laganiere http://www.site.uottawa.

Programming computer vision applications:

A step-by-step guide to the use of the Intel OpenCV library

Robert Laganière, VIVA lab, University of Ottawa.

The Intel Image Processing Library can be found at:

Your inputs are welcome.

1. Creating a Dialog-based application

1 von 33 11.04.2008 11:43

char title[]= {"Open Image"};

CString path= dlg.GetPathName(); // contain the

2. Loading and displaying an image

2 von 33 11.04.2008 11:43

ipl.lib cv.lib highgui.lib

3 von 33 11.04.2008 11:43

#if !defined IMAGEPROCESSOR

IplImage* img; // Declare IPL/OpenCV image pointer

ImageProcessor(CString filename, bool display=true) {

img = cvvLoadImage( filename ); // load image

// display the image on window

4 von 33 11.04.2008 11:43

char title[]= {"Open Image"};

CString path= dlg.GetPathName();

Then when you select an image, this window should appear:

#if !defined IMAGEPROCESSOR

5 von 33 11.04.2008 11:43

#include "cv.h" // include core library interface

IplImage* img; // Declare IPL/OpenCV image pointer

ImageProcessor(CString filename, bool display=true) {

img = cvvLoadImage( filename ); // load image

cvvNamedWindow( "Original Image", 1 );

cvvNamedWindow( "Resulting Image", 1 );

extern ImageProcessor *proc;

// the function that processes the image

IplImage* image = reinterpret_cast<IplImage*>(img);

6 von 33 11.04.2008 11:43

The member functions now become:

char title[]= {"Open Image"};

CString path= dlg.GetPathName();

proc= new ImageProcessor(path);

// process and display

7 von 33 11.04.2008 11:43

Check point #1: source code of the above example.

4. Creating an image and accessing its pixels

// Creating a gray level image

// Creating a color image

8 von 33 11.04.2008 11:43

IplImage* color = cvCreate(

To deallocate, you can then call cvReleaseImage(&image).

unsigned char values[3]; // 3 is for color image

void process(void* img) {

IplImage* image = reinterpret_cast<IplImage*>(img);

int nl= image->height;

// because imageData is a signed char*

for (int i=0; i<nl; i++) {

// 3 channels per pixel

if (data[j+1] > data[j] && data[j+1] > data[j+2]) {

data[j]= 0xFF; // 255

data+= step; // next line

9 von 33 11.04.2008 11:43

The result is:

template <class PEL>

10 von 33 11.04.2008 11:43

res= abs(src - src.neighbor(-1,-1) +

IPin GetPin(IBaseFilter pFilter, PIN_DIRECTION PinDir)