Sei sulla pagina 1di 45

1.

This is a preview of Chapter 18 An Introduction to Thread in the upcoming book From: Introduction to the C++ Boost Libraries Volume I Foundations. 18.1. Introduction and Objectives In this chapter we discuss how to create C++ code that make use of multi-core processors. In particular, we introduce the thread concept. A thread is a software entity and it represents an independent unit of execution in a program. We design an application by creating threads and letting them execute separate parts of the program code with the objective of improving the speedup of an application. We define the speedup as a number that indicates how many times faster a parallel program is than its serial equivalent. The formula for the speedup S(p) on a CPU with p processors is: S(p)= T(1)/T(p) where the factor T(p) is the total execution time on a CPU with p processors. When the speedup equals the number of processors we then say that the speedup is perfectly linear. The improved performance of parallel programs comes at a price and in this case we must ensure that the threads are synchronised so that they do not destroy the integrity of the shared data. To this end, Boost.Thread has a number of synchronisation mechanisms to protect the program from data races, and ensuring that the code is thread-safe. We also show how to define locks on objects and data so that only one thread can update the data at any given time. When working with Boost.Thread you should use the following header file:

Code (cpp):

1. #include <boost/thread.hpp>

and the following namespaces: Code (cpp):

2. using namespace boost; 3. using namespace boost::this_thread;

This chapter is a gentle introduction to multi-threading. We recommend that you also run the source code that accompanies the book to see how multithreaded code differs from sequential code. In Volume II we use Thread as the basis for the implementation of parallel design patterns (Mattson 2005). 18.2 An Introduction to Threads A process is a collection of resources to enable the execution of program instructions. Examples of resources are (virtual) memory, I/O descriptors, run-time stack and signal handlers. It is possible to create a program that consists of a collection of cooperating processes. What is the structure of a process? o o o o

A read-only area for program instructions. A read-write area for global data. A heap area for memory that we allocate dynamically using the new operator or the malloc system call. A stack where we store the automatic variables of the current procedure.

Processes have control over their resources. Processes can communicate with each other using IPC (Inter Process Communication) mechanisms and they can be seen as heavyweight units of execution. Context-switching between processes is expensive.

A thread is a lightweight unit of execution that shares an address space with other threads. A process has one or more threads. Threads share the resources of the in process. The execution context for a thread is the data address space that contains all variables in a program. This includes both global variables and automatic variables in routines as well as dynamically allocated variables. Furthermore, each thread has its own stack within the execution context. Multiple threads invoke their own routines without interfering with the stack frames of other threads. 18.3 The Life of a Thread Each process starts with one thread, the master or main thread. Before a new thread can be used, it must be created. The main thread can have one or more child threads. Each thread executes independently of the other threads. What is happening in a thread after it has been created and before it no longer exists? A general answer is that it is either executing or not executing. The latter state may have several causes: o o o

It is sleeping. It is waiting on some other thread. It is blocked, that is it is waiting on system resources to perform an input or output operation.

An application should make the best use of its threads because each thread may run on its own processor and the presence of idle threads is synonymous with resource waste. 18.3.1 How Threads Communicate A multi-threaded application consists of a collection of threads. Each thread is responsible for some particular task in the application. In order to avoid anarchy we need to address a number of important issues: o Synchronisation: ensuring that an event in one thread notifies another thread. This is called event synchronisation. This signals the occurrence of an event among multiple threads. Another type of synchronisation is mutual exclusion that gives a thread exclusive access to a shared variable or to some other resource for a certain amount of time. This ensures the integrity of the shared variable when multiple threads attempt to access and modify it. We place a lock on the resource and failure to do this may result in a race condition. This occurs when multiple threads share data and at least one of the threads accesses this data without using a defined synchronisation mechanism. Scheduling: we order the events in a program by imposing some kind of scheduling policy on them. In general, there are more concurrent tasks to be executed than there are processors to run them. The scheduler synchronises access to the different processors on a CPU. Thus the scheduler determines which threads are currently executing on the available processors.

18.4 What Kinds of Applications are suitable for Multi-Threading? The main reason for creating a multi-threaded application is performance and responsiveness. As such, threaded code does not add new functionality to a serial application. There should be compelling reasons for using parallel programming techniques. In this section we give an overview of a number of issues to address when developing parallel applications (see Mattson 2005 and Nichols 1996). First, we give a list of criteria that help us determine the categories of applications that could benefit from parallel processing. Second, having determined that a given application should be parallelised we discuss how to analyse and design the application with parallelism in mind. 18.4.1 Suitable Tasks for Multi-threading The ideal situation is when we can design an application that consists of a number of independent tasks in which each task is responsible for its own input, processing and output. In practice, however tasks are inter-dependent and we must take this into account. Concurrency is a property of software systems in which several computations are executing simultaneously and potentially interacting with each other. We maximise concurrency while we minimise the need for synchronisation. We identify a task that will be a candidate for threading based on the following criteria (Nichols 1996):

Its degree of independence from other tasks. Does the task need results or data from other tasks and do other tasks depend on its results? These questions determine the provide/require constraints between tasks. An analysis of these questions will lead us to questions concerning task dependencies and resource sharing. Does a task spend a long time in a suspended state and is it blocked in potentially long waits? Tasks that consume resources are candidates for threads. For example, if we dedicate a thread for I/O operations then our program will run faster instead of having to wait for slow I/O operations to complete. Compute-intensive routines. In many applications we may be able to dedicate threads to tasks with time-consuming calculations. Examples of such calculations are array processing, matrix manipulation and random number generation.

18.5 The Boost thread class This class represents a thread. It has member functions for creating threads, firing up threads, thread synchronisation and notification, and finally changing thread state. We discuss the functionality of the thread class in this chapter. There are three constructors in thread: o o o

Default constructor. Create a thread with an instance of a callable type (which can be a function object, a global or static function) as argument. This function is run when the thread fires up that is, after thread creation. Create a thread with a callable type and its bound arguments to the thread constructor.

We now discuss the second option. The callable type which plays the role of the thread function can be a free function, a static member function or a function callable object. The thread function has a voidreturn type and when it has finished the thread that called it will stop executing. We now discuss some code to show how to create a simple 101 multi-threaded program. There are two threads, namely the main thread (in the main() function) and a thread that we explicitly create in main(). The program is simple each one thread is prints some text on the console. The first case is when we create a thread whose thread function is a free (global) function: Code (cpp):

19. 20. 21. 22. 23. 24. 25. 26. 27. 28.

// Global function called by thread void GlobalFunction() { for (int i=0; i<10; ++i) { cout<< i << "Do something in parallel with main method." << endl; boost::this_thread::yield(); // 'yield' discussed in section 18.6 } }

We now create a thread with GlobalFuntion() as thread function and we fire the thread up: Code (cpp):

29. 30. 31. 32.

int main() { boost::thread t(&GlobalFunction);

33. 34. 35. 36. 37. 38. 39. 40.

for (int i=0; i<10; i++) { cout << i <<"Do something in main method."<<endl; } return 0; }

Each thread prints information on the console. There is no coordination between the threads and you will get different output each time you run the program. The output depends on the thread scheduler. You can run the program and view the output. We now discuss how to create a thread whose thread function is a static member function of a class and that is a functor at the same time. Code (cpp):

41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72.

class CallableClass { private: // Number of iterations int m_iterations; public: // Default constructor CallableClass() { m_iterations=10; } // Constructor with number of iterations CallableClass(int iterations) { m_iterations=iterations; } // Copy constructor CallableClass(const CallableClass& source) { m_iterations=source.m_iterations; } // Destructor ~CallableClass() { } // Assignment operator

73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100.

CallableClass& operator = (const CallableClass& source) { m_iterations=source.m_iterations; return *this; } // Static function called by thread static void StaticFunction() { for (int i=0; i < 10; i++) // Hard-coded upper limit { cout<<i<<"Do something in parallel (Static function)."<<endl; boost::this_thread::yield(); // 'yield' discussed in section 18.6 } } // Operator() called by the thread void operator () () { for (int i=0; i<m_iterations; i++) { cout<<i<<" - Do something in parallel (operator() )."<<endl; boost::this_thread::yield(); // 'yield' discussed in section 18.6 } } };

We can now create threads based on the static member function StaticFunction() and on the fact that CallableClass is a function object: Code (cpp):

101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112.

int main() { boost::thread t(&CallableClass::StaticFunction); for (int i=0; i<10; i++) { cout<<i<<" - Do something in main method."<<endl; } return 0; }

and

Code (cpp):

113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127.

int main() { // Using a callable object as thread function int numberIterations = 20; CallableClass c(numberIterations); boost::thread t(c); for (int i=0; i<10; i++) { cout<< i <<" Do something in main method." << endl; } return 0; }

Finally, when a threads destructor is called then the thread of execution becomes detached and it no longer has an associated boost::thread object. In other words, the member function Thread::detach() is called while the thread function continues running. 18.6 The Life of a Thread In general, a thread is either doing something (running its thread function) or is doing nothing (wait or sleep mode). The state transition diagram is shown in Figure 18.1. The scheduler is responsible for some of the transitions between states: o o o Running: the thread has been created and is already started or is ready to start (this is a runnable state). The scheduler has allocated processor time for the thread. WaitSleepJoin: the thread is waiting for an event to trigger. The thread will be placed in the Running state when this event triggers. Stopped: the thread function has run its course (has completed).

Figure 18.1 Thread Lifecycle

We now discuss some of the member functions that appear in Figure 18.1. First, we note that multi-tasking is not guaranteed to be preemptive and this can result in possible performance degradation because a thread can be involved in a computationally intensive algorithm. Preemption relates to the ability of the operating system to stop a running thread in favour of another thread. In order to give other threads a chance to run, a running thread may voluntarily give up or yield control. Control is returned as soon as possible. For example, we use the global function yield() in the boost::this_thread namespace. As an example, consider a callable object that computes the powers of numbers (this class could be adapted to compute powers of very large matrices which would constitute a computationally intensive algorithm): Code (cpp):

131. 132.

class PowerClass {

133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160.

private: // Version II: m will be a large matrix int m, n; // Variables for m^n public: double result; // Public data member for the result // Constructor with arguments PowerClass(int m, int n) { this->m=m; this->n=n; this->result=0.0; } // Calculate m^n. Supposes n>=0 void operator () () { result=m; // Start with m^1 for (int i=1; i<n; ++i) { result*=m; // result=result*m boost::this_thread::yield(); } if (n==0) result=1; // m^0 is always 1 } };

A thread can put itself to sleep for a certain duration (the units can be a POSIX time duration, hours, minutes, seconds, milliseconds or nanoseconds). We use the sleep option when we wish to give other threads a chance to run and for tasks that fire at regular intervals. The main difference is that with yield the thread gets the processor as soon as possible. We give an example to show how to put a thread to sleep. We simulate an animation application by creating a thread whose thread function displays some information, then sleeps only to be awoken again by the scheduler at a later stage. The thread function is modelled as AnimationClass: Code (cpp):

161. 162. 163. 164. 165. 166. 167. 168. 169. 170.

class AnimationClass { private: boost::thread* m_thread; // The thread runs this object int m_frame; // The current frame number // Variable that indicates to stop and the mutex to // synchronise "must stop" on (mutex explained later) bool m_mustStop; boost::mutex m_mustStopMutex;

171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219.

public: // Default constructor AnimationClass() { m_thread=NULL; m_mustStop=false; m_frame=0; } // Destructor ~AnimationClass() { if (m_thread!=NULL) delete m_thread; } // Start the threa void Start() { // Create thread and start it with myself as argument. // Pass myself as reference since I don't want a copy m_thread=new boost::thread(boost::ref(*this)); } // Stop the thread void Stop() { // Signal the thread to stop (thread-safe) m_mustStopMutex.lock(); m_mustStop=true; m_mustStopMutex.unlock(); // Wait for the thread to finish. if (m_thread!=NULL) m_thread->join(); } // Display next frame of the animation void DisplayNextFrame() { // Simulate next frame cout<<"Press <RETURN> to stop. Frame: "<<m_frame++<<endl; } // Thread function void operator () () { bool mustStop; do

220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235.

{ // Display the next animation frame DisplayNextFrame(); // Sleep for 40ms (25 frames/second). boost::this_thread::sleep(boost::posix_time::millisec(40)); // Get the "must stop" state (thread-safe) m_mustStopMutex.lock(); mustStop=m_mustStop; m_mustStopMutex.unlock(); } while (mustStop==false); } };

Note that the boost thread is created in the Start() function passing itself as thread function. This function object will loop until we call Stop(). In the current case the main() function will call this member function. The code corresponding to the second thread is: Code (cpp):

236. 237. 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252.

int main() { // Create and start the animation class AnimationClass ac; ac.Start(); // Wait for the user to press return getchar(); // Stop the animation class cout << "Animation stopping..." << endl; ac.Stop(); cout << "Animation stopped." << endl; return 0; }

We note the presence of the variable boost::mutex m_mustStopMutex and the call lock() and unlock() on that variable in the Stop() and operator () functions.We discuss mutexes in section 18.7. The next question is: how does a thread wait on another thread before proceeding? The answer is to use join() (wait for a thread to finish) or timed_join (wait for a thread to finish for a certain amount of time). The effect in both cases is to put the calling thread into WaitSleepJoin state. It is used when we need to wait on the result of a lengthy calculation. To give an example, we revisit the class PowerClass and we use it in main() as follows:

Code (cpp):

253. 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277.

int main() { int m=2; int n=200; // Create a m^n calculation object PowerClass pc(m, n); // Create thread and start m^n calculation in parallel // Since we read the result from pc, we must pass it as reference, // else the result will be placed in a copy of pc boost::thread t(boost::ref(pc)); // Do calculation while the PowerClass is calculating m^n double result=m*n; // Wait till the PowerClass is finished // Leave this out and the result will be bogus t.join(); // Display result. cout << "(" << m << "^" << n << ") / (" << m << "*" << n << ") = "<<pc.result/result<<endl; }

Here we see that the main thread does some calculations and it waits until the computationally intensive thread function in PowerClass has completed. 18.7 Basic Thread Synchronisation One of the attention points when writing multi-threaded code is to determine how to organise threads in such a way that access to shared data is done in a controlled manner. This is because the order in which threads access data is non-deterministic and this can lead to inconsistent results; called race conditions. A classic example is when two threads attempt to withdraw funds from an account at the same time. The steps in a sequential program to perform this transaction are:

278. 279. 280.

Check the balance (are there enough funds in the account?). Give the amount to withdraw. Commit the transaction and update the account.

When there are two threads involved then steps 1, 2 and 3 will be interleaved which means the threads can update data in a non-deterministic way. For example, the scenario in Figure 18.2 shows that after withdrawing 70 and 90 money units the balance is -60 money units which destroys the invariant condition. This states in this case that the balance may never become negative. Why did this transaction go wrong?
Thread 1 if (70>balance) Thread 2 balance 100

if (90>balance) 100 balance-=70 balance-=90 30 -60

Figure 18.2 Thread Synchronisation

The solution is to ensure that steps 1, 2 and 3 constitute an atomic transaction by which we mean that they are locked by a single thread at any one moment in time. Boost.Thread has a number of classes for thread synchronisation. The first class is called mutex (mutual exclusion) and it allows us to define a lock on a code block and release the lock when the thread has finished executing the code block. To do this, we create an Account class containing an embedded mutex: Code (cpp):

281. 282. 283. 284. 285. 286. 287. 288. 289. 290.

class Account { private: // The mutex to synchronise on boost::mutex m_mutex; // more... };

We now give the code for withdrawing funds from an account. Notice the thread-unsafe version (which can lead to race conditions) and the thread-safe version using mutex: Code (cpp):

291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305. 306. 307. 308.

// Withdraw an amount (not synchronized). Scary! void Withdraw(int amount) { if (m_balance-amount>=0) { // For testing we now give other threads a chance to run boost::this_thread::sleep(boost::posix_time::seconds(1)); m_balance-=amount; } else throw NoFundsException(); } // Withdraw an amount (locking using mutex object) void WithdrawSynchronized(int amount) { // Acquire lock on mutex. // If lock already locked, it waits till unlocked

309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 328.

m_mutex.lock(); if (m_balance-amount>=0) { // For testing we now give other threads a chance to run boost::this_thread::sleep(boost::posix_time::seconds(1)); m_balance-=amount; } else { // Release lock on mutex. Forget this and it will hang m_mutex.unlock(); throw NoFundsException(); } // Release lock on mutex. Forget this and it will hang m_mutex.unlock(); }

Only one thread has the lock at any time. If another thread tries to lock a mutex that is already locked it will enter the SleepWaitJoin state. Summarising, only one thread can hold a lock on a mutex and the code following the call to mutex.lock() can only be executed by one thread at a given time. A major disadvantage of using mutex is that the system will deadlock (hang) if you forget to call mutex.unlock(). For this reason we use the unique_lock<Lockable> adapter class that locks a mutex in its constructor and that unlocks a mutex in its destructor. The new version of the withdraw member function will be: Code (cpp):

329. 330. 331. 332. 333. 334. 335. 336. 337. 338. 339. 340. 341. 342. 343.

// Withdraw an amount (locking using unique_lock) void WithdrawSynchronized2(int amount) { // Acquire lock on mutex. Will be automatically unlocked // when lock is destroyed at the end of the function boost::unique_lock<boost::mutex> lock(m_mutex); if (m_balance-amount>=0) { // For testing we now give other threads a change to run boost::this_thread::sleep(boost::posix_time::seconds(1)); m_balance-=amount; } else throw NoFundsException(); } // Mutex automatically unlocked here

Note that it is not necessary to unlock the mutex in this case.

18.8 Thread Interruption A thread that is in the WaitSleepJoin state can be interrupted by another thread which results in the former thread transitioning into the Running state. To interrupt a thread we call the thread member function interrupt() and then an exception of type thread_interrupted is thrown. We note that interrupt() only works when the thread is in the WaitSleepJoin state. If the thread never enters this state, you should call boost::this_thread::interruption_point() to specify a point where the thread can be interrupted. The following function contains a defined interruption point: Code (cpp):

344. 345. 346. 347. 348. 349. 350. 351. 352. 353. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364. 365.

// The function that will be run by the thread void ThreadFunction() { // Never ending loop. Normally the thread will never finish while(true) { try { // Interrupt can only occur in wait/sleep or join operation. // If you don't do that, call interuption_point(). // Remove this line, and the thread will never be interrupted. boost::this_thread::interruption_point(); } catch(const boost::thread_interrupted&) { // Thread interruption request received, break the loop cout<<"- Thread interrupted. Exiting thread."<<endl; break; } } }

We now use this function in a test program; in this case we start a thread with ThreadFunction() as thread function. We let it run and then we interrupt it. Code (cpp):

366. 367. 368. 369. 370. 371. 372. 373. 374. 375. 376. 377.

int main() { // Create and start the thread boost::thread t(&ThreadFunction); // Wait 2 seconds for the thread to finish cout<<"Wait for 2 seconds for the thread to stop."<<endl; while (t.timed_join(boost::posix_time::seconds(2))==false) { // Interupt the thread cout<<"Thread not stopped, interrupt it now."<<endl; t.interrupt();

378. 379. 380. 381. 382. 383. 384. 385.

cout<<"Thread interrupt request sent. " cout<<"Wait to finish for 2 seconds again."<<endl; } // The thread has been stopped cout<<"Thread stopped"<<endl; }

18.9 Thread Notification In some cases a thread A needs to wait for another thread B to perform some activity. Boost.Thread provides an efficient way for thread notification: o o o

wait(): thread A releases the lock when wait() is called; A then sleeps until another thread B calls notify(). notify(): signals a change in an object related to thread B. Then one waiting thread (in this case A) wakes up after the lock has been released. notify_all(): this has the same intent as notify() except that all waiting threads wake up.

We shall see examples of this mechanism when we discuss synchronising queues and the ProducerConsumer pattern in section 11. 18.10 Thread Groups Boost.Thread contains the class thread_group that supports the creation and management of a group of threads as one entity. The threads in the group are related in some way. The functionality is: o o o o o o o

Create a new thread group with no threads. Delete all threads in the group. Create a new thread and add it to the group. Remove a thread from the group without deleting the thread. join_all(): call join() on each thread in the group. interrupt_all(): call interrupt() on each thread object in the group. size(): give the number of threads in the group.

We shall give an example of how to use this class when we discuss the Producer-Consumer pattern in which a producer group writes data (enqueue) to a synchronised queue while threads in a consumer group extract (dequeue) the data from this queue. 18.11 Shared Queue Pattern This pattern is a specialisation of the Shared Data Pattern (Mattson 2005). It is a thread-safe wrapper for the STL queue<T> container. It is a blocking queue because a thread wishing to dequeue the data will go into sleep mode if the queue is empty and it will only call this function when it receives a notify() from another thread. This notification implies that new data is in the queue. The lock is automatically released and waiting threads are notified using a condition variable. A condition variable provides a way of naming an event in which threads have a general interest. The interface is: Code (cpp):

396. 397. 398.

// Queue class that has thread synchronisation template <typename T> class SynchronisedQueue

399. 400. 401. 402. 403. 404. 405. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415. 416. 417. 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429. 430. 431. 432. 433. 434. 435. 436. 437. 438. 439.

{ private: std::queue<T> m_queue; // Use STL queue to store data boost::mutex m_mutex; // The mutex to synchronise on boost::condition_variable m_cond; // The condition to wait for public: // Add data to the queue and notify others void Enqueue(const T& data) { // Acquire lock on the queue boost::unique_lock<boost::mutex> lock(m_mutex); // Add the data to the queue m_queue.push(data); // Notify others that data is ready m_cond.notify_one(); } // Lock is automatically released here // Get data from the queue. Wait for data if not available T Dequeue() { // Acquire lock on the queue boost::unique_lock<boost::mutex> lock(m_mutex); // When there is no data, wait till someone fills it. // Lock is automatically released in the wait and obtained // again after the wait while (m_queue.size()==0) m_cond.wait(lock); // Retrieve the data from the queue T result=m_queue.front(); m_queue.pop(); return result; } // Lock is automatically released here };

We now use this class as a data container in the Producer-Consumer pattern. 18.12 The Producer-Consumer Pattern This pattern is useful in a variety of situations. There are many applications of this pattern (POSA 1996, Mattson 2005, GOF 1995). In general, one or more producer agents write information to a synchronised queue while one or more consumer agents extract information from the queue. It is possible to extend the pattern to support multiple queues. The Producer-Consumer Pattern is depicted in Figure 18.3.

Figure 18.3 Producer-Consumer Pattern

We create a producer class as follows: Code (cpp):

440. 441. 442. 443. 444. 445. 446. 447. 448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459. 460. 461. 462. 463. 464. 465. 466. 467. 468. 469. 470. 471. 472. 473.

// Class that produces objects and puts them in a queue class Producer { private: int m_id; // The id of the producer SynchronisedQueue<string>* m_queue; // The queue to use public: // Constructor with id and the queue to use Producer(int id, SynchronisedQueue<string>* queue) { m_id=id; m_queue=queue; } // The thread function fills the queue with data void operator () () { int data=0; while (true) { // Produce a string and store in the queue string str = "Producer: " + IntToString(m_id) + " data: " + IntToString(data++); m_queue->Enqueue(str); cout<<str<<endl; // Sleep one second boost::this_thread::sleep(boost::posix_time::seconds(1)); } } };

Similarly, the interface for the consumer class is given by: Code (cpp):

474. 475. 476. 477. 478. 479. 480. 481. 482. 483. 484. 485. 486. 487. 488. 489. 490. 491. 492. 493. 494. 495. 496. 497. 498. 499. 500. 501. 502. 503.

// Class that consumes objects from a queue class Consumer { private: int m_id; // The id of the consumer SynchronisedQueue<string>* m_queue; // The queue to use public: // Constructor with id and the queue to use. Consumer(int id, SynchronisedQueue<string>* queue) { m_id=id; m_queue=queue; } // The thread function reads data from the queue void operator () () { while (true) { // Get the data from the queue and print it cout<<"Consumer "<<IntToString(m_id).c_str() <<" consumed: ("<<m_queue->Dequeue().c_str(); // Make sure we can be interrupted boost::this_thread::interruption_point(); } } };

Finally, the following code creates thread groups for producers and consumers using the thread-group class: Code (cpp):

504. 505. 506. 507. 508. 509. 510. 511. 512. 513. 514. 515. 516.

#include "Producer.hpp" #include "Consumer.hpp" using namespace std; int main() { // Display the number of processors/cores cout<<boost::thread::hardware_concurrency() <<" processors/cores detected."<<endl<<endl; cout<<"When threads are running, press enter to stop"<<endl; // The number of producers/consumers

517. 518. 519. 520. 521. 522. 523. 524. 525. 526. 527. 528. 529. 530. 531. 532. 533. 534. 535. 536. 537. 538. 539. 540. 541. 542. 543. 544. 545. 546. 547. 548. 549. 550. 551. 552. 553.

int nrProducers, nrConsumers; // The shared queue SynchronisedQueue<string> queue; // Ask the number of producers cout<<"How many producers do you want? : "; cin>>nrProducers; // Ask the number of consumers cout<<"How many consumers do you want? : "; cin>>nrConsumers; // Create producers boost::thread_group producers; for (int i=0; i<nrProducers; i++) { Producer p(i, &queue); producers.create_thread(p); } // Create consumers boost::thread_group consumers; for (int i=0; i<nrConsumers; i++) { Consumer c(i, &queue); consumers.create_thread(c); } // Wait for enter (two times because the return from the // previous cin is still in the buffer) getchar(); getchar(); // Interrupt the threads and stop them producers.interrupt_all(); producers.join_all(); consumers.interrupt_all(); consumers.join_all(); }

18.13 Thread Local Storage We know that global data is shared between threads. In some cases we may wish to give each thread its own copy of global data. To this end, we call thread_specific_ptr<T> that is a pointer to the data (it is initially set to NULL). Each thread must initialise this pointer by calling reset() and subsequentially the data can be accessed by dereferencing the pointer. The data is automatically deleted when the thread exits. Here is an example of a thread function that defines it own copy of global data: Code (cpp):

554. 555.

// Global data. Each thread has its own value boost::thread_specific_ptr<int> threadLocalData;

556. 557. 558. 559. 560. 561. 562. 563. 564. 565. 566. 567. 568. 569. 570. 571. 572. 573. 574.

// Callable function void CallableFunction(int id) { // Initialise thread local data (for the current thread) threadLocalData.reset(new int); *threadLocalData=0; // Do this a number of times for (int i=0; i<5; i++) { // Print value of global data and increase value cout<<"Thread: "<<id<<" - Value: "<<(*threadLocalData)++<<endl; // Wait one second boost::this_thread::sleep(boost::posix_time::seconds(1)); } }

We now initialise the copy of the global data, we also create a thread group and we add a number of threads to it, each one having its own copy of the global data: Code (cpp):

575. 576. 577. 578. 579. 580. 581. 582. 583. 584. 585. 586. 587. 588. 589. 590. 591. 592. 593. 594. 595. 596.

int main() { // Initialise thread local data (for the main thread) threadLocalData.reset(new int); *threadLocalData=0; // Create threads and add them to the thread group boost::thread_group threads; for (int i=0; i<3; i++) { boost::thread* t=new boost::thread(&CallableFunction, i); threads.add_thread(t); } // Wait till they are finished threads.join_all(); // Display thread local storage value, should still be zero cout<<"Main - Value: "<<(*threadLocalData)<<endl; return 0; }

18.14 Summary and Conclusions We have included a chapter on multi-threading using Boost.Thread. It is now possible to create parallel applications in C++. We see a future for multi-tasking and multi-threading applications and for this reason we decided to give an introduction to the most important functionality in this library. Boost.Thread contains low-level operations or implementation mechanisms that we use to design and implement multithreaded applications. It contains the building blocks that can be used with parallel design patterns (see Mattson 2005). We summarise the main steps in the process of creating a multithreaded application:

597. Finding Concurrency: we decide if a problem is a suitable candidate for a parallel solution. System decomposition based on tasks or data allows us to find potentially concurrent tasks and their dependencies. In particular, we need a way of grouping tasks and ordering the groups in order to satisfy temporal constraints. 598. An initial design is produced. 599. Algorithm Structure Design: we elaborate the initial model in order to move it closet to a program. We pay attention to forces such as efficiency, simplicity, portability and scalability. The algorithm structure will be determined by tasks on the one hand or by data on the other hand. Examples of high-level algorithms are Divide and Conquer, Geometric Decomposition and Pipeline. 600. Supporting Structures Design: in this phase we need to decide how to model program structure and shared data. For example, the program could be designed as a SPMD (Single Program Multiple Data), Master Worker or Loop Parallelism pattern. Possible data structures are Shared Data and Shared Queue whose implementation we discussed in section 18.11. 601. Implementation Mechanisms: in this phase we deploy the functionality of Boost.Thread to implement the design. We discuss this process and its applications in Volume II.

Concurrent programming using the extremely popular Boost libraries is a lot of fun. Boost has several libraries within the concurrent programming spacethe Interprocess library (IPC) for shared memory, memory-mapped I/O, and message queue; the Thread library for portable multi-threading; the Message Passing Interface (MPI) library for message passing, which finds use in distributed computing; and the Asio library for portable networking using sockets and other low-level functions, to name just a few. This article introduces the IPC and MPI libraries along with some of the functionality they offer. In this article, you learn how to use the Boost IPC library to implement shared memory objects, message queues, and synchronized file locking. Using the Boost MPI library, you learn about the environment and the communicator classes and how you can achieve distributed communication. Note: The code in this article was tested using the gcc-4.3.4 and boost-1.45 packages. Frequently used acronyms

API: Application programming interface I/O: Input/output POSIX: Portable Operating System Interface for UNIX SDK: Software development kit

Using the Boost IPC library

Boost Interprocess is a header-only library, so all you need to do is include the appropriate header in your sources and make the compiler aware of the include path. This is a nifty feature to have; you just download the Boost sources (see Resources for a link), and you're ready to get started. For example, to use shared memory in your code, use the include shown in Listing 1.

Listing 1. The Boost IPC library is a header-only affair


#include <boost/interprocess/shared_memory_object.hpp> using namespace boost::interprocess; // your sources follow

When you pass the information to the compiler, you request that the reader modify the include path appropriately per the installation. Then, compile the code:
bash-4.1$ g++ ipc1.cpp I../boost_1_45_0

Create a shared memory object Let's begin with the customary "Hello World!" program. You have two processes: The first writes the string "Hello World!" into shared memory, and the latter reads and displays the string. Create your shared memory object as shown in Listing 2.

Listing 2. Creating the shared memory object


#include <boost/interprocess/shared_memory_object.hpp> int main(int argc, char* argv[ ]) { using namespace using boost::interprocess; try { // creating our first shared memory object. shared_memory_object sharedmem1 (create_only, "Hello", read_write); // setting the size of the shared memory sharedmem1.truncate (256); // more code follows } catch (interprocess_exception& e) { // .. . clean up } }

The object sharedmem1 is of type shared_memory_object (declared and defined in Boost headers) and takes three arguments in its constructor:

The first argumentcreate_onlymeans that this shared memory object is to be created and has not already been created. If a shared object by the same name already exists, an exception will be thrown. For a process that wants to have access to an already-created shared memory, the first argument should be open_only.

The second argumentHellois the name of the share memory region. Another process that accesses this shared memory will be using this name for the access. The third argumentread_writeis the access specifier of the shared memory object. Because this process modifies the contents of the shared memory object, you use read_write. A process that only reads from this shared memory uses the read_only specifier for access.

The truncate method sets the size of the shared memory in bytes. The code should ideally be wrapped by trycatch blocks. For example, if the shared memory object cannot be created, an exception of type boost::interprocess_exception is thrown. Using the shared memory object for writing For a process to use a shared memory object, the process has to map the object in its address space. The mapping is done using the mapped_region class declared and defined in the header mapped_region.hpp. Another benefit of using mapped_region is that both full and partial access to the shared memory object is possible. Listing 3 shows how to use your mapped_region.

Listing 3. Using mapped_region to access shared memory objects


#include <boost/interprocess/shared_memory_object.hpp> #include <boost/interprocess/mapped_region.hpp> int main(int argc, char* argv[ ]) { using namespace boost::interprocess; try { // creating our first shared memory object. shared_memory_object sharedmem1 (create_only, "Hello", read_write); // setting the size of the shared memory sharedmem1.truncate (256); // map the shared memory to current process mapped_region mmap (sharedmem1, 256); // access the mapped region using get_address std::strcpy(static_cast<char* >(region.get_address()), "Hello World!\n"); } catch (interprocess_exception& e) { // .. . clean up } }

That's about it, really. You have created your mapped_region object and accessed it using the get_address method. The static_cast has been done, because get_address returns a void*. What happens to the shared memory when main exits? The shared memory is not deleted when the process exits. To delete shared memory, you need to call shared_memory_object::remove. The access mechanism for process 2 is simple enough: Listing 4 proves this point.

Listing 4. Accessing the shared memory object from the second process
#include #include #include #include #include <boost/interprocess/shared_memory_object.hpp> <boost/interprocess/mapped_region.hpp> <cstring> <cstdlib> <iostream>

int main(int argc, char *argv[ ]) { using namespace boost::interprocess; try { // opening an existing shared memory object shared_memory_object sharedmem2 (open_only, "Hello", read_only); // map shared memory object in current address space mapped_region mmap (sharedmem2, read_only); // need to type-cast since get_address returns void* char *str1 = static_cast<char*> (mmap.get_address()); std::cout << str1 << std::endl; } catch (interprocess_exception& e) { std::cout << e.what( ) << std::endl; } return 0; }

In Listing 4, you create the shared memory object using the open_only and read_only attributes. If the shared memory object cannot be found, an exception is thrown. Now, build and run the code in Listing 3 and Listing 4. You should see "Hello World!" in your terminal. Next, add the following lines in the code for the second process (Listing 4) just after std::cout, and rebuild the code:
// std::cout code here shared_memory_object::remove("Hello"); // } catch(interprocess_exception& e) {

Execute the code twice in succession. The second run prints the line "No such file or directory," confirming that the shared memory has been deleted.

Back to top Interprocess communication using message queue Now, let's explore another popular mechanism for interprocess communication: the message queue. Each communicating process may add messages to the queue and read messages from the queue. The message queue comes with the following properties:

It has a name, and processes access it using the given name.

During queue creation, the user must specify the maximum length of the queue and the maximum size of an individual message. The queue is persistent, which means that it remains in memory when the process that created it dies. The queue may be removed using an explicit call to boost::interprocess::message_queue::remove.

Listing 5 shows a code snippet in which a process has created a message queue of 20 integers.

Listing 5. Creating a message queue of 20 integers


#include <boost/interprocess/ipc/message_queue.hpp> #include <iostream> int main(int argc, char* argv[ ]) { using namespace boost::interprocess; try { // creating a message queue message_queue mq (create_only, // only create "mq", 20, sizeof(int) ); // more code follows } catch (interprocess_exception& e) { std::cout << e.what( ) << std::endl; } }

// name //max message count //max message size

Note the create_only attribute passed in the constructor for message_queue. Similar to the case for a shared memory object, a message queue that is opened only for reading will have the open_only attribute passed in the constructor. Sending and receiving data On the sending side, you use the send method of the queue to add data. The send method signature has three inputs: a pointer to the raw data (void*), the size of the data, and a priority. For now, send all the numbers with the same priority. Listing 6 shows the code.

Listing 6. Sending messages to the queue


#include <boost/interprocess/ipc/message_queue.hpp> #include <iostream> int main(int argc, char* argv[ ]) { using namespace boost::interprocess; try { // creating a message queue message_queue mq (create_only, // only create "mq", 20, sizeof(int)

// name //max message count //max message size

); // now send the messages to the queue for (int i=0; i<20; ++i) mq.send(&i, sizeof(int), 0); // the 3rd argument is the priority } catch (interprocess_exception& e) { std::cout << e.what( ) << std::endl; } }

On the receiving side, the queue takes in the open_only attribute. The individual messages are obtained from the queue by calling the receive method of the message_queue class. Listing 7 shows the receive method signature.

Listing 7. Method signature for message_queue::receive


void receive (void *buffer, std::size_t buffer_size, std::size_t &recvd_size, unsigned int &priority );

Let's decipher this a bit. The first argument is where the received data from the queue will be stored. The second argument is the expected size of the received data. The third argument is the actual size of the data received, and the fourth argument is the priority of the received message. Clearly, if the second and third arguments turn out to be unequal during the course of program execution, that's an error. Listing 8 provides the code for the receiver process.

Listing 8. Receiving messages from the message queue


#include <boost/interprocess/ipc/message_queue.hpp> #include <iostream> int main(int argc, char* argv[ ]) { using namespace boost::interprocess; try { // opening the message queue whose name is mq message_queue mq (open_only, // only open "mq" ); size_t recvd_size; unsigned int priority;

// name

// now send the messages to the queue for (int i=0; i<20; ++i) { int buffer; mq.receive ((void*) &buffer, sizeof(int), recvd_size, priority); if (recvd_size != sizeof(int)) ; // do the error handling std::cout << buffer << " " << recvd_size << " " << priority; } } catch (interprocess_exception& e) { std::cout << e.what( ) << std::endl; }

That was reasonably simple. Note that you still have not removed the message queue from memory; much like the shared memory object, this queue is persistent. For removing the queue, add the following line whenever you are done using the queue:
message_queue::remove("mq"); // remove the queue using its name

Message priority Make the modification shown in Listing 9 on the sending side. The receiver code needs no changes.

Listing 9. Changing the priority of messages


message_queue::remove("mq"); // remove the old queue message_queue mq (); // create as before for (int i=0; i<20; ++i) mq.send(&i, sizeof(int), i%2); // the 3rd argument is the priority // rest as usual

On re-running the code, you should see the output provided in Listing 10.

Listing 10. The output as seen in the receiving process


1 4 1 3 4 1 5 4 1 7 4 1 9 4 1 11 4 1 13 4 1 15 4 1 17 4 1 19 4 1 0 4 0 2 4 0 4 4 0 6 4 0 8 4 0 10 4 0 12 4 0 14 4 0 16 4 0 18 4 0

Higher-priority messages will be available for removal by the second process, as Listing 10 confirms.

Back to top Synchronized access to a file Shared memory and message queues are fine, but file I/O is also an important tool that processes use to communicate with each other. Synchronizing file accesses used by concurrent processes to communicate is not an easy task, but the file-locking capability from the Boost IPC library does make life simpler. Before any further explanation, look at Listing 11 to understand how a file_lock object works.

Listing 11. Using a file_lock object for synchronizing file accesses


#include #include #include #include <fstream> <iostream> <boost/interprocess/sync/file_lock.hpp> <cstdlib>

int main() { using namespace boost::interprocess; std::string fileName("test"); std::fstream file; file.open(fileName.c_str(), std::ios::out | std::ios::binary | std::ios::trunc); if (!file.is_open() || file.bad()) { std::cout << "Open failed" << std::endl; exit(-1); } try { file_lock f_lock(fileName.c_str()); f_lock.lock(); std::cout << "Locked in Process 1" << std::endl; file.write("Process 1", 9); file.flush(); f_lock.unlock(); std::cout << "Unlocked from Process 1" << std::endl; } catch (interprocess_exception& e) { std::cout << e.what( ) << std::endl; } file.close(); return 0; }

This code first opens a file, then locks it using file_lock. On completion of the writing, it flushes the file buffers and unlocks the file. You use the lock method to gain exclusive access to the file. If there's another process that is also trying to write to the file and has already invoked lock, the second process waits until the first process has voluntarily relinquished using unlock. The constructor for the file_lock class accepts the name of the file to be locked, and it's important to open the file before lock is invoked; otherwise, an exception will be thrown. Now, copy the code in Listing 11 and make some changes to it. Specifically, make it the second process that's requesting the lock. Listing 12 shows the relevant changes.

Listing 12. Code for the second process trying to access the file
// .. as in Listing 11 file_lock f_lock(fileName.c_str()); f_lock.lock(); std::cout << "Locked in Process 2" << std::endl; system("sleep 4"); file.write("Process 2", 9); file.flush(); f_lock.unlock(); std::cout << "Unlocked from Process 2" << std::endl; // file.close();

Now, if these two processes are run concurrently, you expect the first process to wait 4 seconds before acquiring the file_lock 50 percent of the time, all other things being equal. Here are a few things you must remember when using file_lock. You're talking about interprocess communication here, with emphasis on process. This means that you're not supposed to use file_lock to synchronize data accesses by threads of the same process. On POSIX-compliant systems, file handles are process and not thread attributes. Here are a few guidelines for using file locking:

Use a single file_lock object per file per process. Use the same thread to lock and unlock a file. Flush data in writer processes before unlocking a file by either calling C's flush library routine or the flush method (if you prefer a C++ fstream).

Using file_lock with scoped locks It is possible that during program execution some exception is thrown, and the file is not unlocked. Such an occurrence might result in undesirable program behavior. To avoid this situation, consider wrapping the file_lock object in a scoped_lock, defined in boost/interprocess/sync/scoped_lock.hpp. Using scoped_lock, you don't need to explicitly lock or unlock the file; the locking occurs inside the constructor, and the unlocking happens automatically whenever you exit the scope. Listing 13 shows the modification to Listing 11 to make it use scoped locks.

Listing 13. Using scoped_lock with file_lock


#include <boost/interprocess/sync/scoped_lock.hpp> #include <boost/interprocess/sync/file_lock.hpp> // code as in Listing 11 file_lock f_lock(fileName.c_str()); scoped_lock<file_lock> s_lock(f_lock);

// internally calls f_lock.lock( );

// No need to call explicit lock anymore std::cout << "Locked in Process 1" << std::endl; file.write("Process 1", 9); // code as in Listing 11

Note: See Resources for links to more information on the Resource Acquisition Is Initialization (RAII) programming idiom.

Back to top Learning Boost MPI If you are not already familiar with the Message Passing Interface, before delving into Boost MPI, you should briefly check out the links to MPI resources provided in the Resources section. The MPI is an easy-to-use standard that works on the model of processes communicating with each other by passing messages. You don't need to use sockets or other level communication primitives; the MPI back end manages all the hard work. So, where does Boost MPI fit in? The creators of Boost MPI have provided an even higher level of abstraction and a simple set of routines built on top of the MPI-provided API, such as MPI_Init and MPI_Bcast. Boost MPI is not a stand-alone library in the sense that you download it, build it, and you are ready for work. Instead, you must install any of the MPI implementations, such as MPICH or Open MPI, and build the Boost Serialization library. For details on how to build Boost MPI, see Resources. Typically, you would use the following command to build Boost MPI:
bash-4.1$ bjam with-mpi

Windows users can download the pre-built libraries for MPI from BoostPro (see Resources). The libraries are compatible with Microsoft HPC Pack 2008 and 2008 R2 (see Resources) and work on Windows XP with Service Pack 3 and later client operating systems.

Back to top Hello World with MPI There are two primary classes in the Boost MPI library that you must learn: the environment class and the communicator class. The former is responsible for the initialization of the distributed environment; the latter is used for communicating between processes. Because we're talking about distributed computing here, let's have four processes all printing "Hello World" to the terminal. Listing 14 shows the code.

Listing 14. Hello World using Boost MPI


#include <boost/mpi.hpp> #include <iostream> int main(int argc, char* argv[]) { boost::mpi::environment env(argc, argv); boost::mpi::communicator world; std::cout << argc << std::endl; std::cout << argv[0] << std::endl;

std::cout << "Hello World! from process " << world.rank() << std::endl; return 0; }

Now build the code in Listing 14 with proper linking to the Boost MPI and Serialization libraries. Run the executable at the shell prompt. You should see "Hello World! from process 0". Next, use your MPI dispatcher toolfor example, mpirun for Open MPI users and mpiexec for Microsoft HPC Pack 2008and run the executable as:
mpirun np 4 <executable name> OR mpiexec n 4 <executable name>

You should now see something like Listing 15, with mympi1 being the executable name.

Listing 15. Output from running the MPI code


1 mympi1 Hello, 1 mympi1 1 mympi1 Hello, Hello, 1 mympi1 Hello,

World! from process 3

World! from process 1 World! from process 2 World! from process 0

There you have it. Within the MPI framework, four copies of the same process have been created. Within the MPI environment, each process has its unique ID, as determined by the communicator object. Now, try communicating between the processes. Have one process communicate with another process using the send and receive function calls. Call the process sending the message the master process and the processes receiving the message the worker process. The source code is the same for both the master and the receiver, with the functionality being decided using the rank that the world object provides (see Listing 16.

Listing 16. Code for processes 0, 1, and 2 communicating with each other
#include <boost/mpi.hpp> #include <iostream> int main(int argc, char* argv[]) { boost::mpi::environment env(argc, argv); boost::mpi::communicator world;

if (world.rank() == 0) { world.send(1, 9, 32); world.send(2, 9, 33); } else { int data; world.recv(0, 9, data); std::cout << "In process " << world.rank( ) << "with data " << data << std::endl; } return 0; }

Let's start with the send function. The first ID is the ID of the receiver process; the second is message data; and, the third is the actual data. Why do you need the message tag? The receiver process might want to deal with messages that have a specific tag at some point during execution, so this scheme of doing things helps. For processes 1 and 2, the recv function is blocking, which means that the program will wait until it receives a message with tag ID 9 from process 0. When it does receive the message, the information is stored in data. Here's the output when running the code:
In process 1 with data 32 In process 2 with data 33

What happens, then, if you have something like world.recv(0, 1, data); on the receiver side? The code hangs, but in reality, the receiver process is waiting for a message with a tag that's never going to arrive.

Back to top Conclusion This article just scratched the surface of the functionality that these two useful libraries provide. Other functionality that these libraries provide include IPC's memory-mapped I/O and MPI's broadcast ability. From a usability standpoint, IPC is easy to use. However, the MPI library is dependant on native MPI implementations, and off-the-shelf availability of a native MPI library along with the pre-built Boost MPI and Serialization libraries is still an issue. Nevertheless, it is well worth the effort to make builds from sources for both the MPI implementation and Boost.

The Boost.Threads Library


By Bill Kempf, May 01, 2002
1 Comment

Standard C++ threads are imminent and will derive from the Boost.Threads library, explored here by the library's author.

Condition Variables
Sometimes its not enough to lock a shared resource and use it. Sometimes the shared resource needs to be in some specific state before it can be used. For example, a thread may try and pull data off of a stack, waiting for data to arrive if none is present. A mutex is not enough to allow for this type of synchronization. Another synchronization type, known as a condition variable, can be used in this case.

A condition variable is always used in conjunction with a mutex and the shared resource(s). A thread first locks the mutex and then verifies that the shared resource is in a state that can be safely used in the manner needed. If its not in the state needed, the thread waits on the condition variable. This operation causes the mutex to be unlocked during the wait so that another thread can actually change the state of the shared resource. It also ensures that the mutex is locked when the thread returns from the wait operation. When another thread changes the state of the shared resource, it needs to notify the threads that may be waiting on the condition variable, enabling them to return from the wait operation. >Listing Four illustrates a very simple use of the internally through the use of a

boost::condition class. A class is defined implementing a

bounded buffer, a container with a fixed size allowing FIFO input and output. This buffer is made thread-safe

boost::mutex. The put and get operations use a condition variable to

ensure that a thread waits for the buffer to be in the state needed to complete the operation. Two threads are created, one that puts 100 integers into this buffer and the other pulling the integers back out. The bounded buffer can only hold 10 integers at one time, so the two threads wait for the other thread periodically. To verify that it is happening, the

put and get operations output diagnostic strings to std::cout. Finally, the main thread

waits for both threads to complete.

Listing Four: The boost::condition class ?


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

#include #include #include #include

<boost/thread/thread.hpp> <boost/thread/mutex.hpp> <boost/thread/condition.hpp> <iostream>

const int BUF_SIZE = 10; const int ITERS = 100; boost::mutex io_mutex; class buffer { public: typedef boost::mutex::scoped_lock scoped_lock; buffer() : p(0), c(0), full(0) { } void put(int m) { scoped_lock lock(mutex); if (full == BUF_SIZE) { { boost::mutex::scoped_lock lock(io_mutex); std::cout << "Buffer is full. Waiting..." << std::endl; }

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

while (full == BUF_SIZE) cond.wait(lock); } buf[p] = m; p = (p+1) % BUF_SIZE; ++full; cond.notify_one(); } int get() { scoped_lock lk(mutex); if (full == 0) { { boost::mutex::scoped_lock lock(io_mutex); std::cout << "Buffer is empty. Waiting..." << std::endl; } while (full == 0) cond.wait(lk); } int i = buf[c]; c = (c+1) % BUF_SIZE; --full; cond.notify_one(); return i; } private: boost::mutex mutex; boost::condition cond; unsigned int p, c, full; int buf[BUF_SIZE]; }; buffer buf; void writer() { for (int n = 0; n < ITERS; ++n) { { boost::mutex::scoped_lock lock(io_mutex); std::cout << "sending: " << n << std::endl; } buf.put(n); } }

86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109

void reader() { for (int x = 0; x < ITERS; ++x) { int n = buf.get(); { boost::mutex::scoped_lock lock(io_mutex); std::cout << "received: " << n << std::endl; } } } int main(int argc, char* argv[]) { boost::thread thrd1(&reader); boost::thread thrd2(&writer); thrd1.join(); thrd2.join(); return 0; }

Thread Local Storage


Many functions are not implemented to be reentrant. This means that it is unsafe to call the function while another thread is calling the same function. A non-reentrant function holds static data over successive calls or returns a pointer to static data. For example, to be broken into tokens. A non-reentrant function can be made into a reentrant function using two approaches. One approach is to change the interface so that the function takes a pointer or reference to a data type that can be used in place of the static data previously used. For example, POSIX defines takes an extra

std::strtok is not reentrant because it uses static data to hold the string

strtok_r, a reentrant variant of std::strtok, which

char** parameter thats used instead of static data. This solution is simple and gives the best

possible performance; however, it means changing the public interface, which potentially means changing a lot of code. The other approach leaves the public interface as is and replaces the static data with thread local storage (sometimes referred to as thread-specific storage). Thread local storage is data thats associated with a specific thread (the current thread). Multithreading libraries give access to thread local storage through an interface that allows acc ess to the current threads instance of the data. Every thread gets its own instance of this data, so theres never an issue with concurrent access. However, access to thread local storage is slower than access to static or local data; therefore its not a lways the best solution. However, its the only solution available when its essential not to change the public interface. Boost.Threads provides access to thread local storage through the smart pointer

boost::thread_specific_ptr. The first time every thread tries to access an instance of this smart pointer, it has a NULL value, so code should be written to check for this and initialize the pointer on first use. The
Boost.Threads library ensures that the data stored in thread local storage is cleaned up when the thread exits. Listing Five illustrates a very simple use of the

boost::thread_specific_ptr class. Two new threads

are created to initialize the thread local storage and then loop 10 times incrementing the integer contained in the

smart pointer and writing the result to resource). The

std::cout (which is synchronized with a mutex because it is a shared

main thread then waits for these two threads to complete. The output of this example clearly

shows that each thread is operating on its own instance of data, even though both are using the same

boost::thread_specific_ptr.
Listing Six: The boost::thread_specific_ptr class. ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

#include #include #include #include

<boost/thread/thread.hpp> <boost/thread/mutex.hpp> <boost/thread/tss.hpp> <iostream>

boost::mutex io_mutex; boost::thread_specific_ptr<int> ptr; struct count { count(int id) : id(id) { } void operator()() { if (ptr.get() == 0) ptr.reset(new int(0)); for (int i = 0; i < 10; ++i) { (*ptr)++; boost::mutex::scoped_lock lock(io_mutex); std::cout << id << ": " << *ptr << std::endl; } } int id; }; int main(int argc, char* argv[]) { boost::thread thrd1(count(1)); boost::thread thrd2(count(2)); thrd1.join(); thrd2.join(); return 0; }

Once Routines
Theres one issue left to deal with: how to make initialization routines (such as constructors) thread -safe. For example, when a global instance of an object is created as a singleton for an application, knowing that theres an issue with the order of instantiation, a function is used that returns a static instance, ensuring the static instance is

created the first time the method is called. The problem here is that if multiple threads call this function at the same time, the constructor for the static instance may be called multiple times as well, with disastrous results. The solution to this problem is whats known as a once routine. A once routine is called only once by an application. If multiple threads try to call the routine at the same time, only one actually is able to do so while all others wait until that thread has finished executing the routine. To ensure that it is executed only once, the routine is called indirectly by another function thats passed a pointer to the routine and a r eference to a special flag type used to check if the routine has been called yet. This flag is initialized using static initialization, which ensures that it is initialized at compile time and not run time. Therefore, it is not subject to multithreaded initialization problems.

boost::call_once and also defines the flag type boost::once_flag and a special macro used to statically initialize the flag named BOOST_ONCE_INIT.
Boost.Threads provides calling once routines through

boost::call_once. A global integer is statically initialized to zero and an instance of boost::once_flag is statically initialized using BOOST_ONCE_INIT. Then main starts two threads, both trying to initialize the global integer by calling boost::call_once with a pointer to a function that increments the integer. Next main waits for these two threads to complete and writes out the final value of the integer to std::cout. The output illustrates that the routine truly was only called once because the
Listing Six illustrates a very simple use of value of the integer is only one.

Listing Six: A very simple use of boost::call_once. ?


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

#include <boost/thread/thread.hpp> #include <boost/thread/once.hpp> #include <iostream> int i = 0; boost::once_flag flag = BOOST_ONCE_INIT; void init() { ++i; } void thread() { boost::call_once(&init, flag); } int main(int argc, char* argv[]) { boost::thread thrd1(&thread); boost::thread thrd2(&thread); thrd1.join(); thrd2.join(); std::cout << i << std::endl; return 0; }

The Future of Boost.Threads

There are several additional features planned for Boost.Threads. There will be a

boost::read_write_mutex, which will allow multiple threads to read from the shared resource at the
same time, but will ensure exclusive access to any threads writing to the shared resource. There will also be a

boost::thread_barrier, which will make a set of threads wait until all threads have entered the barrier. A boost::thread_pool is also planned to allow for short routines to be executed asynchronously without
the need to create or destroy a thread each time. Boost.Threads has been presented to the C++ Standards Committees Library Working Group for possible inclusion in the Standards upcoming Library Technical Report, as a prelude to inclusion in the next version of the Standard. The committee may consider other threading libraries; however, they viewed the initial presentation of Boost.Threads favorably, and they are very interested in adding some support for multithreaded programming to the Standard. So, the future is looking good for multithreaded programming in C++.

Threading with Boost - Part I: Creating Threads


By Gavin Baker on May 13, 2009 10:16 PM | 13 Comments

Boost is an incredibly powerful collection of portable class libraries for C++. There are classes for such tasks as date/time manipulation, filesystem interfaces, networking, numerical programming, interprocess communication and much more. The Boost documentation is substantial, but can still be daunting to new users. So this article is the first of a series on using Boost, starting with basic threading. It aims to provide an accessible introduction, with complete working examples. First, this article assumes you know about general threading concepts and the basics of how to use them on your platform. (For a refresher, see the Wikipedia article on Threads).) Here I focus specifically on how to use Boost threads in a practical setting, starting from the basics. I also assume you have Boost installed and ready to use (see the Boost Getting Started Guide for details). This article looks specifically at the different ways to create threads. There are many other techniques necessary for real multi-threaded systems, such as synchronisation and mutual exclusion, which will be covered in a future article. Overview A boost::thread object represents a single thread of execution, as you would normally create and manage using your operating system specific interfaces. For example: on POSIX systems, a Boost thread uses the Pthreads API, and on Win32 it uses the native CreateThread and related calls. Because Boost abstracts away all the platform-specific code, you can easily write sophisticated and portable code that runs across all major platforms. A thread object can be set to a special state of not-a-thread, in which case it is inactive (or hasn't been given a thread function to run yet). A boost::thread object is normally constructed by passing the threading function or method it is to run. There are actually a number of different ways to do so. I cover the main thread creation approaches below. Code Examples All the code examples are provided in a single download below, which you can use for any purpose, no strings attached. (The usual disclaimer is that no warranties apply!) And just as Boost runs on many platforms (ie. Windows, Unix/Linux, Mac OS X and others) the example code should be similarly portable. Download boost_threads_eg1.zip (4kB)

There is a separate example program for each section below, and a common Bjam script to build them all (Jamroot). Bjam is the Boost build system, a very powerful (but notoriously difficult to learn, and worthy of a whole series of articles). Having said that, you are certainly not obliged to use Bjam. It is still worth knowing how to build applications manually before relying on scripts, so here is an example command line for manual compilation on my system (Mac OS X with Boost installed from MacPorts):

g++ -I/opt/local/include -L/opt/local/lib -lboost_thread-mt -o t1 t1.cpp All this does is add an include path (with the -I option) pointing to the root of the boost headers, add the library search path (with the -L option) and link in the threading library (boost_thread-mt). You can use the above as the basis for writing our own Makefile if you prefer, or creating build rules in your IDE of choice. Doing 'Real' Work And before we dive in, a quick note on doing "real work"... In the examples below, a simple sleep call is used to simulate performing actual work in the threads. This is simply to avoid cluttering up the examples with code that would take some finite time to execute but would otherwise be irrelevant. The simplest way to sleep for a given duration using Boost is to first create a time duration object, and then pass this to the sleep method of the special boost::this_thread class. The this_thread class gives us a convenient way to refer to the currently running thread, when we otherwise may not be able to access it directly (ie. within an arbitrary function). Here's how to sleep for a few seconds:
view plaincopy to clipboardprint?

1. // Three seconds of pure, hard work! 2. boost::posix_time::seconds workTime(3); 3. boost::this_thread::sleep(workTime);


// Three seconds of pure, hard w ork! boost::posix_time::seconds w orkTime(3); boost::this_thread::sleep(w orkTime);

In the main() function, we then wait for the worker thread to complete using the join() method. This will cause the main thread to sleep until the worker thread completes (successfully or otherwise). The observant reader will wonder then what the advantage is in spawning a thread, only to wait for it to complete? Surely that serialises the execution path and places us firmly back in sequential world? What is the point of spawning a thread? Well, until we figure out how to use the synchronisation mechanisms, this is the most straightforward approach to illustrate thread creation. And knowing ow to 'join' threads is also very important. To synchronise the completion of a thread, we wait for it to finish by calling workerThread.join(). The general structure of the examples below is shown in the following sequence diagram:

The application starts, and the main thread runs at (a). Then at (b), the main thread spawns the worker thread by constructing a thread object with the worker function. Right after, at (c), the main thread calls join on the thread, which means it will go to sleep (and not consume any CPU time) until the worker thread has completed. As soon as the worker thread is created at (b), it will start execution. At some point later at (d), the worker completes. Since the main thread was joining on its completion, main wakes up and continues running. It finishes at (e) and the process terminates. Each of the examples below follow this general scheme - the difference lies in how the threads are created. Type 1: A Thread Function The simplest threading scenario is where you have a simple (C-style) function that you want to run as a separate thread. You just pass the function to the boost::thread constructor, and it will start running. You can then wait for the thread to complete by calling join(), as shown above. The following example shows how. First, we include the correct Boost thread header, create the thread object and pass in our worker function. The main thread in the process will then wait for the thread to complete.
view plaincopy to clipboardprint?

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

#include <iostream> #include <boost/thread.hpp> #include <boost/date_time.hpp> void workerFunc() { boost::posix_time::seconds workTime(3); std::cout << "Worker: running" << std::endl; // Pretend to do something useful... boost::this_thread::sleep(workTime);

14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

std::cout << "Worker: finished" << std::endl; } int main(int argc, char* argv[]) { std::cout << "main: startup" << std::endl; boost::thread workerThread(workerFunc); std::cout << "main: waiting for thread" << std::endl; workerThread.join(); std::cout << "main: done" << std::endl; return 0; }

#include <iostream> #include <boost/thread.hpp> #include <boost/date_time.hpp> void w orkerFunc() { boost::posix_time::seconds w orkTime(3);

When you run the program, you should see output similar to the following: % ./t1 main: startup main: waiting for thread Worker: running Worker: finished main: done It's as simple as that! This created a thread and ran it, and you would have seen a pause while the worker thread was running (ok, busy sleeping!). Later on, we'll do something a bit more substantial. But this example shows the absolute minimum code required to start a simple thread. Simply pass your function to the boost::thread constructor. Type 2: Function with Arguments So the above function wasn't terribly useful by itself. We really want to be able to pass in arguments to the thread function. And fortunately, it's very easy - you simply add parameters to the thread object's constructor, and those arguments are automagically bound and passed in to the thread function. Let's say your thread function had the following signature:
view plaincopy to clipboardprint?

1. void workerFunc(const char* msg, unsigned delaySecs) //...


void w orkerFunc(const char* msg, unsigned delaySecs) //...

You simply pass the arguments to the thread constructor after the name of the thread function, thus:
view plaincopy to clipboardprint?

1. boost::thread workerThread(workerFunc, "Hello, boost!", 3);


boost::thread w orkerThread(w orkerFunc, "Hello, boost!", 3);

This example is called 't2' in the source examples. Type 3: Functor A functor is a fancy name for an object that can be called just like a function. The class defines a special method by overloading the operator() which will be invoked when the functor is called. In this way, the functor can encapsulate the thread's context and still behave like a thread function. (Functors are not specific to threads, they are simply very convenient.) This is what our functor looks like:
view plaincopy to clipboardprint?

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40.

class Worker { public: Worker(unsigned N, float guess, unsigned iter) : m_Number(N), m_Guess(guess), m_Iterations(iter) { } void operator()() { std::cout << "Worker: calculating sqrt(" << m_Number << "), itertations = " << m_Iterations << std::endl; // Use Newton's Method float x; float x_last = m_Guess; for (unsigned i=0; i < m_Iterations; i++) { x = x_last - (x_last*x_last-m_Number)/ (2*x_last); x_last = x; std::cout << "Iter " << i << " = " << x << std::endl; } std::cout << "Worker: Answer = " << x << std::endl; } private: unsigned float unsigned }; m_Number; m_Guess; m_Iterations;

class Worker { public: Worker(unsigned N, float guess, unsigned iter) : m_Number(N), m_Guess(guess), m_Iterations(iter) { } void operator()() { std::cout << "Worker: calculating sqrt(" << m_Number << "), itertations = " << m_Iterations << std::endl; // Use New ton's Method float x; float x_last = m_Guess; for (unsigned i=0; i < m_Iterations; i++) { x = x_last - (x_last*x_last-m_Number)/ (2*x_last); x_last = x; std::cout << "Iter " << i << " = " << x << std::endl; } std::cout << "Worker: Answ er = " << x << std::endl; } private: unsigned m_Number;

This worker functor calculates a square root of a number using Newton-Rhapson's method, just for fun (ok, I got bored putting sleep in all the threads). The number, a rough guess and the number of iterations is passed to the constructor. The calculation (which is in fact extremely fast) is performed in the thread itself, when the operator()() gets called. So in the main code, first we create our callable object, constructing it with any necessary arguments as normal. Then you pass the instance to the boost::thread constructor, which will invoke the operator()() method on your functor. This becomes the new thread, and runs just like any other thread, with the added benefit that it has access to the object's context and other methods. This approach has the benefit of wrapping up a thread into a convenient bundle.
view plaincopy to clipboardprint?

1. int main(int argc, char* argv[]) 2. { 3. std::cout << "main: startup" << std::endl; 4. 5. Worker w(612, 10, 5); 6. boost::thread workerThread(w); 7. 8. std::cout << "main: waiting for thread" << std::endl; 9. 10. workerThread.join(); 11. 12. std::cout << "main: done" << std::endl; 13. 14. return 0; 15. }

int main(int argc, char* argv[]) { std::cout << "main: startup" << std::endl; Worker w (612, 10, 5); boost::thread w orkerThread(w ); std::cout << "main: w aiting for thread" << std::endl; w orkerThread.join(); std::cout << "main: done" << std::endl; return 0; }

Note: A very important consideration when using functors with boost threads is that the thread constructor takes the functor parameter by value, and thus makes a copy of the functor object. Depending on the design of your functor object, this may have unintended side-effects. Take care when writing functor objects to ensure that they can be safely copied. Type 4: Object method I It is frequently convenient to define an object with an instance method that runs on its own thread. After all, we're coding C++ here, not C! With Boost's thread object, this only slightly more work than making a regular function into a thread. First, we have to specify the method using its class qualifier, as you would expect, and we use the & operator to pass the address of the method. Because methods in C++ always have an implicit this pointer passed in as the first parameter, we need to make sure we call the object's method using the same convention. So we will pass the object pointer (or this depending on whether we are inside the object or not) as the first parameter, along with any other actual parameters we might have after that, thus:
view plaincopy to clipboardprint?

1. Worker w(3); 2. boost::thread workerThread(&Worker::processQueue, &w, 2);


Worker w (3); boost::thread w orkerThread(&Worker::processQueue, &w , 2);

As an aside, take care in your own code that you don't accidentally allocate an object on the stack in one place, spawn a thread, then have the object go out of scope and be destroyed before the thread has completed! This could be the source of many tricky bugs. The full listing is essentially the same. To pass additional parameters to the method, simply add them in the constructor to the thread object after the object pointer. Type 5: Object method II You may want to create a bunch of objects that manage their own threads, and can be created and run in a more flexible manner than keeping around a bunch of objects along with their associated threads in the caller. (I think they call this encapsulation.) So our final example places the thread instance within the object itself, and provides methods to manage them. Since the thread object exists as an instance member (as opposed to a pointer), what happens in the constructor? In particular, what if we don't want to run the thread at the same time as we create our object? Fortunately, the default constructor for the thread creates it in an "invalid" state, called not-a-thread, which will do nothing until you assign a real one to it (in our start method, for example). So now our class declaration has the following data member added:

view plaincopy to clipboardprint?

1. //... 2. private: 3. boost::thread 4. //...


//... private: boost::thread //...

m_Thread;

m_Thread;

The Worker::start() method spawns the thread which will run the processQueue method. Notice how we pass in this as the first bound parameter? Because we are using an instance method (and not a class method or regular function), we must ensure the first parameter is the instance pointer. The N parameter is the first actual parameter for the thread function, as can be seen in its signature.
view plaincopy to clipboardprint?

1. void start(int N) 2. { 3. m_Thread = boost::thread(&Worker::processQueue, this, N); 4. }


void start(int N) { m_Thread = boost::thread(&Worker::processQueue, this, N); }

The join method is very simply:


view plaincopy to clipboardprint?

1. void join() 2. { 3. m_Thread.join(); 4. }


void join() { m_Thread.join();

which means our main function becomes no more than:


view plaincopy to clipboardprint?

1. Worker worker; 2. 3. worker.start(3); 4. 5. worker.join();


Worker w orker; w orker.start(3);

This encapsulation of the threading can be very useful, especially when combined with patterns such as the Singleton (which we will look at in a future article). Conclusion

We have seen a variety of techniques for creating threads using the Boost threading library. From simple C functions to instance methods with parameters, the thread class permits a great deal of flexibility in how you structure your application. Future Articles In later installments, we will look at synchronisation methods, mutexes, and all sorts of other interesting techniques. Please leave a comment below, with any questions or feedback you may have. Too long? Too brief? More details required? Something confusing? Let me know.

Potrebbero piacerti anche