Click to navigate back to homepage  
Wednesday, March 17, 2010 
  Search :



 
 
     
 
 
Issue > Jan 2009 > Featured Technologies
 
 

Picking on Threads

By: Courtesy of Texas Instruments Inc.
( 1 Jan 2009 )


The trend toward multicore processors spans a wide variety of applications, from PCs to audio/ video gear to cellular network infrastructure. It's not hard to see why: as applications require more processing power, simply increasing a DSP's megahertz is no longer a viable solution because of the power required and the heat produced. Multicore DSPs avoid that problem by spreading the workload over multiple cores. For example, if the system requires 3GHz worth of performance from a DSP, a multicore architecture could have three cores running at 1GHz each, all in a single DSP package. But at the same time, the multicore DSP has lower power requirements and produces less heat than a single-core DSP.

Although no one is arguing about whether multicore processors are here to stay, there is some debate about threading, which is the process of dividing a program into multiple tasks, or threads, that run in parallel. Two Texas Instruments Inc. (TI) experts, Alan Gatherer and Bill Mills, explain what the debate is all about.

NEW WAYS OF THINKING ABOUT THREADING
Alan Gatherer, CTO, Communications Infrastructure Group
As a child, my mother would knit while watching television. Though I never learned to knit myself, I did appreciate the importance of careful counting. If one stitch too many was added due to a lack of concentration during a particularly exciting episode of "Dr. Who", then the pattern would not line up. Even worse, the mistake might not be found until much later in the process; finding exactly where the extra stitch occurred involved careful unraveling until you found it. So my mother, who only recently started using a computer, taught me at an early age that just one thread out of place can ruin a whole sweater, and when you have a lot of threads, debugging is a nightmare. In the early days of circuit design, a similar problem, where systems developed using many circuit components often failed, was called the "tyranny of numbers" [1]. If you have enough little things put together, something is bound to go wrong. And it was this problem that led TI's Jack Kilby to the development of the first integrated circuit.

It seems that with the advent of serious multicore we are facing a similar situation. We may need a new Jack Kilby to solve this problem for us again. Until a new Jack comes along, it looks like we are stuck carefully knitting threads together. Like my mother, we need to count carefully and keep a close eye on the pattern we are trying to create. Breaking a problem into more threads than it really needs is asking for trouble; some authors have even suggested that threading is the root of the problem in multiprocessing today. [2]

But assuming that threads are here to stay, and that the number of threads a system supports will increase significantly, a multiprocessing system is going to look less and less like a collection of threads and more like a piece of material. If we do not learn to create, visualize and debug at the sweater level, but instead continue to deal with the individual threads, we will not be able to knit ourselves anything worth wearing.

The good news is that companies such as TI are developing tools to help designers accommodate the growing number of threads. Many baby steps have been taken, both in the tooling that is used to create the system and in the run time software and hardware support that manages and allows visibility and virtualization of the threads as they run. Message-passing abstractions, standard thread communication APIs, semaphores and memory-locking mechanisms are all examples of ideas that make multithreaded programs more robust and manageable. OpenMP is a great example of a programming paradigm where the user describes the parallelism that is possible rather than making explicit decisions on the threading model that will be used. By stepping back from managing the details of the threading model, the user is free to create more complex and more parallelizable systems.

The optimal parallelism strategy may not be the "natural" or obvious one, and researchers have proposed compilers that search the space of possible parallel strategies to find the best one. This obviously requires users to let go of managing the threads. However, as the level of abstraction increases, debugging the system becomes more of a challenge, especially in the kinds of real-time systems that TI and its partners often deal with.

It is an exciting time to be in the world of multicore and multithreading. The challenges and opportunities are many, and little by little we are making progress while we wait for the paradigm shift that will change the industry. Who knows, now that she is using a computer, maybe my mother has a good idea or two.

Bill Mills, CTO, Open Linux Solutions, Software Development Organization
I think the solution to Alan's situation is simple: He should get his mother a sonic screwdriver. As the good doctor has proven over and over, this tool can solve just about any problem. However, as we wait for this fantastic tool to be invented, I find the tools I have today to be very useful. I still use a pair of pliers that were handed down to me from my grandfather. Sure, the yellow paint is peeling off, but they still aptly perform the function they did for my Dad's Pop.

When it comes to programming tomorrow's massively multicore processors, I wonder, what programming model will we use? Do we need an evolution or a revolution? Will we continue to use our trusty existing tools or do we need something completely new? Threads and plain old C are two tools that have served us well over the years. When we talk about software running in parallel, you cannot avoid different threads of execution; that is the whole point. However, as commonly defined, the term "threads" usually means separate paths of execution sharing particular resources, with a shared memory space being the most important. Critics of threaded programming models say that they are hard to validate for functional correctness and hard to predict worse-case performance. There is certainly truth in these arguments; however, I believe they are often overstated.

The issue with functional correctness comes from the problems of sharing read/write data between asynchronously executing threads. One common answer to this problem is the semaphore or mutex. Ed Lee points out the problems with this approach [3].

I agree that indiscriminate use of mutexes is a problem—just as indiscriminant use of global variables is a problem. But I am not ready to give up either. Multiple threads in one address space has been the only model for most embedded programmers until fairly recently. With the strict priority models of real-time operating systems (RTOS), embedded programmers have already seen lots of preemption. I do not fear more. The issue with predictability of performance also stems from data access. With threaded programming models, it is often the case that any thread can touch any piece of data at any time. As the speed of interconnecting processing elements becomes more and more of a limiting factor, the cache-miss verses cache-hit ratio gets higher. This makes the pathological worse case hard to predict.

Opponents of the thread model would have us strictly regulate which actors can touch which data at what times. Then it becomes the problem of the scheduler to ensure that this data (and code) is available in the right processing element when needed. This scheme has a lot to offer, and I have used it myself when I can. However, it is hard to apply this model universally; and the thread model is certainly more universal.

I am old enough to remember my fellow engineers' predictions of doom when caches came to embedded systems. The criticism was true, but in the end did not cause the disruption in programming that had been feared.

So should we do nothing? Well, I think that doing nothing will work out a lot better than some people today predict. On the other hand, I do think we can do better. I suggest we start on the outside and work toward the inside, applying techniques that are already in use in other areas. My concern with some solutions like OpenMP is that they seem to start on the inside, focusing on the language and the "for loop". That is certainly the harder and more interesting problem to solve, and it will need to be tackled. However, I believe there is much parallelism to achieve just at the module level. After that, we can bring the granularity down to the function call level. Once we have accomplished that, we can tackle the granularity of an element within a "for loop".

As for the techniques, we need to concentrate on asynchronous event-based programming. GUI programmers have been doing this for years, but only some embedded system programmers are comfortable with it. I think interesting analogies we should be looking to are the distributed server infrastructure of a business Web site, Python's Twisted framework for event-based programming of network servers, and filter graph description frameworks like the open-source GStreamer project or Microsoft's DirectShow. The filter graph examples are particularly interesting because they popularize a view of signal processing applications that have been commonly employed by TI and TI's customers. These frameworks embrace the distributed and asynchronous nature of multicore processors instead of trying to hide them; also, most focus on the data flow and not the control flow.

Do we need any new tools? Yes. I expect that the debugging environment needs to evolve just as it did when we embraced object-oriented programming, and when we added OS awareness. Debuggers will need to look beyond the individual thread and track transactions as they pass from element to element. Some sort of framework awareness will be necessary. The other area where tooling may help is a better way to describe the boundaries and relationships between modules or components. One such effort is real-time software components (RTSC), a technology TI has been working on for a number of years and that is now being worked openly in the Eclipse foundation.

So are we looking at a revolution or an evolution? I am quite certain that the answer will turn out to be "both". Whenever you try to ask a black or white question, the answer—once you really understand the question—infallibly is "gray". The challenges ahead of us will look radically different until we have adapted to them. After we have incorporated the new ways of thinking into our conscience, we will look back on the changes as a minor evolution to the way we have always done software.

So my advice is, don't throw out what you already know when tackling the new challenges of massively multicore programming. If my grandfather could see me work, he might not recognize the computer equipment I work on, but he would certainly recognize how I use that pair of banged-up pliers.

REFERENCES
1. Texas Instruments Inc. "The Chip that Jack Built."

2. Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. "The Landscape of Parallel Computing Research: A View from Berkeley." Electrical Engineering and Computer Sciences, University of California at Berkeley. 18 December 2006 .

3. Edward A. Lee. "The Problem with Threads." University of California, Berkeley. May 2006 .

 
 
 
 
Related Articles
   

Altium Adds Support for Xilinx Spartan-6 FPGA to Altium Designer

Broadcom Crystal HD Technology Brings HD Multimedia Experience to Atom-based Netbook Platforms

STMicroelectronics Receives EuroDOCSIS 2.0 Certification for Set-top Box Reference Design

Intel Atom Integrates Graphics, Memory Controller for Netbooks

Nokia, ST-Ericsson to Partner on TD-SCDMA

Data Center Strategy Helps Migration from Physical to Virtual to Cloud

Xilinx Connectivity, Embedded, and DSP Kits Enable Increased Productivity, Innovation for SoC Designs

Atmel's Cortex-M3-based Flash MCUs Improve Impedance Matching, Lower Power Up to 50%

Microchip's 18-pin PIC MCUs Feature Enhanced Mid-range Core

NEC Electronics, Renesas Sign Merger Agreement

   
 
Top News
   

Verizon’s $17 Billion Network Investment in 2009 Pays Off

Anritsu Intros First Bluetooth Low Energy Test Solution

Nokia, ST-Ericsson to Partner on TD-SCDMA

Xilinx Connectivity, Embedded, and DSP Kits Enable Increased Productivity, Innovation for SoC Designs

Motorola Leads Fast-rebounding WiMAX Equipment and Device Market

   
 
 
 
 
Industry Links
Photonics Association (Singapore)
Singapore Industrial Automation Association (SIAA)
Taiwan Semiconductor Industry Association (TSIA)
   
   
 
 
 
 
 


 
 
Technical Channels

Amplifiers

Components

Digital Hardware/components

Integrated components

Integrated subsystems

Interface/interconect

Materials

Passives

Power

Semis/ICs/Mmics

Services

Signal Processing

Signal Sources

Software

Test & Measurement

Transmission Components

Wireless Protocols

 
Other Websites
EDN Asia
EDN Asia (India)
EDN Asia (Taiwan)
EDN Asia (Korea)
ECN Asia
ECN Asia (Korea)
ECN Asia (Taiwan)
  ECN Asia (China)
  EB Asia
Electronics Asia
Reed Electronic Group
Reed Business Information Asia
   
 

© 2010 Reed Business Information, a division of Reed Elsevier Inc.
All rights reserved. Use of this web site is subject to its Terms and Conditions of Use. View our Privacy Policy.