Update 12 December 2013: This post is now over two years old. Just recently McNeel have released an update to the Grasshopper Python component that supports threading. I’m leaving this post up for prosperity, but you would be much better off using the Grasshopper Python component than following the advice below.
An alternative title might be Why CAD software hasn’t gotten any faster in the last three years. Simply, most CAD software is written to take advantage of only one processor, leaving the other three processors in your shiny new Quad-core i7 idle. The solution to the multi-processor problem is threading. Programming threads can seem daunting, I put them off until I had to create one for Yeti to, of all things, check for updates in the background without hanging up the interface. To my surprise I found threading in modern programming languages almost elegant. Naturally I was curious about whether threads can be used in Grasshopper, and whether they give any performance increase. It turns out threads can be used in Grasshopper, and that they significantly increase performance – on my shiny new Quad-core i7 I get an almost 200% improvement. This post is the third part of a series on Optimising Grasshopper (following on from part one and part two). In this post I will explain a fairly ninja technique for breaking C# nodes down into multiple threads to be processed in parallel by all the processors on your computer.
What are threads?
Roughly speaking, a thread is a list of tasks for the processor in your computer to execute. All programs and scripts get compiled into a thread of tasks, which are then feed through the processor in linear order to produce the desired outcome. Conceptually this could be thought of as a chain of grasshopper nodes being evaluated one at a time by the processor.
Threads get interesting when you have more than one processor, like on the Quad-core Intel processor, which has four processors sitting in parallel. With a multi-processor computer, you can have each processor working on a separate thread. So the Quad-core processor can work through four threads simultaneously (it can actually work through more due to some processor magic but that is getting a little technical). The difficulty with a multi-processor computer is that the processors can not ‘talk’ to each-other, so (as shown above) a thread can only go through one processor at a time. This essentially means that if one of the processors on the Quad-core processor gets a really long thread of tasks, it cannot ask the neighboring processors to help out with some of the tasks, even if they are sitting idle. This is why sometimes you will be updating a really complex model in Grasshopper, and see the processors only working at 25% – there is only one thread going through the processors so only one of the four processors is actually working.
Multi-threading
Since the processor can not break a single thread into multipul strands, we have to break the tread up for the processor. Essentially rather than giving the processor one long thread of tasks, we can break the tasks up into little bundles of work, and generate a thread of tasks for each bundle. That way we can send a single thread of tasks as multipul threads and get each processor to work on a little part by itself.
Multi-threading in Grasshopper
To test threading in Grasshopper I decided to recreate the Project-point-to-surface node in Grasshopper. The full node can be downloaded from Parametric Model. The critical line of code in the node comes within the for-loop on lines 114-123:
private void RunScript(List<System.Object> p, Surface s, ref object P, ref object uvP, ref object D)
{
int total = p.Count;
//setup arrays to catch data as it is produced.
outP = new GH_Point[total];
outUVP = new GH_Point[total];
outD = new GH_Number[total];
//data needed to create the objects.
done = total;
mySurface = s;
//loop through each task, sending it to the threadpool
for(int i = 0; i < total; i++)
{
if(p[i] is Point3d)
{
int tempI = i;
Point3d tempP = (Point3d) p[i];
taskInfo task = new taskInfo(tempI, tempP);
System.Threading.ThreadPool.QueueUserWorkItem(pntToSrf, task);
} else done-;
}
//wait until all the threads are done.
//TODO: Need a better wait method.
while(done > 1)
{
System.Threading.Thread.Sleep(2);
}
//push out the new data.
P = outP;
uvP = outUVP;
D = outD;
}
private static Surface mySurface = null;
private static object locker = new object();
private static GH_Point[] outP = null;
private static GH_Point[] outUVP = null;
private static GH_Number[] outD = null;
private static int done = 0;
//Adds the data to the outputs arrays. Have to lock this function to prevent co-current access.
private static void addToList(GH_Point myPoint, GH_Point uvPoint, GH_Number myDis, int index)
{
lock(locker)
{
outP[index] = myPoint;
outUVP[index] = uvPoint;
outD[index] = myDis;
done-;
}
}
//This is the function we feed into threads.
static void pntToSrf(object data)
{
//check data is as we expect it.
if(data is taskInfo)
{
taskInfo task = (taskInfo) data;
//find the closest point and distance to it.
double u = 0;
double v = 0;
mySurface.ClosestPoint(task.startPoint, out u, out v);
Point3d closestPoint = mySurface.PointAt(u, v);
double dis = closestPoint.DistanceTo(task.startPoint);
//cast data into Grasshopper friendly terms.
GH_Point uvPoint = new GH_Point(new Point3d(u, v, 0));
GH_Point newPoint = new GH_Point(closestPoint);
GH_Number newDis = new GH_Number(dis);
//add the new data to the output lists.
addToList(newPoint, uvPoint, newDis, task.index);
} else {
done-;
}
}
public class taskInfo
{
public int index;
public Point3d startPoint;
public taskInfo(int _index, Point3d _startPoint)
{
index = _index;
startPoint = _startPoint;
}
}
Essentially this loop works its way through every point passed to the node. It then adds the point and the point’s index to a taskInfo object – a custom object created to store the information about the point being projected to the surface. The point is then projected onto the surface by calling the pntToSrf function and passing the taskInfo object in this line:
System.Threading.ThreadPool.QueueUserWorkItem(pntToSrf, task);
Basically this line of code calls the pntToSrf function, but rather than calling it on the main thread (as you would do in a singally threaded application), it sends the function to the ThreadPool where a manager runs it on the next available thread. The ThreadPool has many different threads running through each processor in parallel, so the manager will send the pntToSrf function to whatever processor is not doing any work, which utilises the power each processor.
A couple of tricky things happen inside the thread, like the arrays are locked whenever data is written to them, to prevent the threads sending data into the array simultaneously:
lock(locker)
{
outP[index] = myPoint;
outUVP[index] = uvPoint;
outD[index] = myDis;
done-;
}
Finally, since the threads are executing co-currently to the main thread, we have to pause the main thread until the other threads have finished processing before returning the data from the node. For simplicity sake I have used a really naive method of waiting, which loops until the right number of tasks are done.
while(done > 1)
{
System.Threading.Thread.Sleep(2);
}
And that is it.
In terms of performance, on my quad-core computer, the normal Grasshopper srfCP node can project 1,000,000 points onto a sphere in 10.0 seconds. Compared to both Digital Project and GC, this is remarkably fast, although it is nothing like the performance of Open Cascade. The same task performed with my threaded srfCP node takes 4.5 seconds, which is roughly a 2X improvement over the non-threaded version. You will notice using four cores does not result in a straight up 4X improvement, this is in part because some processor power is used to manage the threads, and partly because aspects of the code still happen in serial.
Native types in Grasshopper
outP = new GH_Point[total];
outUVP = new GH_Point[total];
outD = new GH_Number[total];
Some keen eyes would have picked up that the C# node is returning an array of GH_point’s rather than an array of point3d’s. When a C# node returns a point3d, this is automatically converted by Grasshopper into a GH_Point (allowing it to be baked ect.). This conversion is computationally expensive, presumably because Grasshopper is checking the data is valid. To avoid this performance hit, we can sidestep the conversion by giving Grasshopper the data back in its native format, a GH_Point. The other native Grasshopper formats can be found under Grasshopper.Kernal.Data. Similarly if the inputs to a node are of a specific type, Grasshopper does an automatic conversion. By asking for the inputs as a System.Object this conversion can be avoided.
A word to the eager
While multi-threading offers significant speed improvements, it is not without downsides. It can be challenging and frustrating to debug threaded code since interactions between threads can throw unforeseen errors from bugs that are invisible in the Grasshopper IDE. These errors will often crash Rhino: save frequently. Furthermore, not all functions in Rhino 4 are thread safe. So if you use the unsafe function of plane-brep intersection in different threads, you will either crash Rhino or end up with strange results. Rhino 5 addresses some of the thread safety issues and once it moves to .Net4.0 there will be access to even easier threading functions. For now take this post as an explanation for why CAD software hasn’t gotten any faster in the last three years, but expect to see significant performance increases in the near future once CAD finally catches up to processor developments.
For more information about multi-threading I recommended Joseph Albahari’s excellent tutorial on C# threading.
Daniel
Great article!
I’d been keen to try some threading, but had found it a bit daunting before. I think this clear and simple guide is really going to help.
Thanks Daniel
Daniel
Hey Daniel,
I too had always put off threading thinking it was only for super experienced programmers. For simple tasks like breaking the contents of for-loop’s into threads, it is fairly straight forward, especially in .Net4.0 where you can just write parallel.for (http://msdn.microsoft.com/en-us/library/dd460703.aspx). I found things get a little tricky when doing anything more than a for loop because you have to be careful about what parts of the code the tread is accessing, but the Albahari tutorial got me started on understanding that problem.
I imagine the update cycle in Kangaroo could be broken to solve every node separately. Looking forward to the next iteration…
Daniel
Mostapha
Hi Daniel,
Do you have any idea how to do the same for the current GH components. “Exposure”, for instance, should use sort of ray-tracing method and the grids potentially could be allocated on different cpu’s. Have you ever tried something similar…
Thanks,
Mostapha
Daniel
Hi Mostapha,
This was more a ‘proof of concept’ to demonstrate the sort of performance gains we can expect in the future, rather than something I have been using in practice. Having said that, if you were desperate it would be possible to do a similar thing with other Grasshopper components. To do it with the Exposure component, you would first have to rewrite the exposure component into a C# / VB script. Then you would take the script and identify the parts that can be multi-threaded (as you guessed, there is probably a for-loop running through each ray which could be multithreaded). However there is no guarantee the functions you are using in Rhino will be thread-safe so it might all come crashing down in the end.
Daniel
Mostapha
Hi Daniel,
Thank you for the reply. To re-write the exposure component from the scratch is above my head. I assume there should be ways to decode the component and get the concept though. I’ll give it a try soon.
Mostapha
Daniel
Hi Mostapha,
One way to get started would be to try to use standard Grasshopper components to recreate the functionality of the Exposure component. That way you could mockup the logic of the exposure component, and once it is all working, transfer it to C#/VB.
Daniel