Gillespie algorithm for stochastic simulations of signaling pathways – vectorization in MATLAB

Modeling of signaling pathways is an important part of cancer research, as it is essential to understand how proteins interact with each other to provide or impair a specific cell function. There are two main approaches to tackle that problem mathematically: 1) deterministic approach, in which we assume that the number of molecules is large and stochastic effects are negligible, 2) stochastic approach (typically using Gillespie algorithm), in which we model exact number of molecules and take into account inherent system noise. Today I’ll focus on the stochastic approach. You can find online plenty of packages to perform the stochastic simulations using Gillespie algorithm, but I haven’t seen any that is harnessing the vectorization in MATLAB. Today I will show that using vectorization we can increase the speed of the simulations about 100 fold. Such an increase in computational speed is very important, as more and more frequently agent based models incorporate signaling pathways on the cell level.

Let us start with basic information about the Gillespie algorithm. Using the algorithm we describe the change in the number of molecules from K different species (x1(t),…,xK(t)) that at each moment can undergo one of L different biochemical reactions. Each reaction has a specific propensity function, e.g. biding of two molecules from different species can have propensity r*x1(t)*x2(t) where r is biding rate. The propensities determine the time to next reaction (exponentially distributed) and which reaction occurs next (discrete distribution with propensity based probabilites). After a reaction occurs the system is updated according to stoichiometric matrix, i.e. matrix describing how many reactants are removed from the system and what products are created.

As a working example we will consider a very basic gene expression pathway – we will describe the number of mRNAs and corresponding proteins. Four considered reactions will be: transcription, translation, mRNA degradation, and protein degradation. Ok, let us start coding by defining considered pathway in the main script.

%DEFINING REACTION RATES
par.kr = 1/10;     %transcription rate
par.kp = 10;   %translation rate
par.gr = 1/150; %mRNA degradation rate
par.gp = 1/30;  %protein degradation rate

%DEFINING PROPENSITY FUNCTIONS AS A FUNCTION HANDLE
prop = @(x,par)([par.kr,...      %transcription, one mRNA molecule created
                 par.kp*x(1),... %translation, one protein created
                 par.gr*x(1),... %mRNA degradation, one mRNA molecule removed
                 par.gp*x(2)]);  %protein degradation, one protein molecule removed

%DEFINING INITIAL CONDITION, order [mRNA, Protein]
init = [0;1];

%DEFINING STOICHIOMETRIC MATRIX
%column corresponds to the reaction, row corresponds to the molecule
%order as in prop and init variables
stoch = [1 0 -1 0;... 0 1 0 -1];

%DEFINING TIME MESH FOR THE OUTPUT TRAJECTORY
tmesh = linspace(0,1000,100) ;

Now in a few lies we can code the whole Gillespie algorithm in a separate function file (without using any vectorization techniques first).

function trajectory = SSA(tmesh, par,prop,stoch, init )
   %tmesh - time mesh on which solution should be returned
   %per - parameters of the pathway
   %prop - definition of propensity functions
   %stoch - stochiometric matrix
   %init - initial condition for the pathway

   t = 0;                                          %current time
   state = init(:);                                %variable with current system state
   trajectory = zeros(length(init),length(tmesh)); %preparing output trajectory
   trajectory(:,1) = init(:);                      %setting initial value as the first element in trajectory
   cindx = 2;                                      %current trajectory index
   N = length(tmesh);                              %number of time points

   while t<tmesh(end)
      Q = feval(prop,state,par);          %calculating propensities of the reactions
      Qs = sum(Q);                        %total propensity
      dt = -log(rand())/Qs;               %generating time to the next reaction
      R = sum(rand >= cumsum([0 Q])/Qs);  %selecting reaction
      state = state + stoch(:,R);         %updating state
      t = t + dt;                         %updating time

      %writing the output
      while cindx<=N && t>tmesh(cindx)
         trajectory(:,cindx) = state;
         cindx = cindx+1;
     end
   end
end

and add the following lines in the main script to simulate and plot trajectory.

%simulating
traj = SSA(tmesh, par,prop,stoch, init );

%plotting
figure(1)
plot(tmesh,traj(1,:))
xlabel('Time, min')
ylabel('#mRNAs')

figure(2)
plot(tmesh,traj(2,:),'r')
xlabel('Time, min')
ylabel('#Proteins')

Below are the resulting plots after one stochastic realization of the whole pathway.

mRNA protein

So now we can simulate our simple pathway, but there is one problem: it takes about 3.5 seconds to generate trajectory presented in the above plot. If we want to simulate that pathway for each agent present in the system and we reach the level of 2000 agents, then simulating the pathway for each of the agents using simple for loop will take about 2 hours… What can we do to decrease computational time? We could of course use multiple CPUs – using 100 of them will allow to do the computations in about a minute. But who has access to 100 CPUs? Only few of us. What about using the vectorization option in MATLAB and modifying the SSA function to perform those 2000 simulations simultaneously? That is definitely worth trying. We need only to add additional argument to SSAv function – number of trajectories to generate, and modify few lines to do vectorized calculations:

function trajectory = SSAv(tmesh, par,prop,stoch, init, nSim )
   %tmesh - time mesh on which solution should be returned
   %par - parameters of the pathway
   %prop - definition of propensity functions
   %stoch - stochiometric matrix
   %init - initial condition for the pathway
   %nSim - number of simulations to perform

   tmesh = tmesh(:);   %reshaping mesh to be vertical vector
   t = zeros(nSim,1);  %current time for each simulation
   state = repmat(init(:)',nSim,1); %current state variable for each simulation
   trajectory = zeros(nSim,length(init),length(tmesh)); %preparing output trajectory
   trajectory(:,:,1) = state;%setting initial value as the first element in trajectory
   cindx = 2*ones(nSim,1);%current trajectory indices
   N = length(tmesh); %number of time points
   aux = 1:nSim; %

   while ~isempty(t)
      Q = feval(prop,state,par);         %calculating propensities of the reactions
      Qs = sum(Q,2);                     %total propensities
      dt = -log(rand(size(Qs,1),1))./Qs; %generating time to the next reaction
      P = bsxfun(@rdivide, cumsum([zeros(size(Qs,1),1) Q],2),Qs); %probabilities for each reaction
      R = sum(bsxfun(@ge,rand(size(Qs,1),1),P),2);                %selecting reaction
      state = state + stoch(:,R)';       %updating state
      t = t + dt;                        %updating time

     %writing the output
     update = t > tmesh(cindx);
     while any(update)
        %updating state
        iupdt = find(update);
        for i = 1:length(iupdt)
           trajectory(aux(iupdt(i)),:,cindx(iupdt(i))) = state(iupdt(i),:);
        end
        cindx = cindx+update;

        %removing finished simulations from the variables
        indx = cindx > N;
        if any(indx)
           cindx(indx) = [];
           t(indx) = [];
           aux(indx) = [];
           state(indx,:) = [];
           if isempty(cindx)
              break;
           end
        end
        update = t > tmesh(cindx);
     end
   end
end

We need also to make sure that in the main script that the propensity functions are defined in the vectorized manner (everything else remains the same).

%DEFINING PROPENSITY FUNCTIONS AS A FUNCTION HANDLE IN VECTORIZED MANNER
prop = @(x,par)([par.kr*ones(size(x,1),1),...
                 par.kp*x(:,1),...
                 par.gr*x(:,1),...
                 par.gp*x(:,2)]);

Below are the plots of trajectories generated using the vectorized function for nSim = 100.

mRNA100 protein100

Now the important question is how much time did it take to simulate those 100 trajectories. We know that using the simple non-vectorized code and a for loop it would take about 360 seconds. The vectorized function did it in 16 seconds! That is 22 fold decrease in computational time. Plot below shows how computational time changes with the number of trajectories to simulate.

comparizonTimes

Vectorized code generated 2000 trajectories in 62 seconds! So we don’t need those 100 CPUs to obtain reasonable time of computations.

I hope that this post will convince some people that MATLAB is not bad after all.

Agent-Based Modeling (ABM) of Tumors in Java

Before I begin I would like to quickly introduce myself — my name is Sameer Puri and I am a high school student working in an internship under Dr. Heiko Enderling through the HIP IMO program at Moffitt Cancer Center.

Over the past 5 weeks, I have been developing an ABM in Java for my research. My principal goal in developing this ABM was to quickly generate tumors with 1 million cells. (And prove that Java can perform better in some cases…!)

When I started, I was working with an ABM written by @jpoleszczuk and Heiko in C++ (available here). Having a strong background in Java and little past experience with C++, it was difficult for me to understand. Structs, vectors, memcpy, and other features of C++ are not present in Java. Instead of trying to learn C++, I converted the code to Java, my language of expertise.

Optimization

I decided to optimize the code before making any extensions for my research. In Java, finding slow-down points is simple, since an array of profiling tools are available to find which part of the code is slowing down the ABM. I utilized the web-based profiler WarmRoast, developed by sk89q (on GitHub here).

  • One of the biggest time hogs was the Random class which was implemented in the 1990s, and after 20 years, much more efficient pseudo-random number generators have emerged. I replaced it with the XorShift generator, developed by George Marsaglia. The XorShift is theoretically 3 times faster than the default generator. However, the speed-up was far more significant — the call for a random double was so fast that it disappeared from the WarmRoast profiler’s results.
  • The returnEmptyPlace method was optimized with a strategy known as pregeneration, or exchanging memory for lower CPU time. The returnEmptyPlace method looks for an empty spot around a cell in the ABM, shuffling a list of 8 directions each time. Instead, I pregenerated all 40,320 permutations of the 8 direction vectors surrounding a lattice square. Now, the returnEmptyPlace method simply uses the speedy XorShift generator to select a random permutation of the directions list to use.
  • I also optimized the shuffling of the list of cells in the main simulation loop. I created a new class called SPCollections which uses the XorRandom generator for shuffling. There was a large speed-up here as well.
  • Using NetBeans’ built-in hashcode generator, I added hashcodes for the Cell and Indx classes, which will speed up the usage of HashMaps or other hash-based structures.
  • Since memory was not a serious concern for me, I made one optimization. The original model in C++ used the char primitive type for its lattice. I turned the lattice into a byte lattice, which has half the memory footprint (chars are 16 bits, bytes are 8 bits).

Additions

  • For my research, I had to grow 10 tumors in silico with one million cells each. To speed up simulation I utilized parallel computing with an ExecutorService. With my ABM, you can specify the number of “trials” you want to run. For each trial run, a simulator instance is created and run on a separate thread. Most CPUs today have at least 4 cores. Code in most programming languages tend to make use of only one core unless programmers specify methods that allow for parallel execution of code. By creating a separate thread for each simulation, the threads can operate on separate cores and maximize CPU usage!
  • To universalize my data, I saved it utilizing Java serialization. With serialization, Java can output objects to files, which can be loaded later on without needing to code a loader. Others will be able to easily load data generated using my ABM and run their own operations on it.
  • I coded an image generator to automatically generate tumor images. Once a simulation is finished, the image generator takes the serialized output of the simulation and generates three types of PNG images for each dumped instance: proliferation, quiescence, and nutrition. It also creates an animated GIF to show the growth of the tumor over time. Each image has the time step (1 step = 1 hour), number of stem cells and number of regular cancer cells at the upper left corner.

Proliferation during Development


Yellow = Glioma Stem Cell Reddish = few divisions ==> Black = many divisions

Nutrition during Development (on rim)

Nutrients of lattice on the outer edge is displayed. Unoccupied (0) and occupied (9) lattice spots are white

Quiescence during Development

Orange = quiescent Red = can divide/proliferate


 

 

 

 

Download

You can download my ABM here

Parameter dependent implementations

When dealing with ordinary or partial differential equations we know that existing numerical solvers have different performance depending on the specific mathematical problem. In particular, the performance of any given solver can dramatically change if we change model parameters. Let us take the well known logistic equation as an example. Let C(t) be the current number of cancer cells, r the growth rate, and K the maximal number of cancer cells that can be sustained by the host. Then the growth dynamics can be described by the following equation:

\frac{dC(t)}{dt}=rC(t)\left(1-\frac{C(t)}{K}\right).

The above equation has of course analytical solution, but let us use a MATLAB ordinary differential equations solver to find a numerical solution. Generally it is a good idea to start with the solver that has the highest order – in our case it will be ode45 based on the Runge-Kutta method. We will also consider another solver, ode15s, which is designed for solving stiff systems. We fix the value of carrying capacity K, initial condition C(0), and time interval in which we solve the equation. We will investigate the performance of both solvers for different values of growth rate r. The plots below show that for lower values of growth rate ode45 outperforms the ode15s, but situation changes when the growth rate is large (while keeping small relative difference between the calculated solutions).

timesdifference

That transition in solvers performance can be explained when one looks how solution changes for different values of growth rates (see figure below) – the larger the growth rate the steeper the solution. Stiff solvers, like ode15s, are designed to effectively deal with such phase transitions.

solution

So, if we need to consider parameter dependent performance of the numerical solvers for differential equations, should we do the same for agent based models of cancer growth? Should we create parameter dependent implementations and switch between them on the fly? I think that yes, and I’ll try to convince you with the simple example below.

Let us consider a very simple CA: cell can either divide or migrate if there is a free spot – nothing else. We can adapt the very basic C++ implementation from the previous post as a baseline: we have a boolean matrix (in which value true indicates that the spot is occupied by the cell) and additional vector of cells keeping information about locations of all viable cells present in the system. At each simulation step we go through all of the cells in a vector in random order and decide about faith of each cell.

Let us tweak the code and at each iteration drop from memory (from vector) cells that are quiescent (completely surrounded). If the cell stays quiescent for prolonged time we save some computational time by skipping its neighborhood search. However, after each migration event we need to add some cells back to the vector – that is additional cost generated be searching the lattice again. The essential part of the tweaked code in C++ is:

   for (int i=0; i<nSteps; i++) {
        random_shuffle(cells.begin(), cells.end()); //shuffling cells
        while (!cells.empty()) {
            currCell=cells.back(); //pick the cell
            cells.pop_back();
            newSite = returnEmptyPlace(currCell);
            if (newSite) {//if there is a free spot in neigh
                if ((double)rand()/(double)RAND_MAX < pDiv) {//cell decides to proliferate
                    lattice[newSite] = true;
                    cellsTmp.push_back(currCell);
                    cellsTmp.push_back(newSite);
                    nC++;
                } else if ((double)rand()/(double)RAND_MAX < pmig) {//cell decides to migrate
                    toDelete.push_back(currCell);
                    lattice[newSite] = true;
                } else {//cell is doing nothing
                    cellsTmp.push_back(currCell);
                }
            } 
        }
        cells.swap(cellsTmp);
        
        //freeing spots after cell migration and adding cells to vector
        for(int j =0; j<toDelete.size(); j++)
            lattice[toDelete.at(j)] = false;
    
        c = !toDelete.empty();
        while(!toDelete.empty()) {
            newSite = toDelete.back();
            toDelete.pop_back();
            //adding cells that can have space after migration event
            for(int j=0;j<8;j++) {//searching through neighborhood
                if (newSite+indcNeigh[j]>=0) {
                    cells.push_back(newSite+indcNeigh[j]);
                }
            }
        }
        
        //removing duplicates in cells vector
        if (c) {
            sort( cells.begin(), cells.end() );
            cells.erase( unique( cells.begin(), cells.end() ), cells.end() );
        }
    }

Depending on the proliferation/migration rates we can imagine that the tweaked code will have different performance. For low  probability of migration event only cells on the tumor boundary will have some space (green cells on the left figure below) and we save a lot of time by dropping quiescent cells (blue cells in figures below). For higher migration rates, however, less than 30% of the cells are quiescent at each proliferation step (compare right figure below) and the benefit of dropping cells can be outweighed by the after migration update steps.
solid diffusive

The plot below shows the computational time needed to reach 500,000 cells when using the baseline code (all cells, blue line) and tweaked one (only proliferation, dashed green line) for a fixed proliferation probability. For lower proliferation rates change in the performance would occur even earlier.

timeCA

I think that this simple example show that we always need to think how our model will behave before choosing the way to implement it.

Quick parallel implementation of local sensitivity analysis procedure for agent-based tumor growth model

In the last couple of posts I’ve shown how to implement agent-based model of cancer stem cell driven tumor growth, both in MATLAB and C++. Having the implementations we can go one step further and perform some analysis of the tumor growth characteristics predicted by the model. We will start with performing local sensitivity analysis, i.e. we will try to answer the question how the perturbation of parameter values impacts the growth dynamics. Typically it is being done by perturbing one of the parameters by a fixed amount and analyzing the response of the model to that change. In our case the response will be the percentage change in average tumor size after 3 months of growth. Sounds fairly simple, but…

We have 5 different parameters in the model: proliferation capacity, probability of division, probability of spontaneous death, probability of symmetric division, and probability of migration. Moreover, let us assume that we would like to investigate 3 different values of parameter perturbation magnitude (5%, 10% and 20%). In order to be able to analyze the change in the average size we need to have its decent estimator, i.e. we need sufficient number of simulated stochastic trajectories – let us assume that 100 simulations is enough to have a good estimation of “true” average. So in order to perform the sensitivity analysis we will need to perform 100 + 3*5*100 = 1600 simulations (remember about the growth for nominal parameters values). Even if single simulation takes typically 30 seconds, then we will wait more than 13 hours to obtain the result using single CPU – that is a lot!

After looking at the above numbers we can make a straightforward decision right now – we will use C++ instead of MATLAB, because the model implementation in C++ is a several times faster. However 1) we will need to write a lot of a code in order to perform sensitivity analysis, 2) using multiple CPU is not that straightforward as in MATLAB. Is there a better way to proceed?

Few weeks ago I’ve shown here how to wrap your C++ in order to use it from within MATLAB as a function without loosing the C++ performance. Why not to use it to make our lives easier by making sensitivity code short and harness easily the power of multiple CPUs?

We will start the coding (in MATLAB) with setting all simulation parameters

nSim = 100; %number of simulations to perform for a given set of parameters
tmax = 30*3; %number of days to simulate
N = 1000; %size of the simulation domain

%nominal parameter values [rhomax, pdiv, alpha, ps, pmig]
nominal = [10, 1/24, 0.05, 0.3,10/24];

perturb = [0.05 0.1 0.2]; %perturbation magnitudes, percent of initial value up

Now we just need to construct a loops that will iterate through all possible perturbations and simulations. If we don’t use Parallel Toolbox, i.e. don’t use multiple CPUs, it doesn’t really matter how we will do that – performance will be similar. Otherwise, implementation is not that straightforward even though from MATLABs documentation it seems that it is enough to change for to parfor. The most important thing is how will we divide the work between the CPUs. The simples idea is to spread the considered perturbations values among the CPUs – that will allow to use 15 CPUs in our setting. However, I’ve got machine with 24 CPUs, so that would be a waste of resources – bad idea. The other idea would be to use parfor loop to spread all 100 simulations for a given perturbation value on all 24 CPUs and go through all perturbation values in a simple loop – now we are using all available CPUs. But are we doing that efficiently? No. The thing is that CPUs need to be synchronized before proceeding to the next iteration of the loop going through perturbation values. So some of the CPUs will be idle while waiting for other ones to finish with parfor loop. In order to make the code even more efficient we will just use one parfor loop and throw all 1600 simulation on 24 CPUs at the same time. Let us first prepare the output variable.

HTCG = zeros(1,nSim + length(nominal)*length(perturb)*nSim);

Before writing the final piece of the code we need to solve one more issue. Namely, in the C++ implementation we used srand(time(NULL)) to initiate the seed for random number generator. It is perfectly fine when we use single CPU – each simulation will take some time and we don’t need to worry about uniqueness of the seed. The problem is when we want to use multiple CPUs – all initial parallel simulations will start with exactly the same seed. One way to solve that is to pass the current loop iteration number (i) to C++ and use srand(time(NULL)+i) – that is what I have done. After solving that issue we can write the final piece of the code.

parfor i = 1:length(HTCG)
    %%PREPARING PARAMETERS VALUES
    params = nominal; %setting parameters to nominal values
    if i>nSim %simulation is for perturbed parameters
        %translating linear index to considered parameter and perturbation value
        j = ceil((i-nSim)/(nSim*length(perturb)));
        k = ceil((mod((i-nSim)-1,nSim*length(perturb))+1)/nSim);
        %updating parameters
        params(j) = params(j)*(1+perturb(k));
        if k == 1 %if proliferation capacity parameter we need to round it
           params(j) = round(params(j)); 
        end
    end

    %%RUNNING SIMULATION AND SAVING OUTPUT
    [~, cells] = CA(params,[tmax*24,N,i]);
    HTCG(i) = length(cells)/3;
    clear mex %important! without that the internal 
              %variables in CA functions won't be cleared and next simulation
              %won't begin with single initial cell
end

Then in the command line we start the parallel pool with 24 workers (CPUs), by typing parpool(24) command, and run the code. Screenshot below shows nicely how all of the 24 CPUs are being used – no resource wasted!

CPUs occupancy

We can then add few additional lines of the code to plot the results.


nom = mean(HTCG(1:100)); %average for nominal parameters value
%calculating averages for perturbed sets values
av = squeeze(mean(reshape(HTCG(101:end),nSim,length(perturb),length(nominal))));

%plotting results of sensitivity analysis
set(0,'DefaultAxesFontSize',18)
figure(1)
clf
bar(perturb*100,(av-nom)/nom*100)
legend({'\rho_{max}', 'p_{div}', '\alpha', 'p_s', 'p_{mig}'})
xlabel('% perturbation')
ylabel('% change')

And “voila!” – the resulting figure shows that the perturbation in the proliferation capacity has the highest impact on the tumor growth dynamics.

sensResults

Wrapping C++ code of agent-based tumor growth model into MATLAB

Last week I posted a C++ implementation of basic cancer stem cell driven tumor growth model. About 100 lines of a code allowed to perform simulation, but there was nothing about exporting the results, doing more complicated analysis, visualization, or performing data fitting. Of course each of those tasks can be performed by writing another large piece of the code.

Two weeks ago I posted a MATLAB code for the very similar model, which was quite fast but a lot slower than C++ implementation. On the other hand it was straightforward to visualize the tumor and it’s not much of a hassle to perform more complicated analysis or data fitting using variety of MATLABs built-in functions.

Each of those two approaches has their own advantages: C++ is fast, MATLAB has a lot of built-in in tools.

Today I will show how to take the best out of both approaches, i.e. I’ll show how to easily wrap our C++ code into MATLAB. The whole C++ based simulation will be invoked from MATLAB as a function and results will be visible straight from it.

First we need to add necessary libraries.

#include "mex.h"
#include <matrix.h>

Then we need to modify our original definition of global variables (see previous post) so they can be adjusted based on the input parameters (for example lattice has to be dynamically allocated).

int N;
bool* lattice;
vector<cell> cells;

char pmax;
double pDiv, alpha, ps, pmig; 
int* indcNeigh;

Then the only thing that we need to do is to delete old main() function and use mexFunction() instead. In the mexFunction() we need to read first the parameters that will be passed from MATLAB (input parameters to the function). Then we need to assign the memory for the lattice and set other auxiliary variables. Finally, after performing the simulation we need to write variables to the output, so the MATLAB can see the results.

void mexFunction(
		 int          nlhs, //number of output arguments
		 mxArray      *plhs[], //output arrays
		 int          nrhs, //number of input arguments
		 const mxArray *prhs[] //input arrays
		 )
{
    /*READING INPUT PARAMETERS*/
    double *par=mxGetPr(prhs[0]); //model parameters, first input variable
    double *settings=mxGetPr(prhs[1]); //simulation settings, second input variable

    int Nsteps = settings[0]; //number of simulation steps to perform
    N=(int)settings[1]; //size of the lattice
    pmax = par[0]; //divisions before proliferative capacity exhaustion 
    pDiv = par[1]; //division probability 
    alpha = par[2]; //spontaneous death probability 
    ps = par[3]; //probability of symmetric division 
    pmig = par[4]; //probability of migration
    
    /*CREATING LATTICE AND AUXILARY VARIABLES*/
    //creating lattice with a given size
    lattice = new bool [N*N];
    fill_n(lattice, N*N, false);
    
    indcNeigh = new int [8];//cell neighborhood
    //assigning values
    indcNeigh[0] = -N-1; indcNeigh[1] = -N; indcNeigh[2] = -N+1;
    indcNeigh[3] = -1; indcNeigh[4] = 1; indcNeigh[5] = N-1;
    indcNeigh[6] = N; indcNeigh[7] = N+1;

    /*OLD PIECE OF MAIN() FUNCTION*/	
    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(Nsteps); //simulate CA
    
    /*WRITING TO OUTPUT VARIABLES*/
    //1) writing the lattice as a logical matrix
	plhs[0] = mxCreateLogicalMatrix(N,N); //lattice
    mxLogical *ptr = mxGetLogicals(plhs[0]);
    for(int i=0; i<N*N; i++) 
        ptr[i] = lattice[i];

    //2) writing the cells vector as a vector
    plhs[1] = mxCreateDoubleMatrix(1,cells.size()*3,mxREAL);
    double* ptr2 = mxGetPr(plhs[1]);
    for(int i=0; i<cells.size(); i++) {
        ptr2[3*i] = cells.at(i).place;
        ptr2[3*i+1] = cells.at(i).p;
        ptr2[3*i+2] = cells.at(i).is_stem;
    }
}

After making all of the changes we can compile our .cpp file using the MATLAB command “mex CA.cpp”. Remember all other pieces of C++ associated with model itself remain the same – there is no need to change anything.

From now on we can perform our simulation in MATLAB environment by invoking the following command

[lattice, cells] = CA([pmax, pdiv, alpha, ps,pmig],[nSteps, N]);

and further use of all MATLABs benefits, especially data fitting features. At the same time all of the simulations will be performed at C++ speed! Imagine using Parallel Toolbox in MATLAB to harness multiple CPUs with only few lines of the code.

Setting complex domains for agent based models using bitmaps

In the previous posts (CA in MATLAB and C++) I’ve shown how to implement cancer stem sell driven tumor growth model. In both codes I have set the boundaries of the lattice to true without adding those sites to additional cells vector, so the boundary was treated as an occupied site, but not as a viable cell. This way of coding the system makes it easy to implement more complex domains. Today, I’ll show how to read the complex domains from bitmap (.bmp) file and use them for simulations using previously posted codes. The basic idea is that the image will be loaded into the program and the pixels with values “close” to white will be treated as free sites.

First we need to prepare the image. The posted codes will assume that the image is a square (image=width). We can draw anything we want, remembering that white pixels will be interpreted as free sites. We will write C++ code that will be able to read properly 24 bits bitmap images without the embedded information about the color space. In order to export image in that format we can use GIMP program with the following settings during the export.

Screen Shot 2015-06-09 at 11.20.46 PM

I’ve prepared two exemplary images that I will use further in the simulations. The first one is adapted from the paper by Enderling et al. and the second is a generated text using PowerPoint.
tissue text

Firt, the basic C++ code. We could use additional libraries to load images, but we want to make code as platform independent as possible. We assume that the array lattice and its size are already defined as the global variables (see previous code).

#include <math.h>  

void domainFromImage(char* fileName) {

    FILE* f = fopen(fileName, "rb"); //open to read as binary
    unsigned char head[54];
    fread(head, sizeof(unsigned char), 54, f); // read the header
    
    //in BMP format the size of each row is rounded up to multiple of 4 bytes
    int Nb = ceil(3.*(double)N/4.)*4; //true size of the row in bytes, including padding

    unsigned char* img = new unsigned char[Nb]; // allocate one row of pixels
    
    for(int i = 0; i < N; i++) { //for each row
        fread(img, sizeof(unsigned char), Nb, f);//read one row
        for (int j=0; j<N; j++)
            //set to free only those that have high intensity close to white (>240 in all three channels)
            lattice[i*N+j] = !(img[3*j]>250 && img[3*j+1]>250 && img[3*j+2]>250);
    }
    fclose(f);//close file
    
    //filling boundary
    for (int i=0; i<N; i++) {lattice[i]=true;};//left
    for (int i=0; i<N*N; i+=N) {lattice[i]=true;};//top
    for (int i=N-1; i<N*N; i+=N) {lattice[i]=true;};//bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//right
}

The same task in MATLAB is way easier to implement…

function [L, N] = domainFromImage( filename )
    L = imread(filename); %reading image
    L = ~(L(:,:,1)>250 & L(:,:,2)>250 & L(:,:,3)>250); %setting those values...
    N = size(L,1);%size of the domain (assuming square)
    
    %filling border
    L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
end

What is most important MATLAB code is not so much dependent on the format of the image – it can be easily modified to other image types (imread function is very flexible).

Below is the exemplary result of simulation using image representing the tissue.

tissueTumor

Tumor growth can be also simulated for the domain generated using PowerPoint text.

ht4

Cancer stem cell driven tumor growth model in C++

Because of the feedback that I’ve received after publishing first three posts, I’ve decided to change a tool of interest from MATLAB to C++. Today, I’ll show how to quickly implement a cancer stem cell driven tumor growth model in C++. It is almost the same model as implemented in my previous post (I’ll explain why it is “almost” the same at the end of this post).

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

First we start with including the necessary headers and setting the namespace. Apart from the standard functions and datatypes, we will use the vector datatype to store the cells and a shuffling function from algorithm library.

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <time.h>

using namespace std;

Now we can define a cell. It will be defined by three variables: index to the place on the lattice (integer), remaining proliferation capacity (char), and boolean variable defining stemness.

struct cell {//defining cell
    int place;
    char p;
    bool is_stem;
};

Next step is to define the lattice size, the lattice itself, vector that will contain all viable cells, and auxiliary variable defining the cells neighborhood.

static const int N = 2000; //lattice size
bool lattice[N*N] = {false}; //empty lattice
vector<cell> cells; //vector containing all cells present in the system

static const int indcNeigh[] = {-N-1, -N, -N+1, -1, 1, N-1, N, N+1};//neighborhood

The parameters of the model are defined as follows.

char pmax=10; //proliferation capacity
double pDiv=1./24.; //division probability
double alpha=0.05; //spontaneous death probability
double ps=0.05; //probability of symmetric division
double pmig=10./24.; //probability of migration

Having made all of those initial definitions we can finally start coding the main program.
Let us start with writing the function that will initialize the whole simulation, i.e. fill the lattice boundary and put the initial stem cell.

void initialize() {
    for (int i=0; i<N; i++) {lattice[i]=true;};//filling left
    for (int i=0; i<N*N; i=i+N) {lattice[i]=true;};//filling top
    for (int i=N-1; i<N*N; i=i+N) {lattice[i]=true;};//filling bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//filling right
    
    lattice[N/2*N+N/2] = true; //initial cell in the middle
    cell initialCell = {N/2*N+N/2,pmax,true};//initial cell definition
    cells.push_back(initialCell);
}

As in the previous post, we set all the boundary values of lattice to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

The second auxiliary function that we will use returns the index to the randomly selected empty space around a given spot.

int returnEmptyPlace(int indx) {
    int neigh[8], nF = 0;
    for(int j=0;j<8;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {//if free spot
            neigh[nF] = indx+indcNeigh[j]; //save the index
            nF++; //increase the number of found free spots
        }
    }
    if(nF) {//selecting free spot at random
        return neigh[rand() % nF];
    } else {//no free spot
        return 0;
    }
}

If there is no free spot the function returns 0.
Finally we can code the part in which all the magic happens – main simulation procedure. It is hard to dissect the whole procedures into parts, so I did my best to explain everything in the comments.

void simulate(int nSteps) {
    
    vector<cell> cellsTmp;
    int newSite;
    cell currCell, newCell;
    
    for (int i=0; i<nSteps; i++) {
       random_shuffle(cells.begin(), cells.end()); //shuffling cells
       while (!cells.empty()) {
           currCell=cells.back(); //pick the cell
           cells.pop_back();
           newSite = returnEmptyPlace(currCell.place);
           
           if (newSite) {//if there is a new spot
               newCell = currCell;
               newCell.place = newSite;
               if ((double)rand()/(double)RAND_MAX < pDiv) {
                   if (currCell.is_stem) {
                       lattice[newSite]=true;
                       cellsTmp.push_back(currCell);
                       if ((double)rand()/(double)RAND_MAX > ps) {//asymmetric division
                           newCell.is_stem = false;
                       }
                       cellsTmp.push_back(newCell);
                   } else if (currCell.p > 0 && (double)rand()/(double)RAND_MAX > alpha) {
                       currCell.p--;
                       newCell.p--;
                       lattice[newSite] = true;
                       cellsTmp.push_back(currCell);
                       cellsTmp.push_back(newCell);
                   } else {
                       lattice[currCell.place] = false;
                   }
               } else if ((double)rand()/(double)RAND_MAX < pmig) {
                   lattice[currCell.place] = false;
                   lattice[newSite] = true;
                   cellsTmp.push_back(newCell);
               } else {//doing nothing
                 cellsTmp.push_back(currCell);
               }
           } else {//no free spot
               cellsTmp.push_back(currCell);
           }
       }
       cells.swap(cellsTmp);
    }
}

Now we wrap everything in the main function, compile and run.

int main() {
    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(24*30*6);
    return 0;
}

Everything in about 100 lines of the code.
What about the speed of the code? It took about 40 seconds to simulate the tumor presented below, which is consisted of about 460,000 cells (on 2000×2000 lattice).

Screen Shot 2015-06-06 at 12.24.53 AM

Why is it “almost” the same model as the one presented in previous post? The answer is in details. In the above C++ implementation lattice is updated after every single cell movement/death, i.e. because of the random shuffling a new free spot can be created for a cell that at the beginning of the iteration was quiescent (without a free spot), so it can successfully proliferate in the same iteration. In the MATLAB implementation that kind of behavior was avoided.

Visualization of 3D tumor using isosurfaces and simple blur

Today, I’ll show how to simply visualize 3D tumor using MATLAB’s isosurface function combined with a simple blurring technique.

First, we will modify the code from the previous post in order to simulate tumor in 3D. There are only a few changes: redefinition of the domain and the neighborhood; generating permutations on the fly instead of storing them in Pms varaible; and the way in which the initial cell is placed.

N = 200; %cubic domain dimension
nSteps = 30*24; %number of simulation steps

pprol = 1/24; %probability of proliferating
pmig = 10/24; %probability of migrating
pdeath = 1/100; %probability of dying
pmax = 10; %proliferation capacity
ps = 3/10; %probability of symmetric division

L = false(N,N,N); %domain definition
L(1,:,:) = true; L(end,:,:) = true; %filling boundary
L(:,1,:) = true; L(:,end,:) = true; %filling boundary
L(:,:,1) = true; L(:,:,end) = true; %filling boundary

L(N*N*round(N/2)+N*round(N/2)+round(N/2)) = true;
cells = int32(N*N*round(N/2)+N*round(N/2)+round(N/2));
cellsIsStem = true;
cellsPmax = uint8(pmax);

aux = int32([[-N-1 -N -N+1 -1 1 N-1 N N+1] ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]-N*N ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]+N*N])'; %indices to heighborhood

for i = 1:nSteps
    sh = randperm(length(cells));
    cells = cells(sh);
    cellsIsStem = cellsIsStem(sh);
    cellsPmax = cellsPmax(sh);

    Pms = cell2mat(arrayfun(@(x)randperm(26),(1:length(cells))','UniformOutput',0))';
    S = bsxfun(@plus,cells,aux(Pms));
    S(L(S)) = 0; %setting occupied spots to 0
    indxF = find(any(S)); %selecting cells with at least one free spot
    nC = length(indxF); %number of cells with free spot
    
    P = rand(1,nC)<pprol; %proliferation
    Ps = P & rand(1,nC)<ps & cellsIsStem(indxF);%symmetric division
    De = P & (cellsPmax(indxF) == 0);%proliferation exhaution
    D = P & (rand(1,nC)<pdeath) & ~cellsIsStem(indxF); %death at proliferation attempt
    M = ~P & (rand(1,nC)<pmig); %go when no grow
    
    del = D | De; %cells to delete
    act = find((P | M) & ~del); %indices to the cells that will do something
    for ii = act %only for those that will do anything
        ngh = S(:,indxF(ii)); 
        ngh(ngh==0) = [];
        indO = find(~L(ngh),1,'first'); %selecting free spot
        if ~isempty(indO) %if there is still a free spot
            L(ngh(indO)) = true;
            if P(ii) %proliferation
                cells = [cells uint32(ngh(indO))];
                if Ps(ii) %symmetric division
                   cellsIsStem = [cellsIsStem true];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))];  
                else
                   cellsIsStem = [cellsIsStem false];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))-1];
                   if ~cellsIsStem(indxF(ii))
                    cellsPmax(indxF(ii)) = cellsPmax(indxF(ii))-1;
                   end
                end
            else %migration
                L(cells(indxF(ii))) = false;
                cells(indxF(ii)) = uint32(ngh(indO));
            end
        end
    end
    
    if ~isempty(del) %updating death
        L(cells(indxF(del))) = false;
        cells(indxF(del)) = [];
        cellsIsStem(indxF(del)) = [];
        cellsPmax(indxF(del)) = [];
    end 
end

Using the above code I simulated two tumors that have different growth characteristics, both with pmig = 5/24 and one with ps = 3/10 and the other one with ps = 5/100. Now we need to visualize them.

Having the whole tumor stored in the L variable we can use isosurface function straight away and plot the tumor as it is. The only thing that we need to do is to remove the cells from the L boundary.

function visualize( N, L)

%clearing cells from the boundary
L(1,:,:) = false; L(end,:,:) = false;
L(:,1,:) = false; L(:,end,:) = false;
L(:,:,1) = false; L(:,:,end) = false;

%calculating isosurfaces and plotting
p = patch(isosurface(1:N,1:N,1:N,L,0.25));
isonormals(1:N,1:N,1:N,L,p)
set(p,'FaceColor','red','EdgeColor','none');

xlim([1 N]);
ylim([1 N]);
zlim([1 N]);
view([90 0]);
camlight 
lighting gouraud    
end

The results of the above function are plotted below.

tumor1tumor1a

As you can see the visualization doesn’t really allow to see the 3D structure of the tumors, as individual separated cells are plotted and there is no clear 3D surface. We will fix that by using a simple blurring technique. The basic idea is that we will replace each site value by the average value from its neighborhood (several times in a loop). That will allow to delete separated cells and make 3D surface smoother. We add an input variable blLevels to the function in order to define how may iterations of that procedure should be performed.

function visualize( N, L, cells, blLevels )

%clearing cells from the boundary
L(1,:,:) = false; L(end,:,:) = false;
L(:,1,:) = false; L(:,end,:) = false;
L(:,:,1) = false; L(:,:,end) = false;

if blLevels %if perform basic blur

%auxilary variable with indices to the cell negiborhood
aux = int32([[-N-1 -N -N+1 -1 1 N-1 N N+1] ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]-N*N ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]+N*N])';

%creating indices to cells and their beignorhoods
S = [cells unique(reshape(bsxfun(@plus,cells,aux),1,[]))];
%creating indices to neigborhood of indices in S
S2 = bsxfun(@plus,S,aux);
%making sure that indices are still within the lattice
S2(S2<1) = []; S2(S2>N*N*N) = [];

    %changing lattice from logical variable to float
    L = single(L);
    for i = 1:blLevels %for number of blurs
        L(S) = mean(L(S2)); %taking the average of neighborhood
    end
end

%calculating isosurfaces and plotting
p = patch(isosurface(1:N,1:N,1:N,L,0.25));
isonormals(1:N,1:N,1:N,L,p)
set(p,'FaceColor','red','EdgeColor','none');

xlim([1 N]);
ylim([1 N]);
zlim([1 N]);
view([90 0]);
camlight 
lighting gouraud    
end

If we apply blLevels = 1 we get the following, nicer plot of the tumor.

tumor2tumor2a

For blLevels = 2 we get the following plots.

tumor3tumor3a

Finally, when applying three iterations of blur (blLevels = 3) we get 3D plots nicely showing the tumor shape.

tumor4tumor4a

What is also great it takes only about 3 seconds to plot a tumor with more than quarter of a million of cells (left one).

Cancer stem cell driven tumor growth model in less than 70 lines of code

Today I’ve decided to show how to efficiently code a cancer stem cell driven tumor growth model in less then 70 lines of code in MATLAB.

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

We start with defining initial settings of a domain and simulation timespan.

N = 1000; %square domain dimension
nSteps = 6*30*24; %number of simulation steps

Next we define the values of all parameters used in the simulation.

pprol = 1/24; %probability of proliferation
pmig = 10/24; %probability of migrating
pdeath = 1/100; %probability of death
pmax = 10; %initial proliferation capacity of CC
ps = 3/10; %probability of symmetric division

Now we can intialize domain and place initial CSC in its center. Our computational domain is represented by N x N Boolean matrix (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice).

L = false(N,N);
L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
L(N*round(N/2)+round(N/2)) = true;
cells = int32(N*round(N/2)+round(N/2));
cellsIsStem = true;
cellsPmax = uint8(pmax);

To reduce the amount of used memory we utilized int32 and unit8 types as they occupy less memory than double (double is default type in MATLAB). We also set all the boundary values of L to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

We can now define auxiliary variables that will be utilized further.

aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])'; %indices to heighborhood
Pms = perms(uint8(1:8))'; %permutations
nP = size(Pms,2); %number of permutations

Variable aux simply defines cell’s neighborhood and Pms is a variable in which we store all possible permutations of variable aux.

Now we can start the main loop of the program and randomly shuffle cells at its beginning.

for i = 1:nSteps

sh = randperm(length(cells));
cells = cells(sh);
cellsIsStem = cellsIsStem(sh);
cellsPmax = cellsPmax(sh);

Now few lines in which we first create indices to neighborhoods of all of the cells already in random order (variable S), then we flag by 0 all of the spots that are occupied and finally select only indices to those cells that have at least one free spot in the neighborhood (variable indxF).

S = bsxfun(@plus,cells,aux(Pms(:,randi(nP,1,length(cells)))));
S(L(S)) = 0; %setting occupied spots to 0
indxF = find(any(S)); %selecting cells with at least one free spot
nC = length(indxF); %number of cells with free spot

Now we can decide about the faith of the cells that can perform action (are not completely surrounded by other cells).

   
    P = rand(1,nC)<pprol; %proliferation
    Ps = P & rand(1,nC)<ps & cellsIsStem(indxF); %symmetric division
    De = P & (cellsPmax(indxF) == 0); %proliferation capacity exhaution
    D = P & (rand(1,nC)<pdeath) & ~cellsIsStem(indxF); %death at proliferation attempt
    M = ~P & (rand(1,nC)<pmig); %go when no grow

In the additional loop we perform update of the system.

   
    del = D | De; %cells to delete
    act = find((P | M) & ~del); %indices to the cells that will perform action
    for ii = act %only for those that will do anything
        ngh = S(:,indxF(ii)); %cells neighborhood
        ngh(ngh==0) = []; %erasing places that were previously occupied
        indO = find(~L(ngh),1,'first'); %selecting free spot
        if ~isempty(indO) %if there is still a free spot
            L(ngh(indO)) = true; %updating occupancy
            if P(ii) %proliferation
                cells = [cells uint32(ngh(indO))]; %adding new cell
                if Ps(ii) %symmetric division
                   cellsIsStem = [cellsIsStem true];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))];  
                else
                   cellsIsStem = [cellsIsStem false];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))-1];
                   if ~cellsIsStem(indxF(ii))
                    cellsPmax(indxF(ii)) = cellsPmax(indxF(ii))-1;
                   end
                end
            else %migration
                L(cells(indxF(ii))) = false; %freeing spot
                cells(indxF(ii)) = uint32(ngh(indO));
            end
        end
    end

At the end we need to erase cells that died.

    
    if ~isempty(del) %updating death
        L(cells(indxF(del))) = false;
        cells(indxF(del)) = [];
        cellsIsStem(indxF(del)) = [];
        cellsPmax(indxF(del)) = [];
    end 
end

This is it. Whole CA is coded and we are ready to run simulations!

In order to visualize simulations results we can implement short function.

    
function visualize( N, cells, cellsIsStem, cellsPmax, pmax )

    M = ones(N,N,3); %matrix for image
    color = hot(3*pmax); %colormap
    %drawing CCs
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+2*N*N) = color(cellsPmax(~cellsIsStem)+1,3);
    %drawing CSCs, we want to draw them as 3x3 points
    aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])';
    CSCs = cells(cellsIsStem);
    plusSurr = bsxfun(@plus,CSCs,aux);
    M(plusSurr) = color(2*pmax,1);
    M(plusSurr+N*N) = color(2*pmax,2);
    M(plusSurr+2*N*N) = color(2*pmax,3);
   
    figure(1)
    imshow(M);
end

Below is the visualization of the tumor simulated using the above code. It is consisted of 369,504 cells in total (8563 CSCs). The whole simulation took about 12 minutes on my laptop, what taking into account the lattice size and number of cells is a reasonable time.

higherps

Loop-free search for proliferative cells in MATLAB

When thinking about implementation of agent-based model only few of us consider MATLAB as a coding environment, even though it has plenty of built-in visualization and analysis tools. I get that. It’s a high level language that is not very efficient with for loops, so a basic loop-based code will execute much faster when coded in C++ (and others). However, there is a very cool feature of MATLAB – vectorization, that allows to skip plenty of loops and significantly speed-up the computations. In this post I’ll show how to utilize this feature to make a loop free search for a cells that have a free spot in the neighborhood.

Our computational domain is represented by N x N Boolean matrix L (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice). Our task is to write a code that returns indices to those cells that have at least one free spot in the 8 spot neighborhood.

A basic loop-based code in MATLAB that will do the task is the following:

cellsInd = false(size(cells));
    for j = 1:length(cells)
        loop = true; 
        for ii = -1:1:1
            if ~loop
                break;
            end
            for kk = -N:N:N
                if ~L(cells(j)+ii+kk)
                   loop = false; 
                   cellsInd(j) = true;
                   break;
                end
            end
        end
    end
    cellsInd = find(cellsInd);
end

The code is similar to the one that would be written in C++. For 2000×2000 lattice with 800,000 cells evaluation of 200 iterations of the above code take about 83 seconds. Not very impressive… Same task for a code written in C++ takes about 3 seconds (when using vectors from STL library). Can we do something about that?

As I wrote, vectorization is a very cool MATLAB feature. When using it, coding the whole task takes only one two lines:

aux = [-N-1; -N; -N+1; -1; 1; N-1; N; N+1]';
cellsInd = find(~all(L(bsxfun(@plus,cells,aux))));

Moreover, the evaluation time is reduced from about 83 seconds to about 9.5 seconds. Impressive speed-up and elegant code in two lines.

Still, both codes are outperformed by C++, but the difference when using vectorized code is bearable.