New Video Tutorial: Make a Neural Net Simulator in C++

February 4, 2011

Released today! Our newest video programming tutorial, A Neural Net Simulator in C++, is now available for your viewing pleasure at the following link:

[ Update for 2013: Also see the new companion video for visualizations of how neural nets work and how to train them: The Care and Training of Your Backpropagation Neural Net.  ]

Neural Net in C++ Tutorial from David Miller on Vimeo.

If you’re a beginning to intermediate C++ programmer, this tutorial will guide you through the analysis, design, and coding of a command line console program that implements a neural net in C++. You’ll end up with a classic back propagation model with adjustable gradient descent learning and adjustable momentum. You’ll see how to teach your neural net to solve a simple task, then you can supply your own training data to train your net to do wonderful and amazing things.

Besides showing how a neural net works, we also discuss:

  • C++ class design
  • prototyping with portable C++
  • test early, test often
  • encapsulation, data hiding
  • static class members
  • accessor functions
  • const correctness
  • the assert() macro
  • the vector<> template container, and .size(), .back(), and .push_back() member functions
  • reference variables

This tutorial does not cover or require exception handling, memory management, STL iterators, inheritance, threads, or graphical input or output.

The finished neural net source code is also available for download here (see the video for instructions on what to do with it):



tags: , , , , ,
posted in C++, Tutorials by Dave

Follow comments via the RSS Feed | Leave a comment

33 Comments to "New Video Tutorial: Make a Neural Net Simulator in C++"

  1. Ben wrote:


    You have a bug in your code. After you set up the net in the Net() constructor, you do not initialize the bias node.

    This code must be slightly different than the code you showed in your online video, because the one in your online video has this initialization.

    Essentially ,you’re missing this line:

    // force the output node’s bias to 1.0

  2. Dave wrote:

    Ben, thank you very much for spotting and reporting that omission. The downloadable source code should match the video now. (For reference, the line we’re talking about is near the end of the video at around 59:45).

    The bias neurons are interesting. As far as I can tell, they don’t have to be initialized to any particular value; they just need to be nonzero. Then the backprop algorithm will change the connection weights of the bias nodes to scale the value to whatever the neurons need in the subsequent layer.

  3. Theo wrote:

    Hi Dave, first of all thanks for this great video tut. It really helped me to understand some certain things:)

    I downloaded the code for windows(dos version) and after building the project(which by the way ends without any errors or warnings), when I stard .exe file I get the message:

    Debug Error!
    - ()abort has been called

    can you give me a hand on this issue?thanks in advance:)

  4. Dave wrote:

    Hi Theo, the only abort() statement in the program is at line 45, so this should be easy. It checks that the input training data begins with a line that contains the string “topology:”. So just be sure that your input training file is formatted the same way it is shown in the video and it should get past that abort(). The sample input file used in the video (at around 1:00:50) begins with these lines:

    topology: 2 4 1
    in: 1.0 0.0
    out: 1.0
    in: 1.0 1.0
    out: 0.0

    As mentioned in the video, that entire class called TrainingData was intended to be just a placeholder, with the expectation that you’ll replace it with something more appropriate for your needs, and so that class is not burdened with a lot of friendly error messages and error recovery. So if you use the TrainingData class as shown without modification, just be careful to supply it with the same format of input data used in the video.

  5. Theo wrote:

    Thanks a lot Dave:)I got it..

  6. B Rehart wrote:

    This is my favorite tutorial! It’s over an hour long and I didn’t even notice. I always wondered about neural nets and how they worked. Since it seems they take a lot of iterations to learn stuff, I wonder how to optimize them for game AI agents. AI agents can’t take 2000 iterations to learn not to attack an opponent through the valley, surrounded on both sides in the killing zone. Is there a faster learning algorithm?

  7. Dov wrote:

    This is a nice intro to neural networks, and it’s certainly making my task of getting something going in that area easier. However, you made one serious mistake. The rand function has no specified implementation, so it’s highly nonstandard. On Turbo C way back when, I actually saw an implementation that cycled after 137 numbers! Hopefully no implementation is as bad today, but rand() is not designed for floating point applications. drand48() is the ANSI standard uniformly distibuted u[0,1]. Even better, use boost, but if you’re trying to keep things simple I can understand why your wouldn’t.

    The other nitpicky things are not serious, but I don’t see why you abstracted certain things the way you did. Bias seems to me to be intrinsic to the neuron, so it seems to me that having a private member with the bias number is the way to go, not creating an extra bogus Neuron that is always one, and then weighting it. Sure it’s slower and bigger, but it’s not the speed, it’s the lack of clarity that bothers me. Similarly, I don’t see that you gain anything by making a bunch of bogus neurons on the input. The abstraction should be that you link each neuron with the location where its input values are.

    Any way you slice it, you’ve done a real service to anyone getting started in neural networking. I hope these comments serve to make your code even better.


  8. Dov wrote:

    Oh, another small thing I noticed, you pass strings by value unnecessarily, and don’t always const parameters that are readonly:

    void showVectorVals(string label, vector &v)

    should be:

    void showVectorVals(const string& label, const vector &v)

  9. Dave wrote:

    Hi Dov, Thanks very much for the thoughtful feedback; it deserves a thoughtful reply.

    You’re absolutely correct that showVectorVals() could take its string argument by reference to const. That’s a good suggestion.

    The bias input is slightly special, but only because its input is constant when running a trained simulator. The reason the bias neuron is not a private member of the neuron class is so that the backpropagation loop can adjust its weight the same way it adjusts all the other weights. A neuron is just an integrator, and the bias input carries what you could think of as a DC component that goes with the signals that are on the other inputs. The bias is no more or less important than the other input signals that get summed, and its weighting is recalculated during training in the same way as the other neurons. They’re more alike than different, and many loops are simplified by including the bias neuron in the loops. But you can certainly abstract it differently and implement it in several alternative ways, all mathematically equivalent and all quite correct.

    I appreciate your comments about the deficiencies of rand(), but it’s not a problem here. Besides, that’s our only choice if we’re sticking with standard C++ features. (The drand48() function is not standard in C++98 nor C++11.) In C++-11, you could use new cryptographic quality random number generators if you needed random bits that would survive intense statistical analysis. But we don’t need that in this program. We’re only using rand() to initialize weights so that the infant neural net begins its walk toward a solution somewhere in the midst of the huge solution space instead of always starting out at the same remote corner of the solution space. We only need randomness like a shotgun. Even the old Turbo C implementation would have sufficed. And as you can see in, the rand() function as implemented in today’s compilers is way more than sufficient for our needs.

    Thanks again for the comments!

  10. Steve(UK) wrote:

    Really enjoyed your tutorial David , helps in understanding the basics in Neural Networks and what a bonus some c++ coding to play around with, many many thanks.

    Best Regards

  11. Irina (CR) wrote:

    This is the first time I see one of your tutorials. I have to say, I’ve been really interested in AI for some years now, but I’m still an undergraduate student just seeing where her career is going, which I hope is that area. I was looking for information about neural nets, and I found your video. This not only made me love more AI, but inspired me. Thank you for your awesome tutorial and for sharing your knowledge.

  12. Hindi wrote:

    Thanks a lot for this tutorial. I followed some courses about neural networks and leaving the theory for some practice is really interesting.

  13. CK wrote:

    When calculating gradients in hidden Layers, you calculated bias’s signal error. Is that true ?
    Bias is not fixed ?

  14. admin wrote:

    That’s right, during training, the biases gets adjusted during each backpropagation pass.

  15. Eak wrote:

    when I tried your written c++ code for the Neural Net with visual studio 2012. when I debugged it I got the (R6010 Error: abort() has been called. the the program stops.

  16. Dave wrote:

    Hi Eak, check out comment #4 above ( ) — the only abort() in the program occurs if the input data file isn’t formatted as shown.

  17. Eak wrote:

    Its me again… I got the (E6010 Abort() has been called) even when I try to run the makeTrainingSamples.ccp

  18. Dave wrote:

    Sorry you’re having troubles with it. You may need to run the neural net with a debugger and set a breakpoint somewhere in lines 41 – 44 and see why the string “topology” is not being found in the line that was read from the file.

  19. Mike wrote:

    Hello Dave,
    first of all great tutorial! I am trying to run this Neural Net for a set of data that requires more than 1 output (actually it requires 10 outputs) and has 64 inputs. From what I understand from the tutorial is that in topology it has to be something like 64 64 10 right? The thing is that when i try to run the program with those numbers the program crashes ( actually only the output does crash the program) with the error: Assertion failed! in line 390 targetVals.size() == topology.back(). This application has requested the Runtime to terminate it in an unusual way!

    Any help appreciated :)

  20. Mike wrote:

    Hello again.. Correction to my previous comment! The output needs to be between 0-9 thats what i meant! Because the targets are between those numbers!

    Thanks again :)

  21. admin wrote:

    Hi Mike, You’re on the right track. That assert at line 390 is triggered if your input data file isn’t formatted exactly as the program expects. First, make sure that the input data filename is correct at line 361 The contents of trainingData.txt should be formatted as shown in Comment #4 above. For example, try the following and verify that it works in a simple case — this creates a net with 64 inputs, 10 outputs, and it contains one trivial training sample:

    topology: 64 64 10
    in: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    out: 0 0 0 1 0 0 0 0 0 0

    If that works, then append all your input samples with their actual input and target output values, following that same format. All the input and output values should be between -1 and 1 if you’re using the default tanh transfer function.

    Also, if you have a large number of inputs, you might want to see if the newer program neural2d is easier to work with:

  22. Mike wrote:

    Good Afternoon!

    First of all thanks for the fast reply! Yes, I have made sure that the trainingData file matches that format! My data set looks something like this

    in: 0 1 11 13 2 0 0 0 0 8 15 15 6 0 0 0 0 10 9 6 14 0 0 0 0 3 10 4 10 0 0 0 0 0 0 11 7 0 0 0 0 0 2 15 3 0 0 0 0 2 13 16 13 11 2 0 0 1 12 12 12 15 11 0
    out: 2
    in: 0 0 3 10 14 3 0 0 0 8 16 11 10 13 0 0 0 7 14 0 1 15 2 0 0 2 16 9 16 16 1 0 0 0 12 16 15 15 2 0 0 0 12 10 0 8 8 0 0 0 9 12 4 7 12 0 0 0 2 11 16 16 9 0
    out: 8

    the outputs are between 0 and 9. But as you said I have to change the transfer function and the transfer function derivative in order to get network to train to get between the values of 0 and 9 right? can i use something like this: return abs(9 * cos(x)); ??

    Thanks a lot again!
    Oh and happy holidays!!

  23. admin wrote:

    Thanks, and Happy Holidays!

    What do you do next depends whether you’re training the net to be a scorer or a classifier. I don’t know which you are doing, so I’ll describe both types.

    First, let’s talk about topology.

    An example of a scorer is where the input values are financial data about a stock and the output is a single floating point number indicating how strongly you should buy that stock, where a low value means “don’t buy this stock” and a high value means “buy immediately”. In that case, you only need one neuron in the output layer and your topology would be specified as 64-64-1 (not 64-64-10).

    On the other hand, an example of a classifier is where the input values are from an image of ten possible things and the net detects which thing it is. In that case, you’ll want ten individual neurons in the output layer, one for each possible thing, and the topology would be specified as 64-64-10. In the topology file, the out: line would have ten numbers. You want to train the net to output a high value on just one output neuron that corresponds to the pattern detected, and train all the other output neurons to be a low value. Then whichever output neuron has the highest value indicates which pattern was detected most strongly.

    Next, let’s talk about the value ranges of input and outputs. Rather than change the code to output a different range, just scale the values outside of the neural net — give your net input values between -1 and 1, and train it to output values between -1 and 1. In the training data you provided above, you could divide all the inputs by, say 100, so that the first input line looks like:

    in: 0.0 0.01 0.11 0.13 0.02 0.0 0.0 0.0 0.0 0.08 0.15 , etc

    Same for the outputs. If the output is a scorer, then instead of training the net to output 0.0 – 9.0, divide that by ten so that the output values are in the range 0.0 – 0.9. Or if the output is a classifier, then train it to output 1.0 on the neuron corresponding to the class, and -1.0 (or 0.0) on all the other neurons. In the training data you provided above, if the first training sample output means “class 2 of ten possible things,” then the out: line would look like:

    out: -1 -1 1 -1 -1 -1 -1 -1 -1 -1

    Putting that all together, if you’re doing a classifier, then your input data file would start out as:

    topology: 64 64 10
    in: 0.0 0.1 0.11 0.13 0.2 0.0 0.0 0.0 0.0 0.8 0.15 , etc.
    out: -1 -1 1 -1 -1 -1 -1 -1 -1 -1

    And if it’s a scorer, then your input data file would start out as:

    topology: 64 64 1
    in: 0.0 0.1 0.11 0.13 0.2 0.0 0.0 0.0 0.0 0.8 0.15 , etc.
    out: 0.2

  24. Mike wrote:

    Ah I see… that clarified things a lot!!

    Thanks for all the effort you’ve been through explaining me this :) Because the dataset is quite big (around 1700 patterns) i will have a look at the other Neural net as you said! But really that help me a lot to understand more things on how it should be trained :)

    Thanks again and keep up the good work!

  25. Jay wrote:

    Great tutorial! I followed your tutorial and reimplemented it in Python. I also included a function to visualize the ANN to better debug or see the system. It can be found at:

  26. Catalin wrote:


    First of all I want to thank you for your great tutorial about neural network – c++ implementation. :)

    I have started to build my own neural network starting from you code and I have wrote a similar code in Java.

    However, after some research I have found that you code use a different method to calculate neuron gradient for hidden layers.

    void Neuron::calcHiddenGradients(const Layer &nextLayer)
    double dow = sumDOW(nextLayer);
    m_gradient = dow * Neuron::transferFunctionDerivative(m_outputVal);
    On some books I`ve noticed that the gradient for an hidden node is calculated using the formula:
    m_gradient = dow * transferFunctionDerivative(sumIOW());
    Where sumIOW() it`s the sum of Imputs * weights.
    If you take a look on this resource:, the formula for hidden layer gradient (equation 10) use sumIOW, and the entire mathematics for this article is correct.
    If I change your code and I use this formula the neural network does not converge (I`ve used different values for learning rate and for momentum), and it`s a little bit strange, because it seems that this is the correct formula for hidden layer gradients.
    I have tried different implementations for neural network but you formula seems to be the best way to calculate hidden layer gradients.
    My question is: Why is your method better and how you found that this is the correct implementation for hidden gradients? I didn`t see this method on other places.
    Thank you in advance!

  27. Dave wrote:

    Hi Catalin, I find that part of the math confusing too, and I don’t think I could adequately explain the relationship between those approaches, but they do work out the same. You might want to compare with the code in which uses a similar approach and compare that with the math in .

  28. Martin wrote:


    thank you for great tutorial about neural network in c++. :) I am writing diploma thesis (physics) draught force on plough and I find the code very useful to train neural network “my data for draught force”.

    Do you allow that I use your coude to train neural network with my training data? Of course I will cite it.

    Thanks and keep up good work.

  29. Dave wrote:

    Thanks Kamensek, feel free to use the code, and good luck with the thesis.

  30. Catalin wrote:

    Why this code is not working for [2 20 1] topology? I use 20 neurons for hidden layer. The output is always 0.999 for any input values. I think there are some math issues here…. This implementation works only for small topology (max 7 neurons in hidden layer) In my opinion there is a problem regarding layer gradient. Eg: For a big error (0.999) the gradient for output neuron is 0.0018…. This gradient should be bigger to decrease the error.

    Thank you!

  31. Catalin wrote:

    I have fixed the issue regarding big topology (more than 7 neurons in hidden layer). If I use transfer function derivative: 1 – tanh * tanh (the actual derivative) the neural network is working for a big hidden layer. Using the approximation for transfer function derivative (1 – x*x) is not working for a neural network with 20 hidden neurons for example. I have tested neural network for XOR function.

  32. Dave wrote:

    Thanks for reporting that, Catalin. That helps confirm a comment that was made at, which I’ll copy below for reference:

    …the weights are starting out way too large, causing many of the neurons to start out saturated, and that can be a handicap for an infant neural net. (During training, if the net wants to saturate some neurons, that’s its business.) To fix that, scale down the initial weight in line 223.

    Second, the approximate tanh derivative function works suboptimally for neurons that are very saturated. For a more robust formula, change the expression in line 201 to the exact formula: 1.0 – tanh(x) * tanh(x).

  33. Catalin wrote:

    In your code you are using the following formula (node delta = transferFunctionDerivate(m_output) * sumDOW) to calculate neuron gradient.

    I have done some research and I think is better to use the following formula (node delta = transferFunctionDerivate(sumIOW) * sumDOW) to calculate the neuron gradient.
    Where somIOW is the sum of inputs * weights.
    You can take a look on

    Also you have to use the exact derivative for transfer function (not 1-x*x aproximation). If you use 1-x*x the result for node delta may go to +/- infinity.

    If you do this changes the neural network is going to minimize the error faster and you can avoid the situation described above (for big neural networks).

    However, this two approaches are not very different. In your code the node delta formula is a aproximation of the standard method and I think your method is a little bit faster.

    Also I have translated your code in Java and I cand send you this code. Maybe some java programmers are interested for your implementation.

    Thank you.

Leave Your Comment

Powered by Wordpress and MySQL. Theme by Shlomi Noach,