New Video Tutorial: Make a Neural Net Simulator in C++

February 4, 2011

Released today! Our newest video programming tutorial, A Neural Net Simulator in C++, is now available for your viewing pleasure at the following link:

[ Update for 2013: Also see the new companion video for visualizations of how neural nets work and how to train them: The Care and Training of Your Backpropagation Neural Net.  ]

Neural Net in C++ Tutorial from David Miller on Vimeo.

If you’re a beginning to intermediate C++ programmer, this tutorial will guide you through the analysis, design, and coding of a command line console program that implements a neural net in C++. You’ll end up with a classic back propagation model with adjustable gradient descent learning and adjustable momentum. You’ll see how to teach your neural net to solve a simple task, then you can supply your own training data to train your net to do wonderful and amazing things.

Besides showing how a neural net works, we also discuss:

  • C++ class design
  • prototyping with portable C++
  • test early, test often
  • encapsulation, data hiding
  • static class members
  • accessor functions
  • const correctness
  • the assert() macro
  • the vector<> template container, and .size(), .back(), and .push_back() member functions
  • reference variables

This tutorial does not cover or require exception handling, memory management, STL iterators, inheritance, threads, or graphical input or output.

The finished neural net source code is also available for download here (see the video for instructions on what to do with it):


tags: , , , , ,
posted in C++, Tutorials by Dave

Follow comments via the RSS Feed | Leave a comment

67 Comments to "New Video Tutorial: Make a Neural Net Simulator in C++"

  1. Ben wrote:


    You have a bug in your code. After you set up the net in the Net() constructor, you do not initialize the bias node.

    This code must be slightly different than the code you showed in your online video, because the one in your online video has this initialization.

    Essentially ,you’re missing this line:

    // force the output node’s bias to 1.0

  2. Dave wrote:

    Ben, thank you very much for spotting and reporting that omission. The downloadable source code should match the video now. (For reference, the line we’re talking about is near the end of the video at around 59:45).

    The bias neurons are interesting. As far as I can tell, they don’t have to be initialized to any particular value; they just need to be nonzero. Then the backprop algorithm will change the connection weights of the bias nodes to scale the value to whatever the neurons need in the subsequent layer.

  3. Theo wrote:

    Hi Dave, first of all thanks for this great video tut. It really helped me to understand some certain things:)

    I downloaded the code for windows(dos version) and after building the project(which by the way ends without any errors or warnings), when I stard .exe file I get the message:

    Debug Error!
    – ()abort has been called

    can you give me a hand on this issue?thanks in advance:)

  4. Dave wrote:

    Hi Theo, the only abort() statement in the program is at line 45, so this should be easy. It checks that the input training data begins with a line that contains the string “topology:”. So just be sure that your input training file is formatted the same way it is shown in the video and it should get past that abort(). The sample input file used in the video (at around 1:00:50) begins with these lines:

    topology: 2 4 1
    in: 1.0 0.0
    out: 1.0
    in: 1.0 1.0
    out: 0.0

    As mentioned in the video, that entire class called TrainingData was intended to be just a placeholder, with the expectation that you’ll replace it with something more appropriate for your needs, and so that class is not burdened with a lot of friendly error messages and error recovery. So if you use the TrainingData class as shown without modification, just be careful to supply it with the same format of input data used in the video.

  5. Theo wrote:

    Thanks a lot Dave:)I got it..

  6. B Rehart wrote:

    This is my favorite tutorial! It’s over an hour long and I didn’t even notice. I always wondered about neural nets and how they worked. Since it seems they take a lot of iterations to learn stuff, I wonder how to optimize them for game AI agents. AI agents can’t take 2000 iterations to learn not to attack an opponent through the valley, surrounded on both sides in the killing zone. Is there a faster learning algorithm?

  7. Dov wrote:

    This is a nice intro to neural networks, and it’s certainly making my task of getting something going in that area easier. However, you made one serious mistake. The rand function has no specified implementation, so it’s highly nonstandard. On Turbo C way back when, I actually saw an implementation that cycled after 137 numbers! Hopefully no implementation is as bad today, but rand() is not designed for floating point applications. drand48() is the ANSI standard uniformly distibuted u[0,1]. Even better, use boost, but if you’re trying to keep things simple I can understand why your wouldn’t.

    The other nitpicky things are not serious, but I don’t see why you abstracted certain things the way you did. Bias seems to me to be intrinsic to the neuron, so it seems to me that having a private member with the bias number is the way to go, not creating an extra bogus Neuron that is always one, and then weighting it. Sure it’s slower and bigger, but it’s not the speed, it’s the lack of clarity that bothers me. Similarly, I don’t see that you gain anything by making a bunch of bogus neurons on the input. The abstraction should be that you link each neuron with the location where its input values are.

    Any way you slice it, you’ve done a real service to anyone getting started in neural networking. I hope these comments serve to make your code even better.


  8. Dov wrote:

    Oh, another small thing I noticed, you pass strings by value unnecessarily, and don’t always const parameters that are readonly:

    void showVectorVals(string label, vector &v)

    should be:

    void showVectorVals(const string& label, const vector &v)

  9. Dave wrote:

    Hi Dov, Thanks very much for the thoughtful feedback; it deserves a thoughtful reply.

    You’re absolutely correct that showVectorVals() could take its string argument by reference to const. That’s a good suggestion.

    The bias input is slightly special, but only because its input is constant when running a trained simulator. The reason the bias neuron is not a private member of the neuron class is so that the backpropagation loop can adjust its weight the same way it adjusts all the other weights. A neuron is just an integrator, and the bias input carries what you could think of as a DC component that goes with the signals that are on the other inputs. The bias is no more or less important than the other input signals that get summed, and its weighting is recalculated during training in the same way as the other neurons. They’re more alike than different, and many loops are simplified by including the bias neuron in the loops. But you can certainly abstract it differently and implement it in several alternative ways, all mathematically equivalent and all quite correct.

    I appreciate your comments about the deficiencies of rand(), but it’s not a problem here. Besides, that’s our only choice if we’re sticking with standard C++ features. (The drand48() function is not standard in C++98 nor C++11.) In C++-11, you could use new cryptographic quality random number generators if you needed random bits that would survive intense statistical analysis. But we don’t need that in this program. We’re only using rand() to initialize weights so that the infant neural net begins its walk toward a solution somewhere in the midst of the huge solution space instead of always starting out at the same remote corner of the solution space. We only need randomness like a shotgun. Even the old Turbo C implementation would have sufficed. And as you can see in, the rand() function as implemented in today’s compilers is way more than sufficient for our needs.

    Thanks again for the comments!

  10. Steve(UK) wrote:

    Really enjoyed your tutorial David , helps in understanding the basics in Neural Networks and what a bonus some c++ coding to play around with, many many thanks.

    Best Regards

  11. Irina (CR) wrote:

    This is the first time I see one of your tutorials. I have to say, I’ve been really interested in AI for some years now, but I’m still an undergraduate student just seeing where her career is going, which I hope is that area. I was looking for information about neural nets, and I found your video. This not only made me love more AI, but inspired me. Thank you for your awesome tutorial and for sharing your knowledge.

  12. Hindi wrote:

    Thanks a lot for this tutorial. I followed some courses about neural networks and leaving the theory for some practice is really interesting.

  13. CK wrote:

    When calculating gradients in hidden Layers, you calculated bias’s signal error. Is that true ?
    Bias is not fixed ?

  14. admin wrote:

    That’s right, during training, the biases gets adjusted during each backpropagation pass.

  15. Eak wrote:

    when I tried your written c++ code for the Neural Net with visual studio 2012. when I debugged it I got the (R6010 Error: abort() has been called. the the program stops.

  16. Dave wrote:

    Hi Eak, check out comment #4 above ( ) — the only abort() in the program occurs if the input data file isn’t formatted as shown.

  17. Eak wrote:

    Its me again… I got the (E6010 Abort() has been called) even when I try to run the makeTrainingSamples.ccp

  18. Dave wrote:

    Sorry you’re having troubles with it. You may need to run the neural net with a debugger and set a breakpoint somewhere in lines 41 – 44 and see why the string “topology” is not being found in the line that was read from the file.

  19. Mike wrote:

    Hello Dave,
    first of all great tutorial! I am trying to run this Neural Net for a set of data that requires more than 1 output (actually it requires 10 outputs) and has 64 inputs. From what I understand from the tutorial is that in topology it has to be something like 64 64 10 right? The thing is that when i try to run the program with those numbers the program crashes ( actually only the output does crash the program) with the error: Assertion failed! in line 390 targetVals.size() == topology.back(). This application has requested the Runtime to terminate it in an unusual way!

    Any help appreciated :)

  20. Mike wrote:

    Hello again.. Correction to my previous comment! The output needs to be between 0-9 thats what i meant! Because the targets are between those numbers!

    Thanks again :)

  21. admin wrote:

    Hi Mike, You’re on the right track. That assert at line 390 is triggered if your input data file isn’t formatted exactly as the program expects. First, make sure that the input data filename is correct at line 361 The contents of trainingData.txt should be formatted as shown in Comment #4 above. For example, try the following and verify that it works in a simple case — this creates a net with 64 inputs, 10 outputs, and it contains one trivial training sample:

    topology: 64 64 10
    in: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    out: 0 0 0 1 0 0 0 0 0 0

    If that works, then append all your input samples with their actual input and target output values, following that same format. All the input and output values should be between -1 and 1 if you’re using the default tanh transfer function.

    Also, if you have a large number of inputs, you might want to see if the newer program neural2d is easier to work with:

  22. Mike wrote:

    Good Afternoon!

    First of all thanks for the fast reply! Yes, I have made sure that the trainingData file matches that format! My data set looks something like this

    in: 0 1 11 13 2 0 0 0 0 8 15 15 6 0 0 0 0 10 9 6 14 0 0 0 0 3 10 4 10 0 0 0 0 0 0 11 7 0 0 0 0 0 2 15 3 0 0 0 0 2 13 16 13 11 2 0 0 1 12 12 12 15 11 0
    out: 2
    in: 0 0 3 10 14 3 0 0 0 8 16 11 10 13 0 0 0 7 14 0 1 15 2 0 0 2 16 9 16 16 1 0 0 0 12 16 15 15 2 0 0 0 12 10 0 8 8 0 0 0 9 12 4 7 12 0 0 0 2 11 16 16 9 0
    out: 8

    the outputs are between 0 and 9. But as you said I have to change the transfer function and the transfer function derivative in order to get network to train to get between the values of 0 and 9 right? can i use something like this: return abs(9 * cos(x)); ??

    Thanks a lot again!
    Oh and happy holidays!!

  23. admin wrote:

    Thanks, and Happy Holidays!

    What do you do next depends whether you’re training the net to be a scorer or a classifier. I don’t know which you are doing, so I’ll describe both types.

    First, let’s talk about topology.

    An example of a scorer is where the input values are financial data about a stock and the output is a single floating point number indicating how strongly you should buy that stock, where a low value means “don’t buy this stock” and a high value means “buy immediately”. In that case, you only need one neuron in the output layer and your topology would be specified as 64-64-1 (not 64-64-10).

    On the other hand, an example of a classifier is where the input values are from an image of ten possible things and the net detects which thing it is. In that case, you’ll want ten individual neurons in the output layer, one for each possible thing, and the topology would be specified as 64-64-10. In the topology file, the out: line would have ten numbers. You want to train the net to output a high value on just one output neuron that corresponds to the pattern detected, and train all the other output neurons to be a low value. Then whichever output neuron has the highest value indicates which pattern was detected most strongly.

    Next, let’s talk about the value ranges of input and outputs. Rather than change the code to output a different range, just scale the values outside of the neural net — give your net input values between -1 and 1, and train it to output values between -1 and 1. In the training data you provided above, you could divide all the inputs by, say 100, so that the first input line looks like:

    in: 0.0 0.01 0.11 0.13 0.02 0.0 0.0 0.0 0.0 0.08 0.15 , etc

    Same for the outputs. If the output is a scorer, then instead of training the net to output 0.0 – 9.0, divide that by ten so that the output values are in the range 0.0 – 0.9. Or if the output is a classifier, then train it to output 1.0 on the neuron corresponding to the class, and -1.0 (or 0.0) on all the other neurons. In the training data you provided above, if the first training sample output means “class 2 of ten possible things,” then the out: line would look like:

    out: -1 -1 1 -1 -1 -1 -1 -1 -1 -1

    Putting that all together, if you’re doing a classifier, then your input data file would start out as:

    topology: 64 64 10
    in: 0.0 0.1 0.11 0.13 0.2 0.0 0.0 0.0 0.0 0.8 0.15 , etc.
    out: -1 -1 1 -1 -1 -1 -1 -1 -1 -1

    And if it’s a scorer, then your input data file would start out as:

    topology: 64 64 1
    in: 0.0 0.1 0.11 0.13 0.2 0.0 0.0 0.0 0.0 0.8 0.15 , etc.
    out: 0.2

  24. Mike wrote:

    Ah I see… that clarified things a lot!!

    Thanks for all the effort you’ve been through explaining me this :) Because the dataset is quite big (around 1700 patterns) i will have a look at the other Neural net as you said! But really that help me a lot to understand more things on how it should be trained :)

    Thanks again and keep up the good work!

  25. Jay wrote:

    Great tutorial! I followed your tutorial and reimplemented it in Python. I also included a function to visualize the ANN to better debug or see the system. It can be found at:

  26. Catalin wrote:


    First of all I want to thank you for your great tutorial about neural network – c++ implementation. :)

    I have started to build my own neural network starting from you code and I have wrote a similar code in Java.

    However, after some research I have found that you code use a different method to calculate neuron gradient for hidden layers.

    void Neuron::calcHiddenGradients(const Layer &nextLayer)
    double dow = sumDOW(nextLayer);
    m_gradient = dow * Neuron::transferFunctionDerivative(m_outputVal);
    On some books I`ve noticed that the gradient for an hidden node is calculated using the formula:
    m_gradient = dow * transferFunctionDerivative(sumIOW());
    Where sumIOW() it`s the sum of Imputs * weights.
    If you take a look on this resource:, the formula for hidden layer gradient (equation 10) use sumIOW, and the entire mathematics for this article is correct.
    If I change your code and I use this formula the neural network does not converge (I`ve used different values for learning rate and for momentum), and it`s a little bit strange, because it seems that this is the correct formula for hidden layer gradients.
    I have tried different implementations for neural network but you formula seems to be the best way to calculate hidden layer gradients.
    My question is: Why is your method better and how you found that this is the correct implementation for hidden gradients? I didn`t see this method on other places.
    Thank you in advance!

  27. Dave wrote:

    Hi Catalin, I find that part of the math confusing too, and I don’t think I could adequately explain the relationship between those approaches, but they do work out the same. You might want to compare with the code in which uses a similar approach and compare that with the math in .

  28. Martin wrote:


    thank you for great tutorial about neural network in c++. :) I am writing diploma thesis (physics) draught force on plough and I find the code very useful to train neural network “my data for draught force”.

    Do you allow that I use your coude to train neural network with my training data? Of course I will cite it.

    Thanks and keep up good work.

  29. Dave wrote:

    Thanks Kamensek, feel free to use the code, and good luck with the thesis.

  30. Catalin wrote:

    Why this code is not working for [2 20 1] topology? I use 20 neurons for hidden layer. The output is always 0.999 for any input values. I think there are some math issues here…. This implementation works only for small topology (max 7 neurons in hidden layer) In my opinion there is a problem regarding layer gradient. Eg: For a big error (0.999) the gradient for output neuron is 0.0018…. This gradient should be bigger to decrease the error.

    Thank you!

  31. Catalin wrote:

    I have fixed the issue regarding big topology (more than 7 neurons in hidden layer). If I use transfer function derivative: 1 – tanh * tanh (the actual derivative) the neural network is working for a big hidden layer. Using the approximation for transfer function derivative (1 – x*x) is not working for a neural network with 20 hidden neurons for example. I have tested neural network for XOR function.

  32. Dave wrote:

    Thanks for reporting that, Catalin. That helps confirm a comment that was made at, which I’ll copy below for reference:

    …the weights are starting out way too large, causing many of the neurons to start out saturated, and that can be a handicap for an infant neural net. (During training, if the net wants to saturate some neurons, that’s its business.) To fix that, scale down the initial weight in line 223.

    Second, the approximate tanh derivative function works suboptimally for neurons that are very saturated. For a more robust formula, change the expression in line 201 to the exact formula: 1.0 – tanh(x) * tanh(x).

  33. Catalin wrote:

    In your code you are using the following formula (node delta = transferFunctionDerivate(m_output) * sumDOW) to calculate neuron gradient.

    I have done some research and I think is better to use the following formula (node delta = transferFunctionDerivate(sumIOW) * sumDOW) to calculate the neuron gradient.
    Where somIOW is the sum of inputs * weights.
    You can take a look on

    Also you have to use the exact derivative for transfer function (not 1-x*x aproximation). If you use 1-x*x the result for node delta may go to +/- infinity.

    If you do this changes the neural network is going to minimize the error faster and you can avoid the situation described above (for big neural networks).

    However, this two approaches are not very different. In your code the node delta formula is a aproximation of the standard method and I think your method is a little bit faster.

    Also I have translated your code in Java and I cand send you this code. Maybe some java programmers are interested for your implementation.

    Thank you.

  34. Tom wrote:


    Excellent tutorial, thanks for sharing.

    I ported your code to VB.NET (yes, indeed :$) and implemented the XOR-test. Sometimes it trains well, most of the times, however, it doesn’t. Is there a possibility you can help me?

  35. Dave wrote:

    Hi Tom, Thanks for the feedback, and congrats on doing that port. That symptom sounds like the program is overly sensitive to the random starting weights. The two modifications mentioned in comment #32 above might fix that. If you’ve already made those changes, then I would be tempted to run the VB version and the C++ version side-by-side with some debugging enabled and see which calculations are different. Let us know what you find out.

  36. Tom wrote:

    Hi Dave,

    It works :)

    I did what you’d suggested but at first didn’t see a problem when debugging and comparing your C++ code. But by doing so, I focused on the random weight generator code.

    The problem went away when breakpointing the random number generator. Then I wrote some helper functions to print out the weights. When running, they where all the same. This was because I generated a random number in de connections constructor.

    The random generator in VB however (the one I used) initializes with a seed. And for this seed I used the system time. And, of course, when running, it was always initialized with the same millisecond seed (not when debugging).

    So, I’ve put the initialization of the random generator two levels up, and all works great!!

    Thanks for the help, and if you’d like, I will send you the VB-code… Thanks.

  37. Eduardo Nascimento wrote:

    Hello, Dave, thanks for doing those lessons. I have some questions to ask you, and I would apreciate if you give me your e-mail.

  38. Fabian wrote:

    Hello, Thanks for your tutorial,

    I don’t know, but the formula that Catalin is citing in
    is the same that you use in your code (but with g(z) factored out), maybe the author changed updated the article.

    Also, the way you are using the function transferFunctionDerivative you get the “exact” derivative,
    replacing the implementation with 1-tanh(x)*tanh(x) is wrong, because you are passing m_outputVal to this function, and
    m_outputVal = tanh(sum(vi)) = tanh(sum(wi*xi)), so tanh(vi)’ = 1-tanh(vi)^2 = 1-m_outputVal^2.

  39. daniel wrote:

    great tutorial, i have a question .. which parameter indicates the program to stop training

  40. Jakob Thomsen wrote:

    It seems the variable m_recentAverageError is not initialized (at least in the downloadable source code),
    which can result in a wrong average error being reported.
    m_recentAverageError = 0;
    in the Net-constructor fixes this.

    By the way: nice program & great tutorial!

  41. Tomm wrote:

    I have a little project I am doing with a Raspberry Pi (never dreamed of this kind of computer power in such a small package). I can compile with gcc in linux on the Pi, or go to the version in Python posted above. Before I take off on that trail, which path do you suggest will produce less headaches for me?

    You asked for us to post what we were trying to do with the code… I will try to find your email address and forward that to you.

    Thanks for sharing your talents!

  42. Abdullah wrote:

    Is there any way to save this trained neural network for future use?
    Its time expensive to train everytime when a new data arrives.

  43. Dave wrote:

    Hi Jakob, thanks for the feedback and suggestion. You can also initialize m_recentAverageError to a value that is more like what an untrained net is expected to yield. Then you can compare that value against some threshold to determined when to stop training.

  44. Dave wrote:

    @daniel, you can insert a line in the program at any convenient place to test the net’s error measurement and stop training when the error falls below some threshold. You can use the variable m_recentAverageError for that if you initialize it to some nonzero value.

  45. Dave wrote:

    Hi Abdullah, that’s a matter of saving all the weights somewhere. This simple demo program does not have code for that, but you can add that function pretty easily. For example, you could insert a test in the program somewhere to determine when to stop training, then at that point, loop through all the weights in all the neurons in all the layers and write the weights to a file or to a database or however you want to store the weights. Then you can write a complementary routine that reads the weights and initializes the net when you want to run the trained net. Also feel free to check out neural2d, a very similar neural net but with more features, including saving weights to a file.

  46. Abdullah wrote:

    Thankx a Lot Dave!!! for such a quick response..
    and your solution really helped me to solve my long awaited problem..
    cheers :D..
    and will surely take a look at neural2d

  47. erfan wrote:

    Hi, thanks for your great tutorial.
    sorry but I’m new to this field. I want to implement an online training. so is it possible to run this code for online learning?
    Thank you

  48. Dave wrote:

    Hi erfan, I’m sorry, I’m not sure what you are asking. You can use the neural net source code any way you like. Be sure to check out the video that goes with it, and the very helpful comments on this page contributed by many individuals.

  49. erfan wrote:

    Hi, thanks for your quick response.
    I accidentally skipped what you said about the training data in the code, Now I get it. Great job, thanks again.

  50. Muhammad Ghazanfar Ali wrote:

    It was an awesome tutorial. I used to hate watching tutorials and this tutorial has put love in my heart for learning from this kind of sources. May ALLAH give you reward for this.
    Well, I was all able to understand the code till you were explaining =P and I was coding it myself but I am unable to get how the data is getting trained as the trainData code is not understood by me. I went through it but in vain. Plus can you please explain how can I use breast cancer dataset from UCI repository, which is in .txt. format, with this code. Please help me asap with the both queries. Thank you!

  51. adhu wrote:

    I am getting an o/p as below:
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!

    Pass 1
    Program ended with exit code: 0

    Can you let me know why ?

  52. Paulo wrote:

    Hi Dave , I used your network to train a classifier using resources Haar -like , I spent these resources to the input layer and then made ​​the spread back to 0 and 1 negative image to a positive image . The result is that the network has the same output for both negative and positive images and the network actually is not trained , what does that mean? Thanks for the tutorial.

  53. Nikos wrote:

    Hello Dave,

    Thanks for this great tutorial. I knew how to create and train NN on Matlab, but in C++ I had no idea!

    I used your code and I got some pretty nice results!

    I have one question though!

    Now that I have trained my data, how can I see the trained weights? So to use them to make a prediction with new values.

    Kind Regards,

  54. john wrote:

    Great video

  55. Max Lumnar wrote:

    I absolutely love your tutorial, very well done and easy to follow.

    I have a question. After so many years, do you think this same code could be used to train a nn for image classification? Or would that require different formulas?

    Thank you for writing this article!

  56. john wrote:

    @Jay, i tried installing your library on windows os on python 2.7 and 3.2 but it is reporting error that it could not be installed what should i do to install it?

  57. ahmed wrote:

    I am new to NN , i am trying to build a prediction model for my Phd project , i want to know how to save and run the NN network against new input to predict new output after training it , i appreciate it much if yo could provide a source ….

  58. Dave wrote:

    Hi Ahmed, see Comment #45 above about that. This little demonstration program does not have a way to save the trained weights for use with new inputs, so you would have to modify the program to do that. Or take a look at for a similar neural net which has that capability already built in. Good luck with the Ph.D. project.

  59. john wrote:

    Thanks for your program. Please in the case where one have large sets of inputs and targets what is the approach because your program did not simulate well most especially on weather data?

  60. Vitaly wrote:

    Greatest NN tutorial by far! The only complaints are Bias neurons (storing *input* weights + bias inside a neuron would result in much cleaner design – no more “size()-1″ mess all over the code, and no more m_myIndex), and the other one is TrainingData thing – generating random 0/1 pairs and XORing them on-the-fly requires 2-3 lines of code, while TrainingData is a big chunk of non-portable code () totally irrelevant to NN/C++. Well, you got punished for that last one already.
    But the whole tutorial is brilliant, thanks a lot, Dave! You rock!

  61. Alfie wrote:

    Hey Dave, I was challenged to make a neural network with back prop and your tutorial was very helpful. However, I know no C++ so I had to translate what you wrote into Python to try and make it, and I’m encountering an error. I get an index out of range error whenever I call updateInputWeights() when referencing the outputWeights using the neuron’s index as the index for the position of the connection container. Would you know a simple fix to that without it affecting how the back prop works? (Sorry if my terminology is bad, I’m 16 and still learning computing as a whole and this is probably the most complex thing I’ve programmed ever :P)

  62. Alfie wrote:

    Hello, I fixed my issue. However, I’m not sure what I should be initializing the deltaWeight as? If I initialize it as 0.0, then the weight never changes, and anything that’s a non-zero value doesn’t do anything either. It doesn’t seem to be learning to drift towards what the target data expects.

  63. Dave wrote:

    Congrats @Alfie on your Python progress.

    I’m afraid I can’t help much with the Python debugging, but I can comment on how the deltaWeight member of the neuron class works. It’s ok to let all the deltaWeights initialize to zero. That will cause no extra momentum to be added to the weights at the first back propagation. If you initialize the deltaWeights to nonzero, the very first call to back propagation will add momentum adjustments to the neurons’ weights that are not quite right, but I would expect that should smooth itself out after several training passes.

    During your debugging, it might be easier to set the alpha parameter to zero. That will effectively disable the use of deltaWeight while you get the rest of the neural net working. With alpha set to zero and momentum disabled, you should still see each neurons’ weight member get updated after every call to back propagation. After the neural net is debugged and working, you can set alpha to nonzero and debug the momentum part to get a little better training performance.

    Even though you’re not into C++ yet, this project would be easier if you could compile the C++ version of the program. Then you can run the C++ and Python versions side by side, set breakpoints or trace the programs line by line, and compare the weight calculations. As soon as the two versions diverge in their calculations, then you can focus on that area and find out why. Otherwise, you’ll need to thoroughly understand the mathematics that’s going on the neural net program and then figure out how to get the Python version to do the right arithmetic.

  64. Cheron wrote:

    Hey Dave, Once I ran the program, it triggered a breakpoint at:;

    Could you please explain the reason for that?
    Thank u in advance!

  65. Dave wrote:

    Hi @Cheron, I wonder if the filename you’re passing into that function is valid. Check in the main() function where the filename is specified for the training data, and verify that it’s a valid path on your computer.

  66. Phyllis Sutherland wrote:

    Hi David, this is an awesome tutorial. It helped me immensely. I have a question about how to add cross-entropy to your code as a cost function for binary classification. I asked it here if you have a chance to offer your insight: Much much much appreciated!!! :)

  67. Phyllis Sutherland wrote:

    Finally found the answer. In Neuron::calcOutputGradients(double targetVal), the m_gradient = targetVal – m_outputVal; if you want to use cross-entropy. That is the only change that needs to be made. Surprisingly simple! See here:

Leave Your Comment

Before you post, please demonstrate that you are a live, honest person:

What is the square root of 144?

Powered by Wordpress and MySQL. Theme by Shlomi Noach,