New Video Tutorial: Make a Neural Net Simulator in C++

Released today! Our newest video programming tutorial, A Neural Net Simulator in C++, is now available for your viewing pleasure at the following link:

[ Update for 2013: Also see the new companion video for visualizations of how neural nets work and how to train them: The Care and Training of Your Backpropagation Neural Net.  ]

Neural Net in C++ Tutorial from David Miller on Vimeo.

If you’re a beginning to intermediate C++ programmer, this tutorial will guide you through the analysis, design, and coding of a command line console program that implements a neural net in C++. You’ll end up with a classic back propagation model with adjustable gradient descent learning and adjustable momentum. You’ll see how to teach your neural net to solve a simple task, then you can supply your own training data to train your net to do wonderful and amazing things.

Besides showing how a neural net works, we also discuss:

  • C++ class design
  • prototyping with portable C++
  • test early, test often
  • encapsulation, data hiding
  • static class members
  • accessor functions
  • const correctness
  • the assert() macro
  • the vector<> template container, and .size(), .back(), and .push_back() member functions
  • reference variables

This tutorial does not cover or require exception handling, memory management, STL iterators, inheritance, threads, or graphical input or output.

The finished neural net source code is also available for download here (see the video for instructions on what to do with it):

 


Comments

131 responses to “New Video Tutorial: Make a Neural Net Simulator in C++”

  1. Hello,

    You have a bug in your code. After you set up the net in the Net() constructor, you do not initialize the bias node.

    This code must be slightly different than the code you showed in your online video, because the one in your online video has this initialization.

    Essentially ,you’re missing this line:

    // force the output node’s bias to 1.0
    m_layers.back().back().setOutputVal(1.0);

    1. Ben, thank you very much for spotting and reporting that omission. The downloadable source code should match the video now. (For reference, the line we’re talking about is near the end of the video at around 59:45).

      The bias neurons are interesting. As far as I can tell, they don’t have to be initialized to any particular value; they just need to be nonzero. Then the backprop algorithm will change the connection weights of the bias nodes to scale the value to whatever the neurons need in the subsequent layer.

  2. Hi Dave, first of all thanks for this great video tut. It really helped me to understand some certain things:)

    I downloaded the code for windows(dos version) and after building the project(which by the way ends without any errors or warnings), when I stard .exe file I get the message:

    Debug Error!
    R6010
    – ()abort has been called

    can you give me a hand on this issue?thanks in advance:)

    1. Hi Theo, the only abort() statement in the program is at line 45, so this should be easy. It checks that the input training data begins with a line that contains the string “topology:”. So just be sure that your input training file is formatted the same way it is shown in the video and it should get past that abort(). The sample input file used in the video (at around 1:00:50) begins with these lines:


      topology: 2 4 1
      in: 1.0 0.0
      out: 1.0
      in: 1.0 1.0
      out: 0.0

      As mentioned in the video, that entire class called TrainingData was intended to be just a placeholder, with the expectation that you’ll replace it with something more appropriate for your needs, and so that class is not burdened with a lot of friendly error messages and error recovery. So if you use the TrainingData class as shown without modification, just be careful to supply it with the same format of input data used in the video.

  3. Thanks a lot Dave:)I got it..

  4. B Rehart Avatar
    B Rehart

    This is my favorite tutorial! It’s over an hour long and I didn’t even notice. I always wondered about neural nets and how they worked. Since it seems they take a lot of iterations to learn stuff, I wonder how to optimize them for game AI agents. AI agents can’t take 2000 iterations to learn not to attack an opponent through the valley, surrounded on both sides in the killing zone. Is there a faster learning algorithm?

  5. This is a nice intro to neural networks, and it’s certainly making my task of getting something going in that area easier. However, you made one serious mistake. The rand function has no specified implementation, so it’s highly nonstandard. On Turbo C way back when, I actually saw an implementation that cycled after 137 numbers! Hopefully no implementation is as bad today, but rand() is not designed for floating point applications. drand48() is the ANSI standard uniformly distibuted u[0,1]. Even better, use boost, but if you’re trying to keep things simple I can understand why your wouldn’t.

    The other nitpicky things are not serious, but I don’t see why you abstracted certain things the way you did. Bias seems to me to be intrinsic to the neuron, so it seems to me that having a private member with the bias number is the way to go, not creating an extra bogus Neuron that is always one, and then weighting it. Sure it’s slower and bigger, but it’s not the speed, it’s the lack of clarity that bothers me. Similarly, I don’t see that you gain anything by making a bunch of bogus neurons on the input. The abstraction should be that you link each neuron with the location where its input values are.

    Any way you slice it, you’ve done a real service to anyone getting started in neural networking. I hope these comments serve to make your code even better.

    Dov

  6. Oh, another small thing I noticed, you pass strings by value unnecessarily, and don’t always const parameters that are readonly:

    void showVectorVals(string label, vector &v)

    should be:

    void showVectorVals(const string& label, const vector &v)

    1. Hi Dov, Thanks very much for the thoughtful feedback; it deserves a thoughtful reply.

      You’re absolutely correct that showVectorVals() could take its string argument by reference to const. That’s a good suggestion.

      The bias input is slightly special, but only because its input is constant when running a trained simulator. The reason the bias neuron is not a private member of the neuron class is so that the backpropagation loop can adjust its weight the same way it adjusts all the other weights. A neuron is just an integrator, and the bias input carries what you could think of as a DC component that goes with the signals that are on the other inputs. The bias is no more or less important than the other input signals that get summed, and its weighting is recalculated during training in the same way as the other neurons. They’re more alike than different, and many loops are simplified by including the bias neuron in the loops. But you can certainly abstract it differently and implement it in several alternative ways, all mathematically equivalent and all quite correct.

      I appreciate your comments about the deficiencies of rand(), but it’s not a problem here. Besides, that’s our only choice if we’re sticking with standard C++ features. (The drand48() function is not standard in C++98 nor C++11.) In C++-11, you could use new cryptographic quality random number generators if you needed random bits that would survive intense statistical analysis. But we don’t need that in this program. We’re only using rand() to initialize weights so that the infant neural net begins its walk toward a solution somewhere in the midst of the huge solution space instead of always starting out at the same remote corner of the solution space. We only need randomness like a shotgun. Even the old Turbo C implementation would have sufficed. And as you can see in http://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use, the rand() function as implemented in today’s compilers is way more than sufficient for our needs.

      Thanks again for the comments!

  7. Steve(UK) Avatar
    Steve(UK)

    Really enjoyed your tutorial David , helps in understanding the basics in Neural Networks and what a bonus some c++ coding to play around with, many many thanks.

    Best Regards
    Steve

  8. Irina (CR) Avatar
    Irina (CR)

    This is the first time I see one of your tutorials. I have to say, I’ve been really interested in AI for some years now, but I’m still an undergraduate student just seeing where her career is going, which I hope is that area. I was looking for information about neural nets, and I found your video. This not only made me love more AI, but inspired me. Thank you for your awesome tutorial and for sharing your knowledge.

  9. Hindi Avatar
    Hindi

    Thanks a lot for this tutorial. I followed some courses about neural networks and leaving the theory for some practice is really interesting.

  10. When calculating gradients in hidden Layers, you calculated bias’s signal error. Is that true ?
    Bias is not fixed ?
    Thanks

    1. That’s right, during training, the biases gets adjusted during each backpropagation pass.

  11. Greetings;
    when I tried your written c++ code for the Neural Net with visual studio 2012. when I debugged it I got the (R6010 Error: abort() has been called. the the program stops.
    thanx

    1. Hi Eak, check out comment #4 above (http://www.millermattson.com/dave/?p=54#comment-75 ) — the only abort() in the program occurs if the input data file isn’t formatted as shown.

  12. Greetings;
    Its me again… I got the (E6010 Abort() has been called) even when I try to run the makeTrainingSamples.ccp

    1. Sorry you’re having troubles with it. You may need to run the neural net with a debugger and set a breakpoint somewhere in lines 41 – 44 and see why the string “topology” is not being found in the line that was read from the file.

  13. Hello Dave,
    first of all great tutorial! I am trying to run this Neural Net for a set of data that requires more than 1 output (actually it requires 10 outputs) and has 64 inputs. From what I understand from the tutorial is that in topology it has to be something like 64 64 10 right? The thing is that when i try to run the program with those numbers the program crashes ( actually only the output does crash the program) with the error: Assertion failed! in line 390 targetVals.size() == topology.back(). This application has requested the Runtime to terminate it in an unusual way!

    Any help appreciated 🙂

  14. Hello again.. Correction to my previous comment! The output needs to be between 0-9 thats what i meant! Because the targets are between those numbers!

    Thanks again 🙂

    1. Hi Mike, You’re on the right track. That assert at line 390 is triggered if your input data file isn’t formatted exactly as the program expects. First, make sure that the input data filename is correct at line 361 The contents of trainingData.txt should be formatted as shown in Comment #4 above. For example, try the following and verify that it works in a simple case — this creates a net with 64 inputs, 10 outputs, and it contains one trivial training sample:

      topology: 64 64 10
      in: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      out: 0 0 0 1 0 0 0 0 0 0

      If that works, then append all your input samples with their actual input and target output values, following that same format. All the input and output values should be between -1 and 1 if you’re using the default tanh transfer function.

      Also, if you have a large number of inputs, you might want to see if the newer program neural2d is easier to work with: http://neural2d.net.

  15. Good Afternoon!

    First of all thanks for the fast reply! Yes, I have made sure that the trainingData file matches that format! My data set looks something like this

    in: 0 1 11 13 2 0 0 0 0 8 15 15 6 0 0 0 0 10 9 6 14 0 0 0 0 3 10 4 10 0 0 0 0 0 0 11 7 0 0 0 0 0 2 15 3 0 0 0 0 2 13 16 13 11 2 0 0 1 12 12 12 15 11 0
    out: 2
    in: 0 0 3 10 14 3 0 0 0 8 16 11 10 13 0 0 0 7 14 0 1 15 2 0 0 2 16 9 16 16 1 0 0 0 12 16 15 15 2 0 0 0 12 10 0 8 8 0 0 0 9 12 4 7 12 0 0 0 2 11 16 16 9 0
    out: 8

    the outputs are between 0 and 9. But as you said I have to change the transfer function and the transfer function derivative in order to get network to train to get between the values of 0 and 9 right? can i use something like this: return abs(9 * cos(x)); ??

    Thanks a lot again!
    Oh and happy holidays!!

    1. Thanks, and Happy Holidays!

      What do you do next depends whether you’re training the net to be a scorer or a classifier. I don’t know which you are doing, so I’ll describe both types.

      First, let’s talk about topology.

      An example of a scorer is where the input values are financial data about a stock and the output is a single floating point number indicating how strongly you should buy that stock, where a low value means “don’t buy this stock” and a high value means “buy immediately”. In that case, you only need one neuron in the output layer and your topology would be specified as 64-64-1 (not 64-64-10).

      On the other hand, an example of a classifier is where the input values are from an image of ten possible things and the net detects which thing it is. In that case, you’ll want ten individual neurons in the output layer, one for each possible thing, and the topology would be specified as 64-64-10. In the topology file, the out: line would have ten numbers. You want to train the net to output a high value on just one output neuron that corresponds to the pattern detected, and train all the other output neurons to be a low value. Then whichever output neuron has the highest value indicates which pattern was detected most strongly.

      Next, let’s talk about the value ranges of input and outputs. Rather than change the code to output a different range, just scale the values outside of the neural net — give your net input values between -1 and 1, and train it to output values between -1 and 1. In the training data you provided above, you could divide all the inputs by, say 100, so that the first input line looks like:

      in: 0.0 0.01 0.11 0.13 0.02 0.0 0.0 0.0 0.0 0.08 0.15 , etc

      Same for the outputs. If the output is a scorer, then instead of training the net to output 0.0 – 9.0, divide that by ten so that the output values are in the range 0.0 – 0.9. Or if the output is a classifier, then train it to output 1.0 on the neuron corresponding to the class, and -1.0 (or 0.0) on all the other neurons. In the training data you provided above, if the first training sample output means “class 2 of ten possible things,” then the out: line would look like:

      out: -1 -1 1 -1 -1 -1 -1 -1 -1 -1

      Putting that all together, if you’re doing a classifier, then your input data file would start out as:

      topology: 64 64 10
      in: 0.0 0.1 0.11 0.13 0.2 0.0 0.0 0.0 0.0 0.8 0.15 , etc.
      out: -1 -1 1 -1 -1 -1 -1 -1 -1 -1

      And if it’s a scorer, then your input data file would start out as:

      topology: 64 64 1
      in: 0.0 0.1 0.11 0.13 0.2 0.0 0.0 0.0 0.0 0.8 0.15 , etc.
      out: 0.2

  16. Ah I see… that clarified things a lot!!

    Thanks for all the effort you’ve been through explaining me this 🙂 Because the dataset is quite big (around 1700 patterns) i will have a look at the other Neural net as you said! But really that help me a lot to understand more things on how it should be trained 🙂

    Thanks again and keep up the good work!

  17. Great tutorial! I followed your tutorial and reimplemented it in Python. I also included a function to visualize the ANN to better debug or see the system. It can be found at:

    https://github.com/jayluan/jANN

  18. Catalin Avatar
    Catalin

    Hello,

    First of all I want to thank you for your great tutorial about neural network – c++ implementation. 🙂

    I have started to build my own neural network starting from you code and I have wrote a similar code in Java.

    However, after some research I have found that you code use a different method to calculate neuron gradient for hidden layers.

    void Neuron::calcHiddenGradients(const Layer &nextLayer)
    {
    double dow = sumDOW(nextLayer);
    m_gradient = dow * Neuron::transferFunctionDerivative(m_outputVal);
    }
    On some books I`ve noticed that the gradient for an hidden node is calculated using the formula:
    m_gradient = dow * transferFunctionDerivative(sumIOW());
    Where sumIOW() it`s the sum of Imputs * weights.
    If you take a look on this resource: https://theclevermachine.wordpress.com/2014/09/06/derivation-error-backpropagation-gradient-descent-for-neural-networks/, the formula for hidden layer gradient (equation 10) use sumIOW, and the entire mathematics for this article is correct.
    If I change your code and I use this formula the neural network does not converge (I`ve used different values for learning rate and for momentum), and it`s a little bit strange, because it seems that this is the correct formula for hidden layer gradients.
    I have tried different implementations for neural network but you formula seems to be the best way to calculate hidden layer gradients.
    My question is: Why is your method better and how you found that this is the correct implementation for hidden gradients? I didn`t see this method on other places.
    Thank you in advance!

    1. Hi Catalin, I find that part of the math confusing too, and I don’t think I could adequately explain the relationship between those approaches, but they do work out the same. You might want to compare with the code in http://www.cs.bham.ac.uk/~jxb/NN/nn.html which uses a similar approach and compare that with the math in http://en.wikipedia.org/wiki/Backpropagation#Finding_the_derivative_of_the_error .

  19. Hello,

    thank you for great tutorial about neural network in c++. 🙂 I am writing diploma thesis (physics) draught force on plough and I find the code very useful to train neural network “my data for draught force”.

    Do you allow that I use your coude to train neural network with my training data? Of course I will cite it.

    Thanks and keep up good work.

    1. Thanks Kamensek, feel free to use the code, and good luck with the thesis.

  20. Catalin Avatar
    Catalin

    Why this code is not working for [2 20 1] topology? I use 20 neurons for hidden layer. The output is always 0.999 for any input values. I think there are some math issues here…. This implementation works only for small topology (max 7 neurons in hidden layer) In my opinion there is a problem regarding layer gradient. Eg: For a big error (0.999) the gradient for output neuron is 0.0018…. This gradient should be bigger to decrease the error.

    Thank you!

  21. Catalin Avatar
    Catalin

    I have fixed the issue regarding big topology (more than 7 neurons in hidden layer). If I use transfer function derivative: 1 – tanh * tanh (the actual derivative) the neural network is working for a big hidden layer. Using the approximation for transfer function derivative (1 – x*x) is not working for a neural network with 20 hidden neurons for example. I have tested neural network for XOR function.

    1. Thanks for reporting that, Catalin. That helps confirm a comment that was made at https://vimeo.com/19569529, which I’ll copy below for reference:

      …the weights are starting out way too large, causing many of the neurons to start out saturated, and that can be a handicap for an infant neural net. (During training, if the net wants to saturate some neurons, that’s its business.) To fix that, scale down the initial weight in line 223.

      Second, the approximate tanh derivative function works suboptimally for neurons that are very saturated. For a more robust formula, change the expression in line 201 to the exact formula: 1.0 – tanh(x) * tanh(x).

  22. Catalin Avatar
    Catalin

    In your code you are using the following formula (node delta = transferFunctionDerivate(m_output) * sumDOW) to calculate neuron gradient.

    I have done some research and I think is better to use the following formula (node delta = transferFunctionDerivate(sumIOW) * sumDOW) to calculate the neuron gradient.
    Where somIOW is the sum of inputs * weights.
    You can take a look on https://theclevermachine.wordpress.com/2014/09/06/derivation-error-backpropagation-gradient-descent-for-neural-networks/

    Also you have to use the exact derivative for transfer function (not 1-x*x aproximation). If you use 1-x*x the result for node delta may go to +/- infinity.

    If you do this changes the neural network is going to minimize the error faster and you can avoid the situation described above (for big neural networks).

    However, this two approaches are not very different. In your code the node delta formula is a aproximation of the standard method and I think your method is a little bit faster.

    Also I have translated your code in Java and I cand send you this code. Maybe some java programmers are interested for your implementation.

    Thank you.

  23. Hi,

    Excellent tutorial, thanks for sharing.

    I ported your code to VB.NET (yes, indeed :$) and implemented the XOR-test. Sometimes it trains well, most of the times, however, it doesn’t. Is there a possibility you can help me?

    1. Hi Tom, Thanks for the feedback, and congrats on doing that port. That symptom sounds like the program is overly sensitive to the random starting weights. The two modifications mentioned in comment #32 above might fix that. If you’ve already made those changes, then I would be tempted to run the VB version and the C++ version side-by-side with some debugging enabled and see which calculations are different. Let us know what you find out.

  24. Hi Dave,

    It works 🙂

    I did what you’d suggested but at first didn’t see a problem when debugging and comparing your C++ code. But by doing so, I focused on the random weight generator code.

    The problem went away when breakpointing the random number generator. Then I wrote some helper functions to print out the weights. When running, they where all the same. This was because I generated a random number in de connections constructor.

    The random generator in VB however (the one I used) initializes with a seed. And for this seed I used the system time. And, of course, when running, it was always initialized with the same millisecond seed (not when debugging).

    So, I’ve put the initialization of the random generator two levels up, and all works great!!

    Thanks for the help, and if you’d like, I will send you the VB-code… Thanks.

  25. Eduardo Nascimento Avatar
    Eduardo Nascimento

    Hello, Dave, thanks for doing those lessons. I have some questions to ask you, and I would apreciate if you give me your e-mail.

  26. Hello, Thanks for your tutorial,

    I don’t know, but the formula that Catalin is citing in
    http://www.millermattson.com/dave/?p=54#comment-3492
    is the same that you use in your code (but with g(z) factored out), maybe the author changed updated the article.

    Also, the way you are using the function transferFunctionDerivative you get the “exact” derivative,
    replacing the implementation with 1-tanh(x)*tanh(x) is wrong, because you are passing m_outputVal to this function, and
    m_outputVal = tanh(sum(vi)) = tanh(sum(wi*xi)), so tanh(vi)’ = 1-tanh(vi)^2 = 1-m_outputVal^2.

  27. great tutorial, i have a question .. which parameter indicates the program to stop training

    1. @daniel, you can insert a line in the program at any convenient place to test the net’s error measurement and stop training when the error falls below some threshold. You can use the variable m_recentAverageError for that if you initialize it to some nonzero value.

  28. Jakob Thomsen Avatar
    Jakob Thomsen

    It seems the variable m_recentAverageError is not initialized (at least in the downloadable source code),
    which can result in a wrong average error being reported.
    Inserting
    m_recentAverageError = 0;
    in the Net-constructor fixes this.

    By the way: nice program & great tutorial!

    1. Hi Jakob, thanks for the feedback and suggestion. You can also initialize m_recentAverageError to a value that is more like what an untrained net is expected to yield. Then you can compare that value against some threshold to determined when to stop training.

  29. Hi,
    I have a little project I am doing with a Raspberry Pi (never dreamed of this kind of computer power in such a small package). I can compile with gcc in linux on the Pi, or go to the version in Python posted above. Before I take off on that trail, which path do you suggest will produce less headaches for me?

    You asked for us to post what we were trying to do with the code… I will try to find your email address and forward that to you.

    Thanks for sharing your talents!
    tomm

  30. Is there any way to save this trained neural network for future use?
    Its time expensive to train everytime when a new data arrives.

    1. Hi Abdullah, that’s a matter of saving all the weights somewhere. This simple demo program does not have code for that, but you can add that function pretty easily. For example, you could insert a test in the program somewhere to determine when to stop training, then at that point, loop through all the weights in all the neurons in all the layers and write the weights to a file or to a database or however you want to store the weights. Then you can write a complementary routine that reads the weights and initializes the net when you want to run the trained net. Also feel free to check out neural2d, a very similar neural net but with more features, including saving weights to a file.

  31. Thankx a Lot Dave!!! for such a quick response..
    and your solution really helped me to solve my long awaited problem..
    cheers :D..
    and will surely take a look at neural2d

  32. Hi, thanks for your great tutorial.
    sorry but I’m new to this field. I want to implement an online training. so is it possible to run this code for online learning?
    Thank you

    1. Hi erfan, I’m sorry, I’m not sure what you are asking. You can use the neural net source code any way you like. Be sure to check out the video that goes with it, and the very helpful comments on this page contributed by many individuals.

  33. Hi, thanks for your quick response.
    I accidentally skipped what you said about the training data in the code, Now I get it. Great job, thanks again.

  34. Muhammad Ghazanfar Ali Avatar
    Muhammad Ghazanfar Ali

    It was an awesome tutorial. I used to hate watching tutorials and this tutorial has put love in my heart for learning from this kind of sources. May ALLAH give you reward for this.
    Well, I was all able to understand the code till you were explaining =P and I was coding it myself but I am unable to get how the data is getting trained as the trainData code is not understood by me. I went through it but in vain. Plus can you please explain how can I use breast cancer dataset from UCI repository, which is in .txt. format, with this code. Please help me asap with the both queries. Thank you!

  35. I am getting an o/p as below:
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!
    Made a Neuron!

    Pass 1
    Done
    Program ended with exit code: 0

    Can you let me know why ?

  36. Hi Dave , I used your network to train a classifier using resources Haar -like , I spent these resources to the input layer and then made ​​the spread back to 0 and 1 negative image to a positive image . The result is that the network has the same output for both negative and positive images and the network actually is not trained , what does that mean? Thanks for the tutorial.

  37. Hello Dave,

    Thanks for this great tutorial. I knew how to create and train NN on Matlab, but in C++ I had no idea!

    I used your code and I got some pretty nice results!

    I have one question though!

    Now that I have trained my data, how can I see the trained weights? So to use them to make a prediction with new values.

    Kind Regards,
    Nikos

  38. Great video

  39. I absolutely love your tutorial, very well done and easy to follow.

    I have a question. After so many years, do you think this same code could be used to train a nn for image classification? Or would that require different formulas?

    Thank you for writing this article!

  40. @Jay, i tried installing your library on windows os on python 2.7 and 3.2 but it is reporting error that it could not be installed what should i do to install it?

  41. I am new to NN , i am trying to build a prediction model for my Phd project , i want to know how to save and run the NN network against new input to predict new output after training it , i appreciate it much if yo could provide a source ….

    1. Hi Ahmed, see Comment #45 above about that. This little demonstration program does not have a way to save the trained weights for use with new inputs, so you would have to modify the program to do that. Or take a look at https://github.com/davidrmiller/neural2d for a similar neural net which has that capability already built in. Good luck with the Ph.D. project.

  42. Thanks for your program. Please in the case where one have large sets of inputs and targets what is the approach because your program did not simulate well most especially on weather data?

  43. Greatest NN tutorial by far! The only complaints are Bias neurons (storing *input* weights + bias inside a neuron would result in much cleaner design – no more “size()-1” mess all over the code, and no more m_myIndex), and the other one is TrainingData thing – generating random 0/1 pairs and XORing them on-the-fly requires 2-3 lines of code, while TrainingData is a big chunk of non-portable code () totally irrelevant to NN/C++. Well, you got punished for that last one already.
    But the whole tutorial is brilliant, thanks a lot, Dave! You rock!

  44. Hey Dave, I was challenged to make a neural network with back prop and your tutorial was very helpful. However, I know no C++ so I had to translate what you wrote into Python to try and make it, and I’m encountering an error. I get an index out of range error whenever I call updateInputWeights() when referencing the outputWeights using the neuron’s index as the index for the position of the connection container. Would you know a simple fix to that without it affecting how the back prop works? (Sorry if my terminology is bad, I’m 16 and still learning computing as a whole and this is probably the most complex thing I’ve programmed ever :P)

  45. Hello, I fixed my issue. However, I’m not sure what I should be initializing the deltaWeight as? If I initialize it as 0.0, then the weight never changes, and anything that’s a non-zero value doesn’t do anything either. It doesn’t seem to be learning to drift towards what the target data expects.

    1. Congrats @Alfie on your Python progress.

      I’m afraid I can’t help much with the Python debugging, but I can comment on how the deltaWeight member of the neuron class works. It’s ok to let all the deltaWeights initialize to zero. That will cause no extra momentum to be added to the weights at the first back propagation. If you initialize the deltaWeights to nonzero, the very first call to back propagation will add momentum adjustments to the neurons’ weights that are not quite right, but I would expect that should smooth itself out after several training passes.

      During your debugging, it might be easier to set the alpha parameter to zero. That will effectively disable the use of deltaWeight while you get the rest of the neural net working. With alpha set to zero and momentum disabled, you should still see each neurons’ weight member get updated after every call to back propagation. After the neural net is debugged and working, you can set alpha to nonzero and debug the momentum part to get a little better training performance.

      Even though you’re not into C++ yet, this project would be easier if you could compile the C++ version of the program. Then you can run the C++ and Python versions side by side, set breakpoints or trace the programs line by line, and compare the weight calculations. As soon as the two versions diverge in their calculations, then you can focus on that area and find out why. Otherwise, you’ll need to thoroughly understand the mathematics that’s going on the neural net program and then figure out how to get the Python version to do the right arithmetic.

  46. Hey Dave, Once I ran the program, it triggered a breakpoint at:
    m_trainingDataFile.open(filename.c_str());

    Could you please explain the reason for that?
    Thank u in advance!

    1. Hi @Cheron, I wonder if the filename you’re passing into that function is valid. Check in the main() function where the filename is specified for the training data, and verify that it’s a valid path on your computer.

  47. Phyllis Sutherland Avatar
    Phyllis Sutherland

    Hi David, this is an awesome tutorial. It helped me immensely. I have a question about how to add cross-entropy to your code as a cost function for binary classification. I asked it here if you have a chance to offer your insight: http://stackoverflow.com/questions/41705055/cross-entropy-applied-to-backpropagation-in-neural-network Much much much appreciated!!! 🙂

  48. Phyllis Sutherland Avatar
    Phyllis Sutherland

    Finally found the answer. In Neuron::calcOutputGradients(double targetVal), the m_gradient = targetVal – m_outputVal; if you want to use cross-entropy. That is the only change that needs to be made. Surprisingly simple! See here: https://visualstudiomagazine.com/articles/2014/04/01/neural-network-cross-entropy-error.aspx

  49. Sir please uploaded trainingdata.txt file as we are unable to solve the abort() problem.Kindly publish trainingdata.txt file

    1. Hi @umesh, I no longer have that original training file, but you can just copy the format from Comment #4 above.

  50. hii sir actually sir im working on air quality prediction project and i have input data of more than 2000 rows containing different gases..Actually sir i am unable to transform into 2 4 1 topology kindly suggest some links or post or give some material to form training data set

    1. Hi @umesh, I’m not sure what you’re asking about transforming something into a 2-4-1 topology, and I’m not sure if the 2000 number refers to the number of test cases (input samples) you have, or if that’s the number of floating point components in each input sample. You can define your neural net to have however many inputs and outputs you need to match the problem you’re solving. If each test case has 2000 components, you can define a neural net topology with 2000 input neurons. The same applies to the output layer — if you want to train the net to give you N numbers for each test case, then define a topology with N output neurons. Also you may also find this video helpful: http://vimeo.com/technotes/neural-net-care-and-training .

  51. hi Dave thanks for ur tutorial ,, its my primary once that i use programming c++ ,, i m really not good in c++ , and my project around MLP ,, i don’t know what i should change in this code for giving me the initialisation weights and the update weights ,, in the execution .

    1. Hi darine, I’m sorry I don’t understand your question about weights, so here are some comments about weights in general, and feel free to ask your question again in different words. The initial weights are assigned by member function randomWeight() in class Neuron. You can experiment with different kinds of initialization by changing that function. Also see Comment #32 above about that. After your net is successfully trained on a data set, you would typically want to save all the weights in a file or in a database so that you can reconstruct the same net again. This program does not have a way to save the weights, so you would need to add a function to do that in a way that fits your particular needs. Also see Comment #45 about that. If you don’t want to do a lot of C++ coding, you may want to explore other neural net programs that have more features already built in. For example, check out the neural net at neural2d, a very similar neural net in C++ but with more features built in, including a function for saving weights to a file.

  52. hi dave thanks very much ,, i m sorry i m not good in english too 🙁 but i hope that u understand me ,, listen when i build the code we must showin every iteration the averageError and updateweight ; but in this code when i build i show just the averageError in every iteration ,, so how i make change or what ‘s instruction i must add in this code ,

  53. hi dave please i want a help 🙁 when i debugged this code only the averageError and the target input , output were displayed ,,, but i want also updateweight is displayed ,,, so how i change in this code ,,,

  54. Hi darine, the connection weights are updated in function Neuron::updateInputWeights(). Each time that function is called, it will loop through all the neurons in the layer to the left. One connection weight is updated inside each loop iteration. The amount of change to the weight is calculated in the variable newDeltaWeight, and the actual change in weight is made in line 162:

    neuron.m_outputWeights[m_myIndex].weight += newDeltaWeight;

    You could insert a cout statement near line 162 to display the value of newDeltaWeight in each iteration. That will display a lot of data — one number for each connection. You may also want to display additional information to make the data more readable. For example, you could also display the layer number at line 305.

    (For reference for other readers, the weight update function is described in the video starting at 55:00.)

  55. Hi Dave,
    This is by far the best NN tutorial out-there! You are an excellent teacher! As a hello world learning project, I have ported the code to Qlang, a proprietary language and dev environment tailored for professional finance. There I repeat your XOR example with nice results and also add some financial examples. You have been credited in my youtube video both at start and end of the short session, as credits are due. Hope you like it. https://youtu.be/fJ-9oqQRmKY Thanks again!

  56. Hi david thank u very much ,,,, i inserted the statement near line 162 :
    cout<<"neWdeltaWeight: " << Neuron.getOutputWeight() << endl ; and i declarate in class Neuron in public vectorgetOutputWeight()const { return m_outputWeights;} ,, but i find error in this declaration
    but i find problem to create it on the right statement .
    thank you to tel me how can be declared ? for desplayed updateWeight ,, really i need ur help please tell me ,, thanks .

  57. Hi david thank u very much ,,,, i inserted the statement near line 162 :
    cout<<"neWdeltaWeight: " << Neuron.getOutputWeight() << endl ; and i declarate in class Neuron in public vectorgetOutputWeight()const { return m_outputWeights;} ,, but i find error in this declaration
    but i find problem to create it on the right statement .
    thank you to tel me how can be declared ? for desplayed updateWeight ,, really i need ur help please tell me ,, thanks .

    1. Hi darine, near line 162, to see the change in weight, display the variable newDeltaWeight. Try something like cout << newDeltaWeight << endl; Unfortunately, this blog is not the best place to learn basic C++ debugging. You may get faster and better answers from a blog made for learning C++, such as stackoverflow.com. Also, when you ask a question there, it will help if you copy and paste the exact error message you received from the compiler. Good luck!

  58. thanks very much dave , you are very good person 🙂

  59. Hello I am new to programming and due to the interest in the neural networks I watched your complete video on youtube and it is really nice.For deeper understanding I want to run the source code given by you on you website but I am facing some problem in running the code.As I do not have any programming background can you please let me know how can I run the code.
    To run the code I installed C lion and MniGw compiler but still not able to run the code.

    ERROR
    C:UsersMohitCLionProjectsuntitled3cmake-build-debuguntitled3.exe

    This application has requested the Runtime to terminate it in an unusual way.
    Please contact the application’s support team for more information.

    1. Hi Mac, Welcome to the world of programming. I admire your eagerness to learn AI coding. However, I’m sorry I’m not familiar with either of those compilers, and I’m also not very familiar with Windows, so I can’t help much with getting those tools set up. Maybe other readers can jump in here and help.

      I would suggest finding a tutorial on how to compile and run a simple hello-world console program in C++ using your specific compiler. After you accomplish that, you can use the same technique to compile the neural net source code.

      The C++ language is not the easiest language to learn for a beginner. If you want to learn C++, I would suggest you get connected with several online support communities for C++ such as stackoverflow.com and cplusplus.com. For compiler-specific questions, you can search for support forums for your specific compilers.

      You may also want to search for online tutorials in AI, many of which use the Python programming language, an excellent language for learning AI concepts and programming at the same time.

  60. thank you Dave. I really appreciate the revert and I am trying to fix this out since morning , may be i will break my head for some more time.Only one last thing is there anything that I need to modify the given source code?
    or if I will b able to run a Hello world code, will I be able to run the source code provided you without modifying it?
    I am just asking as I am new to programming so don’t have much knowledge to modify code.
    Thank you.

    1. You may need to change the path to the training data file in line 361 to something appropriate for your Windows environment. Also see Comment #4 above about the training data file. Other than that, it should compile and run with a standard-compliant C++ compiler.

  61. sanjay Avatar
    sanjay

    Hello really commendable work ,but have one question
    why I am unable to execute this code as it is the same code that you are running on your video tutorial after executing the program for the first time in video — 22.20

    #include
    #include

    using namespace std;

    class Neuron {};

    typedef vector Layer;

    class Net
    {
    public:
    Net(const vector &topology);
    void feedForward(const vector &inputVals) {};
    void backProp(const vector &targetVals) {};
    void getResults(vector &resultVals) const {};

    private:
    vector m_layers; // m_layers[layerNum][neuronNum]
    };

    Net::Net(const vector &topology)

    {
    unsigned numLayers = topology.size();
    for (unsigned layerNum = 0; layerNum < numLayers; ++layerNum)
    {
    m_layers.push_back(Layer());

    // add a bias neuron in each layer
    for (unsigned neuronNum = 0; neuronNum <= topology[layerNum]; ++neuronNum) {
    m_layers.back().push_back(Neuron()); //.back–gives last value in the container
    cout << "Made a Neuron!" << endl;
    }
    }
    }

    int main()
    {
    vector topology;
    topology.push_back(3);
    topology.push_back(2);
    topology.push_back(1);
    Net myNet(topology);

    vector inputVals;
    myNet.feedForward(inputvals);

    vector targetVals;
    myNet.backProp(targetvals);

    vector resultVals;
    myNet.getResults(Resultvals);
    }

    ERROR—–in function int main()
    ‘inputvals’ was not declared in this scope|

    1. Hi sanjay, it looks like you spelled it inputVals in one place and inputvals in another — C++ is case-sensitive. Also check the spelling of targetVals and resultVals.

  62. Sahin Isik Avatar
    Sahin Isik

    Hi,
    Thanks for your great efforts to explain the concepts of NN with a simple example.

    I need a bit of your help:) I have modified your code in csharp, but something goes wrong that is my nn gives unconsistent outputs to make decision. In an example, I have topology as [2,4,1] with 2000 input samples having 2 features, and corresponding 2000 outputs. But results are not consistent. if you have enough time, I want to share the code with you to make correct.

    1. Hi Sahin, I’m not familiar with C#, but I can try to answer questions about how the program works. If you’re getting different results each time you run the program on the same training set, you might have a data set that is sensitive to the starting weights. You could temporarily change the function randomWeight() in line 131 to set the initial weights to some small deterministic value. Then each run should give the same results. Also, if you could find a way to compile the C++ version, then you could run it in parallel with the C# version and compare the results at different breakpoints to see if the C# version works the same.

  63. sahin isik Avatar
    sahin isik

    Thank you so much 🙂

  64. Hi,

    I tried your code with the XOR example, doing a NN with topology {2, 4, 1}, but it always ends with result “1” for all combinations, giving error = 1 for 0 ^ 0 and 1 ^ 1, and minimal error (3.18e-314) for 0 ^ 1 and 1 ^ 0.

    In some successive tests I made, I found that at iteration 1780 all training passes give “-NaN” as a result.

    Thanks for your attention.

    1. Hi Leo, sorry I don’t know what would cause those results, but we can try to figure it out. Which compiler and version are you using, and on which OS? How are you invoking the compiler? Have you made any changes to the source code? Can you show us the first few lines of the training data file?

  65. Hi,

    re-reading the file I found out I missed a line from your code, where you statically set the recent average smoothing factor. But it stills gives the same results, except on the recent average error being around 0.5 .

    I am using Cygwin’s g++ compiler version 5.4.0, on Windows 8.1 64-bit.

    I invoke the compiler with g++ -Wall -o

    My training data file has:
    1.0 1.0 0.0
    1.0 1.0 0.0
    0.0 0.0 0.0
    0.0 1.0 1.0
    0.0 1.0 1.0
    0.0 0.0 0.0
    1.0 0.0 1.0
    1.0 0.0 1.0
    1.0 0.0 1.0
    1.0 1.0 0.0

    The only change I made was to not use your TrainingData class.

    1. Hi Leo, as far as I know, that compilation environment should work fine. Assuming the unmodified source code compiles and runs correctly for you, then perhaps the modifications you made were not an exact functional replacement for class TrainingData. I’m glad that you replaced that class, but the new code needs to interface with Net::feedForward() and Net::backProp() the same way as before or you’ll get different results. You might be able to find the point of difference by comparing the unmodified and modified versions of the source code, using the same input data, with a breakpoint set in Net::feedForward() or Net::backProp().

  66. Hi Dave, thanks a lot for those fantastic tutorials you put up. First of all, for those of the people who want to copy paste some code to generate a training file:

    class genTrainingData
    {
    public:
    genTrainingData();
    void calcData();

    private:
    vector pos;
    };

    genTrainingData::genTrainingData()
    {
    pos.push_back(“in: 1.0 0.0nout: 1.0n”);
    pos.push_back(“in: 0.0 1.0nout: 1.0n”);
    pos.push_back(“in: 1.0 1.0nout: 0.0n”);
    pos.push_back(“in: 0.0 0.0nout: 0.0n”);
    calcData();
    }

    void genTrainingData::calcData()
    {
    ofstream ost(“tmp/trainingData.txt”);

    ost << "topology: 2 4 1" << endl;

    for (int i = 0; i < 2000; ++i) {
    int randNum = rand() % 4;
    ost << pos[randNum];
    }
    }

    (Just change the number in the for loop in genTrainingData::calcData() to determine the size, currently at 2000)

    For some reason I could never get a solution like you did in your video a clear:

    Pass 2000: Inputs: 1 1
    Outputs: 0
    Targets: 0
    Net recent average error: 0.0175172

    I get something like this towards the end:

    Pass 1998: Inputs: 1 1
    Outputs: -0.00287236
    Targets: 0
    Net recent average error: 0.0178216

    Pass 1999: Inputs: 0 0
    Outputs: 0.0013849
    Targets: 0
    Net recent average error: 0.0176589

    Pass 2000: Inputs: 1 1
    Outputs: 0.00334959
    Targets: 0
    Net recent average error: 0.0175172

    Pass 2001
    Done

    can you tell me why that is?
    Thank you, best regards,
    Phil

  67. Dear David,

    thank you very much for your Video on the NN tutorial. I am sure, it helps a lot of people to dive into the exciting field of NN!

    In comment #14 (http://www.millermattson.com/dave/?p=54#comment-1344) you wrote:
    “That’s right, during training, the biases gets adjusted during each backpropagation pass.”
    I don’t quite agree with that. In your backProp member function you write for the weight updates:

    for (unsigned layerNum = m_layers.size() – 1; layerNum > 0; –layerNum) {
    Layer &layer = m_layers[layerNum];
    Layer &prevLayer = m_layers[layerNum – 1];

    for (unsigned n = 0; n < layer.size() – 1; ++n) {
    layer[n].updateInputWeights(prevLayer);
    }
    }

    So the update is performed for all the Neurons, but not for the bias (note the -1 in the last for loop). I hope I didn't miss anything here, if so, I am sorry.

    But anyways: great tutorial

    1. Hi Mac, thanks for the comment. Remember that the backprop looping goes right to left, and the weights between any two layers are stored with the neurons in the layer to the left. The loop you found is the loop through the right-hand layer of a pair of layers. For each of those neurons, there’s a loop through all the input neurons on the layer to the left (at line 148) including the bias neuron, and that’s where the weights gets updated.

  68. Hi Dave,

    regarding your comment (http://www.millermattson.com/dave/?p=54#comment-5049). Of course you are right!

    I have to admit that the one “extra” Neuron per layer being constant to 1 and acting as the bias for all the Neurons in the subsequent layer was not that intuitive to me. In textbooks you often see that there is
    an individual bias per Neuron (to each Neuron a real number is added, which is updated in each training step, and if I am not mistaken tensorflow is using that approach). With the help of your comment I now understand that those two approaches are mathematically equivalent (obviously, how could I have missed that?!). But of course the weights of the “extra bias neuron per layer” needs to be adjusted in each training step.

    Still, I am not sure, which of the two approaches is “better” regarding computation time, memory, … Any ideas on that?

    Again, thanks a lot for your help. Next time I will reflect more in detail before posting!

  69. Mousumi Paul Avatar
    Mousumi Paul

    Hi,
    Thanks for your great efforts to explain the NN using c++. I have used your code for my time series prediction. I have tried sigmoid activation function but I got error in most of the cases. Then I tried tanh, it is working. But anyhow I am getting output as 1 after first iteration. My expected output is like 23405 then how to calculate the error & go for training? In short, I want my network to produce values like 23405. What should the activation function for the hidden and the output layer then be? I read some suggestion in research gate, they said to use sigmoid in the hidden layers and linear activation function in the output layer. I tried to modify but failed. Can you please provide me the changes in the code to achieve my desired output.

    1. Hi Mousumi Paul, you can scale the training data so that the range of training output values matches the range of values that the activation function can produce. For example, tanh produces values from -1 to +1, so if you need outputs in the range X to Y, just scale the training data so that the range X to Y gets scaled to the range -1 to +1. No changes are needed in the neural net code for this.

      To change the activation function, just replace the member functions transferFunction() and transferFunctionDerivative() in class Neuron. In this program, that will change it for all layers. It’s difficult to predict which activation functions will work the best for a particular data set, so you may need to experiment.

  70. Mousumi Paul Avatar
    Mousumi Paul

    Hi Dave,
    Thank you so much for your suggestion. It enhanced the outputs quite a bit after scaling using a minmax scaler. But I have one more query that If I need to implement adam optimizer in this network then how can I modify the existing code. Should I change the backpropagation method? Can you please provide some suggested link to implement adam optimizer.

    1. Hi @mousumipaul, I’ve seen this paper about the Adam optimizer and it looks very interesting, but I’ve not taken time to digest it, so I don’t know what kind of modifications would be needed. You’re in new territory, and here’s wishing you success. If you get something working, let us know here.

  71. Udonyah Kingsley Avatar
    Udonyah Kingsley

    Hello Dave. I must commend you on your work. This is the best tutorial I’ve on neural network. I am new in AI and I really want to implement my first Neural Network. I wish I had seen this tutorial much earlier.

    I am currently doing a B.Sc Project and would love to apply some of your codes for the work and of course you will be cited.

    As a beginner, I have some challenges though. Towards the end of your video lecture you did a lot of quick pasting which I could not really follow.

    Please, I would like you to explain how to make the “makeTrainingSamples.cpp” and “trainingData.txt”

    Please how do I implement this on Windows. I am using the DEV C++ Compiler

    1. Hi Udonyah Kingsley, thanks for the email and the kind words. I’m sorry I’m not a Windows programmer, so I don’t know the exact steps to compile and run a console program using your compiler, but there should be plenty of tutorials on how to do that. If you can run a simple single-file Hello-World console program, then you should be able to compile and run this neural net source code using the same steps.

      Sorry about how fast the video went through the TrainingData class. The whole purpose of that class is to repeatedly call the functions Net::feedForward() and Net::backProp(), passing the input values to feedForward() and the target output values to backProp() for each training sample. You can write any kind of loop you want that calls those two function repeatedly with your particular training data. If you want to use the TrainingData class as it’s written, then just create a text file called trainingData.txt, and change line 361 to refer to that file using Windows syntax. If you want to replicate the XOR training data, see Comment #4 above for the format of trainingData.txt. After the “topology” line, each two lines is a single training sample — one line specifies the inputs, and the next line specifies the expected outputs. Although there are only four possible XOR combinations with two input neurons, you’ll need to present them to the neural net a few hundred or a few thousand times, preferably in random order. You could create this training file by hand with any text editor by entering a few XOR training samples and then copying them randomly, or you could write a simple little program that outputs random training data for you. I’m sorry I no longer have the source code online for that program called makeTrainingSamples.cpp, but you should be able to freeze the video and see how the source code is structured and replicate it.

      Good luck with the neural net experiments.

  72. Hello Dave!

    Thanks for such a great tutorial. It helped me a lot although I have a little problem and I wonder if you can help me a bit. My net takes as input 7 normalized values and based on this it must decide if the object X belongs to the A, B, C or D class. So lets say X belongs to the A class. The targetValue is a vector like this (1, -1, -1, -1). If X belongs to the B class we have (-1, 1, -1, -1) and so on. The problem is that the network doesn’t converge. The average error is around 0.5. The best results are around 0.05 but is still not sufficently good. I managed to balance the training data so it takes exactly the same number of elements from each class. Should I use the m_error for updating wieghts or something? If you have any idea to improve my network I would really appreciate.

    Best regards,
    Dorian.

    1. Hi Dorian, Congrats on what you’ve done so far. Your approach is sound. At this point, you might benefit from more training samples, or there might be a way you could precondition the input data to make it easier for the neural net to extract the signal from the noise. Also see Comment #32 above for some tweaks to the program. A different network topology might work better, or different transfer functions, or active management of the learning rate. Check out these (external) resources for more technical details about those topics:

      My Neural Network isn’t working! What should I do?

      37 Reasons why your Neural Network is not working

      Debugging Neural Networks: A Checklist

      You need elbow room to experiment, and the program itself is a limiting factor now. You could hack on this little pedagogical program and add many of the improvements described in those resources above, or if you’re more interested in getting a trained net than programming the inner machinery of the simulator, then check out the many excellent neural net frameworks out there with more features so that you can explore the possibilities. Good luck on the journey.

  73. Udonyah Kingsley Avatar
    Udonyah Kingsley

    RE: REPLY #104
    Hello Dave. Thank you very much for the prompt reply. Your explanation was really helpful. There is still very much to learn for me but you have given me a very easy start. Keep the good work going! Once again thank you.

  74. Udonyah Kingsley Avatar
    Udonyah Kingsley

    Hello Dave, I have found in you an excellent teacher already. Please tell me, for instance, this neural network that you have trained here, having finished training it how could you make it remain in the trained state or do you have to train it whenever you want to use it? If you did the training today by feeding training data to it, could you pick it up and use it another day with it still trained. What could be done to keep a network trained?

    1. Hi kingsleyudonyah, that’s a great question. See if Comments #42 and #45 above answer it sufficiently.

  75. Thanks a lot! I will try to improve my net using your links.
    Best wishes!

  76. Udonyah Kingsley Avatar
    Udonyah Kingsley

    Hello Dave! Thanks for your prompt answer! I tried a few codes on inserting a functions to indicate when to stop training and write the weights at this point to a file and then read it for the trained error but as a “newbie” in this field, I have not be able to get this to work. Please could you put me through on how to go about this? Some snippets of codes to illustrate would help to explain better.

    Thanks in advance!

  77. Udonyah Kingsley Avatar
    Udonyah Kingsley

    Hello Dave! Thanks for your prompt answer! I tried a few codes on inserting a functions to indicate when to stop training and write the weights at this point to a file and then read it to make a trained network but as a “newbie” in this field, I have not been able to get this to work. Please could you put me through on how to go about this? Some snippets of codes to illustrate would help to explain better.

    Thanks in advance!

  78. Hi Dave,
    where did you get the formula with smoothing factor from? Thanks!

  79. Hi Dave, great tutorial. I successfully managed to follow it and train it for the logic gate.

    I’m trying to get the network to work on handwriting training examples. I’ve vectorised images that are 28×28, so I have 784 inputs into the network and I want to classify images as 0, 1 … 9.

    I get the network to run over a large number of training examples but for each pattern it always outputs 1. I noticed in one of your comments you said to scale down the initial weights which I have done but still the same output.

    Any ideas? Thanks

  80. Sandeep Yadav Avatar
    Sandeep Yadav

    Hi Dave,

    Great tutorial for a beginner like me.
    I have done the implementation in Java.
    Sharing below for anyone who might be interested.

    https://github.com/sandeepyadavkec/BasicNeuralNetJava

    Thanks again.

  81. Norio Kosaka Avatar
    Norio Kosaka

    Thank you David,

    I have trie coding myself while following your nice video!
    If you have time, please check this out.
    I have added Relu and sigmoid activation function as well.

    https://github.com/Rowing0914/neural_network_cplusplus

    Best,
    Norio

  82. Harish Guruaribam Avatar
    Harish Guruaribam

    Hi David!Thank you for the great tutorial.
    I tried the code on code::block. The compiling is fine with no error and no warning. However, the application stopped. It says,”The application has requested the Runtime to terminate it in an unusual way.”

    1. Hi Harish, that sounds like it might be related to the abort() statement in line 45, which is triggered if the training data file is not in the format the program expects. See Comment #4 above to see if that helps.

  83. Hi Dave,

    thank you for the video it is really great.

    I am working wth 40 input and 1 output . Input is formatted you requested as this

    topology: 40 20 1
    in: 0.7 0.7 0.8 0.8 0.8 0.7 0.6 0.8 0.8 0.9 0.5 0.7 0.6 0.6 0.6 0.7 0.8 0.8 0.8 0.7 0.6 0.7 0.8 0.9 0.5 0.8 0.7 0.8 0.6 0.8 0.5 0.5 0.2 0.2 0.1 0.1 0.8 0.5 0.7 0.3
    out: 0.0
    in: 0.7 0.7 0.8 0.8 0.8 0.7 0.5 0.8 0.8 0.9 0.5 0.7 0.6 0.6 0.6 0.7 0.8 0.8 0.8 0.7 0.7 0.7 0.8 0.9 0.5 0.7 0.7 0.8 0.7 0.8 0.4 0.5 0.3 0.2 0.2 0.2 0.6 0.7 0.5 0.3
    out: 0.0
    in: 0.3 0.1 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.2 0.2 0.3 0.9 0.5 0.3 0.8 0.9 0.9 0.9 0.2 0.2 0.7 0.8 0.3 0.2 0.5 0.5 0.6 0.8 0.1 0.1 0.1 0.1 0.4 0.5 0.6 0.6 0.5 0.7
    out: 1.0
    in: 0.9 0.9 0.8 0.7 0.8 0.3 0.2 0.3 0.3 0.2 0.9 0.9 0.9 0.5 0.2 0.2 0.2 0.4 0.4 0.1 0.8 0.8 0.9 0.5 0.1 0.2 0.2 0.1 0.1 0.3 0.5 0.2 0.2 0.7 0.3 0.5 0.1 0.3 0.5 0.5
    out: 1.0
    in: 0.9 0.9 0.9 0.9 0.9 0.5 0.5 0.5 0.3 0.2 0.7 0.9 0.9 0.9 0.2 0.2 0.3 0.1 0.1 0.1 0.7 0.7 0.8 0.8 0.2 0.1 0.1 0.4 0.5 0.5 0.5 0.2 0.5 0.7 0.3 0.6 0.2 0.2 0.5 0.5
    out: 1.0
    in: 0.8 0.9 0.9 0.9 0.9 0.8 0.9 0.7 0.4 0.5 0.7 0.8 0.9 0.8 0.4 0.2 0.2 0.1 0.1 0.1 0.5 0.5 0.7 0.8 0.4 0.2 0.1 0.4 0.5 0.8 0.4 0.2 0.2 0.7 0.8 0.6 0.1 0.2 0.3 0.3
    out: 1.0
    in: 0.8 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.7 0.7 0.8 0.8 0.8 0.4 0.2 0.3 0.3 0.3 0.5 0.5 0.5 0.7 0.5 0.7 0.6 0.1 0.1 0.3 0.4 0.2 0.5 0.7 0.5 0.5 0.5 0.2 0.1 0.1
    out: 1.0

    But still I am getting error like ” Aborted core dumped” or can say unsual programmed termination.

    Is this data is too much for the this program to handel.

    thanks in advanced .:-)

    1. Hi Ray, I just tried your input data, and it worked fine here. It sounds like your program might be failing to open the input data file. If you’re using the original program without modification, then be sure to change that hardcoded filename and path in line 361 to whatever is appropriate for your operating system. If that’s not the problem, or if you’ve made any other changes, you may need to get out your favorite debugger and see what’s going on in the program at the time of the abort.

  84. Hi Dave,
    Thanks for the quick response,

    I have tried same location for the default training data and it is working fine. I used the same original code for that. It is compiling really good. jut execution time it is showing that error.

    I am using the ‘g++ filename -o ./Prog
    Do you think it might create a problem ?

    1. Hi Ray, I think your command line looks fine. I’m not sure what’s happening over there. I’d suggest using a debugger to see where the program stops and why. My prediction is that the problem has something to do with naming or opening that file, and if you get past the open() statement in line 59, then the rest of the program should run without error.

  85. Thomas Haenselmann Avatar
    Thomas Haenselmann

    Dear Dave,

    your video tutorial on neural networks got me started understanding how feed forward and back propagation actually works. Paused the video
    many times to think about how I would code it next. I find it amazing how posting a video like this can help others to open up a whole new world in the field of machine learning.

    Tried a simple example on your code. The net is shown 2×2 binary pixel images. It is taught to like two pictures with diagonal pixels being set and to dislike the other 14 ones.

    I trained a 4:4:1 network 1000 times and was wondering how to set alpha and eta. So the obvious way to better understand the impact of (alpha,eta) is to sample the area [0..1][0..1] about 1600 times and to show a surface plot. Anyone interested, find it here:

    https://postimg.cc/image/9putimol3/

    Values close to 1 mean the network performs as well as random answers and 0 stands for perfect results.

    There is a plane of bad (alpha,eta) values. Basically combination of (alpha+beta) > 1.3 fail to train the network. My understanding of the two parameters is still somewhat vague but it seems a kind of convex combination which fails for sums > 1.

    The half-plane for (alpha+eta) < 1 works fine but interestingly there is an island in the middle where combinations don't work out optimally. In addition there are single instances of (alpha,eta) in the bad half-plane that work out well.

    You might argue that this is a coincidence but I ran each combination (alpha,eta) about 100 times to take the average of the prediction error.

    In any case, keep up inspiring other people by your great work,

    Thomas

    1. Hi Thomas, Thanks for the kind words and for sharing the results of your investigations. That was brilliant to make a plot of alpha-eta on that data set. The results are intriguing. It certainly does appear that for any given eta value, there are two bands of alpha values that work optimally, divided by an island of suboptimal values. I can’t immediately visualize in my mind what causes that bimodal contour in the results.

      It would be interesting to compare with plots for more complex training data sets, and to somehow plot the training speed to compare the trade-off of speed and accuracy.

      Best wishes in the experimentations.

  86. MrBang Avatar
    MrBang

    Hello Dave,
    Thank you so much for the video. It learned a ton! Im somewhat a novice when it comes to programming. I want to try to use your code for a project of mine. If I understood correctly, you first present your network the input and correct outputs. Then save the weights and bias. For the next step of setting this into production you feed the network the input and ANN coefficient to get the results you want. I believe this is called supervised learning?

    My problem is that I dont quite understand how to extract these weights and bias.

    I want to create a program that can authenticate users. Uses some information like biometric data(a vector) as input, the output(a secret) is created from a CSPRNG and train the network through this input and output. Once I see that it works, extract the ANN coefficient and see if I can get a match on the passwords.

    Thanks for your time!

    Best regards
    MrBang

    1. Hi @MrBang, thanks for the feedback. You’re asking the right questions. This little demo program does not have the ability to save and restore the weights, so you’d need to write a pair of functions for saving and restoring the weights. Those functions would have nested loops that save/restore all the values in the weights arrays for each neuron in each layer. After successful training, you’d save all the weights, then later you can restore the weights back into a net of the same topology and start forward-feeding it data (without invoking backpropagation any more). The way this program is structured, the bias is just one of the values in the weights array that will get saved and restored, so no special handling would be required for the bias values.

      However, if you’re more interested in getting results on your project instead of fiddling with the neural net internals, then you may have outgrown this little demo program, and you may want to experiment with several other neural net simulators to find one with the features that fit your project well.

  87. Thomas Haenselmann Avatar
    Thomas Haenselmann

    Dear Dave,

    hopefully you are still interested in your project. I kept digging
    deeper into the subject, in particular trying to understand how back propagation actually works.

    One keyword everyone comes across in this context certainly is the “chain rule”. It simply means that the derivative

    of g(f(x)) is g'(f(x)) * f'(x)

    As you explain in your video, the error between what the network provides and what we would like to see can be computed as the squared deltas between the targetVals and the outputLayer. In fact you calculate that in backProp().

    void Neuron::calcOutputGradients(double targetVal)
    {
    double delta = targetVal – m_outputVal;
    m_gradient = delta * Neuron::transferFunctionDerivative(m_outputVal);
    }

    That is where the delta variable in calcOutputGradients() comes from. In fact and more precisely it could be

    delta = 2*(targetVal – m_outputVal);

    because the derivative of x^2 becomes 2*x. From my understanding and for the sake of training the network this is nit-picking because the network is just as happy being trained with delta and we save a multiplication.

    Next comes the derivative of the transfer function. I guess it takes what the previous neurons told us (their activations), weighted by the weights and squeezed into the right shape by the sigmoid function.

    And this is where my understanding deviates from the code – which I have to admit works perfectly well for me *as* *is*.

    If we differentiate the transferFunction( m_outputVal ) it becomes the transferFunctionDerivative( m_outputVal ) * m_outputVal’ according to the chain rule. But the code does not seem to need the derivative of the inner function m_output.

    How comes?

    Thanks for your time and dedication to the net community.

    Thomas

  88. Hi , i want to use this for a simple snake game , There is no problem executing the Code but it does not guess the snake directions and does not produce necessary outputs . What am i doing wrong what do you suggest? There is 24 inputs and 4 outputs indicating directions..

  89. hey i am getting an error in the program while running it –

    vector subscript out of range

    please help me with this error

  90. Dear Dave

    Excellent tutorial! Tremendously helped me in understanding neural network! Thank you!

    I implemented the solution like you did in this example/video, but I am not sure why I always get the same Output: 6.95326e-310 at each training.

    I can see the Net recent average error is changing at each training, but why the Outputs always stuck at the same value? I checked your code and mine is exactly the “same”, but I ran your code wouldn’t happen the same output as I did….I really don’t know where I am got wrong.

  91. Jelle Bleeker Avatar
    Jelle Bleeker

    So i split your neural network up in multiple header and cpp files. everything compiles as it should. but when I try to train it it keep outputting -nan(ind) on the feedforward output and the getrecentaverage error.

    any idea what could cause this. i have gone over my changes for at least 7 times now. everything should be the same except for the missing traindata class and the splitted files