I would suggest, take any simple architecture say ResNet-18 replace ReLU with Tanhx or sigmoid and compare things the training performance, losses, etc. you will get the gist of them.
As Jeremy once said and Sanyam said a while ago, the more u experiment with code and get your hands dirty the more stronger your intuitions will become.
Hi!! I was working on the SGD optimizer and noticed that it gives the loss=Nan when I use t_u(not normalized), the authors use t_un=0.1*t_u. Why does this happen?
I was working on the validation set loss and found the loss to decrease, then increase, and then stabilize. It seems something is wrong but I know what to look for. Any advice/help is appreciated
I had read the chapter 5 mechanics of learning earlier but i recently read it again and its basically -simple high school stuff of differetiating functions and having slopes or here they say gradients ,
But the whole point is that the developer have to come from far high level dealing complex problems to basic statistics and make the reader realize the loss funtion and then differentiating it wrt weights and biases then optimizing it all with code as good as pen and paper , it makes us realize that as a user of torch. nn or even sklearn ,how close we get to the truth but yet we are too far with our implementations …
The chapter starts with simple " mx + c " and beautifully fits and optimizes everything in our world and you can only realize the calmness if you truly try to forget everything that you have learnt about ML or DL… The only way to enjoy this chapter is to know x^2 derivative wrt x is 2x and nothing more!