Deeper perspective of Learning

Friday, 11 July 2025

Objective of this blog

This blog is a summary of all I learn on Machine Learning, Computer Vision and Software programming in general. It is basically and effort by me to accumulate (and in some way link) everything I read/ understand about these topics while bringing them together in one space. I will cite all sources of information along with links to find them.

Disclaimer: Everything on this blog is MY understanding and notes on topics and issues. They could be wrong so please refrain from relying too much on them for important projects.

Friday, 5 August 2016

C++ basics and hacks

1. Volatile keyword is useful when a value can change by some external code. A volatile variable is never optimized, this is very important and useful. So, it is very much used in parallel where we write a global variable which can be modified by different threads.

2. Concept of Inheritance:

Creating an abstract class from which other classes derive. For example we create an animal class, and classes dog and cat derive from this class. Here, animal class is called the base class and dog, cat are derived classes.
Base class has more general functions and variables and every derived class has more specific features.
Private variables are not accessible outside a class or to any derived classes of this class.
Protected variables are not accessible outside a class but the derived classes inherit these variables and functions as a part of their own class.
Public variables are accessible outside the class also.
A very important thing to note is that only variables are shared among different objects of derived classes of same base class and not the values in these variables. The same applies to functions.

3. Virtual Class:

If you want a derived class to have a different implementation of a function mentioned in base class, you use virtual to declare it in the base class initially.
For example:
Another use is when u don't want to define a function in the base class but have its implementation in some derived classes.
For example:

4. Template function/class

5. Polymorphism?

strings can be accessed by position using str.at(pos), and not getat(pos)
a\*c\*d will be read as a*c*d by C++, to make \ accountable ur input should be a\\*c\\*d which will be read as a\*c\*d
consequently u check for '\\' in single quotes to check if \\ (or effectively \) exists in a string.
Problem: http://www.cprogramming.com/challenges/regexp.html

6. Static Variables:

Static memory is allocated at compile time before the program is executed, unlike dynamic memory allocation when memory is allocated as required at run time.
Static global vs global: static variable has an internal linkage, the compiler does not know it exists. Two files can have same named static variable without collision but they will not be accessible in each others files. A normal global variable has an external linkage. You can refer to a global variable in different file if it is declared using extern. But, having 2 global variables of same name in different files will cause collision as compiler will get confused,

In this example, there will be collision for variable a, but if they were both static, they will be limited to their own files with no collision.
Static local's scope is the function in which it is defines, static global's scope is the file in which it is defined. Global variable can be used in an entire project.

Sources:
1. https://en.wikipedia.org/wiki/Static_variable

Tuesday, 12 July 2016

Residual Networks (ResNet)

ResNet
As the number of layers in a plain net increases, the training accuracy decreases. This is attributed to the degradation problem. This might be caused due to exponentially low convergence rates of deeper networks (I think).

Adding a skip step allows preconditioning the problem. Having an identity mapping (I think it means just carrying over the input x without any modification) adds no extra parameter, and aids this preconditioning. In back propagation, the gradient just distributes equally among all its children so the layers between the skip step learn to produce H(x)-x effectively and gradient flows back via skip step to input step. I think this causes a faster convergence. Another way to think, the net is effectively translating the identity and the layers in between are just computing a delta over this x.

H(x) = F(x) + x;
F(x) is a small delta over x that layers are learning to set up.

The difference between plain net and resnet is that the layers are now learning for parameters so as to produce H(x)-x rather than H(x) for the same desired output. It has been experimentally proven that a deeper resnet converges successfully and has lower training error rate compared to its counter plain net. As for a shallow network, the accuracy of resnet is similar to plain net although it converges faster.

-uses batch normalization after every conv and before relu activation.
-does not use drop out because BN paper says u dont need drop out when using BN
-learning rate 0.1, divided by 10 when the validation error plateaus.

Summary:
- Using plain nets like VGGNet, AlexNet, etc till now.
- It is observed that increasing number of layers in these plainNets increases training error as opposed to obvious intuition.
- ResNet decreases the error consistently.
- Even though it has 152 layers, its faster than VGGNet
- The good thing about this network is in back prop the gradient just distributes evenly into its children, so it uniformly flows back in the skip step and the layers in between learn to adjust accordingly.
- This is very similar to RNN/LSTM where we are using past knowledge to affect decisions made in the future.

Extra Notes:

Sources and References:
1. http://arxiv.org/pdf/1512.03385v1.pdf
2. https://www.youtube.com/watch?v=LxfUGhug-iQ&index=7&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC
3. https://www.youtube.com/watch?v=1PGLj-uKT1w

Left to cover:
1. code analysis
2. Math of forward and back prop
3. bottleneck architecture
4. transfer from lesser to high dimension on skip step.

Monday, 11 July 2016

Stack as a Template class

Following is the implementation of a stack as a template class.
The most interesting thing to note is the default return of pop in case the stack is empty. It can be written as T(). This is calling the default constructor of the datatype assigned to T. I checked and the default value for int, double, float is 0 and for string is an empty string. This solves so many other problems also :)

Output:

Hash Map vs STL Map

A hash map or unordered_map stores values in terms of hash. It converts the key into a hash value and stores the value linked to this hash value. Whenever we look for a particular key, it converts this search key into its hash value and then checks if the hash value is present in the hash map. If it is, it returns the corresponding value tied to this key/hash value else none. Therefore, it takes O(1) in insertion and finding a key.

A map stores data in ordered (sorted form). It overwrites the new value of a key if it already exists. This is good for smaller operations and sorted data if needed. It is in the form of a balanced binary tree and takes O(logn) to search and insert.

Hash map stored multiple value to same key in the form of a linked list. Collisions also occur sometimes when the hash function return the same hash value for 2 different keys. It is therefore a good idea to store both key and value in the position pointed to by the hash value. The linked list element will hold both key and value. On becoming huge it resizes it accordingly.. Although this leads to collision very frequently. A good hash function can avoid/reduce collisions.

However, Hash map is still good for huge amounts of data. This is because it is faster to search for a key in hash map for large data than it is in a map where it traverses through all of it in order to search.

Problem: Return the mapping of objects from one list to another

I was asked this question in a Google Interview. I did not know templates then, which is fair because it was not my primary programming language then so the programmer described it to me and I used it fine then. Now, as I am learning about templates, maps and vectors I am attempting to redo the question.

Problem: Imagine there are 2 arrays/lists. Return the relative position mapping of them.
list1 = {"apple", "banana", "orange", "honey"}
list2 = {"banana","apple", "honey", "orange"}

Answer: {1,0,3,2}

Follow-up1: What if this list had repeated items?
Follow-up2: What if the data type of this list was unknown?

1. It is easier to convert the array into a vector because vectors have more inbuilt functions like size.
2. You can not directly send an array to a function and even compute the sizeof operation on the array pointer. For instance if I did:

It prints length 0. Unlike C where sizeof works perfectly in functions also when you pass the pointer to the array.
3. Important to check if the object is found in the map.
4. Using an alternative to iterator over list1 because we need the index as the value in key-value pair of the map.

Follow-up 1:
If I had 2 same objects, the key-value pair will over write the value to the highest index value or the last value that was added. This is because all keys in a map are unique, we can't have 2 keys of same name. Therefore, I converted the value from being a single integer to being a vector of integers which stores all the value occurrences of a particular key.

The interesting thing to note here is it->second is now a pointer to a vector of ints which stores all the index occurrences of a key. After using one index of "orange", we delete that value from this vector of values, so that next time a unique index is returned. Also, it is important to check (line 25) that the vector of values is not empty, else when an object is called for more than it appears in list 1, it will cause segmentation fault. With and without the second condition and the following example:

Without condition:

2 0 1 3 3

segmentation fault

With condition:

2 0 1 3

not found

Follow-up 2:

I used template function to change the function myMap so that it can work with any data type of input including string, int, double, etc..

Output:

20134

24310

1. while defining a iterator to map or vector which has an argument of type T, you need to mention typename in front of it.

Recurrent Neural Networks(RNN) and Long- Short Term Memory (LSTM)

Recurrent Neural Network (RNN)

In CNN, we applied the same filter in space. That is we applied the same filter over different areas of the image. We do something similar in RNN. If we know that the sequence of events are stationary over time, we can apply the same W, at every time stamp.

If we are training a network from a sequence of data, it is a good idea to have a summary of the past as an input to the classifier. To have full summary at every step will cause a very complicated model especially for very deep networks. Therefore, we have a single node per step connecting to the previous step and providing information from immediate past.

Theoretically, this should give us a good, stable model but this is not good for SGD. This is because, when we back propagate in time, all the gradients are looking to modify the same W. This causes a lot of correlation problems. This leads to either exploding or vanishing gradient.

Exploding gradient can be handled by normalizing (penalizing) the W when the change in W becomes almost 0. But for vanishing gradient, the classifier starts forgetting the older stuff and only remembers the new one.

Long Short Term Memory

Here is where LSTM comes into action. We replace the W in the vanilla network with a memory cell. We are trying to help RNNs remember better. A memory cell must do 3 things:

So, we replace W with the following memory cell. At each cell (node of neural net), we decide whether we are going to write, read or forget for this cell. So kind of, what is the probabilty of reading, writing, and forgetting at each cell is the parameter we are looking to tune this time.

So we can have these gates as 0 or 1 to indicate gate open or close. But we use a continuous function on these gates, so that it is differentiable so we can take derivative and do back prop on it.

All the little gates help keep the model remember longer when it needs to and ignore when in should not. So, now we are tuning all these Ws.

LSTM Regularization:
L2 can always be used. Dropout works fine if we use it on input and output and not on future and past gates.

Sources:

1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
2.