- to get an extensive overwiew of the tools used, reference: https://github.com/Farooq-azam-khan/sklearn-datascience.git
- more complex data science libraries were not used in some scenarios for the following reasons:
-
- Provide a first principles approach
-
- provide a case for why such libraries are useful when dealing with abstractions
-
- a pythonic / functional approach is used. Some notes / examples of these topics can be found at
./PythonConcepts.md
Follow these instruction so that you can use this repository:
- Launch a python virtual environment by typing
pipenv shell
.- If you do not have virtualenv use:
pipenv install
.
- If you do not have virtualenv use:
- Once it is setup activate the virtual environment by typing:
pipenv shell
- This way the python modules installed on your computer will not affect your moduels in this virtual environment nor will they be affected.
- Download the zip file and extract the
src\
file and therequirements.txt
file in to the virtuelenv directory. - Install the required python packages from requirements.txt. This following code will install all of the requirements:
pip3 install -r requirements.txt
.- To check that you have done the above steps correctly just type
pip freeze
to see all the packages that are installed on your virtual environment.
- To check that you have done the above steps correctly just type
- Now, you can run any python file in this repository, just type
python [file_name].py
in that directory.
- To learn about PLA look at the following files:
preceptron.py
,linear_function.py
,boolean_function.py
, andplanar_equation.py
. - In the
preceptron.py
file, thePerceptron()
class contains the perceptron learning algorithm. This is an extremely useful algorithm to understand because neural networks and deep neural networks build on this simple algorithm.- The algorithm itself has two main parts to it, predicting results based on input and training based on desired outcome and actual outcome.
- The reason PLA is simple and not useful in modern day research is because it can only predict linearly separable data (i.e. it can only separate things with a line and can be proven with linear algebra).
- The
linear_function.py
file contains a graphical understanding of how PLA does linear separation of 2d inputs. - The
Point()
class generates random points with a label. Thelabel=1
if it is above the actual line andlabel=-1
if it is below. PLA will try to approximate this line as best as possible by putting a point to either side of a line. Think of it as organizing a bowl of dimes and nickels. The dimes will go in one basket and the nickels in another. - There are four outcomes when predicting a set of inputs. It could be false positive, false negative, true positive, and true negative. These can be seen in the legend of the graph.
- The
boolean_function.py
file contains examples of PLA successes as well as its failure, i.e. the XOR problem. Look atneural_network.py
for improvement to PLA and a solution to thexor
problem.
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- In the directory
Neural_Network
you will find 2 files:matrix.py
andneural_network.py
. - the
matrix.py
contains matrix operations (which you can look at if you are interested but it is not necessary for an intuitive understanding). neural_network.py
contains two important functions, thefeed_forward(inputs)
and thetrain(inputs, targets)
methods. Both expect arrays as parameters. Note that this is very similar to the Perceptron Learning Algorithm; however, the complexity to the algorithm comes from the linear algebra, and the calculus involved with it. This is because we are storing out weights and biases in a matrix and in some cases trying to get the derivative/gradient of that matrix. The necessary linear algebra comes from thematrix.py
file.- An interesting feature in this class is the
map(func)
method. If you are coming fromJava/C/C++/JS
it should be noted that, inPython
, you can pass in functions to another function, i.e. a function can be treated as a parameter. For example, iffunc(x) = 2*x
thenmap(func)
is allowed and will bemap(2*x)
. - On the subject of python, there are no such thing as
array
orArrayList
. This is justlist
which behaves like anArrayList
. - You do not have to worry about double or single quotation you can use either as long as you are consistent.
- Common syntax for writing a
list
is by using afor each loop
, which in python is the defaultfor loop
.
- An interesting feature in this class is the
- with the
neural_network.py
file as you can see thexor
problem, although simple to us cannot be solved by the PLA but it is very easy for the NN (after 1000 iterations of training).
- Before using Tensorflow we must try and understand what a tensor is. We have a strong understating of scalars, and vectors. Matrices are intuitive, but tensors can be a bit tricky. As we know scalars are just the set of real numbers, vectors provide magnitude and direction. Note also that scalars and vectors have different rules for multiplying, adding, etc. A Matrix is a collection of vectors or just a table of rows and columns. With tensors, we go a step further which gives us a higher order generalization.
- Tensors are an array of matrices. For example, if we had two sets of m by n matrices we can store them in an array object and we would have a tensor. Theoretically, it is possible to do operations on tensors, but it is exponentially harder to implement on a computer. Large matrix operations are very expensive and one could imagine how expensive tensor operations could get.
- The beauty of Tensorflow is that it is heavily optimize. It takes care of all the memory management involved in doing tensor operations, hence, the "flow" in "Tensorflow".
- In the director
Tensorflow
we have the following files:NN_tf.py
,iris_tf.py
. - The
NN_tf.py
file contains an implementation of a Deep Neural Network with Tensorflow.- A Deep Neural Network is essentially a Neural Network with many hidden layers.
NN_tf.py
trains a model on the mnist dataset which has 784 inputs, 3 hidden layers, and 10 outputs. The mnist dataset contains hand written digits. With out model we are trying to predict which digits it is hence the 10 outputs and the 784 is each pixel in the image. The 3 hidden layers is arbitrary and is calculated through experimentation.
- The
iris_tf.py
is another application of the tensorflow library. The dataset operated on here is another famous dataset, the iris dataset. It has 4 inputs and 3 outputs. The 4 inputs include the sepal width, sepal length, pedal length, and pedal width. The model tries to predict the type of iris flower based on the inputs (setosa, virginica, or versicolor).
- to be implemented