In this post, I will try to share my observations and best practices to publish a python package. The four simple steps mentioned below will help you to implement a systematic approach and make your publishing work easy.

Yes. This also a step and the first one to be done. In this step you need to check some plenty of points and also some post development points such as,
Adding test scripts will make the package more proper and will help you to find one or more cases to be covered. Also, python package : pytest offers test facility to check your package with the test cases. Whenever you add new facility or made changes, try to add the respective test case.
PyTest
I suggest to read the post from reference : How to Publish an Open-Source Python Package to PyPI. However here the summarized points as steps are added,
python setup.py sdist bdist_wheel
twine check dist/*
twine upload dist/*
In this step, The usages of GitHub and Travis CI in publishing python packages are explained.
I have added these both facilities in my github repo patch_antenna. Let us take this repo as an example for the further discussion.
Integrating Travis CI to your github repo is done by adding a .travis.yml file in the same repo. I have added the steps what are all need to be done to test my package in that yaml file like shown below.
.travis.yml
sudo: false
language: python
python:
- "3.7"
before_install:
- pip3 install scipy
- python3 setup.py install
script: pytest
notifications:
email: false
Once this added to your github repo, you need to enable travis ci in github authorized Applications and also configure which repo need to be used by travis ci in https://travis-ci.com/.
GitHub offers workflows facility to do various of our requirements. I am using this facility to publish my python package to PyPI on release based mode like we did it with travis ci.
To avail the facility, a yaml file .github/workflows/python-publish.yml created in the same repo.
python-publish.yml
name: Upload Python Package
on:
release:
types: [created]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
The user name and password which asked by twine can be added as secrets in the same repository settings > secrets like shown below image.
Screenshot

Once you done all the above points, follow these steps as best practices to publish you package.
The goal of the post is to share my experience with this topic called One Shot Learning which is normally used while we have a small training data set for face recognition. After testing with various codes shared in github and posts shown in references, I made this post to show my observations and collected results.
One shot learning :
It is commonly a classification / categorization / similarity identification technique while having small training data set for computer vision tasks such as object detection, face recognition and hand writing recognition. Normally computer vision models have very large deep neural networks which are all not easy to train and requires more resources and training data. But real time problems doesn’t have that much data to train those large models. One shot learning is the recommended solution for these kind of specific problems.

Sample structure - One Shot Learning - source : reference_1
Here Two Convolution Networks pre-trained models are followed by a custom layer with sigmoid activated final layer used to learn the similarity between the two image inputs. The work flow would be like,
The pre-trained deep learning neural model Keras-VGG-Face-ResNet-50 is used again for training to learn our custom data faces. The reason for choosing ResNet50 was discussed in the evaluation of Face Authentication. Custom Final layer followed by sigmoid activation function was implemented on tensor layers for calculating the euclidean distance.

The results of Siamese Network test accuracy scores and real time scores are not up to the expectation as discussed in theories. I have experienced these scenarios for various data and also in varying metrics size of the data (but low for single class), number of epochs learned. Until increasing the size of training data for single class, can not found any enhancement in test set accuracy score and real time accuracy.
The results are shown below,
![]() |
![]() |
![]() |
![]() |
After increasing the epochs size, The model seems well with cross validation test data. But when this siamese model trained and loaded in real time test, It may even get 0 % accuracy.
$ Loaded model accuracy : 0 %
The point is Siamese network for face authentication with the discussed One shot learning technique is not reliable in my observations or may be i am wrong with implementation (If yes please correct me). As said in theories, the siamese network with transfer learned deep learning neural network can’t learn from lowest data (4-5 images per class. Even they mentioned 1 image per class idk how ?) even highly performing transfer learned model loaded.
One shot learning with siamese network may be work well with simple convolutional neural networks having few layers only. These kind of architecture only fit for the Similarity Detection based tasks such as hand write recognition, shapes similarity level calculation and etc.
If we increase the size of the convolutional network the learning phase would requires more system resources and consumes large time. So continuous / online learning is a difficult one for these kind of situations. Please correct me if any thing wrong.
Thanks to the sources, - One shot learning - siamese - Keras VGG-Face - ONe shot learning - Machine learning mastery - One shot learning - wiki
]]>Getting HttpsURLConnection successfully using the SSL Certificates is the ultimate aim. After a long struggle i came up with this post which is very useful for those who are all looking for this and may have all the files or some of the files which are listed below.
These are Certificate file formats often confuses the person who is novice. This post explains the basic and easy way to get connected HTTPS url with small work around.
Step 1 : Create a p12 or pfx from .key and .crt using the command, So the created file SomeThing.pfx normally contains the all three files inside it in binary form.
Step 2 : Create a KeyStore in the instance of JKS for the TrustManager and load the Certificate SomeThing-CA.crt in to the true store.
Step 3 : Create a KeyStore in the instance of PKCS12 for the KeyManager and load the Certificate SomeThing.pfx in to the key store.
Step 4 : Load these Two Managers (TrustManager, KeyManager) to the SSLContext and the SocketFactory for the HttpsURLConnection
Step 5 : Now We can use the HttpsURLConnection and set the SSLFactory from the certificate generated.
These steps are now implemented in normal Java / Groovy code to get the SSLSocketFactory.
Step 1 : openssl command to export as pfx file
openssl pkcs12 -export -in SomeThing.crt -inkey SomeThing.key -out SomeThing.pfx -certfile SomeThing-CA.crt
Step 2 : Getting Trust Manager
String caCertPath = "/path/to/SomeThing-CA.crt";
String certPath = "/path/to/SomeThing.crt";
String passWord = "changeit";
private Certificate getCertificate (String path) throws Exception
{
CertificateFactory cf = CertificateFactory.getInstance("X.509");
InputStream caInput = new FileInputStream(new File(path));
Certificate c = cf.generateCertificate(caInput);
caInput.close();
return c;
}
private TrustManager [] getTrustManagers() throws Exception
{
KeyStore tKeyStore = KeyStore.getInstance("JKS")
tKeyStore.load (null, null);
tKeyStore.setCertificateEntry ("CA-Cert", getCertificate (caCertPath));
TrustManagerFactory tmf = TrustManagerFactory.getInstance (TrustManagerFactory.getDefaultAlgorithm());
tmf.init (tKeyStore);
return tmf.getTrustManagers();
}
Step 3 : Getting Key Manager
private KeyManager [] getKeyManagers() throws Exception
{
KeyStore keyStore = getPKCSKeyStore();
keyStore.load (new FileInputStream(new File(certPath)), passWord.toCharArray());
KeyManagerFactory kmf = KeyManagerFactory.getInstance (KeyManagerFactory.getDefaultAlgorithm());
kmf.init (keyStore, passWord.toCharArray());
return kmf.getKeyManagers();
}
Step 4 : Getting SocketFactory
SSLSocketFactory getSocketFactory(){
SSLContext sslContext = SSLContext.getInstance("TLS");
sslContext.init(getKeyManagers(), getTrustManagers(), new SecureRandom());
return sslContext.getSocketFactory();
}
Step 5 : Creating HttpsURLConnection
HttpsURLConnection connection = (HttpsURLConnection) new URL(u).openConnection()
connection.setSSLSocketFactory getSocketFactory()
Formats of certificate files shown below,
.crt .cer .der .p12 .jks .pfx .pem .key
In these formats, .crt and .key are the primary formats from which other all can be generated using openssl command.

The types of managers used are,
Those files which are all added in KeyStore, the related CA-Certificate need to be added in Trust Store. Generally $JAVA_HOME/jre/lib/security/cacerts is the basic jre’s trust store. you can add a certificate using command line like shown below,
keytool -importcert -file SomeThing.crt -keystore $JAVA_HOME/jre/lib/security/cacerts -storepass changeit
If you skip this or didn’t added the CA-Certificate inside the trust store, you will get the exception as,
sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
In another method, By setting the values in System Property can solve the issue but Not Recommended.
The related System properties need to be assigned are,
System.setProperty("javax.net.debug", "ssl")
System.setProperty("javax.net.ssl.trustStore", "$JAVA_HOME/jre/lib/security/cacerts")
System.setProperty("javax.net.ssl.trustStorePassword", "changeit") // commonly used password
System.setProperty("javax.net.ssl.trustStoreType", "JKS")
System.setProperty("javax.net.ssl.keyStoreType", "PKCS12")
System.setProperty("javax.net.ssl.keyStore", "/path/to/SomeThing.pfx")
System.setProperty("javax.net.ssl.keyStorePassword", "password_created_for_pfx")
These init process need be done before doing openConnection in java code.
These connection can be implemented with normal curl command like shown below,
curl -k https://url/path --cacert /path/to/SomeThing-CA.crt --cert /path/to/SomeThing.crt --key /path/to/SomeThing.key
or even without using trust store certificate, we can connect by only .crt and .key files in curl
curl -k https://url/path --cert /path/to/SomeThing.crt --key /path/to/SomeThing.key
Thanks to the references… - Certificate Authority - java-client-certificates-over-https-ssl - SSL Converter
]]>This post explains about the steps involved in containerizing an application using docker tool. Also tried to disclose an overview on other areas related to this topic.
A Container is a running instance of a docker image. Containers are an abstraction at application layer that packages the codes and dependencies together, ships the application and a run-time environment.
Container are running on the docker engine almost as VM but not exactly. It has lot of merits in production basis and currently it is booming in IT-Industry now. For example containerizing an application with docker or kubernetes provides enormous facilities.
In simple, we can say that, we are going to releasing our application as an image which will run as a bounded isolated OS
instead of releasing it without the environments and dependencies. It will fully reduce the environment and dependency problems which we often facing.
Totally we are just going to follow these three steps in overall. The steps are,
| File system | Details | |
|---|---|---|
| Build | Dockerfile | Packaging the application with required dependencies and custom files |
| Ship | Docker image | Releasing it as an image file globally using docker registry |
| Run | Container | Run containers using the image which will act as your application along with an isolated VM like environment |
For very basic level, To understand the containers, I am just going to containerize a simple file printer shown below,
filename = "log.txt"
myfile = open(filename, 'w')
myfile.write( "Hi Here we are ..!" + '\n')
myfile.close()
This code just create and save the content in a file named log.txt. Lets containerise.. see what is happening … !
Dockerfile is also like a Makefile, which is used to build the docker image. This file normally contains three type of instruction sets,
For our scenario, The dockerfile would be like this,
FROM python:2
COPY test.py /usr/
WORKDIR /usr/
RUN python test.py
CMD bash
Description :
Go to the build directory and make sure you have Dockerfile in your current directory. To build the docker image use the command,
docker build -t testimage .
Sending build context to Docker daemon 3.072kB Step 1/5 : FROM python:2 —> 92c086fc9702 Step 2/5 : COPY test.py /usr/ —> Using cache —> 1b9aa6ce04cd Step 3/5 : WORKDIR /usr/ —> Using cache —> 6dcef3ef8785 Step 4/5 : RUN python test.py —> Using cache —> 6edf4708cf6c Step 5/5 : CMD bash —> Using cache —> 21303e2891d6 Successfully built 21303e2891d6 Successfully tagged testimage:latest
This command creates an image with the name and version you provided. When you build,
You can check the created image by listing available images using the command,
docker image ls
or you can also specify the command
docker image ls testimage
REPOSITORY TAG IMAGE ID CREATED SIZE testimage latest 21303e2891d6 3 days ago 914MB
Method 1 :
We can containerize this image using, two steps.
docker container create --name testcontainer testimage
docker container start testcontainer
Method 2 :
Or it can be done in single step by,
docker run -itd --name testcontainer testimage
The arguments itd means interactive, tty and detach. See the help for more details. we can check the container status using the command
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c706bd3fe268 testimage “/bin/sh -c bash” 5 seconds ago Up 3 seconds testcontainer
we can enter inside the container using exec facility with bash command like shown below,
docker exec -it testcontainer bash
Inside the container environment,
root@c706bd3fe268:/usr# ls bin games include lib local log.txt sbin share src test.py
Here we can see the file log.txt which was generated while this container started.
I always suggest to see this official documentation for commands. But here i have shared often some using commands and also felt useful for quick view :).
Some Basic commands,
For Images,
For containers,
Some Important facilities such as,
Docker has many more awesome facilities such as,
Docker compose
Composing or managing multiple containers by a single configuration file docker-compose.yaml.
Docker swarm
Swarm mode is for managing the docker clusters using manager and worker scenario.
Docker Service
It is used in swarm mode for deploying the application as a service with some facilities such as rollback, scale and update.
Docker Stack
Docker stack is also in swarm mode for managing collection of multiple services.
These facilities used for High availability, distributed processing, scaling and zero down time deployment (almost).
Also I have created a single page guide which helps you to quickly act with docker. It consists simple 4 steps guide you to quickly done the dockerization.
Here i have shared the slides of docker - overview which explains the concepts behind this technology. Have a look at this to understand quickly and move on … :)
Update 1 : 11/01/2024 Posts section updated.
This post tried to cover the area of rarely shared and highly demanded methods in containerisation with docker and
kubernetes. you can enhance this post by contributing in Docker-tutorial GitHub repo.
I have added these kind of stuffs under ## Quick Details in every readme.md of all chapters.
Fundamental Instructions
Configuration Instructions
chown manually regardless of your USER / WORKDIR setting, as same as ADD.Execution Instructions
docker stop containerName docker commit tempImageName docker run -p 8080:8080 –name -td tempImageName
Reference : -An answer in Stackoverflow by Fujimoto Youichi
Follow these simple steps to connect and disconnect multiple containers with a bridge network.
docker network create --driver bridge testbridge
docker network connect testbridge containerOne
docker network connect testbridge containerTwo
docker network inspect testbridge
“Containers”: { “2475796b7bb161fafd661eb9e1f23233104ca57915dd88a3fc33aa6dc9d73700”: { “Name”: “containerOne”, “EndpointID”: “45319bd6ce083bf7e7d3015750e35f7644d4a2d3e5db8c27153c613958ab43d2”, “MacAddress”: “02:42:ac:1e:00:03”, “IPv4Address”: “172.30.0.3/16”, “IPv6Address”: “” }, “ae7ca4ac1e4aaa2bab4d53e24f76afa1f83de620d1ce7d244e03cb8707a8448b”: { “Name”: “containerTwo”, “EndpointID”: “166b2c5eea217c9baeeb906c5d83b04d1c1bab93e46ab01bbf6e94fc21c47c81”, “MacAddress”: “02:42:ac:1e:00:02”, “IPv4Address”: “172.30.0.2/16”, “IPv6Address”: “” } }, ….
You can check network by verifying containers from the generated json file.
Follow this two reliable steps to share the image as .tar compressed file format.
docker save –output imageName.tar imageName docker load –input imageName.tar
Dangling images are created while creating new build of a image without renaming / updating the version. So that the old image converted into dangling images like shown below.
$ docker images -f dangling=true
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 3f4ae2ddf543 4 days ago 1.37GB
<none> <none> b4c8cecab3bc 8 days ago 655MB
$ docker rmi $(docker images -f dangling=true -q)
Reference : - What is Dangling images by SO
It can be done by configuring the registry but i found it helpful. Initially go to the directory of the build path and make sure you have the Dockerfile in your directory.
-currentdirectory
|--- Dockerfile
|--- Other-Project-Files
minikube start
eval $(minikube docker-env)
docker build -t imageName:version .
kubectl run hello-foo –image=foo:0.0.1 –image-pull-policy=Never // Or it can be set inside the yaml config file like shown below -image: imageName:latest name: imageName imagePullPolicy: Never
References -Kubernetes official doc -How to use local docker images with Minikube? - stackoverflow

The need of ReplicaSet is,
Running Pod alone is dangerous because,
Deployment having the merits,
This is very simple and tutorial post for doing Machine Learning in Groovy. This post covers the clustering algorithms such as,
These algorithms differs in their motivation and working methodology. The simple and best summary of these algorithms discussed in Math3 Java Documentation.
View the Document here.
The source code also shared in GitHub, you can find it in above shared link. Here Again I am sharing with some explanation,
class ClusterWork
{
List<DoublePoint> points = new ArrayList<DoublePoint>()
Map<DoublePoint, List<String>> pointMap = [:]
ClusterWork(Map table)
{
table.each{ k,v ->
DoublePoint dArr = new DoublePoint(v)
points.add(dArr)
if (!(dArr in pointMap.keySet()))
pointMap[dArr] = []
pointMap[dArr].add(k)
}
}
List<ClusterDetail> dbscan (double d, int i)
{
DBSCANClusterer DBScan = new DBSCANClusterer(d, i)
collectDetails DBScan.cluster(this.points)
}
List<ClusterDetail> fuzzykmean (int k, double fuzziness)
{
FuzzyKMeansClusterer fKMean = new FuzzyKMeansClusterer(k, fuzziness)
collectDetails fKMean.cluster(this.points)
}
List<ClusterDetail> multiplekmean (int k, int trials)
{
MultiKMeansPlusPlusClusterer mkppc = new MultiKMeansPlusPlusClusterer(new KMeansPlusPlusClusterer(k), trials)
collectDetails mkppc.cluster(this.points)
}
List<ClusterDetail> kmean (int k)
{
KMeansPlusPlusClusterer kMean = new KMeansPlusPlusClusterer(k)
collectDetails kMean.cluster(this.points)
}
private List<ClusterDetail> collectDetails(def clusters)
{
List<ClusterDetail> ret = []
clusters.eachWithIndex{ c, ci ->
c.getPoints().each { pnt ->
DoublePoint pt = pnt as DoublePoint
ret.add new ClusterDetail (ci + 1 as Integer, pt, this.pointMap[pt])
}
}
ret
}
}
class ClusterDetail
{
int cluster
DoublePoint point
List<String> labels
ClusterDetail(int no, DoublePoint pt, List<String> labs)
{
cluster = no; point= pt; labels = labs
}
}
Running algorithm has multiple steps like showing below,
Step 1 :
@Grab('com.xlson.groovycsv:groovycsv:1.1')
@Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
import org.apache.commons.math3.ml.clustering.DBSCANClusterer
import org.apache.commons.math3.ml.clustering.DoublePoint
import org.apache.commons.math3.ml.clustering.FuzzyKMeansClusterer
import org.apache.commons.math3.ml.clustering.KMeansPlusPlusClusterer
import org.apache.commons.math3.ml.clustering.MultiKMeansPlusPlusClusterer
import static com.xlson.groovycsv.CsvParser.parseCsv
// All imported
Step 2 :
//Read the csv input data
df = new FileReader('data.csv')
Step 3 :
Map<String, double[]> dfMap = [:]
for (line in parseCsv (df))
{
double [] point= [line.temp.toDouble(), line.humidity.toDouble()]
dfMap[line.city] = point
}
// Map dfMap formed.
Step 4 :
// Construct the cluster work using our Map
ClusterWork clusterWork = new ClusterWork (dfMap)
// Simple print closure implementation
def showClosure = {detail ->
println "Cluster : " + detail.cluster + " Point : " + detail.point + " Label : "+ detail.labels
}
Step 5 :
// Running All algorithms accordingly
println 'DBSCAN'
clusterWork.dbscan(6, 0).each(showClosure)
println '-----------'
println 'Kmean'
clusterWork.kmean( 5).each(showClosure)
println '-----------'
println 'FuzzyKMean'
clusterWork.fuzzykmean(5, 300).each(showClosure)
println '-----------'
println 'MultipleKMean'
clusterWork.multiplekmean(5, 5).each(showClosure)
println '-----------'
Here I have attached the sample output for DBSCAN algorithm.
DBSCAN Cluster : 1 Point : [284.624954535, 76.0] Label : [Vancouver] Cluster : 1 Point : [282.100480976, 80.0] Label : [Portland] Cluster : 1 Point : [281.78244858, 80.0] Label : [Seattle] Cluster : 1 Point : [286.213142193, 71.0] Label : [Saint Louis] Cluster : 1 Point : [283.994444444, 76.0] Label : [Indianapolis] Cluster : 1 Point : [284.278140131, 75.0] Label : [Detroit] Cluster : 1 Point : [286.276495879, 81.0] Label : [Toronto] Cluster : 1 Point : [290.07866575, 70.0] Label : [Kansas City] Cluster : 1 Point : [287.009165955, 66.0] Label : [Minneapolis] Cluster : 1 Point : [284.300133393, 70.0] Label : [Chicago] Cluster : 1 Point : [285.85044048, 70.0] Label : [Philadelphia] Cluster : 1 Point : [287.277251086, 68.0] Label : [Boston] Cluster : 1 Point : [291.553209206, 81.0] Label : [San Diego] Cluster : 1 Point : [284.59253007, 62.0] Label : [Denver] Cluster : 1 Point : [289.89855969, 86.0] Label : [Dallas] Cluster : 1 Point : [289.446243412, 87.0] Label : [San Francisco] Cluster : 1 Point : [291.857503395, 88.0] Label : [Los Angeles] Cluster : 1 Point : [288.650991196, 87.0] Label : [Charlotte] Cluster : 1 Point : [289.373344722, 92.0] Label : [San Antonio] Cluster : 1 Point : [288.371111111, 92.0] Label : [Houston] Cluster : 1 Point : [285.860929124, 91.0] Label : [Montreal] Cluster : 1 Point : [294.064062959, 94.0] Label : [Atlanta] Cluster : 1 Point : [281.151870096, 93.0] Label : [Pittsburgh] Cluster : 2 Point : [293.381212832, 21.0] Label : [Las Vegas] Cluster : 2 Point : [296.654466164, 23.0] Label : [Phoenix] Cluster : 3 Point : [285.313345004, 49.0] Label : [Albuquerque] Cluster : 4 Point : [287.48791359, 99.0] Label : [Nashville] Cluster : 5 Point : [298.393960613, 87.0] Label : [Jacksonville] Cluster : 5 Point : [299.800641223, 82.0] Label : [Miami] Cluster : 6 Point : [288.406203155, 57.0] Label : [New York] Cluster : 7 Point : [307.145199718, 51.0] Label : [Beersheba] Cluster : 7 Point : [304.4, 51.0] Label : [Haifa, Nahariyya] Cluster : 7 Point : [303.5, 50.0] Label : [Jerusalem] Cluster : 8 Point : [304.238014609, 62.0] Label : [Tel Aviv District] Cluster : 9 Point : [310.327307692, 22.0] Label : [Eilat]
The JFree Chart visualization of all algorithms with cluster details attached accordingly. The JFree chart requires other dependencies such as,
Scatter plot is used to visualize the clusters. The result plots are,
DBSCAN
KMean
Fuzzy KMean
Multiple KMean
Thanks to the sources, 1. Machine Learning with Math3 2. JFree Chart Doc 3. Tutorials Point - JFree chart
]]>It may be a funny question to ask in this technological era. Any way, Let’s start with this question. It is like, computers learn by themselves not using any explicit programming requirements. Here we need to align the process flow to achieve this. Such as like,
Why ? The normal statistical methods such as correlations, etc., can be used to do the work but the key point and the truth is we are letting the machine learning model to do the various statistical analysis and getting the summary or formatted model based outputs.
It can cover the topics like,
In these techniques, Actual data converted into Components which carries the characteristics of the data. It is like, Extracting the most important information from data what actually the data trying to convey to you. Some of the methods are,
Reference : - The Ultimate Guide to 12 Dimensionality Reduction Techniques (with Python codes) by Analytics Vidhya
Accuracy is the well known metric but this not only the important metric. Rather than this some of others are Precision, Recall, F1 Score and ROC.
Precision and Recall are related to the Confusion Matrix values TP, TN, FP and FN. The confusion matrix looks like,
| Positive | Negative | |
|---|---|---|
| Positive | TP | TN |
| Negative | FP | FN |
So There may be a confusion between precision and recall :P. For quickly grab this difference, It can be concluded that, The Importance of predicting the non true observations as true. For example,
It is like saying to a non-infected patient that he (or) she was infected.
So now you can easily grab it for Recall.
It is like saying to a infected patient that he (or) she was not infected.
From above two examples, you can find that both are so important. We need to find the good balance format for these metrics. So what about F1 Score ?
ROC Curve (Receiver Operating Characteristics) and AUC (Area Under ROC Curve)
This curve is constructed from the parameters,
Image source in Ref
Reference : - Machine Learning - Towards data science - ROC and AUC by Google Crash course
Regularization is so important technique as it determines how the learning should be. The word itself means, making the training or learning as a common one and not to converge by the current training data. It is a technique to protect the model from over-fitting. It can be simply explained from below lines,
Well-known-techniques :
Reference : - Regularization post from towards-data-science - Fundamentals of Regularization Techniques by analyticsvidhya - L1 and L2 Regularization - Keras Regularizers - Regularization in Machine Learning - Regularization for simplicity by Google
K Fold is a technique used to evaluate the model like Test-train split . It internally splits the training data into k number of groups (by default 10). Each group are the test data while another all groups are trained. This technique is commonly known as Cross validation. It will generate a log for each group test result. So you can understand how your machine learning model acts with various groups. If your model produces highly various result accuracies, That means the model over-fitted for some of the classes. That’s why it can’t detect some poorly recognized classes. The total object is to find out which model works better for all classes involved in the input data.
Reference : - KFold Doc by scikit - Cross Validation - K-Fold Cross validation
The goal of using epochs is to reduce the Error Rate of the model. In machine learning, It really works to improve the model accuracy by trying to Good fit the model. It can be absorbed when we use history plot. We can use the epochs until it saturated at some point to avoid over-fitting and remove unnecessary run. So, In practical we can’t be sure how much epochs needed for good results. Manual Observation required to validate to find out the epochs value. In deep learning, The sample code would be like,
history = model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, verbose=0)
Reference : - Display Deep Learning Model Training History in Keras
This is because stochastic nature of the model. Every time you run, the model runs with random initialized values. To get the same results from a Machine learning model, we can generate a same random values for every time. Thi is the syntax to set random seed in numpy package.
from numpy.random import seed seed(7)
So you need to be aware of what random package used by your model.
Reference : - Reproducible Results by machine learning mastery
This question was the often arouse and also more complicated. I hope the notes and references below will help you to understand why i said like that.
Reference : - Stack overflow discussion - 5.2 Capacity Over-fitting and Under-fitting - Stack Exchange Question and Answer 1 - Stack Exchange Question and Answer 2
Due to the nature of this post, Always can find updates
]]>Note : Use Browser console logs to see outputs, while script requires.
<head>
<script>
alert("Hello world !");
</script>
</head>
<head>
<script>
var info = "Hi by variable";
info += "\nSo Hello world"
alert(info);
</script>
</head>
Note : Type Conversion
REPL - Read Eval Print Loop
<head>
<script>
var info = "Hi by variable";
info += "\nSo Hello world"
alert(info);
console.log(info);
</script>
</head>
Types :
<head>
<script>
var x;
var p = "Now x is : "
console.log(p + typeof x);
// datatype : number
x = 5;
console.log(p + typeof x);
// datatype : null (special)
x = null;
console.log(p + typeof x);
// datatype : string
x = "Hi by variable";
console.log(p + typeof x);
// Using comments like this line
// checking for Booleans data type
x = false;
var y;
function chekBool(z){
if(z){
console.log(z + ' is True Type.')
}else{
console.log(z + ' is False Type.')
}
}
chekBool(x);
chekBool(y);
y = 'String';
chekBool(y);
</script>
</head>
Note :
| Operators | |||||
|---|---|---|---|---|---|
| Arithmetic | + | - | * | / | % |
| Conditional | < | <= | > | >= | |
| Incre/ Decre | ++ | – |
<head>
<script>
function isEven(z){
if(z%2 == 0){
alert(z + ' is Even.');
}else{
alert(z + ' is Odd.');
}
}
var x = "0";
isEven(x);
</script>
</head>
<head>
<script>
function isEven(z){
switch(z){
case z % 2 == 0: return true;
break;
default: return false;
}
}
var x = "0";
isEven(x);
</script>
</head>
Using break and continue also explained.
<head>
<script>
var i=0;
while (i <= 10)
{
i++;
if (i == 5){break;}
if (i == 3){continue;}
console.log('While Loop Iteration : '+i)
}
for (var j=1; j<=5 ;j++){
console.log('For Loop Iteration : '+j);
}
</script>
</head>
We can use external script file inside the html as for organizing the code and the build healthier.
//script used for IE9 browser.
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
// Local js
<script src="path/to/your/all.js"></script>
# In Browser Console:
var x = new Object();
x.name = "chitti"
x.version = "2.0"
console.log(x);
Output : {name: “2.0”, version: “2.0”}
Normally JSON Object would be like,
{
"root": {
"binaries": {
"0": {
"val": "0"
},
"1": {
"val": 1
}
},
"ternaries": [0, 1, 2]
}
}
Some rules are there .. such as…
Initially it should look like a complex structure. Do some play with json to familiar with it.
Types :
var x = [1,2];
x.unshift(0)
// x = [0,1,2]
x.shift()
// x = [1,2]
x.push(3)
// x = [1,2,3]
x.pop()
// x = [1,2]
x = [1,2,3,4,5,6]
var s = x.slice(3,7)
//s = [4, 5, 6]
x.splice(3)
// x = [4, 5, 6]
var x = [1, 2, 5, 6, 10, 2, 7];
x.sort()
// Output : [1, 10, 2, 2, 5, 6, 7]
function sort_asc(a,b){ return (a-b)}
function sort_desc(a,b){ return (b-a)}
x.sort(sort_asc);
// Output : [1, 2, 2, 5, 6, 7, 10]
x.sort(sort_desc);
// Output : [10, 7, 6, 5, 2, 2, 1]
var x = [1, 2, 5, 6, 10, 2, 7];
x.filter(function(x){return x % 2 == 0})
x.filter(function v(val, id, li){ console.log(val, id, li)})
[2, 6, 10, 2] 1 0 [1, 2, 3, 4] 2 1 [1, 2, 3, 4] 3 2 [1, 2, 3, 4] 4 3 [1, 2, 3, 4]
var x = [1, 2, 5, 6, 10, 2, 7];
x.map(function(x){return x / 2})
[0.5, 1, 2.5, 3, 5, 1, 3.5]
var x = [1, 2, 5, 6, 10, 2, 7];
x.reduce(function(prev, curr){console.log(prev, curr);prev = curr; return prev},0)
0 1 1 2 2 3 3 4 4
function is_even(x){
return x % 2 == 0
}
function a(filter){
var li = [1,2,3,4];
for (var x of li){
if(filter(x)){
console.log(x)
}
}
}
a(is_even)
2 4
var li = [1,2,3,4];
li.forEach(
function (i){ console.log(i)}
)
1 2 3 4
function hypotenuse (a, b){
function sq(x) { return x*x }
return Math.sqrt(sq(a)+sq(b))
}
function stepiter (start, step){
return function (){
var x = start;
start += step;
return x;
}
}
var iter = stepiter (3,6);
iter()
iter()
iter()
3 9 15
function mul_to_arr(){
return Array.prototype.slice.call(arguments)
}
mul_to_arr(1,2,3,4)
[1,2,3,4]
var square = { side : 4, area : function (){ return (this.side * this.side)}, perimeter : function() { return 4 * this.side;}}
square.area()
square.perimeter()
var square = { side : 4, area : (this.side * this.side), perimeter : (4 * this.side)}
square.area
square.perimeter
16 16 NaN NaN
function Square(x){ this.side = x; this.area = function (){ return this.side * this.side};}
var x = new Square(3)
console.log(x.area())
Square.prototype.perimeter = function() { return (4 * this.side)}
console.log(s.perimeter())
9 8
setTimeout(function() {console.log("Delay time 5s over. The line printed.")},5000)
Delay time 5s over. The line printed.
The DOM model represents a document with a logical tree like XML and HTML.
document |__html |__head | |__ . . . |__body |__ . . .
Some methods are :
For example we can change the document by Java script like shown below,
document.getElementById(“testElem”).style.display = “block”;
AJAX - Asynchronous Java Script and XML
Without reloading the web page, we can communicate to the server using AJAX and can update the page. For example, by clicking a button will load a function() and that will do the rest.
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
console.log(this.responseText);
}
};
xhttp.open("GET", "additional.html", true);
xhttp.send()
In real world, We deal with various types of data for example date, currency, stock rate, categories and rank. These are all not same data types and also not easy to associate these all in single line information. There are lot of methods in Data Mining to extract the association or information from the complex data. Some methods are,
In this post, I tried to explain the data mining process on Nominal Data Set.
The technique to extract the interesting information from Nominal data or Categorical data
is Association Rule Mining.
—
import matplotlib.pyplot as plt
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
titanic = pd.read_csv('train.csv')
nominal_cols = ['Embarked','Pclass','Age', 'Survived', 'Sex']
cat_cols = ['Embarked','Pclass','Age', 'Survived', 'Title']
titanic['Title'] = titanic.Name.str.extract('\, ([A-Z][^ ]*\.)',expand=False)
titanic['Title'].fillna('Title_UK', inplace=True)
titanic['Embarked'].fillna('Unknown',inplace=True)
titanic['Age'].fillna(0, inplace=True)
# Replacing Binary with String
rep = {0: "Dead", 1: "Survived"}
titanic.replace({'Survived' : rep}, inplace=True)
## Binning Method to categorize the Continous Variables
def binning(col, cut_points, labels=None):
minval = col.min()
maxval = col.max()
break_points = [minval] + cut_points + [maxval]
if not labels:
labels = range(len(cut_points)+1)
colBin = pd.cut(col,bins=break_points,labels=labels,include_lowest=True)
return colBin
cut_points = [1, 10, 20, 50 ]
labels = ["Unknown", "Child", "Teen", "Adult", "Old"]
titanic['Age'] = binning(titanic['Age'], cut_points, labels)
in_titanic = titanic[nominal_cols]
cat_titanic = titanic[cat_cols]
The data type of the Age column is converted from Number to Categorical using the method Binning. The data Set of the age column is ["Unknown", "Child", "Teen", "Adult", "Old"] and also ensured that all the columns are only have nominal data. The data set is separated into two types. They are,
in_titanic.head()
| Embarked | Pclass | Age | Survived | Sex | |
|---|---|---|---|---|---|
| 0 | S | 3 | Adult | Dead | male |
| 1 | C | 1 | Adult | Survived | female |
| 2 | S | 3 | Adult | Survived | female |
| 3 | S | 1 | Adult | Survived | female |
| 4 | S | 3 | Adult | Dead | male |
cat_titanic.head()
| Embarked | Pclass | Age | Survived | Title | |
|---|---|---|---|---|---|
| 0 | S | 3 | Adult | Dead | Mr. |
| 1 | C | 1 | Adult | Survived | Mrs. |
| 2 | S | 3 | Adult | Survived | Miss. |
| 3 | S | 1 | Adult | Survived | Mrs. |
| 4 | S | 3 | Adult | Dead | Mr. |
for x in ['Embarked', 'Pclass','Age', 'Sex', 'Title']:
sns.set(style="whitegrid")
ax = sns.countplot(y=x, hue="Survived", data=titanic)
plt.ylabel(x)
plt.title('Survival Plot')
plt.show()





Because title is also a keyword which shows the Gender type of a person. Analysing these both fields together will cause for the results with 100% association with both fields.
Putting these two fields together does not make any sense. So that the analysis split into two types.
dataset = []
for i in range(0, in_titanic.shape[0]-1):
dataset.append([str(in_titanic.values[i,j]) for j in range(0, in_titanic.shape[1])])
# dataset = in_titanic.to_xarray()
oht = TransactionEncoder()
oht_ary = oht.fit(dataset).transform(dataset)
df = pd.DataFrame(oht_ary, columns=oht.columns_)
print df.head()
print oht.columns_
[‘1’, ‘2’, ‘3’, ‘Adult’, ‘C’, ‘Child’, ‘Dead’, ‘Old’, ‘Q’, ‘S’, ‘Survived’, ‘Teen’, ‘Unknown’, ‘female’, ‘male’]
output = apriori(df, min_support=0.2, use_colnames=oht.columns_)
print output.head()
idx support itemsets 0 0.242697 (1) 1 0.206742 (2) 2 0.550562 (3) 3 0.528090 (Adult) 4 0.615730 (Dead)
config = [
('antecedent support', 0.7),
('support', 0.5),
('confidence', 0.8),
('conviction', 3)
]
for metric_type, th in config:
rules = association_rules(output, metric=metric_type, min_threshold=th)
if rules.empty:
print 'Empty Data Frame For Metric Type : ',metric_type,' on Threshold : ',th
continue
print rules.columns.values
print '-------------------------------------'
print 'Configuration : ', metric_type, ' : ', th
print '-------------------------------------'
print (rules)
support=rules.as_matrix(columns=['support'])
confidence=rules.as_matrix(columns=['confidence'])
plt.scatter(support, confidence, edgecolors='red')
plt.xlabel('support')
plt.ylabel('confidence')
plt.title(metric_type+' : '+str(th))
plt.show()
Output : Config 1: antecedent support = 0.7

Output : Config 2: antecedent support = 0.7

Output : Config 3: confidence: 0.8

Output : Config 4: conviction: 3

dataset = []
in_titanic=cat_titanic
for i in range(0, in_titanic.shape[0]-1):
dataset.append([str(in_titanic.values[i,j]) for j in range(0, in_titanic.shape[1])])
# dataset = in_titanic.to_xarray()
oht = TransactionEncoder()
oht_ary = oht.fit(dataset).transform(dataset)
df = pd.DataFrame(oht_ary, columns=oht.columns_)
print df.head()
print oht.columns_
output = apriori(df, min_support=0.2, use_colnames=oht.columns_)
print output.head()
support itemsets 0 0.242697 (1) 1 0.206742 (2) 2 0.550562 (3) 3 0.528090 (Adult) 4 0.615730 (Dead)
config = [
('antecedent support', 0.7),
('confidence', 0.8),
('conviction', 3)
]
for metric_type, th in config:
rules = association_rules(output, metric=metric_type, min_threshold=th)
if rules.empty:
print 'Empty Data Frame For Metric Type : ',metric_type,' on Threshold : ',th
continue
print rules.columns.values
print '-------------------------------------'
print 'Configuration : ', metric_type, ' : ', th
print '-------------------------------------'
print (rules)
support=rules.as_matrix(columns=['support'])
confidence=rules.as_matrix(columns=['confidence'])
plt.scatter(support, confidence, edgecolors='red')
plt.xlabel('support')
plt.ylabel('confidence')
plt.title(metric_type+' : '+str(th))
plt.show()
Output : Config 1: antecedent support = 0.7

Output : Config 2: confidence: 0.8

Output : Config 3: conviction: 3

rules[rules['confidence']==rules['confidence'].min()]
rules[rules['confidence']==rules['confidence'].max()]
| antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
|---|---|---|---|---|---|---|---|---|---|
| 8 | (True) | (female) | 0.38427 | 0.352809 | 0.261798 | 0.681287 | 1.931035 | 0.126224 | 2.030636 |
| antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
|---|---|---|---|---|---|---|---|---|---|
| 12 | (1, female) | (True) | 0.105618 | 0.38427 | 0.102247 | 0.968085 | 2.519286 | 0.061661 | 19.292884 |
rules = association_rules (output, metric='support', min_threshold=0.1)
rules[rules['confidence'] == rules['confidence'].min()]
rules[rules['confidence'] == rules['confidence'].max()]
| antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
|---|---|---|---|---|---|---|---|---|---|
| 274 | (S) | (True, Adult, female) | 0.723596 | 0.14382 | 0.103371 | 0.142857 | 0.993304 | -0.000697 | 0.998876 |
| antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
|---|---|---|---|---|---|---|---|---|---|
| 55 | (1, female) | (True) | 0.105618 | 0.38427 | 0.102247 | 0.968085 | 2.519286 | 0.061661 | 19.292884 |
Use this Python script to evaluate the algorithms Apriori and FP Growth.
The evaluation output would be like,
In terms of reading process, the algorithms Apriori and FP Growth differs. According to that FP Growth is more efficient than apriori for bigger data because it reads only two times a file. But for me both are working in same manner and almost consumes same time for a specific data. It may be differ with respect to the data and nominal value count. Any way before implementing these algorithm, once check with Algorithm-Evaluation</ark> as said before and find the suitable algorithm for your work.
Also published in Kaggle.
Thanks to the Sources, - Apriori - FP Growth - Association Rule Mining Via Apriori Algorithm in python - Mining Frequent Items using apriori algorithm - Finding Frequent Patterns - Efficient - Apriori - Python 3.6 - Data mining with apriori
]]>