TheWorkAround

Devise-OTP without jQuery

2023-03-13T00:00:00+00:00

Ruby/Rails boasts a unique diversity of available libraries and gems, including several options for authentication such as Devise, omniauth, clearance, sorcery, rodauth, among others. While there are many options to choose from, I personally prefers Devise for its modularity and ease of use.

However, any modern application needs more than just an authentication system; you also want to ensure you build on permissions permissions (for authorization), 2FA, and a solid hash library for additional secure measures.

There are several 2fa libraries that you can integrate with Devise, such as devise-two-factor, devise-otp, devise-2fa, two_factor_authentication, among others. I honestly prefer Devise-OTP for its ease of use and adherence to RFC 6238 implementation using the ROTP gem.

However I did encounter an issue with Devise-OTP in that it still assumes applications using it are using Sprockets and jQuery, and the qrcode JS file it includes requires jQuery. While I still use jQuery regularly at work, I have moved to using StimulusJS for personal projects.

After attempting to use several npm packages (like node-qrcode) for generating the users 2fa QRcodes, I also found that 1Password had issues when scanning for a valid 2FA QR code; which could be a significant concern for many users. It is noted that the gem has a handy helper otp_authenticator_token_image_google for generating QR codes using the Google Chart API, but generating QR codes with a third party like Google may not be the most security-conscious decision.

After some playing around I was able to find an alternative method to get the qrcodes genreted in the application and without requiring jQuery to be included.

First to modify devise-opts views I ran the generator so they would be in the app, rather than in the gem.

rails g devise_otp:views

Next I added the follow gems to my Gemfile and bundled

gem 'rotp'
gem 'rqrcode'

In my users helper I added the following method to generate the users 2fa QRcode

def otp_rqr_image(otp_url)
  qr_code = RQRCode::QRCode
            .new(otp_url)
            .as_png(resize_exactly_to: 300)
            .to_data_url
  image_tag(qr_code)
end

Now within the view files that the gem generated you should have app/views/devise/otp_tokens/_token_secret.html.slim. This file is where the qrcode is generated for the user enabling 2fa for their account. I replaced the otp_authenticator_token_image (jQuery based) method with otp_rqr_image that we created earlier. You should now have a valid 2fa QRcode which is also scanable by 1Password and most of password managers or 2fa authenticators.

Setting up a Pi-Hole

2019-08-27T00:00:00+00:00

Before we get started, if you haven’t heard of the PiHole module. It’s an Open Source networking product that’s exploding in the consumer networking world. Let me begin by saying at my families house we have multiple TV’s (a smart TV and a TV with a Roku), the kids each have tablets, and my wife and I have our phones and laptops, as well as a few personal servers (local NAS and development servers). And for the majority of the time, most of these devices are sending and receiving traffic at any given time.

For several months in a row our households bandwidth peaked over our xfinity (Comcast) data cap. I mean $10/$20 extra every month doesn’t sound like much, but it adds up (especially when you have kids). So I started to monitoring our networks traffic a little more thoroughly, and it turns out a pretty sizeable amount of our throughput was from advertisements. I’m not going to deny watching Netflix, Hulu, Disney+, and YouTube doesn’t consume a ton of bandwidth; but our network was congested with a ton of traffic, that honestly we had no idea was even being used.

Let me say since I setup our PiHole our network speed seems to have dramatically increased and as you can see from the picture below our network traffic have halved. I’m only theorizing here, but the increase in network speed seems to be from from a combination of having a local DNS within our home network, as well as denying network traffic to a TON of external resources.

I won’t be walking you through how to setup the RasberryPi with Rasbian, but if you haven’t set it up yet I would suggest starting with their documentation. (I setup my Pi using Rasbian Lite to reduce the number of wasted resources, since it will be mainly used as a headless network device.)

My PiHole setup consists of a Raspberry Pi 4 with 4GB of RAM and because of the heat issue that everyone’s complaining about; I also purchased a case with a cooling fan. I know, 4GB is way overboard for a for such a lightweight process and used as a local DNS server, but I’m also using it for a number of other services as well.

Begin setup

This first step isn’t required, but I ran the following command (lsb_release -a && uname -a) so you could see what Rasbian release this setup was build on.

No LSB modules are available.
Distributor ID: Raspbian
Description:  Raspbian GNU/Linux 10 (buster)
Release:  10
Codename: buster
Linux Falion 4.19.58-v7l+ #1245 SMP Fri Jul 12 17:31:45 BST 2019 armv7l GNU/Linux

Adjust Rasbian Configuaton

Next you’ll need to run sudo raspi-config and update the configurations listed below.

Setup new password
Set Hostname
Set locales (en_us.utf-8, timezone, keyboard layout, country)
Update raspi-cofig tool

After these changes, lets restart the Pi so the new configuration changes will be applied shutdow now -r

Since this will be an important component in your network, lets start out on the right foot and remember that we need to be thinking about security with every step along the way. Once your Pi has finished restarting, you’ll need to change the root password.

sudo su # This is used to switch to the root user
passwd  # Used to change the current users password

Next, lets update all currently installed packages and ensure security when installing optional packages.

# Update all software on the Pi
apt-get update && apt-get upgrade -y && apt-get install apt-transport-https ca-certificates -y

Hardening the Pi

I’m not going to cover every single step of hardening you’re RaspberryPi, but if you need an even deeper understanding of Linux security there are an endless supply of guides on how to harden a Debian based distro.

Lets start by create an additional user, and removing the default pi user. This is more of a security by obscurity, because everyone will knows that by default your devices will have a user named pi which gets them one step closer to having access to your device.

First we need to create the new user sudo adduser user_name

Now lets add the new user to the list of sudoers

sudo nano /etc/sudoers.d/010_pi-nopasswd
# This allows the new user to run sudo, but requires a valid password to entered first
user_name ALL=(ALL) PASSWD: ALL

# You can also run the following
sudo /usr/sbin/useradd --groups sudo -m user_name

You’ll now need to be sure to completely kill the current terminal session as the pi user, I’d suggest typing logout to make sure the session is properly killed.

The next step is to login as the user we just created, remove the default pi user that, and then remove this user from the list of sudoers. (This may seem like a bunch of unnecessary steps, but we want to ensure at least a minimum level of security.)

# now ssh into the pi using the new user
ssh [email protected]
sudo su

# Remove the pi user
sudo deluser -remove-home pi

# Remove the user from the sudoer list as well
sudo nano /etc/sudoers.d/010_pi-nopasswd
-> pi ALL=(ALL) NOPASSWD: ALL

Let’s start with SSH

Now since I’m running this as a headless device on my network I’ll only be accessing this device through a shell terminal. So the next step is to begin by locking down OpenSSH. I know this complicates things to another level, but in my setup I generally prefer to to have a different SSH key a number of services. I use one key for my DigitalOcean servers, one key for Github, another key for my works Gitlab account, etc etc. The biggest issue with this is managing a growing number of keys, but if you keep up with configuring your SSH config file ~/.ssh/config as you generate the keys your maintenance should be pretty minimal. My rational behind this is; some people use a single key for every service which is logical, I mean it’s not exactly easy to crack a 4096 encryption key. But if you somehow happen to leak that one private key, any and all services you have associate with it are now compromised. If you have a different RSA key for each major service you use and one key gets comprised the impact should minimal rather than effecting a large number of services.

Now it may be a personal preference but rather than just generating a generic RSA key I prefer to use ed25519 keys, mainly because they’re small, fast, and very secure. (If you need instructions on how to generate ed25519 keys, I have a snip on one of my previous posts.)

Now we need to ensure we have the latest version of OpenSSH installed, to do this run the following command: sudo apt install openssh-server

The configuration I use follows pretty closely to a combination of the Mozilla SSH Guidelines (Modern) and TLDP recommendations.

Lets open our ssh_config file and replace the configuration with the snippet posted below - sudo nano /etc/ssh/sshd_config

# Adapted from the "Modern" configuration detailed on the Mozilla Security
# Guidelines wiki (https://wiki.mozilla.org/Security/Guidelines/OpenSSH).
# https://github.com/mozilla/socorro-infra/blob/master/puppet/modules/socorro/files/etc_ssh/sshd_config
# http://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap15sec122.html
# docs: http://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap15sec122.html
# Package generated configuration file
# See the sshd_config(5) manpage for details

Port 22
Protocol 2
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_ecdsa_key

KexAlgorithms [email protected],ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256
Ciphers [email protected],[email protected],[email protected],aes256-ctr,aes192-ctr,aes128-ctr
MACs [email protected],[email protected],[email protected],hmac-sha2-512,hmac-sha2-256,[email protected]

# Completely disable password based logins, PublicKey login is required
PubkeyAuthentication yes
#AuthenticationMethods publickey,keyboard-interactive:pam # enables multiset authentication PublicKey->Password
AuthenticationMethods publickey
KbdInteractiveAuthentication yes

# Lifetime and size of ephemeral version 1 server key
KeyRegenerationInterval 3600 # to prevent descryption, regenerate connection keys
ServerKeyBits 1024

# Limit to users/groups
AllowUsers tarellel

# Don't read the user's ~/.rhosts and ~/.shosts files
IgnoreRhosts yes  # prevents login from trusted networks
#RhostsAuthentication no
RhostsRSAAuthentication no

# Logging
#SyslogFacility AUTH
SyslogFacility AUTHPRIV
LogLevel INFO

# Log sftp level file access (read/write/etc.) that would not be easily logged otherwise.
Subsystem sftp  /usr/lib/openssh/sftp-server -f AUTHPRIV -l INFO

# Authentication:
PermitRootLogin No
UsePrivilegeSeparation sandbox # prevent user privilege escalation
LoginGraceTime 30 # default 120/2m
StrictModes yes # checks user [~] permissions

X11Forwarding no # may wish to turn this off for security purposes it was defaulted to yes
AllowTcpForwarding no
# ClientAliveCountMax 2 # max amount of concurrently connected clients
# http://serverfault.com/questions/275669/ssh-sshd-how-do-i-set-max-login-attempts
# MaxAuthTries 1 # 1 login attempt per connection, before being dropped
# MaxSessions Specifies the maximum number of open sessions permitted per network connection
# MaxSessions 2

# https://patrickmn.com/aside/how-to-keep-alive-ssh-sessions/
# Keep idle TCP connections alive (kill idle connections)
TCPKeepAlive no
ClientAliveInterval 120 # how long the connection can be idle (seconds)

# NOTE: It's best to disable this when forwarding is also disabled
Compression no

PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication yes

# Login/Logout messages
PrintMotd no
# Banner /etc/issue.net

UsePAM yes
# Ensure /bin/login is not used so that it cannot bypass PAM settings for sshd.
UseLogin no
UseDNS no

Next you’ll need to ensure your personal machines public ssh-key is on the Raspberry, so you won’t be locked out of the device. On your local machine run pbcopy < ~/.ssh/.ssh/localnetwork_id.pub this copys the contents of your SSH’s publickey to your clip. (As I mentioned before, I prefer to generate SSH keys for different services. The SSH-key localnetwork was generated for all devices in my local home network).

Now go back to the terminal and ensure the authorized_keys file creates for the current user in the ssh folder mkdir ~/.ssh; chmod 0700 ~/.ssh; touch ~/.ssh/authorized_keys; nano ~/.ssh/authorized_keys. The last part of this command opens up the authorized_keys file in nano, since you previously copied the contents of you public key to your clipboard lets go ahead paste into the this file and hit CTRL-X and save the file. After the contents have been saved you’ll need to chmod it to ensure access to it’s contents is limited chmod 0600 ~/.ssh/authorized_keys. In order for all these changes to be applied you’ll need to restart the ssh process sudo systemctl restart ssh (you could always do something like systemctl reload ssh but I prefer to just restart the process).

Lets add a firewall

I’m sure there will be some moaning that I’m using UFW over IPtables, but I find UFW to be easy to get started with and it does exactly what I need.

Let’s begin by installing the UFW package sudo apt-get install ufw. But for me, after intalling UFW I started getting “an IP tables error has occured” when trying to start the UFW server. So again I restarted the Pi and after it had finished reloading all errors were gone. Below are the UFW rules used to begin security incoming network services for your device, before running these commands it helps to su sudo so you’re creating these rules as a root user.

# Some people also prefer to be extra cautious and also begin by blocking all outgoing connects `ufw default deny outgoing` ports.
ufw default deny incoming # Block ALL incoming ports
ufw allow ssh
ufw allow 53    # DNS port
ufw allow http  # Port 80 & 443 are used by PiHole to display a dashboard with PHP/lighttpd
ufw allow https

# Limit the number of login attempts on SSH port
ufw limit ssh/tcp

# Allow FTL pihole engine from LAN (if you are using a different subnet specify this instead)
ufw allow from 192.168.1.0/24 to any port 4711 proto tcp

ufw enable

Now that a few basic firewall rules have been creates let’s reload the UFW service in order for these rules to start being applies sudo ufw reload.

Fail2Ban

Next we’ll setup Fail2Ban, which is one of my favorite tools to prevent BruteForcing and/or credential stuffing login attempts. With the firewall setup; we’ve already taken the measure of preventing network requests to a large number of ports and services. But whenever there’s an open SSH port, I guarentee you it will get hit with requests. (I averted this by only having a VPN port on my router and in order to access my network from outside the network requires you to conenct via VPN into my network, but I’ll cover that farther down.)

Begin by installing the fail2ban package apt install fail2ban, than we’ll adjust the configuration to begin monitoring any and all ssh login attemps

sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
sudo nano /etc/fail2ban/jail.local

Once nano has opened up /etc/fail2ban/jail.local hit Ctrl+W and search for [sshd] and change the configuration for ssh to the following.

[sshd]
enabled  = true
port     = ssh
filter   = sshd
logpath  = /var/log/auth.log
maxretry = 3
bantime = -1

Now lets restart and enable fail2ban and verify the process and configuration have properly loaded

service fail2ban restart; service fail2ban start enable

# Verify fail2ban is running
service fail2ban status

This step isn’t required, but I like to get the fail2ban client status to ensure the sshd jails (or any others you may add) are enabled sudo fail2ban-client status

Status for the jail: sshd
|- Filter
|  |- Currently failed:	0
|  |- Total failed:	0
|  `- File list:	/var/log/auth.log
`- Actions
   |- Currently banned:	0
   |- Total banned:	0
   `- Banned IP list:

Unattended Upgrades

Another one of my favorite tools on a Debian based distro is Unattended Upgrades. For those of you who have never used it before, it will systematically apply security updates, distro updates, or whatever kind of upgrades you choose to allow it to update. I prefer to keep everything up to do, but it can be danger to enable allowing automatic updates to something such as a webserver. If this is the case, I’d suggest only enabling security updates; because major package upgrades can and will break your software.

Again let’s begin by installing the required packages sudo apt-get install unattended-upgrades -y. Once the package has installed you’ll need to update it’s default configuration sudo nano /etc/apt/apt.conf.d/50unattended-upgrades to the following: (The contents of the file specify what updates to look for.)

Unattended-Upgrade::Origins-Pattern {
  // Codename based matching:
  // This will follow the migration of a release through different
  // archives (e.g. from testing to stable and later oldstable).
  // Software will be the latest available for the named release,
  // but the Debian release itself will not be automatically upgraded.
  "origin=Debian,codename=${distro_codename}-updates";
  // origin=Debian,codename=${distro_codename}-proposed-updates";
  "origin=Debian,codename=${distro_codename},label=Debian";
  "origin=Debian,codename=${distro_codename},label=Debian-Security";
  // Archive or Suite based matching:
  // Note that this will silently match a different release after
  // migration to the specified archive (e.g. testing becomes the
  // new stable).
  // "o=Debian,a=stable";
  // "o=Debian,a=stable-updates";
  // "o=Debian,a=proposed-updates";
  // "o=Debian Backports,a=${distro_codename}-backports,l=Debian Backports";
};

The next step is to open up /etc/apt/apt.conf.d/20auto-upgrades and configure what components of unattended-upgrades you want to enable. In order to keep my packages up to date with the latest changes I changed mine to the following configuration:

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::Verbose "1";
APT::Periodic::AutocleanInterval "7";

Now let’s enable unattended upgraded as a low priority process sudo dpkg-reconfigure --priority=low unattended-upgrades to being monitoring and applying future updates in order to keep our device secure.

Installing PiHole

I know thus far, it’s been like provisioning any other Linux box; but now comes the fun part. To get started quickly run the following command curl -sSL https://install.pi-hole.net | bash, it does take a few moments because it attempts to install any additional required libraries. It also required to select a number of configurations based on how you like your PiHole setup.

After going through a few of the setup screens you’ll be presented with your fix big choice. Which DNS Upstream do you with to use? You’ll be presented with a number of choices, including but not limited to Google, OpenDNS, Level3, CloudFlare and a number of other choices. I’m my opinion I suggest picking CloudFlare their DNS service is extremely fast and their whole suite of services is about providing security by default.

Next you’ll be present with of thirdparty blocklists to choose from, this is completely up to you and what you want to block. And you can always add more later (which we do with this walkthrough).

Since we setup UFW network requests should be filtered. Part of the installation script ensures that the proper ports available in order for the pihole to function properly.

Once the installation has finished installing you should be presented with a final menu telling you your PiHoles dashboard address and password. (The address should be something like http://192.168.1.5/admin). When visiting the dashboard you should be presented with something similar to the following image:

Now for the most part you’re almost, next we’ll need to configure our router to query the PiHole service for any DNS requests. To do this you’ll need to change your Routers DNS settings to use the internal IP Address for the RaspberryPi. I also applied CloudFlares DNS IP Addresses 1.1.1.1 and 1.0.0.1. This is because for any reason the Pi is shutdown or inaccessible your network won’t come a dead halt, it’ll fallback to the secondary DNS server (Cloudflare).

Setting up VPN Access

PiVPN is an insanely easy to install and the Pi alternative to installing OpenVPN on an ARM chip.

To install it lets run the following command curl -L https://install.pivpn.io | bash

Again the commandline menu screen will come up and walk you through the various steps to install PiVPN, some of the steps are pretty quick but once you get to where it needs to generate a certificate for your VPN it will take a while. Once OpenVPN started generating a certificate for the VPN I stepped away because for a snack, but it took a good 5 or 10 minutes to generate.

Something to note, I ended up having to set the VPN’s DNS settings as the PI’s static IP Address. It may have just been the situation, but when I was VPN’ing in and the DNS was set as the IP address of the router 192.168.1.1 it was causing my computer the pull DNS from the actual network I was connected to rather then through the PiHole service. In order to mitigate this I ended up changing the VPNs DNS settings to use it’s own IP address as the DNS server.

To do this we’ll need to modify the openvpn config nano /etc/openvpn/server.conf and change the following push "dhcp-option DNS 192.168.1.1" to push "dhcp-option DNS 192.168.71.1" or to whatever you Pi’s ip address is.

If you don’t know your current local IP address you can get this by running ifconfig and getting the ip address listed under the eth0 network adapter.

After this step was complete I ended up rebooting the Pi again, so all the new configuration changes and services would be applied.

We’re still a ways off from being finished with setting up the VPN, once the Pi has finished rebooting we need to add users to the VPN. This can be done by running the command pivpn add. You may want to just have a user and make them authenticate with the signed key, I’m a little cautious and decided to have my user (of course me) require to have a password as well as the signed key to authenticate to the VPN. If you end up needing help or finding additional commands for the VPN you can run the command pivpn help.

The next step I did was pull my generated VPN file to my computer so I could add it to my computer and phone. If you do plan on using your phone, for security reasons I suggest having a seperate VPN key for your computer and mobile phone.

To pull the openvpn signature file to you computer using scp, you can run a command similar to the following: (Towards the end of this post I’ll explain how to use the on ur phone or macbook)

scp -i /Users/pi_user/.ssh/localnetwork_id [email protected]:/home/tarellel/ovpns/tarellelRemote.ovpn .

The next step is to enable access to the VPN ports, otherwise we’ll still never be able to VPN into the network through the PI. I started by enabling the UDP VPN port ufw allow 1194/udp, but for some reason I had issues. To get around this I ended up having to remove this rule and just enable access to the 1194 port in general ufw allow 1194.

After opening up the VPN ports I decided to reload it’s configuration, it may be a bit redundent but what’s it going to hurt.

sudo /etc/init.d/openvpn reload
# Just to verify openvpn reloaded properly
sudo systemctl status openvpn

Now we need to update openvpn’s network devices priority sudo nano /etc/pihole/setupVars.conf and add the following PIHOLE_INTERFACE=tun0 below eth0. The pretty much tells openVPN to use eth0 as it’s primary device and tun0 as the pihole virtual network device.

Next we’ll need to list the devices in which we want to allow to make DNS requests through the dnsmasq network service. To add the devices lets open it up nano /etc/dnsmasq.d/01-pihole.conf and add the following list of devices.

interface=eth0
interface=tun0

Since we’ve just make some more changes to the OpenVPN configuration and firewall lets reset these services (yes again, I know).

service openvpn reload
service openvpn restart
systemctl enable ufw

Install log2ram

This next step is pretty important, the reason being SD cards aren’t meant to have files written on and removed from at a constant pace, especially when it comes to generating logs. It’s like havine a piece of paper writing on it and than erasing it, over and over again. Eventually that piece of paper will become useless, the same can be said for SD cards. To mitigate this, we end up using log2ram which will save our logs in memory and once it consumes X amount of memory it’ll save as an actual log file.

The following steps are copied directory from the projects documentation.

curl -Lo log2ram.tar.gz https://github.com/azlux/log2ram/archive/master.tar.gz
tar xf log2ram.tar.gz
cd log2ram-master
chmod +x install.sh && sudo ./install.sh
cd ..
rm -r log2ram-master

Now before doing anything else we’ll need to restart the server again shutdown now -r. After your device has came back up, we’ll need to adjust the log2ram’s configuration by editing the following file /etc/log2ram.conf. And chance the following:

# Change
SIZE=40M

# Lets increase the RAM log size to 100M
SIZE=100M


# You also want to disbale creating report error mails
MAIL=true
# Change it to falsse
MAIL=false

I know you’re getting tired of it, but again we’ll need to restart the Pi in order for the new RAM/log configuration to be properly come into effect. shutdown now -r

Update Raspberry Pi’s bootloader

If you’re device is pretty fresh out of the box I can almost guarentee you that your Pi will need to have it’s bootloader updated. I’m just going to list the steps take in order to verify and apply a bootloader update

apt-get update && apt-get install rpi-eeprom

# enable checking for bootloader updates
systemctl unmask rpi-eeprom-update

# For the O/S to check for a bootlaoder update
rpi-eeprom-update

# If the results show an update you'll need the Pi to prepare the page for update
rpi-eeprom-update -a

# In order for the bootloader update to be applied a restart is required
shutdown now -r

If you’re device requires an update your results should look similar to the following screenshot.

Update PiHole Blocklists

By default the basic list that your Pihole uses is pretty decent it blocks quite a bit of the heavy ad and trackying systems, but I prefer to block more. This is because as I watch my traffic I noticed several of the devices in my houses are still sending requests to various tracking URLs including my smart TV, my kids tablets, my printer, etc.

First you’ll want setup your blocklist of DNS requests to start blocking as many trackers as you possibly can. My list of blocklists is ended up including about 2 million different links and ends up blocking anywhere between 40-70% of my daily traffic (my list of blocklists). Once you add these lists to your piholes blocklist and update you Pi gravity rating you’ll almost instant notice pages are loading faster and you traffics congestion has been massive reduced.

Not lets add some regex filters to filter out any of those DNS that haven’t been caught by the blocklists. The regex filters I use, I believe I got some piecing together a few reddit posts together. They are specifically setup to catch any DNS requests that have the phrase tracker, traffic, ads, analytics or various other phrases

Next you’ll need to add some your whitelists, mine if a bit liberal and I need to go through and trim it down. But I started out by making it pretty broad, because otherwise spccific services and devices would no longer work on my home network because they uptime ping backs were completely disabled. These included; our xbox, Spotify, updating my kids android devices, updating our LG Smart TV, using our Plex server, Hulu, accessing namecheap, and blocking facebook (which I’d prefer, but my wife can’t list without) from the network. My whitelist can also be found also be found on github as a gist. After updating and all all these URL’s and snips to your piholes list, you’ll also want to update and reload your piholes gravity list again. This is to ensure anything added to the whitelist hopefully won’t still be filtered and any regex you added will have hopefully be properly filtered.

Applications to use

In order to VPN into your network (if you want to use the Pihole when outside your network), you’ll need to download the specified VPN client. If you are using an iPhone you’ll need to use OpenVPN iOS client, for android you can download the OpenVPN app, and for OSx I used Tunnelblick.

In order to use the VPN client on your iPhone you will need to connect it to your computer, similar to how you would sync data between the two or doing a backup.

Than on the list of options available for your phone , you need to click the files tab, this will allow you to access the files on your phone. We’ll now need to find the VPN key we generated and scp‘ed from the Raspberry Pi earlier. And you’ll need to dump it on the OpenVPN application. It may prompt you if you’d like to trust the file be transfered to your phone. Accept it and it should setup the VPN connection configuration for you on your phone.

This won’t work while you’re in the same network, but if you turn off wifi or connect from outside the network you should be able to connect like the pictures below. (if you added a password to your VPN key you may also need to occasionally input the password before it will allow you to connect or use the key).

References

Docker Swarm Persistent Storage

2019-05-15T00:00:00+00:00

Unless you’ve been living under a rock, you should need no explanation what Docker is. Using Docker over the last year has drastically improved my deployment ease and with coupled with GitLab’s CI/CD has made deployment extremely ease. Mind you, not all our applications being deployed have the same requirements, some are extremely simple and others are extraordinarily complex. So when we start a new project we have a base docker build to begin from and based on the applications requirements we add/remove as needed.

A little about Docker Swarm

For the large majority of most of our applications, having a volume associated with the deployed containers and storing information is the database fits the applications needs.

In front of all our applications we used to use Docker Flow Proxy to quickly integrate our application into our deployed environment and assign it a subdomain based on it’s service. For a few months we experienced issues with the proxy hanging up, resources not being cleared, and lots of dropped connections. Since than I have rebuilt our docker infrastructure and now we use Traefik for our proxy routing and it has been absolutely amazing! It’s extremely fast, very robust and extensible, and easy to manipulate to fit your needs. Heck before even deploying it I was using docker-compose to build a local network proxy to ensure it was what we needed. While Traefik was running in compose I was hitting domains such as http://whoami.localhost/ and this was a great way to learn the basic configuration before pushing it into a staging/production swarm environment. (That explaing how we got started with Traefik is a whole other post of it’s own.)

Now back to our docker swarm, I know the big thing right now is Kubernetes. But every organization has their specific needs, for their different environments, application, types, and deployment mechanisms. In my opinion the current docker environment we’ve got running right now is pretty robust. We’ve got dozens of nodes, a number of deployment environments (cybersec, staging, and production), dozens of applications running at once, and some of then requiring a number of services in order to function properly.

A few of the things that won me over on the docker swarm in the first place is it’s load balancing capabilities, it’s very fault-tolerant, and the self-healing mechanism that it uses in case a container crashes, a node locks up or drops, or a number of other issues. (We’ve had a number of servers go down due to networking issues or a rack server crapping out and with the docker swarm running you could never even tel we were having issues as an end user to our applications.)

(Below is an image showing traffic hitting the swarm. If you have an application replicated upon deployment, traffic will be distributed amongst the nodes to prevent bottle necks.)

Why would you need persistent storage?

Since the majority of our applications are data orientated, (with most of them hitting several databases in a single request) we hadn’t really had to worry about persistent storage. This is because once we deployed the applications; their volumes held all of their required assets and any data they needed was fetched from the database.

The easiest way to explain volumes, is when a container is deployed to a node (if specified) it will put aside a section of storage specifically for that container. For example say we have an application called DogTracker the was deployed on node A and B. This application can create and store files in their volumes on those nodes. But what happens when there’s an issue with the container on node A and the container cycles to node C? The data created by the container is left in the volume on node A an no longer available, until that applications container cycles back to node A.

And from this arises the problem we began to face. We were starting to develop applications that were starting to require files to be shared amongst each other. We also have numerous applications that require files to be saved and distributed without them being dumped into the database as a blob. And these files were required to be available without cycling volumes and/or dumping them into the containers during build time. And because of this, we needed to be able to have some form of persistent and distributed file storage across our containers.

(Below is an image showing how a docker swarms volumes are oriented)

How we got around this!

Now in this day an age there’s got to be ways to get around this. There’s at least 101 ways to do just about anything and it doesn’t always have to be newest shiniest toy everyone’s using. I know saying this while using Docker is kind of a hypocritical statement, but shared file systems have been around for decades. You’ve been able to mount network drives, ftp drives, have organizational based shared folders, the list can go on for days.

But the big question is, how do we get a container to mount a local shared folder or distribute volumes across all swarm nodes? Well, there’s a whole list of distributed filesystems and modern storage mechanisms in the docker documentation. Below is a list of the top recommended alternatives I found for distributed file systems or NFS’s for the docker stratosphere around container development.

I know you’re wondering why we didn’t use S3, DigitalOcean Spaces, GCS, or some other cloud storage. But internally we have a finite amount of resources and we can spin up VM’s and be rolling in a matter of moments. Especially considering we have build a number of Ansible playbooks to quickly provision our servers. Plus, why throw resources out on the cloud, when it’s not needed. Especially when we can metaphorically create our own network based file system and have our own cloud based storage system.

(Below is an image showing we want to distribute file system changes)

After looking at several methods I settled on GlusterFS a scalable network filesystem. Don’t get me wrong, a number of the other alternatives are pretty ground breaking and some amazing work as been put into developing them. But I don’t have thousands of dollars to drop on setting up a network file system, that may or may not work for our needs. There were also several others that I did look pretty heavily into, such as StorageOS and Ceph. With StorageOS I really liked the idea of a container based file system that stores, synchronizing, and distributes files to all other storage nodes within the swarm. And it may just be me, but Ceph looked like the prime competitor to Gluster. They both have their high points and seem to work very reliable. But at the time; it wasn’t for me and after using Gluster for a few months, I believe that I made the right choice and it’s served it’s purpose well.

Gluster Notes

(Note: The following steps are to be used on a Debian/Ubuntu based install.)

Documentation for using Gluster can be found on their docs. Their installation instructions are very brief and explain how to install the gluster packages, but they don’t go into depth in how to setup a Gluster network. I also suggest thoroughly reading through to documentation to understand Gluster volumes, bricks, pools, etc.

Installing GlusterFS

To begin you will need to list all of the Docker Swarm nodes you wish to connect in the /etc/hosts files of each server. On linux (Debian/Ubuntu), you can get the current nodes IP Address run the following command hostname -I | awk '{print $1}'

(The majority of the commands listed below need to be ran on each and every node simultaneously unless specified. To do this I opened a number of terminal tabs and connected to each server in a different tab.)

# /etc/hosts
10.10.1 staging1.example.com staging1
10.10.2 staging2.example.com staging2
10.10.3 staging3.example.com staging3
10.10.4 staging4.example.com staging4
10.10.5 staging5.example.com staging5

# Update & Upgrade all installed packages
apt-get update && apt-get upgrade -y

# Install gluster dependencies
sudo apt-get install python-software-properties -y

Add the GlusterFS PPA package the list of trusted packages to install from a community repository.

sudo add-apt-repository ppa:gluster/glusterfs-3.10;
sudo apt-get update -y && sudo apt-get update

Now lets install gluster

sudo apt-get install -y glusterfs-server attr

Now before starting the Gluster service but I had to copy some files into systemd (you may or may not have to do this). But since Gluster was developed by RedHat primarily for RedHat and CentOS, I had a few issues starting the system service.

sudo cp /etc/init.d/glusterfs-server /etc/systemd/system/

Let’s start and enable the glusterfs system service

systemctl enable glusterfs-server; systemctl start glusterfs-server

This step isn’t necessary, but I like to verify that

# Verify the gluster service is enabled
systemctl is-enabled glusterfs-server
# Check the system service status of the gluster-server
systemctl status glusterfs-server

If for some reason you haven’t done this yet, each and every node should have it’s own ssh key generated.

(The only reason I can think of why they wouldn’t have a different key is if a VM was provisioned and than cloned for similar use across a swarm.)

# This is to generate a very basic SSH key, you may want to specify a key type such as ED25519 or bit length if required.
ssh-keygen -t rsa

Dependant on your Docker Swarm environment and which server you’re running as a manager; you’ll probably want one of the node managers to also be a gluster node manager as well. I’m going to say server staging1 is one of our node managers, so on this server we’re going to probe all other gluster nodes to add them to the gluster pool. (Probing them essentially is saying this manager is telling all servers on this list to connect to each-other.)

gluster peer probe staging1; gluster peer probe staging2; gluster peer probe staging3; gluster peer probe staging4; gluster peer probe staging5;

It’s not required, but probably good practice to ensure all of the nodes have connected to the pool before setting up the file system.

gluster pool list

# => You should get results similar to the following
UUID					Hostname 	State
a8136a2b-a2e3-437d-a003-b7516df9520e	staging3 	Connected
2a2f93f6-782c-11e9-8f9e-2a86e4085a59	staging2 	Connected
79cb7ec0-f337-4798-bde9-dbf148f0da3b	staging4 	Connected
3cfc23e6-782c-11e9-8f9e-2a86e4085a59	staging5 	Connected
571bed3f-e4df-4386-bd46-3df6e3e8479f	localhost	Connected

# You can also run the following command to another set of results
gluster peer status

Now lets create the gluster data storage directories (It’s very important you do this on every node. This is because this directory is where all gluster nodes will store the distributed files locally.)

sudo mkdir -p /gluster/brick

Now lets create a gluster volume across all nodes (again run this on the master node/node manager).

sudo gluster volume create staging-gfs replica 5 staging1:/gluster/brick staging2:/gluster/brick staging3:/gluster/brick staging4:/gluster/brick staging5:/gluster/brick force

The next step is to initialize the glusterFS to begin synchronizing across all nodes.

gluster volume start staging-gfs

This step is also not required, but I prefer to verify the gluster volume replicated across all of the designated nodes.

gluster volume info

No let’s ensure we have gluster mount the /mtn directory for it’s shared directory especially on a reboot. (It’s important to run these commands on all gluster nodes.)

sudo umount /mnt
sudo echo 'localhost:/staging-gfs /mnt glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0' >> /etc/fstab
sudo mount.glusterfs localhost:/staging-gfs /mnt
sudo chown -R root:docker /mnt

(You may have noticed the setting of file permissions using chown -R root:docker this is to ensure docker will have read/write access to the files in the specified directory.)

If for some reason you’ve already deployed your staging gluster-fs and need to remount the staging-gfs volume you can run the following command. Otherwise you should be able to skip this step.

sudo umount /mnt; sudo mount.glusterfs localhost:/staging-gfs /mnt; sudo chown -R root:docker /mnt

Let’s list all of our mounted partitions and ensure that the staging-gfs is listed.

df -h

# => staging-gfs should be listed in the partitions/disks listed
localhost:/staging-gfs              63G   13G   48G  21% /mnt

Now that all of the work is pretty much done, now comes the fun part lets test to make sure it all works. Lets cd into the /mnt directory and create a few files to make sure they will sync across all nodes. (I know this is one of the first things I wanted to try out.) You can do one of the following commands to generate a random file in the /mnt directory. Now depending on your servers and network connections this should sync up across all nodes almost instantly. The way I tested this I was in the /mtn directory on several nodes in several terminals. And as soon as I issued the command I was running the ls command in the other tabs. And depending on the file size, it may not sync across all nodes instantly, but is at least accessible.

# This creates a 24MB file full of zeros
dd if=/dev/zero of=output.dat bs=24M  count=1

# Creates a 2MB file of random characters
dd if=/dev/urandom of=output.log bs=1M count=2

Using GlusterFS with Docker

Now that all the fun stuff is done if you have looked at docker volumes or bind mounts this would probably be a good time. Usually docker will store a volumes contents in a folder structure similar to the following: /var/lib/docker/volumes/DogTracker/_data.

But in your docker-compose.yml or docker-stack.yml you can specify specific mount points for the docker volumes. If you look at the following YAML snippet you will notice I’m saying to store the containers /opt/couchdb/data directory on the local mount point /mnt/staging_couch_db.

version: '3.7'
services:
  couchdb:
  image: couchdb:2.3.0
  volumes:
   - type: bind
     source: /mnt/staging_couch_db
     target: /opt/couchdb/data
  networks:
    - internal
  deploy:
    resources:
      limits:
        cpus: '0.30'
        memory: 512M
      reservations:
        cpus: '0.15'
        memory: 256M

Now as we had previously demonstrated any file(s) saved, created, and/or deleted in the /mtn directory will be synchronized across all of the GlusterFS nodes.

I’d just like to mention this may not work for everyone, but this is the method that worked best for use. We’ve been running a number of different Gluster networks for several months now with no issues thus far.

Ruby’s Year of Performance (2018)

2018-06-04T00:00:00+00:00

Many people claim Ruby is no longer relevant and quite a few people have moved on to Elixir, Go, Rust, and Node. This is because Ruby was not originally built for speed, it was built for ease of use. It does have its limitations and Rails is a monstrosity with all it’s services, workers, etc. But I’ve never had an issue with this, I started using Ruby because of its ease of use. I can from a world of using PHP and ugly spaghetti code to Ruby where coding is more of a thing of art.

But 2018 has been a big year for Ruby releases and trying to meet their 3x3 performance goals (RedHat Writeup). And in the last year alone, Arron Patterson (tenderlove) and various contributors have made amazing advances in improving Ruby’s performance. And the addition of a JIT compiler to Ruby is no ease feat that looks like it will also have a HUGE effect on Ruby to regain some ground and no longer be known as a “slow language”. I admit Ruby Truffle has some extreme potential to improve Rubys performance using the GraalVM but since Oracle does have a tendency to take people to court and suing them, for using their various technological components.

Another modification that I have found that dramatically improves performance and reduces memory usage is by adding jemalloc. By default, the MRI version uses the glibc memalloc library.

To use jemalloc with ruby lets first install the library, so we can use it when compiling out ruby binaries.

# OSx
brew install jemalloc

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install libjemalloc1 libjemalloc-dev

Many of people prefer ruby-build for compiling new ruby versions but I prefer RVM because of it’s ease of use. Now to compile with jemmalloc, we need to add the flags for RVM to compile using the jemalloc library.

rvm install 2.5 -C --with-jemalloc --autolibs=disable

I tend to use the Fish shell, it has dramatically increased my productivity with its ease of use, auto completion libraries, and great features. So to make compiling a new RVM instance easier I created a function titled rvm_install so now when I want to compile a new Ruby version with the jemalloc flag I issue a command similar to the following rvm_install 2.6 and wait. Below is a copy of the function I created, I know I should probably issue some validation to verify the value of the argument, but I’m the only one using this on my computer and it’s works wonders for what I need it for.

# Install the specified Ruby version through RVM, with the jemalloc library included
function rvm_install
  # verify a version number was specified
  if count $argv > /dev/null
    echo "Installing Ruby-$argv with jemalloc"
    rvm install $argv -C --with-jemalloc
  else
    echo "Please specify a version to install."
  end
end

Now lets take a look at a few Ruby versions to test how well they perform with and without jemalloc. Sam Saffron’s stress test is a great way to compare performance gains and memory allocation. And let me reiterate that I didn’t just run this test a single time and compare the results. These use the averages after running each stress test several times.

# Results for Ruby_v2.5.0p0
Duration: 9.542045
RSS: 137860

# Ruby_v2.5.0p0 with jemmalloc
Duration: 7.420393
RSS: 129616

Faster w/Jemalloc:
0.222347725251767 ==> 22% faster


# Ruby_v2.6.0-preview2 with jemalloc
Duration: 6.743956
RSS: 144108

# Faster than 2.5
0.09115918792980371 ==> 9% faster

As you can see, just adding jemalloc to MRI ruby adds quite a noticeable performance gain. And even when not using a different memory allocator, each new Ruby release has had quite a significant impact on building on the languages potential.

~ NOTE: These tests are performed on a MBP with OSx 10.12, a 2.2ghz i7, 16GB of RAM, and a SSD. So the results may vary from device to device.

MikroTik - Mobile Rate-Limiting

2018-05-05T00:00:00+00:00

A Brief Intro

I’ve been using Mikrotik routers for about a year now and I’ve had nothing but an amazing experience thus far. I haven’t taken any of the certification training courses (MTCNA, MTCRE, MTCWE) for the Mikrotik routers and thus far all my learning has been hands more of a hands on experience and following the RouterOS documentation.

The main reason I got into was Mikrotik routers, was when I started at my current position a little over a year go I was expected to “make some magic happen”. You see I work a nonprofit that occupies several buildings all interconnected, with the internet piped through a fiber line. I knew there was issues after my first couple of days, because they were using a consumer grade NetGear (NightHawk x10) router to try and support everything. There’s generally anywhere between 30-70 people in the building at any time; and considering everyone has a laptop or 2, a VoIP Phone, a cellphone, some have PC’s, and some people even have various smart devices connected as well. This was by far, way to many devices for the basic hone grade router to manage. And the network congestion was terrible; calls were always dropping, active IP addresses were being reassigned, your Upload and Download rates were absolutely terrible.

I has never seen anything like it and I was definitely out of my scope of knowledge. I’m a web developer, not a networking administrator. But this issue with working at a nonprofit if options are limited and as the IT guy I’m expected to solve a large amount of issues at any given time. To help resolve this issue I contacted an acquaintance of mine manages the networking infrastructure of a chain of convenience stores, rest stores, and office buildings within the area.

He recommended I use MikroTik routers because their cheap, highly efficient, easy to learn, and you don’t have to pay any crazy licensing fees. So my first introduction to the router was jumping straight into using out of the CCR1016-12G Cloud-Core Router. And I’ve have had nothing but excellent results with it, thus far.

Now To the Issue

Across the facility here are is a large number of devices assigned static IP addresses; ie. printers, VoIP phones, and several special purpose devices. Now we always have people coming from all over for meetings, training, consultation, etc. So in any one day you may have a few hundred devices connect to the network. And having a huge wireless network with a plethora of devices always connecting to the network can be a nightmare. To ensure a quality network experience, I reduced DHCP lifespans to 10 minutes this removes devices from the DHCP table after a short amount of time. I also setup a rollover subnet, so when the basic subnet chain was full it starts assigning IP addresses to a secondary subnet. I also have a default rate-limit (Queues) for when a new device connects to the network.

Now one of the biggest issues we have is people using a ton of bandwidth on social media. Part of the problem is; we have several people who do marketing, advertising, and outreach across various social media mediums including but not limited to facebook, twitter, youtube, and few others. But we all known, people like to stream videos, baseball games, concerts, and tons of high bandwidth streams with their phones when visiting these sites.

To counter this, I decided it’d be a good idea to specifically rate-limit/set Queue speeds for mobile devices. This isn’t a fool proof method, but it does tend to catch about 99% of all mobile devices that connect to the network. It compares the devices $hostname to a regular expression list of mobile device manufacturers. To setup up the regular expression you goto: IP>Firewall> [Tab] layer7 Protocols, you’ll than create a new Firewall L7 protocol and label it mobileDevices with the following regex.

^.*(android|ANDROID|AppleWatch|BLACKBERRY|Galaxy|HTC|Huaweu|iPhone|iPhne|Moto|SAMSUNG|Xperia).*$

You’ll now create a scheduler by going to System>Scheduler and clicking the blue plus button. I went conservatively assigned it to loop through every 5 minutes, if you’re in a pretty busy office I’d say you may even want to do 2 minute intervals. I’d also say part of this script is unnecessary, but I decided to reassign the queue limits to non mobile devices just for secondary measures. And the second loop in the script is because I have various VIP devices that need higher bandwidth limits than the rest of the network. So they are specifically assigned static IP address with their queue limits. Like I said, about half this script isn’t necessary, but I implemented it in just to take precautions.

/queue simple remove [/queue simple find]
:global layer7 [/ ip firewall layer7-protocol find name="mobileDevices"]
:global mobileDevices [/ ip firewall layer7-protocol get $layer7 regexp]
:global mobileLimit "1024k/1024k"
:global pcLimit "3M/5M"
// if the specified device's IP address is being assigned with DHCP
:foreach i in=[/ip dhcp-server lease find dynamic=yes] do={
  :local ipAddr [/ip dhcp-server lease get $i address];
  :local hostname [/ip dhcp-server lease get $i host-name];
  :local macAddress [/ip dhcp-server lease get $i mac-address]
  :local queueName "Client - $macAddress"

  :if ($hostname ~ $mobileDevices= true) do={
    // if the device has been found to be a mobile device, reduce it's bandwidth - $mobileLimit
    /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$mobileLimit"
  } else={
    // otherwise set the devices bandwidth limits to the default bandwidth limits - $pcLimit
    /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$pcLimit"
  }
}

// If device is connected with a static IP address or not using DHCP to assign it's IP
:foreach i in=[/ip dhcp-server lease find dynamic=no] do={
  :local ipAddr [/ip dhcp-server lease get $i address];
  :local hostname [/ip dhcp-server lease get $i host-name];
  :local macAddress [/ip dhcp-server lease get $i mac-address]
  :local queueName "Client - $macAddress"
  :local vipLimit "10M/10M"

  // hostnames for VIP devices in which to have a high bandwidth limit - $vipLimit
  :if ($hostname = "VIPdesktops" || $hostname = "VIPlaptops" || $hostname = "VIPdevice") do={
    /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$vipLimit"
  } else={
    // otherwise set the devices bandwidth limits to the default bandwidth limits - $pcLimit
    /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$pcLimit"
  }
}

This isn’t a fool proof method, but it does catch the vast majority of mobile devices. This is because by default their devices are labeled with their specific manufacturer as part of the device name. And for the most rarely does anyone ever rename their devices hostname. While watching the network traffic, I can say I’ve only saw a handful of mobile devices that haven’t been labeled with either Samsung, Iphone, or Galaxy.

~ Note: This may not be the best or the most effective script for what I wanted to achieve, but it accomplished what I needed to do. And it’s been tested and proven to work effectively for exactly what I needed.

UPDATED: 5/31/18

After updating our Routers’ packages and routerboad to v6.42.3, I begun having issues with the script showing above. So I removed the Layer7 and revamped the script to use it hostname matches through a variable string. The new revision appears to run a bit faster than the previous version and in my opinion, it seems to be a bit easier to read.

/queue simple remove [/queue simple find]
:global mobileDevices "android|ANDROID|AppleWatch|BLACKBERRY|Galaxy|HTC|Huawei|iPad|iPhone|iphone|iPhne|Moto|SAMSUNG|Unknown|Xperia"
:global mobileLimit "1024k/1024k"
:global pcLimit "2M/5M"

:global vipDevices "VIPdesktops|VIPlaptops|VIPdevice|VIPservers|MikroTik|CapAC"
:global vipLimit "10M/10M"

/ip dhcp-server lease {
  :foreach i in=[find (dynamic && status="bound")] do={
    :local activeAddress [get $i active-address]
    :local activeMacAddress [get $i active-mac-address]
    :local macAddress [get $i mac-address]
    :local hostname [get $i host-name]
    :local ipAddr [get $i address]
    :local queueName "Client - $macAddress"

    :if ($hostname ~ $mobileDevices= true) do={
      /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$mobileLimit"
    } else={
      /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$pcLimit"
    }
  }

  :foreach i in=[find (!dynamic && status="bound")] do={
    :local activeAddress [get $i active-address]
    :local activeMacAddress [get $i active-mac-address]
    :local macAddress [get $i mac-address]
    :local hostname [get $i host-name]
    :local ipAddr [get $i address]
    :local queueName "Client - $macAddress"

   :if ($hostname ~ $vipDevices= true) do={
     /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$vipLimit"
   } else={
     /queue simple add name="$queueName" comment="$hostname" target="$ipAddr" max-limit="$pcLimit"
   }
  }
}

2017 - Was a Year of Change

2018-03-01T00:00:00+00:00

Let me start off by saying, it’s been one heck of a year and so much has changed in my life, it’s almost unrecognizable. To start, I had a few career changes; I went from doing IT support for a car rental agency to being laid off. Being laid off was quite an uncomfortable experience, because I have kids and a family to support. While waiting to get hired I used this downtime as a learning experience and worked on various projects. At least, until I got some contract work. At first, it was a dream come true, working as a developer, getting paid for what I loved. There were several very smart developers in which I was working with and learning from.

That was until the owner slowly started cutting everyone’s checks back and saying “Oh sorry I’ll add it onto your next check.” And then after a few missed and partial payments, he completely disappeared. Or at least stopped all contact with my team. One-by-one, we were all removed from the various SasS services in which we had used. We were removed from the GitHub repos, time-trackers, invoicing, etc. Luckily I still have a local copy of the git repo, if I ever have to prove any of my work.

From here I was kind of desperate for work and took the first thing that came along. For several months, I actually worked for a local locksmith and it was quite the learning experience. Don’t get my wrong; I respect the hell out of any locksmith out there it just wasn’t for me.

But I lucked out and for the past year or, so I’ve been the Web Developer/IT Coordinator for a pretty sizable Non-Profit within my area. For the most part of really enjoy my job, but like any job some days it can be quite trying. This is because some days I’m doing data entry and adding info to one of the databases or helping an older lady re-sync her email, or do a sum formula in excel. And other days I’m mounting a new rack router, patch panel, and running Cat6 through one of our several buildings. And sometimes I’m working on one of the several websites our organization operates. I’m always writing code to assist with certain tasks, it may go unnoticed. But I think of it as; if it gets noticed then I did something wrong.

One of my biggest accomplishments thus far has to be the upgrade of the organizations networking. First of all, I’m not a networking/telecommunications expert, but I know when your not using the right equipment. When I first started they had networking equipment that would have been great for a home based setup. But when you have 70+ users at any given time with multiple devices hooked up to the network. Each person has the following VoIP Phone, Laptop, and usually their cellphones as well. This just don’t work well with a low quality home router. Their network was always congested, phone calls were choppy, their DHCP server was only allowing a range of 70 devices so every few minutes several peoples phones, laptops, or other devices were getting kicked from the network.

I came in and pulled out several of the little crappy routers and added a few centralized switches. And hooked everything into a MikroTik Cloud-Core Router. I’ve spent the last several months learning the ins, outs, and little gotchas of this machine (and far from being an expert). At first, it was extremely intimidating, but the more I play with it; the more I’ve come to love and appreciate it. I’m absolutely locked on using MikroTik routers now, they’re great for home, business, or whatever needs you may require. Heck they even have some amazing Hotspots you can manage through the routers as well. One of my other favorite things about the MikroTik Routers’, is unlike your regular home routers you can build scripts, schedules, or filters to do whatever you absolutely want with your device.

My accomplishments aren’t limited to just networking, since working at my current position I’ve helped build several websites. Most of them are for various causes in which we are associated with and others are to promote our various programs, or generate revenue for the organization. We’ve got an Outreach/Media guy and a Public Relations associate. But seeing as how I’m really the only one who does any web development/programming everything that’s been put on my plate, has been a pick-and-choose opportunity. Some projects are exciting are quite exiting to work on. But occasionally I’ve had more than a full plate and had to pass on them.

One of the great things about not being limited to a specific technology by the organization is it has allowed me to pickup and learn various other technologies. Over the last several months I’ve picked up and learned VueJS, which is an absolutely amazing JavaScript framework. I’ve never been a heavy JavaScript user, more like I had been disgusted with the state of JavaScript. jQuery made the world a mess and old snippet sites such as Dynamic Drive are out of touch with modern needs. But ES6 combined with React, Angular, Vue, and Ember have completely changed the dynamic of modern web applications. With the introduction to VueJS my love for frontend development has returned and I’ve been jumping at various projects to expand on my knowledge, experience, and to try new and exciting things.

Mapping an ArcGIS Parcel dataset as a Mapbox tileset

2016-12-15T00:00:00+00:00

Before I get begin on how I mapped out several large datasets for SanJuanMaps, I would like to say that there was plenty of inspiration behind my actions. I was especially inspired by Justin Palmer’s - The Age of a City. Where he tastefully mapped out all the building ages of Portland, Oregon. But what sets my county maps apart from various other is that I didn’t just map out buildings, people, or objects. I mapped out several sets of data including: building age, building zoning types, current active well sites, and recent crimes.

Loading the Parcel Map

Depending on your city, state, or county it may be a struggle struggle to get a hold of your local GIS information. But for me, I was able to find my counties GIS information for all buildings (not located on federal and/or local reservation land). It took jumping through a few hoops in order to make the data loadable and usable for processing.

To begin San Juan County currently allows you to download a 240MB zip file, thats supposed to contain all GIS information through out the area. This zip file contains numerous files, but the most important one is a dBase file *.dbf. This database table is linked to several ArcGis index files *.atx, meaning that each database column has a unique atx file (phys_address, ownerName, ect). Now that we have access to the GIS dataset, there are a few ways we can access the data in which in contains. Now let me ask; who wants to pay hundreds of dollars to access a GIS file that’ll we’ll be only using a handful of times? Don’t get me wrong, ArcGIS is one hell of a product and if you plan on working with GIS information often or with large datasets it’s worth it. But since this is a one in a blue moon type thing for me I’ll stick with using QGIS, which is an Open-Source GIS projection application. In order to get the application running there are several components that we’ll require, but in the long run it’s well worth.

Now lets= begin with starting up QGIS and opening up the .dbx file as stated before. Since this is a pretty significant file with a ton of datapoints and components, it will take a few moments for everything to load. Once the parcel project loads, you may be faced with a very intricate map that looks similar to CAD wireframe. Your on the right path, lets slow down for a moment. Now an easy way to think of it, is to consider these map components as vector points similar to using the Pen Tool to build shapes in Adobe Illustrator.

Except this GIS map doesn’t just include points to create a layout of buildings (these is quite a bit of meta-data included with each component on the map). If you zoom in and switch to the Identify Features/vector information tool and click individual plots or buildings you’ll notice it allows you to view information for each parcel. Which consists of tons of information used by the county to identify the area (PARCELNO, GrossAcres, PhysAddr, ACCTTYPE, etc.). Now this doesn’t seem like it’ll be anything important, but when we convert the dBase table to a PostGIS table each one of these attributes will be used as a column to identify each and every building throughout the county.

Understanding Coordinate Reference Systems and Projections

~ Note: This section is not require to read, but helps with understanding why we need to perform a CRS conversion.

Before we get started on converting the parcel dataset to a database table, lets talk about Coordinate Reference Systems (CRS). Wait, why aren’t we just using latitude/longitude? With different maps, we use different coordinate systems. Geographic Coordinate Systems use latitude/longitude and Projected Coordinate Systems use points (X and Y) that originate at a specified lat/long. Think of Projected Coordinate Systems as a window frame, it’s got its one size, area, and dimensions. But no matter how you look at it, all these glass panes end up going together and making one big window.

Lets try to look at why there have been several methods developed to map the Earth’s surface (Mercator, Flamsteed-Sinusoidal, Equal area, Equidistant, Albers, Lambert, and many more). Now I know what your thinking, “What are all these different projections for? And why would I care?” Throughout the years cartographers have experimented with creating projections in which they thought was best for mapping out land, water, cities, and the various features of our planet. And many of them were quite accurate for their time, while others have been slightly narrow minded at depicting the earth.

As we can see, it’s no easy task depicting a spherical planet and all of it’s features as a flat surface. This topographical projection types are called an ellipsoid. No matter what sort of map is used, they have always been an important asset for traveling, freight, military operations, and space travel. Some of these projections are better at depicting land forms, street planning, distances, or scale and accuracy but each one had its place and time. Now the point is, it doesn’t matter if your using degrees (25 is degrees, 35 is minutes, and 22.3 is seconds), meters, decimal degrees (25.58952777777778), or whatever plot point method you should always going to end up at approximately the same place.

So what is a datum? In my opinion the easiest way to explain a datum is by stating it is a standard or method for mapping geographic coordinates to a projection. A datum can consist of various datasets or an individual GIS vector, but generally consists of a large set of data points to be mapped out (roads, buildings, elevation changes, land formations, water, etc).

Mapping with GIS can be a really complicated task, but with the right tools it can make things quite a bit easier.

Let’s convert it to usable data

The ArcGIS data in which we obtained through the county is geo-formatted using the GRS:80 reference system. QGIS will display this and various other attributes when loading up the vector map projection. When looking at the data you should see something similar to the following attributes upon loading up the map.

USER:1000001 (* Generated CRS (+proj=tmerc +lat_0=31 +lon_0=-107.8333333333333 +k=0.9999166666666667 +x_0=829999.9999999998 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs))

We are now very much on our way to having a data projection of usable data for mapping out the county. The only issue we are going to have at this point is the majority of online mapping tools use the EPSG:4326/WGS84 standard for projecting data. This can easy be achieved by reprojecting the layer with WGS84 coordinates. Go To: Menu -> Processing -> Toolbox -> QGIS geoalgorithms -> Vector General Tools -> Reproject Layer. Another method is if you have the Toolbox side already open you can just search for the tool Reproject Layer

You than set the “Target CRS” as EPSG:4326/WGS84 in the dropdown menu, but we warned depending on your computer this may take a while. Because some of the structures have numerous repetitive points, I had to reduce the point precision in order to keep my computer from freezing up when I would reproject the layer. This can be be found in the toolbox under Grass -> Vector -> v.clean. Be warned if you do need to use this feature, I suggest that you use it sparingly because it can and will readjust the boundaries of various structures.

Now lets save the data as a WTK CSV file, to make it easier to load all the counties properties and their attributes into a database. To do this, rather than saving the project you right-click the Layer and Save-As. From here make sure you need to change the CRS to Default CRS (EPSG:4326 - WGS 84) and that the format is CSV. The next important thing is to force the Geomerty to be expost as AS_WKT and the CSV seperator is COMMA. Now saving may take a while, because there’s quite a few properties that need to have its properties converted to csv.

Many people may not realize it, but CSV files aren’t just large datafiles that you can open up in excel. If you think about, it’s pretty much a plain text database file. It’s a pretty efficient way for accessing data but it can be bulking and slow depending on your editor. With 81000 or so rows, it locks up and freezes Atom and it even slows down Sublime significantly, with 16gb of RAM. But we’re not going to be editing them in an IDE, we’ll be loading them in Postgres. This because Postgres supports GIS data structures and is very efficient at processing large amounts of data.

Before we import the data into Postgres, we need to insure we have proper table structure. And at the moment, one of my favorite tools for building database into out of CSV files by using csvkit. If you’ve never used it, it lets you print column names, convert CSV to JSON, reorder columns, import to SQL, and much more.

Now normally we’d just attempt to build the database scheme with the following command.

csvsql -i postgresql Buildings.csv

But since there are so many rows, the python script will probably freeze up or start consuming a HUGE amount of memory. So in order to make it more efficient, we’ll get the first 20 rows of data. And using command line piping we’ll send the data to the csv tool for processing.

head -n 20 Buildings.csv | csvsql --no-constraints --table buildings

Depending on the information available, your SQL output should be something similar to the following. Several of these columns we’ll never even use, but for now it’s better to not chance corrupting any of our fields. Before you go any farther be sure to change the geometry field names to geom and if CSVKIT labels geom as a VARCHAR for the geometry property type.

CREATE TABLE "buildings" (
  geom geometry,
  ogc_fid INTEGER,
  parcelno BIGINT,
  accountno VARCHAR,
  book INTEGER,
  page INTEGER,
  grossacres FLOAT,
  accttype VARCHAR,
  ownername VARCHAR,
  ownername2 VARCHAR,
  owneraddr VARCHAR,
  ownctystzp VARCHAR,
  subdivis VARCHAR,
  legaldesc VARCHAR,
  legaldesc2 VARCHAR,
  weblink VARCHAR,
  physaddr VARCHAR,
  physcity VARCHAR,
  physzip VARCHAR
);

NOTE: Depending on the GIS software you used, I had various issues with the CSV file that I needed to correct before importing it. The issues that we’ll encounter involve the geometry field. Since CSV fields are separated by commas Postgres tries to import the polygons coordinates as separate columns as well.

POLYGON ((-108.195455993822 36.979941009114)),1,2075189351198,R0051867,,,57.06,EXEMPT,UNITED STATES OF AMERICA US DEPT OF INTE,,6251 COLLEGE BLVD STE A,"FARMINGTON, NM 87402",,A TRACT OF LAND IN THE  SESW AND NESW AND NWSE OF 153213 DESCRIBED ON PERM CARD  BK.1172 PG.996,,http://propery.url,NM 170,LA_PLATA,

In order to fix this, we need to quote the geometry column in order to make it a field of it’s own. With Sublime’s find/replace REGEX functionality this would be very straight forward step. In order to find the rows with the issue, I used the following ^(?!"POLYGON). Out of 81000 or so columns, there were only a few hundred rows with this issue that needed to be changed.

"POLYGON ((-108.195455993822 36.979941009114,-108.195717330387 36.987213809214,-108.19557299488 36.987214765187))",1,2075189351198,R0051867,,,57.06,EXEMPT,UNITED STATES OF AMERICA US DEPT OF INTE,,6251 COLLEGE BLVD STE A,"FARMINGTON, NM 87402",,A TRACT OF LAND IN THE  SESW AND NESW AND NWSE OF 153213 DESCRIBED ON PERM CARD  BK.1172 PG.996,,http://propery.url,NM 170,LA_PLATA,

Importing the data into Postgres

Now before we import the dataset, lets enable the PostGIS extension so the database can process the buildings vectors properly.

CREATE EXTENSION postgis;

Now depending on how your managing your database, you either import the CSV file through pgAdmin or through the command line. I chose the the command line, because of its speed and convenience.

copy buildings FROM '/Users/Tarellel/Desktop/SJC_GIS/Exported/Buildings.csv' DELIMITER ',' CSV HEADER;

Compared to loading data in IDE’s and/or Excel, Postgres is extremely fast at accessing, modifying, and deleting rows, columns, and fields. Depending on what data you plan on mapping you may not need to make any additional changes. I’m going to be mapping out all building ages in the county, so I needed to add the built_in column to the table, for the year in which the structure was built.

ALTER TABLE buildings ADD built_in integer;

If you’re like me, you probably already started doing queries to verify the integrity of the information imported. Something you may notice is that the structures geometry field no longer looks like "POLYGON ((-108.195455993822 36.979941009114,-108.195717330387 36.987213809214,-108.19557299488 36.987214765187))". That is because PostGIS stores the locations as a binary specification, but when queried the information is viewed as a hex-encoded string. This makes it easier for PostGIS to store and project the data as various data projection formats.

SELECT geom FROM buildings LIMIT 1;
--------------------
01030000000100000005000000F9A232DD00065BC034E0D442D7594240F0567E50FB055BC094B9B760D55942409E97C29CFE055BC0CAC4E0D8BB594240080B801404065BC0440327CEBD594240F9A232DD00065BC034E0D442D7594240
(1 row)
--------------------

Lets fetch the build/properties initial build date

In order to get each and every buildings initial build date, I ended up using nokogiri to scan the county assessors property listings for their initial build dates. Don’t get me wrong, I tried to look for an easy way to get the information. But they have no publicly accessible API for pulling data requests and never received a response from anyone I attempted to contact. So, I used the next best thing, farmed the counties web site for property listing information.

For warning, This script was build to be quick and simple solution, rather than being well formatted and following best practices.

require 'pg'
require 'open-uri'
require 'nokogiri'
require 'restclient'

# Fetch current buildings across the County and determine their build date
class FetchYear
  @conn = nil
  ###
  # Setup initial required variables, for processing all properties
  # ----------
  # A connection to the database is required for request and updates
  def initialize
    @conn = PG.connect(dbname: 'SJC_Properties')
    @base_link = 'http://county_assessor.url'
  end

  ###
  # Fetches the next X(number) of buildings
  # all selects require that built_in year field to be empty, to prevent an infinite loop
  def fetch_buildings
    @conn.exec('SELECT * FROM buildings WHERE weblink IS NOT NULL AND built_in IS NULL AND built_in != 0 LIMIT 50') do |results|
      # if any results are found, begin processing them to get the properties link/year build
      if results.any?
        results.each do |result|
          # for some reason, some buildings are empty on most fields
          # if no weblink is supplied, skip it
          # ie: "parcelno"=>"2063170454509"
          next if result['weblink'].nil?
          link, built_in = ''
          link = fetch_weblink(result)
          link ||= ''

          # Attempt to fetch the properties build_date
          # Note: some properties are EXEMPT and/or Vacacent
          # so no built_date will be specified
          built_in = (!link.empty? ? fetch_yearbuilt(link) : 0)
          update_build_year(result, built_in)
        end
      else
        puts 'It appears all database rows have been processed'
        abort
      end
    end
  end

  private
  # Private: Fetches the properties unique information URI from the requested
  # SJC assessors page
  # ----------------------------------------
  # building - all attributes for the current building/residence being processed
  # Returns: Returns the properties unique information UIR
  def fetch_weblink(building)
    link = []
    # If it processes successfully attempt to load the Redidental/Commercial page
    # Other output the error
    begin
      # Page required the visitor be logged in as a guest, with a GET form request
      page = Nokogiri::HTML(
                RestClient.get(
                  building['weblink'],
                  cookies: { isLoggedInAsPublic: 'true' }
                )
              )

      # Each property contains a unique URL similar to:
      # http://county_assessor.url/account.jsp?accountNum=R0070358&doc=R0070358.COMMERCIAL1.1456528185534
      # Winthin the left sidebar there is a link table that contains the accttype
      # and based on the accttype/property-type each link will be different
      case(building['accttype'])
      # Select the property when it is a class of Residential
      when 'RESIDENTIAL', 'MULTI_FAMILY', 'RES_MIX'
        link = page.css('#left').xpath('//a[contains(text(), "Residential")]').css('a[href]')

      # Process Commercial based Properties
      when 'COMMERCIAL', 'COMM_MIX', 'MH_ON_PERM', 'PARTIAL_EXEMPT'
        link = page.css('#left').xpath('//a[contains(text(), "Commercial/Ag")]').css('a[href]')

      # Some Vacant and EXEMPT properties only have land listed with no building data available
      # When this is the case or default to no type, return a 0.
      # Because no building info is currently available
      when 'EXEMPT', 'VACANT_LAND', 'AGRICULTURAL'
      else
        return ''
      end

      # Some properties have several links, pull the first/original building info
      first_link(link) if !link.empty?

    rescue OpenURI::HTTPError => e
      if e.message == '404 Not Found'
        puts 'Page not found'
      else
        puts 'Error loading this page'
      end
    end
  end

  def fetch_yearbuilt(link='')
    return '' if link.empty?
    begin
      # Load the URL for the buildings summary
      summary = Nokogiri::HTML(
                  RestClient.get(
                    link,
                    cookies: { isLoggedInAsPublic: 'true' }
                  )
                )
      yearbuilt = summary.css('tr').xpath('//span[contains(text(), "Actual Year Built")]').first.parent.search('span').last.content

      # year must be stripped and turned into an into an integer
      # because it trails with an invisible space '&nbsp;'
      yearbuilt.strip.to_i
    rescue OpenURI::HTTPError => e
      puts '--------------------'
      puts 'Error loading property summary page'
      puts "URL: #{link}"
      puts '--------------------'
    end
  end

  ###
  # Process the current link:
  # ----------
  # some properties have their attctype listed several times causing errors when processing
  # this is because of upgrades or additions added to the current property
  # Returns the first link for the property array
  def first_link(link)
    link.length <= 1 ? @base_link + link.attribute('href').to_s : @base_link + link[0].attribute('href').to_s
  end

  ###
  # Update the database record with the year it was built
  # ----------
  # record (hash): The current record/addr in which to update
  # year (int): This will be the year/value that needs to be updated in the record
  def update_build_year(record, year='0')
    puts "#{record['accountno']} >/ #{record['physaddr']} - #{year}"
    @conn.exec("UPDATE buildings SET built_in=#{year} WHERE accountno='#{record['accountno']}' AND physaddr='#{record['physaddr']}'")
  end
end

fetch = FetchYear.new
500.times do |x|
  fetch.fetch_buildings
end

I ran this script over night and came back with a fully populated database and ready for to be fully utilized. Depending on the server load, each request usually took a few seconds for each building/property. But one thing that really surprised me, was the counties server had no rate-limiting setup (at least non that I ever experiences). So I was getting results as fast as my script could run and connect. I’m sure the next morning it might of looked like a small DDOS attack by a script kiddy with thousands of page loads from a single source. But I tried to minimize the effect by doing all my data retrieval in the middle of the night, when it would have as little effect as possible.

How the Script Works

Now, before you start scratching your head thinking “what the hell?” let me explain. Each and every property has a unique weblink associated with the count assessors office. When you access the page a cookie isLoggedInAsPublic is set to true, using a get request. I believe this is used in order to prevent general web scraping, because if the cookie isn’t set when the page is loaded redirected to a user login.

I know it looks like a mess and over complicated by let me explain a few things. But some of the properties are owned by the same owner so we can’t exactly relay on the accountno to update properties. And we can’t use physaddr because some properties have more than one building on them, so we rely on updating several properties.

And what about the EXEMPT and Vacant properties? Well the majority of them consists of Churches, Post Offices, Government Buildings, the Airport, and unclassified BLM land. And yes there are quite a few in the generate area. This is the surrounding area is heavily funded by the government, we’re also a city bordering the edge of several reservations.

Lets convert it to a GeoJson file for vectoring

For those of you who have never used used it, Postgres has amazing support for exporting data as json. Some people seem to be unaware that you can resolve to exporting your database queries in different data formats. And some databases (such as Postgres) also allow you to export your database query returns to various file formats.

I can assure you, I didn’t get this right on the first even second try. I’ll honestly admit it took me a few hours to build a query that use the columns I required and would export as a valid GeoJson format. I’d say I probably rebuild the query maybe 20 times, each one being completely different each and everyone. But in the end, I ended up with a swatch of subqueries and assignments that ended exporting the data extremely fast and well structured. As you can see below, the query may look like a complicated mess. The funny thing is, it’s a heck of a lot simpler than some of the other query selections I ended up trying to come up with.

COPY (
  SELECT
    row_to_json(collection)
  FROM(
    SELECT
      'FeatureCollection' AS type,
      array_to_json(array_agg(feature)) AS features
    FROM(
      SELECT
        'Feature' AS type,
        row_to_json(
          (SELECT l FROM
            (SELECT initcap(accttype) AS type, built_in) AS l)
        ) AS properties,
        ST_AsGeoJSON(lg.geom)::json AS geometry
      FROM
        buildings AS lg
      WHERE
        geom IS NOT NULL
      ) AS feature
  ) AS collection
) TO '/Users/Tarellel/Desktop/buildings.json';

Just a heads up you may be expecting a nice and beautiful json structure similar to the following.

Example from geojson-spec data

{ "type": "FeatureCollection",
    "features": [
      { "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [102.0, 0.5]},
        "properties": {"prop0": "value0"}
        },
      { "type": "Feature",
        "geometry": {
          "type": "LineString",
          "coordinates": [
            [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
            ]
          },
        "properties": {
          "prop0": "value0",
          "prop1": 0.0
          }
        },
      { "type": "Feature",
         "geometry": {
           "type": "Polygon",
           "coordinates": [
             [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
               [100.0, 1.0], [100.0, 0.0] ]
             ]
         },
         "properties": {
           "prop0": "value0",
           "prop1": {"this": "that"}
           }
         }
       ]
     }

The query I used ends exporting the data more like a crunched dataset. Which is great because it dramatically reduces filesize and is a must when optimizing for loading on webpages. Plus no matter how you use it, it’s not like whatever processor your using is really going to worry about how beautiful your file is, as long as the data is valid.

{"type":"FeatureCollection","features":[{"type":"Feature","geometry":{"type":"Point","coordinates":[102,0.5]},"properties":{"prop0":"value0"}},{"type":"Feature","geometry":{"type":"LineString","coordinates":[[102,0],[103,1],[104,0],[105,1]]},"properties":{"prop0":"value0","prop1":0}},{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[100,0],[101,0],[101,1],[100,1],[100,0]]]},"properties":{"prop0":"value0","prop1":{"this":"that"}}}]}

Now that you’ve exported the dataset as a GeoJson file, it’s all downhill from here. You’ll just need to use the Mapbox upload API to upload the file. From here you can edit and style the data and vector points the way in which you see fit.

References

Coordinate Reference Systems
Round Earth, Flat Maps
GIS : Coordinate Converters
- Converting Degrees Minutes Seconds values to Decimal Degree values
- PGC Coordinate Converter
GeoForms
- Web-Friendly Geo formats
Postgres / PostGIS
- Getting Started With PostGIS

Enhancing SSH Security

2016-11-11T00:00:00+00:00

With the whole security/privacy revolution rolling throughout the internet, it has recently come to my attention that specific services are being heavily focused on, while others are completely neglected. When securing your server and it’s services you need to attempt to secure the whole stack rather than a few specific services. For example lets take a look at web servers, they’re full of new ideas and technology, innovative, and always changing. And recently the world was recently introduced to Lets Encrypt which makes them and your data a magnitude of times more secure using HTTPS (when properly configured).

Another very important service that I’d like you to really think about is SSH. It’s another service that we use for tons of uses, but you don’t think “Does it need secured”, because everyone automatically seems to assume that it’s hardened by default. But in my own words, I’d say “It’s easy to use by default, but not necessarily ready for use”.

I’m going to assume you’ve already hardened your SSH config with the basic settings (disable root login, AllowUsers, AllowGroups, MaxRetries, Fail2ban, LoginGraceTime, etc.) There’s tons of variable configurations to setup, one place I would suggest starting at would be Securing and Optimizing Linux to get a basic configuration setup.

Now lets get to the main topic of this post, enhanced security. Lately, I’ve seen quite a bit of talk suggesting everyone disable password authentication and only using key and/or certificate encryptions to secure SSH connections. I mean, really how probable is it that someone will be able to generate your exact ssh-key similar to the one below?

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4MCqOxhfmNP/uv8sl5EYSIqQSGuV4v17B50xMWXMcwTJrriOi9W6nNfxF8wu/i2HB1/nUUuSu+ZxQdYaD2cRkelzSGcq191z+b8lNY2lz+bxB547H465U5EQPlxJ5w7WU6QOV1hrZ7quWh/GYrDnU1aZrhEQ++EV5chQIUxoP3YgBSSb8D5Bpns9gR0IZVtlEqhF8eyCypZSiyKumQxK8e/W8Y8iHWCtRfvZbh+bnemCkHrXI/xc+CuCY9TQmWZkwFfTRBJQo3pmoRSZAZpqwYSl1kySrasw771rfy2rowFiCogkBYu2W9FTR2kMwB4btBrpA4Af97AjxwzkHyXUt [email protected]

Well, lets take a look how vulnerable your keys actually are. If you’re like the vast majority of developers just using a id_rsa.pub or id_dsa.pub key for the vast majority of your logins and connections, than you may already be an easy target. You may be asking, how and why? Well let’s take a look at some keys people use with github. If for example you use Github, or a number of services your public key is already out in the open. This is mainly for system wide user verification purposes. For example lets take a moment to look at user GrahamCampbell’s publickey, now is his key is available what about everyone else’s. If you still don’t trust me try it out with your own profile/username https://github.com/username.keys. Now tell me, how many services do you actually user ur id_rsa key with? If your using this one key for DigitalOcean, GitHub, Google Cloud, AWS, your companies servers, your local NAS, vagrant/Docker and who knows what else, than your services are ready for the taking. Now the simple questions is: How can we fix this? Well, to begin we can begin by using the ssh_config file ‘~/.ssh/config’. This allows us to assign the appropriate keys to be used when trying to access specific services/addresses. This is part of the security method known as “security by obscurity”, you reduce the amount of parameters an attacker can access by making things more complicated. If someone managers to get ahold of one of your keys, they won’t have access to every single web service you use.

An example of how of how you can specify ssh to connect to specific services:

Host github.com
  user sample_user
  HostName github.com
  IdentityFile ~/.ssh/github_rsa
  IdentitiesOnly yes

Host digitalocean.com
  HostName digitalocean.com
  IdentityFile ~/.ssh/digitalocean_key
  IdentitiesOnly yes

And the list goes on...

Another step in order to make things a bit more complicated is to add a password to your SSH-keys.

ssh-keygen -t rsa -q -N 'sample_password' -f ~/.ssh/sample_id -C "[email protected]"

You probably know by now, that by adding a password to your keys can be a pain in the butt. Because every time you connect to a service or do a push request, etc. it’ll ask you for that crazy password you used XL7pa5wnV/nQgUqi5mf7oQ6uG0hk5NwGh+5OYU+Mu6. Well you can solve this by using a service such as keychain and/or using your ssh-agent with the ssh-add command. If configured properly, they require to only input the password once person login session. So in other words, you won’t have to re-input your crazy password until your next reboot.

Newer Key types

Another very important thing to look as is your key encryption method if your still using id_dsa and/or id_rsa, please update to using Ed25519 immediately. I admit, some services like GitHub and DigitalOcean may have issues when with this encryption type, but if your connecting to ssh I’d highly suggest it. I’m no Cryptologist and I can’t tell you how and why a shorter key is more secure, other than it uses Elliptic curve cryptography methods. But at the moment ED25519 is the recommended standard and as we know the higher the bit length and more rounds we encrypt the key, the more secure it’ll be.

ssh-keygen -o -a 100 -b 4096 -t ed25519 -q -f ~/.ssh/dev_key -N 'sample_password123' -C 'sample_user'

# ~/.ssh/dev_key
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBCz0L+cnm3RSHawNK/h7hkCs7ZQIeeAyjKs4S+tHnPF sample_user

Duel Method Authentication

Another enhancement you may wish to add to your SSH connections is requiring two forms of authentication. This a technique in which you can also require a two methods of encryption in order to connect to the server (publickey and password). In other words if a hacker “somehow” manages to get ahold of your publickey or your password, you’re server won’t be completely vulnerable to be victimized. Now let me warn you, this may cause issues with Capistrano, Ansible, Vagrant, or various other services. Because the majority of them attempt to authenticate with the server by password or by publickey and not both. But if that’s not that case and you’re really wanting to secure your SSH connections, this can be achieved with ease. This is because we’ll be using PAM which should be fairly straight forward because most modern Linux installations have various PAM modules installed by default. This can be achieved by changing your AuthenticationMethods used in /etc/ssh/sshd_config to something similar to the following:

AuthenticationMethods publickey,keyboard-interactive:pam

In other words, in order to connect the the specified server you will be required to supply the proper publickey as well a valid password. It also helps to enforce higher levels of encryption, using modern cyphers, and locking down specific users and/or groups. If you’re looking to properly secure your SSH server, I suggest you begin is by taking a very thorough look at Mozilla’s OpenSSH Guidelines.

Important

Please note that the methods suggested for securing your SSH in this post are not the only methods required in order to properly secure your SSH server. They mostly consist of method to enhance existing security methods for your SSH connections.

References

Mozilla SSH Guidelines

Installing Fish on OSx

2016-10-11T00:00:00+00:00

As developers we all use the command line, to make things happen. And by now you know there’s several shell environments ranging from zsh, kornshell, C-Shell, bash and to who knows what variants of there out there. Some of these shells are complicated beasts and some of them include a scripting language of their own. But they all have a purpose and they have their dedicated users. I for one can say for the past several years I’ve been a heavy user of zsh along with the addition of oh-my-zsh for the customization, easy plugins, and user friendly terminal.

From what I could find the majority of POSIX developers tend to stick with BASH which is very universal, widely used, and easy to learn. And mac OS users, tend to stick with either bash or zsh, I mean why change what’s not broken, right? Well it never hurts to try something new, I don’t mean the newest shiniest toy that’s available. And who know’s it could always make a huge difference in your productivity. Well after hearing quite a bit of hype from various developers about the fish shell I decided to give it a try. And thus far, I’m loving it. It’s very smooth, has a quick response time, seems less bogged down like various other shells. It also has an amazing auto-complete/command prediction behavior that makes things quite a bit easier and faster for the user.

In this article, I am going to show you how I documented installing the fish terminal, numerous of it’s extensions, and themes.

Installing Fish on OSx

First of all if your a mac OS developer and looking to install custom software, you more than likely already have homebrew. Brew makes installing programs through the command-line a heck of a lot easier than building with a makefile, or having to bundle dependencies and packages. And than having the compilation fail on you.

To install fish it’s quite simple, just begin by issuing the following commands:

brew update
brew install fish

Now you’re not quite ready to go, you need to add Fish to your systems shell listing in /etc/shells, this does require administrative access.

sudo echo '/usr/local/bin/fish' >> /etc/shells

Now that fish is installed, in order to use it lets begin by setting it as your default terminal shell environment.

chsh -s /usr/local/bin/fish

Now in order to load fish as your shell you’ll need to either open a new window, tab, session, etc. It may look like the same old terminal that your used to, but it’s not. If you would like to see some of the features fish offers, I’d suggest checking out their tutorials page. Now for some reason when I first installed fish it didn’t update/load the autocompletions, if this is the case for you issue the following command fish_update_completions, it’ll take a moment because there’s quite a few to download. If your looking for a rough idea of what they have autocompletions consist of, I’d suggest checking out their list of autocompletion files.

Lets talk plugins

Part of using fish it make life easier on you, to get things done faster and more efficient. That’s why no matter what we do, we’re always looking for plugins, shortcuts, and extensions. Just like using brew, well install a fish plugin manager in order to make installing plugins as smooth as possible. At the moment there’s a variety of fish plugin in manager including: fisherman, fundle, oh-my-fish and a few others. In my opinion the more mature one seems to be fisherman, which is what quite a few fish users seem to prefer using. On the other hand, I chose oh-my-fish because of it’s ease of use and simple commands. I mean as developers we’ve learned 100’s of linux commands, functions, methods, attributes, etc. who wants to complicate our lives and add even more to memorize. So lets install oh-my-fish to get the customization started.

curl -L http://get.oh-my.fish | fish

Now if would like to get a rough idea of the oh-my-fish commands available, lets just issue the omf command.

$: omf

Usage:
  omf install [<name>|<url>]
  omf theme [<name>]
  omf remove [<name>]
  omf search [<name>]
  omf update
  omf help [<command>]

Commands:
  list      List local packages.
  describe  Get information about what packages do.
  install   Install one or more packages.
  theme     List / Use themes.
  remove    Remove a theme or package.
  update    Update Oh My Fish.
  cd        Change directory to plugin/theme directory.
  new       Create a new package from a template.
  search    Search for a package or theme.
  submit    Submit a package to the registry.
  destroy   Uninstall Oh My Fish.
  doctor    Troubleshoot Oh My Fish.
  help      Shows help about a specific action.

Options:
  --help     Display this help.
  --version  Display version.

For more information visit → git.io/oh-my-fish

By now you’ve probably installed several programs that generate several .dotfile folder directories in your $HOME directory. Fish/oh-my-fish is no different, except instead of piling up even more in your $HOME directory, fish config files will be stored in ~/.config/fish. The .config path is used by several programs in order to reduce clutter being built up in your $HOME directory. Because I’m sure if you entered in ‘ls -alH’ in the prompt you get something similar to the following.

.oh-my-zsh
.rbenv
.rvm
.vim
.tux
.ssh
.npm
.irbrc
# And tons of others as well

Now let it be a new install or not, I always suggest updating your package managers to have the newest database and latest packages available.Lets issue the omf update command and continue on. By now, you’ve probably noticed that OMF will generate an entire set of config files of its own. I believe it’s best practice to use the OMF config files for your aliases, functions, etc. But I stuck with using the basic fish configure directory/files. I guarentee you will have to set several of your $PATH directories to access your programs. If you wish to see those in which are currently set use printf printf "%s\n" $PATH to print out what you have access to executing.

Now lets begin installing plugins

# fish supports atom out of the box, but I also use SublimeText as well
# this enables the sublime command 'subl', if it can be found
omf install sublime

# To enable quick access to your rbenv without doing anything extra
omf install rbenv

# https://github.com/oh-my-fish/plugin-osx
# enables OSx based commands: flushdns, showhidden, trash, updatedb, etc.
omf install osx

You may or may not want to install grc, it is used for highlighting output command for specified commands. But I believe it comes in handy for various commands and makes reading output easier.

brew install grc
omf install grc
# this should automatically load grc to highlight fields rather than having to issue the following command
source /usr/local/etc/grc.bashrc

If you install it, you can verify it’s working properly by issuing a command with assorted output, ie: ps au. You should get a colorized version of something similar to the below table.

USER       PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
Tarellel 17590   5.2  0.0  2446204   4976 s005  S+   12:16AM   0:00.04 /usr/bin/python /usr/local/bin/grc -es --colour=auto ps au
Tarellel 14325   1.5  0.0  2500316   5860 s005  S    11:13PM   0:00.73 -fish
root     17591   0.0  0.0  2444720   1172 s005  R+   12:16AM   0:00.00 ps au
Tarellel 15150   0.0  0.0  2492120   5428 s002  S+   11:36PM   0:00.40 -fish

Now back to installing some essential plugins:

# similar to zsh term, allows u to open new tab with current directory
omf install tab

# Add all brew paths to fish $PATH
omf install brew

# used to help enfore 256 color terminal support
# https://github.com/oh-my-fish/plugin-ssh
omf install ssh

# used to make ssh term colors consistent
# https://github.com/oh-my-fish/plugin-ssh
omg install ssh-term-helper

Now if you’re like me, I had several aliases, custom functions, variables and config in my zsh settings. Fish’s function and alias declarations are slightly different than zsh and bash but not to hard to understand. And now would be the perfect time to begin adding them before attempt to do a heck of a lot more. A basic example of variable declarations below, is assigning the $EDITOR variable to use atom.

# -U makes it a univerasal variable
set -U EDITOR atom

Lets make it beautiful

Now you may thing it’s hipster as hell to want custom colors, fonts, and outputs in your terminal. But you want whats comfortable for you, whats easy on the eyes, and is easy to read.

For all I know you may like the default colors and theme that you have setup for your terminal. But you want to spruce it up, I suggest taking a look at the oh-my-zsh theme page, several of them are very similar to their oh-my-zsh counterparts.

Powerline - As you browse through the themes you’ll notice several of the themes require powerline installed in order to fully function. This is a python package and python comes installed by default on most *nix and POSIX operating systems. Depending on your operating system and setup you may want to look at the powerline installation instructions.

pip install --user powerline-status

Now powerline uses HD fonts with special characters for its stylized output, so you’ll need to install various powerline fonts and symbols.

Downloads/Install powerline symbols

You can either install it through the Font-Book app or by moving it to ~/Library/Fonts/

Next lets install all the available powerline fonts, each one looks amazing and adds a whole new experience to your terminal.

git clone https://github.com/powerline/fonts.git
sudo ./fonts/install.sh

Now if you’re using one of those themes that does require powerline and a powerline font. You’ll need to set your terminal applications font to use the powerline fonts. In order to do this in iTerm2 goto the following menus iterm > prefernces > profiles > text and make sure the fonts used are ones with a powerline tag on in. Plain old Courier or Monaco font just won’t cut it.

My setup

oh-my-fish theme: bobthefish

iTerm theme colors: base16-ocean-dark.256

Terminal fonts used

Regular Font: 14pt Inconsolata for Powerline
Non-ASCII font: 12pt Sauce Code Powerline

As you can see below, I tend to enjoy using smooth charcoal gray themes, with colors that aren’t to vibrant.

Resources;

Awesome-fish
oh-my-fish
iTerm2 theme colors:
- base16
- dracula
- flattened
- solarized-dark
- smych
- tomorrow theme
- Iterm2-color-schemes (big list of colors)
- iterm2-material-design
- flat-terminal
- lucario
- terminal.sexy : terminal color schemer

Setting up tmux with Rails

2016-08-29T00:00:00+00:00

Why use tmux?

You may have a similar workflow the way I do. When I’m working on a rails project I tend to have several tabs and/or windows open in the terminal. After a while it tends to be a heaping mess and I end up having to close the terminal down and start all over again. And other times, I have just became tired of having to switch between different tabs and windows, trying to find the one that I need.

As you can see from the image below, this method works. But having to switch from tab to tab, just to view your output logs or guard builds/tests gets to be a pain after a while.

So what is tmux? It’s a mutliplex and very similar to the *nix screen process. It allows you to perform several processes all within the same screen, without having to switch tabs or windows. And there are tons more applications you can do with than building a rails application, but at the moment it’s a great example of why it makes such a great tool.

I can tell you this much, so far I’ve only used tmux for a short period of time but it has become an essential tool in my development workflow. Tmux’s window and session management makes it very easy to get started and increase my productivity.

Setting up tmux

I’m just going to assume you’re on OSx, with homebrew installed. If you are using Debian, Ubuntu, or some other distro it’s pretty close the same commands. You’ll just be using different package manager commands. Instead of using brew, you will be making the installs with apt-get, pacman, etc.

Lets begin by install tmux

brew install tmux

# allows you to access OSx clipboard (pbcopy & pbpaste) through tmux
brew install reattach-to-user-namespace

Now if this is your first time installing tmux we’ll need to create a .tmux.conf file.

# this will contain colors, settings, key-bindings, etc.
touch ~/.tmux.conf

Now lets install the tux Plugin Manager (yes I know another one).

# make a .tmux directory to store everything
mkdir -p ~/.tmux/plugins

# While we're at it, lets install tmux package manager
git clone https://github.com/tmux-plugins/tpm ~/.tmux/plugins/tpm

The tmux plugin manager, makes it easy to install, update, and remove any and all plugins you may wish to use.

Lets begin by adding our first plugin to tmux, tmux-sensible. You you dive in, it won’t seem like much but it contains various handy default values, bindings, etc. that will make tmux easier to use. The best part of this plugin, is that it’s not supposed to overwrite any config values you may have in you ~/.tmux.conf file. No add the following to very bottom of your ~/.tmux.conf file.

set -g @plugin 'tmux-plugins/tmux-sensible' # recommened tmux defaults
set -g @plugin 'tmux-plugins/tmux-yank' # allows copying to system vie tmux

# Initialize TMUX plugin manager (keep this line at the very bottom of tmux.conf)
run '~/.tmux/plugins/tpm/tpm'

Now is where it gets tricky, with tmux you have a prefix-key combo that you have to use in order issue particular commands. By default the prefix keys are CTRL+b. With some commands/key-combos it makes a difference between b and B or T and t.

Now lets fire up, tmux first the first time. At first it won’t be anything we haven’t seen before. But once you learn the keys and get the right plugins, it’ll be a whole new experience.

# fire up tmux by issuing the following command
tmux

You’ll terminal will flicker for a second and you have a very generic/ASCII based status bar appear at the bottom. This means you’re doing great. Now lets install the tmux plugins by doing the following key combo CTRL+b I. You may have to wait a few minutes, but it’ll install all specified plugins in the bottom of the tmux.conf file.

Now if you want to play around with tmux, to get additional windows, panes, sessions, etc. Great. As you can see from the image, tmux is a multiplexer which means it allows you to have several processes all running within a single terminal screen/tab.

Setup and Install Tmuxinator

Now I’m going to assume you have some version of ruby installed already (various *nix OS’s are coming with it preinstalled). So up next, we will will install tmuxinator, this is ruby gem that makes it a heck of a lot easier to make preconfigured panes and windows with default commands that run when they’re created.

We first need to install tmuxinator and verifiy all the required requirements are met in order to use it.

# install tmuxinator
gem install tmuxinator

# verify everything is ready
tmuxinator doctor

You will now need to add tmuxinator to your shell, for consistency sake I’ll just assume your using zsh/oh-my-zsh. You will need to add source ~/.bin/tmuxinator.zsh to your .zshrc file.

Now you’ll be thinking, but no file by the name ~/.bin/tmuxinator.zsh even exists. What gives? Well lets download the required completion file and get started. And if you’re not using zsh, there’s files for bash and fish as well.

# Download zsh completion file
curl -O https://raw.githubusercontent.com/tmuxinator/tmuxinator/master/completion/tmuxinator.zsh
mkdir ~/.bin
mv tmuxinator.zsh ~/.bin/

Now in order to make sure everything is loaded and configured, reload the current terminal.

Lets create a tmuxinator project

Tmuxinator projects let us setup a predefined layout, with panes performing specific tasks. Rather than mess around and build a layout every time we want to open up tmux.

To create a tmuxinator project, lets issue the following command:

tmuxinator new [project_name]

# for example
tmuxinator new red_willow

This will generate a project.yml file in the ~/.tmuxinator folder. It should also open up your specified $EDITOR allowing you to get a rough idea on what tmuxinator’s configuration looks. It’s very basic theres panes, windows, even layouts, the root: ~/[project_name] at the top is the folder you specify for tmuxinator to open up all the tmux terminals to.

Below is an example of a pretty basic rails based tmux session, built with tmuxinator.

# ~/.tmuxinator/tasky.yml
name: tasky
root: ~/Desktop/red_willow

windows:
  - main:
      # http://stackoverflow.com/questions/9812000/specify-pane-percentage-in-tmuxinator-project
      # use: tmux list-windows to get coords and window sizes
      layout: b147,208x73,0,0[208x62,0,0,208x10,0,63{104x10,0,63,103x10,105,63}]
      panes:
        - rails server thin
        - guard
        - atom .
        - sleep 7 && rails -t
        - rails c
        # - foreman start
  - logs: tail -f log/development.log

Now issue the following command to initiate the tmux session tmuxinator [project_name] From here it may take a few seconds for everything to load up and start running. You should now have terminal session similar to the image below. Your sessions colors, panes, and windows may be a bit different from what it looks like. That’s because you will need to setup your tmux config and trust me figuring out all the settings to use can be a major beast to deal with. So to make things a little easier and somewhere to start from, here’s a copy of my current tmux.conf. And if your lazy, you can always search github for .dotfile repositories because there are tons and tons of tmux.conf files listed.

How it’s time to let you loose on your own and see what kind of project configure best suites your needs.

tmux commands/shortcuts

Your prefix key can be configured in ~/.tmux.conf, but I’ve chose to leave it as CTRL+b, because it means there’s no way I will accidentally close or move between a window, pane, session, etc. And this is just a small batch of commands you can do, tmux is a massive program with tons of features I haven’t even begin to think about.

#All of these require you to first initiate the prefix key combo:
# Default Prefix: CTRL+B and than whatever you want it to do

" - Horizontal split
% - Vertical split
<arrow buttom> - moves between session panes
z - loads window focus to current tmux panel
c - load up new tmux window
<window number> - switches between specified window
d - detach current tmux session
x - kill current pane in focus
[space] - toggle between layouts

And here a few essential tmux console commands:

# to view list of current tmux sessions
tmux ls

# to reattach tmux panels (number specified will be specified tmux session from tmux ls)
tmux attach -t 0

#To kill specific tmux session
tmux kill-session -t 0

# kills the last tmux session on the list
tmux kill-session

# to completely kill tmux
tmux kill-server

TheWorkAround

Devise-OTP without jQuery

Setting up a Pi-Hole

Begin setup

Adjust Rasbian Configuaton

Hardening the Pi

Let’s start with SSH

Lets add a firewall

Fail2Ban

Unattended Upgrades

Installing PiHole

Setting up VPN Access

Install log2ram

Update Raspberry Pi’s bootloader

Update PiHole Blocklists

Applications to use

References

Docker Swarm Persistent Storage

A little about Docker Swarm

Why would you need persistent storage?

How we got around this!

Gluster Notes

Installing GlusterFS

Using GlusterFS with Docker

Ruby’s Year of Performance (2018)

MikroTik - Mobile Rate-Limiting

A Brief Intro

Now To the Issue

UPDATED: 5/31/18

2017 - Was a Year of Change

Mapping an ArcGIS Parcel dataset as a Mapbox tileset

Loading the Parcel Map

Understanding Coordinate Reference Systems and Projections

Let’s convert it to usable data

Importing the data into Postgres

Lets fetch the build/properties initial build date

How the Script Works

Lets convert it to a GeoJson file for vectoring

References

Enhancing SSH Security

Newer Key types

Duel Method Authentication

Important

References

Installing Fish on OSx

Installing Fish on OSx

Lets talk plugins

Lets make it beautiful

My setup

Resources;

Setting up tmux with Rails

Why use tmux?

Setting up tmux

Setup and Install Tmuxinator

Lets create a tmuxinator project

tmux commands/shortcuts

Additional Resources