<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Another Blog</title>
 <link href="https://kscherer.github.io//atom.xml" rel="self"/>
 <link href="https://kscherer.github.io/"/>
 <updated>2023-02-11T16:47:28+00:00</updated>
 <id>https://kscherer.github.io/</id>
 <author>
   <name>Konrad Scherer</name>
   <email>kmscherer@gmail.com</email>
 </author>

 
 <entry>
   <title>Book Review: How the word is passed by Clint Smith</title>
   <link href="https://kscherer.github.io//reviews/2022/08/03/book-review-how-the-word-is-passed-by-clint-smith"/>
   <updated>2022-08-03T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//reviews/2022/08/03/book-review-how-the-word-is-passed-by-clint-smith</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;This was a difficult but important book to read. The legacy of slavery and
racism on our society is both overt and subtle. Much of it is ignored and
glossed over but some people are trying to integrate a more historically
accurate interpretation into our institutions and lives.&lt;/p&gt;

&lt;h3 id=&quot;thomas-jefferson&quot;&gt;Thomas Jefferson&lt;/h3&gt;

&lt;p&gt;The first part of the book takes place at the farm/plantation of the
founding father Thomas Jefferson. The reality is that slavery was a big part
of all the founding fathers life and business, but Jefferson has some
especially egregious behavior. He split families, tortured slaves and even
had six (!) children with a slave that was his wife’s half sister! The
estate is trying very hard to integrate this lesser told story into the
traditional story and it is a difficult task. How can the misery and
suffering of the slaves be reconciled with the fact that they allowed him to
do his incredibly important work?&lt;/p&gt;

&lt;h3 id=&quot;whitney-plantation&quot;&gt;Whitney Plantation&lt;/h3&gt;

&lt;p&gt;This plantation is an independent project of a wealthy retired businessman
to preserve the stories of the slaves that sustained it. It is incredible
work that I am glad is being done. The reality of our history needs to be
preserved and integrated, it cannot be ignored and buried. I hope to be able
to visit this important contribution.&lt;/p&gt;

&lt;h3 id=&quot;modern-day-slavery-in-angola&quot;&gt;Modern Day Slavery in Angola&lt;/h3&gt;

&lt;p&gt;Angola is a maximum security prison in Louisiana that is filled with black
men preforming forced labour for cents on the hour. It is modern day slavery
and it highlights the corruption of the justice system as an overt means of
continuing slavery. All talk of reconciliation is empty as long as this
practice continues. This section was so infuriating.&lt;/p&gt;

&lt;h3 id=&quot;juneteenth-and-manhattan&quot;&gt;Juneteenth and Manhattan&lt;/h3&gt;

&lt;p&gt;The actual events of Juneteenth were fascinating to me. The end of slavery
in the South after the Civil War was a messy crazy process. I was not aware
that plans to give slaves plots of land was rejected last moment. How
different things might be today if it had happened.&lt;/p&gt;

&lt;p&gt;The legacy of slavery in Manhattan is also something not talked about
often. The economics of slavery were embedded throughout all of states and
New York was no exception. Just because it was on the “winning” side doesn’t
mean that it doesn’t bare any responsibility.&lt;/p&gt;

&lt;h3 id=&quot;blackford-cemetery&quot;&gt;Blackford Cemetery&lt;/h3&gt;

&lt;p&gt;The Blackford cemetery is a Confederate cemetery and it is a focal point for
Confederate culture. Is it possible to honor the soldiers without getting
tangled in their implicit support of slavery? Is it possible to celebrate a
culture when that culture justified a war to maintain slavery? It is a
similar situation to post war Germany where everyone has to grapple with the
fact that normal people enabled the slaughter of millions of people. What I
experienced is that Germans have reframed the teaching of the history in
terms of “Never Again”, i.e. we must teach this to ensure that it can never
happen again. Unfortunately I don’t see this kind of hard work being done in
the US.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I don’t think we will be able to address the legacy of slavery and racism
without a clear acknowledgment and acceptance of the past. I really
appreciated the way this book presented the historical blind spots of US
history. Highly recommended.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Book Review: 1491 and 1493 by Charles Mann</title>
   <link href="https://kscherer.github.io//reviews/2022/08/02/book-review-1491-and-1493"/>
   <updated>2022-08-02T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//reviews/2022/08/02/book-review-1491-and-1493</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;These books changed the way I see and think about our modern world. They
cover what we know today about the American continent before Christopher
Columbus arrived in the “New World” and the many ways the entire world was
changed forever afterwards. Mostly for my own benefit I will list a bunch of
the things I learned and perhaps this will encourage you to read these books
as well.&lt;/p&gt;

&lt;h3 id=&quot;impact-of-pathogens&quot;&gt;Impact of Pathogens&lt;/h3&gt;

&lt;p&gt;I had heard before about the impact of various pathogens brought over by the
Europeans, but I didn’t appreciate the scale. The exact population of the
Americas is impossible to know, but there is lots of evidence for as high as
100 million people. These people had to deal with multiple waves of
influenza, typhoid, small pox, yellow fever, malaria, etc. and over 50 years
this probably killed over 95% of the population.&lt;/p&gt;

&lt;p&gt;The consequences of this are mind boggling. The Indians (there doesn’t seem
to be a better word for the people that lived in the Americas before
Columbus) did controlled burns of the forests for thousands of years and
this stopped. Without this burning, the ecosystem completely changed. Huge
forests grew, herds of bison formed and the growth may even have been
responsible for the mini ice age in the 1600s due to forests capturing so
much CO2.&lt;/p&gt;

&lt;p&gt;All attempted European settlements failed until the Indians were wiped
out. There are so many stories of how ill prepared the settlers were for
life in America. It wasn’t until they weren’t in direct competition with the
Indians that settlement had a chance of succeeding. Also the weakened Indian
tribes often formed alliances with the settlers in their own
conflicts. These alliances never worked out for the Indians as the Europeans
once established would systematically wipe out the Indians.&lt;/p&gt;

&lt;h3 id=&quot;impact-of-joining-two-separate-ecosystems&quot;&gt;Impact of joining two separate ecosystems&lt;/h3&gt;

&lt;p&gt;The Americas contained ecosystems completely isolated from the rest of the
world. As the world became a single global ecosystem there were many winners
and losers. America didn’t have worms which is one of the reasons the
controlled burns were so critical. Now worms are everywhere and the way they
break down biological material has profound consequences for the native
plants.&lt;/p&gt;

&lt;p&gt;The Americas contained three plants that changed the world: tomatoes, corn
and potatoes. It is hard to imagine cuisine today without tomatoes and to
think that something as “universal” as spaghetti and tomato sauce is a very
recent phenomenon. Corn is the backbone of our industrial agriculture
feeding cattle and converted to corn syrup and various other food
additives. The potato alone has been attributed to have allowed the human
population to grow by billions. Impossible to imagine our society without
these plants, not to mention chocolate or tobacco.&lt;/p&gt;

&lt;h3 id=&quot;american-agriculture&quot;&gt;American agriculture&lt;/h3&gt;

&lt;p&gt;The early settlers often remarked at how healthy the Indians looked. Turns
out the Indians ate a diet of corn, beans and squash that was nutritionally
superior to the European diet of wheat and meat. The Indians didn’t have any
pack or domesticated animals. That means all farming was done by hand and all
communications had to be walked. The Europeans mistook that lack of visible
farms as sign that the land was “unused”. But without oxen to pull plows
giant farms aren’t feasible and the Indians had very different forms of
farming and hunting. Even in the Amazon, there are strange super fertile
regions that contains millions of pottery shards mixed in with the soil. We
still don’t understand how this works or was possible.&lt;/p&gt;

&lt;h3 id=&quot;silver-from-the-andes&quot;&gt;Silver from the Andes&lt;/h3&gt;

&lt;p&gt;After the Incas were defeated and enslaved, the Spanish found a silver mine
in the Andes that had been mined by the Incas. The ore was very pure and
plentiful and soon the silver was moving around the world. It was supposed
to go directly back to Spain to fund various wars.Some enterprising Spanish
sailors realized they could run the silver across the Pacific, trade with
the Chinese for silks and spices, cross the Pacific, Mexico, the Atlantic
and make a fortune. Almost two thirds of the silver went to China and even
the one third to Spain was enough to trigger massive inflation in both
countries. This inflation was the proximate trigger of various regime
changes.&lt;/p&gt;

&lt;h3 id=&quot;rubber&quot;&gt;Rubber&lt;/h3&gt;

&lt;p&gt;The industrial revolution requires three things: steel, oil and rubber. I
underappreciated the critical role that rubber plays in all the gaskets,
seals and tires that are part of modern machines. The rubber/latex tree is
still a core part of our economy and now grows all over the world. Rubber
tree farms cause all kinds of ecological disasters and the rubber boom of
the late 1800s also caused massive economic upheaval. It is another example
of a natural product for which we haven’t been able to make an economical
substitute.&lt;/p&gt;

&lt;h3 id=&quot;slavery&quot;&gt;Slavery&lt;/h3&gt;

&lt;p&gt;I was not aware of the link between malaria and slavery. Malaria was so
deadly to Europeans and Indians that settlement in the Americas was almost a
death sentence. It was because the Africans had more natural immunity the
malaria that drove much of the Atlantic slave trade. Many African slaves
were in fact prisoners of war between African tribes and so many slaves had
military training. Unsurprisingly many escaped their slavery and formed
“free” communities. One incredible example is a state in modern day Brazil
that survived for almost 90 years. It was strategically placed on a cliff
side with access to water, etc. The populace was trained and they were able
to resist many attacks. It is such an amazing story that I hope it becomes a
movie some day.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This is just a fraction of the incredible history of the Americas that I am
so grateful to have been able to learn about. Highly recommended.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>The Kubernetes batch job memory challenge</title>
   <link href="https://kscherer.github.io//2021/08/03/the-kubernetes-batch-job-memory-challenge"/>
   <updated>2021-08-03T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2021/08/03/the-kubernetes-batch-job-memory-challenge</id>
   <content type="html">&lt;h3 id=&quot;batch-jobs-in-kubernetes&quot;&gt;Batch Jobs in Kubernetes&lt;/h3&gt;

&lt;p&gt;The design of Kubernetes has its origins at Google as a platform for
microservices. It does support batch jobs but they do not have the
same level of support as microservices. The only difference between a
Pod and Job is that a Job does not restart automatically.&lt;/p&gt;

&lt;h4 id=&quot;pod-abstraction&quot;&gt;Pod Abstraction&lt;/h4&gt;

&lt;p&gt;The Pod abstraction assumes that the CPU and memory usage of the
processes running inside a Pod are predictable and fairly
constant. When a Pod’s memory usage is larger than its allocation, the
assumption is that there is a bug or memory leak and the process
should be killed.&lt;/p&gt;

&lt;h4 id=&quot;best-effort-resource-allocation&quot;&gt;Best Effort resource allocation&lt;/h4&gt;

&lt;p&gt;For Pods the default resource allocation is “Best Effort”. When no CPU
or memory limits are defined the Pod is considered low priority and
does not have any resource limits. During my initial explorations with
K8s I created 10 Best Effort Jobs that compiled software. K8s started
all the Pods on a single Node. The compile jobs quickly used up all
available memory on the system and K8s started killing Jobs at random.&lt;/p&gt;

&lt;p&gt;This behavior makes sense when the Pods are stateless and part of
Deployments and con be easily moved to other nodes. Of course the
compile jobs can just be restarted but it is wasteful to throw away
the work in progress. In this case “Best Effort” doesn’t really seem
appropriate for running batch jobs.&lt;/p&gt;

&lt;h4 id=&quot;podinterantiaffinity&quot;&gt;PodInterAntiAffinity&lt;/h4&gt;

&lt;p&gt;Using the Pod AnitAffinity with the nodename and/or labels, the K8s
scheduler will spread the Pods out over a set of Nodes. This way if
the nodes have enough resources “Best Effort” pods will have the best
chance of not consuming too many resources on the nodes. But it isn’t
a guarantee and it is difficult to predict if a Job/Pod will finish.&lt;/p&gt;

&lt;h4 id=&quot;burst-and-guaranteed-pods&quot;&gt;Burst and Guaranteed Pods&lt;/h4&gt;

&lt;p&gt;If “Best Effort” isn’t appropriate, then the next step is to give each
Job a CPU and memory limit. The question then becomes what those
limits should be? A Burstable Pod has lower limit and max limit. A
Guaranteed Pod has a max limit. A Burstable Pod allows a form of
resource over commitment. If the memory usage of the processes in the
Pod ever tries to allocate more memory than the resource limit, the
allocation will fail and K8s will kill the Pod.&lt;/p&gt;

&lt;h4 id=&quot;compressible-and-uncompressible-resources&quot;&gt;Compressible and Uncompressible resources&lt;/h4&gt;

&lt;p&gt;The default Pod resources CPU and Memory have different behavior when
the resource is exhausted. CPU over commitment is handled by the
kernel scheduler and the scheduler will allocate CPU time based on the
scheduler policy. A Pod that attempts to use more CPU time than
allocated will be throttled. CPU is considered a compressible
resource.&lt;/p&gt;

&lt;p&gt;Memory is different because if there isn’t any memory available,
attempted allocations will fail. This is considered a uncompressible
resource. There isn’t a way to throttle process memory usage. The only
option is swap memory which can allow allocations to proceed but it
comes with problems like thrashing. With containers it gets even more
complicated because by default the kernel does not account for swap
memory in the memory resource usage.&lt;/p&gt;

&lt;h4 id=&quot;setting-memory-limits&quot;&gt;Setting memory limits&lt;/h4&gt;

&lt;p&gt;So the only way to prevent a Pod from being killed when it uses too
much memory is to set the memory allocation high enough. This is where
the predictability of microservices makes figuring out the max memory
limit easier.&lt;/p&gt;

&lt;h4 id=&quot;finding-a-memory-limit-for-yocto-builds&quot;&gt;Finding a memory limit for Yocto builds&lt;/h4&gt;

&lt;p&gt;But what about large and unpredictable Yocto builds? Yocto provides
infinite configuration options and also supports two forms of build
parallelization: jobs and parallel packages. These options speed up
the build but make the the package build order non
deterministic. SState can be used to speed up builds but it makes
predicting memory usage even more difficult because it is impossible
to know ahead of time which packages will be rebuilt.&lt;/p&gt;

&lt;h3 id=&quot;swap&quot;&gt;Swap?&lt;/h3&gt;

&lt;p&gt;What about using swap to add memory temporarily to a Pod that has used
all of its allocation? K8s does not support swap and will require that
swap be disabled. As far as I can tell the main issue is around how to
account for the swap memory. Swap cannot just be added to memory of
the machine because it is much slower. The current kube tools do not
track swap usage and the kernel does not enable swap memory accounting
by default for performance reasons. Swap can increase performance by
removing unused memory pages to disk and giving applications more
RAM. However a Pod using swap could thrash the entire node causing
failures of Pods that are not using too many resources.&lt;/p&gt;

&lt;p&gt;There is work underway to add swap alpha support to K8s 1.22 but it
has many limitations and may be restricted to “Best Effort”
Pods. Exactly how swap will be managed and accounted for are still
open questions. At some point swap may be a part of the solution but
it isn’t feasible now.&lt;/p&gt;

&lt;h3 id=&quot;workarounds&quot;&gt;Workarounds&lt;/h3&gt;

&lt;p&gt;Without a way to make memory compressible there are only workarounds
and tradeoffs.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;reducing parallelism for large packages like tensorflow or chromium
does keep memory usage down at the cost of slowing down the
build. When a build fails due to resource allocation failure we
track the current packages being compiled from the process list and
use that to identify potential problem packages.&lt;/li&gt;
  &lt;li&gt;Tracking memory usage of a Pod using monitoring systems like
Prometheus allows us to track changes in memory usage over time. It
also gives us a better picture of how the memory usage of Yocto
build changes over the course of the build. Ideally the processes
causing peak memory usage can be identified and reduced to keep
memory usage more stable allowing for better utilization of the
nodes resources. The Prometheus alerting system can also be used to
warn when builds cross memory usage boundaries like 90% which may
give us time to deploy workarounds before builds fail.&lt;/li&gt;
  &lt;li&gt;A build that failed due to a failed memory allocation could
potentially be restarted with lower parallalization and/or changed
memory limits. This is tricky because our builds use local disks
with HostPath and the build Pod would need to be rescheduled to the
node with the in progress build files. K8s 1.21 has added improved
support for local volumes. Local volume management in K8s deserves a
post of its own.&lt;/li&gt;
  &lt;li&gt;The make tool has an option –load-average which tells make to no
longer spawn new jobs if the load average is above a specific
value. This doesn’t work for Pods because load is system wide and
uses CPU but concept of feedback into make or ninja is
interesting. Since a process can use the /proc filesystem to monitor
the memory usage of a container is might be possible to have these
tools reduce the number of spawned jobs based on current container
memory usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;It is the combination of the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;memory being uncompressible&lt;/li&gt;
  &lt;li&gt;K8s/container memory limits are hard limits without second chances&lt;/li&gt;
  &lt;li&gt;Yocto builds have unpredictable memory consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;that makes this a very difficult problem. The only “proper” solution
would be to make the builds have more predictable memory usage. This
would require a feedback mechanism to make/ninja/bitbake to adjust the
number of running processes based on the current container memory
usage.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Golang app development using Skaffold</title>
   <link href="https://kscherer.github.io//2021/04/20/golang-app-development-using-skaffold"/>
   <updated>2021-04-20T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2021/04/20/golang-app-development-using-skaffold</id>
   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;Developing an application to be distributed as a K8s service is a
complicated undertaking. Besides learning the application language and
solving the application problem, there are all the K8s workflows that
need to be automated. This is my attempt to navigate the insane K8s
ecosystem of tools as I try to make a decent development and
production workflow.&lt;/p&gt;

&lt;h1 id=&quot;development-workflow&quot;&gt;Development workflow&lt;/h1&gt;

&lt;p&gt;The local development workflow needs to have a fast feedback
loop. For a K8s application that requires at minimum a container build
and deployment.&lt;/p&gt;

&lt;h1 id=&quot;ubuntu-setup-of-go-115&quot;&gt;Ubuntu setup of go 1.15&lt;/h1&gt;

&lt;p&gt;Latest go at this time is 1.16.3, but for a sample app the distro
supplied 1.15 is fine.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo apt install golang-1.15
cd $HOME/bin &amp;amp;&amp;amp; ln -s /usr/lib/go-1.15/bin/go
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I have $HOME/bin in my $PATH so this makes it easy to manage all the
installation of single binary tools. Technically with buildpacks I
don’t even need to install the go toolchain, but I want to explore
things like debugging of a running go application.&lt;/p&gt;

&lt;h1 id=&quot;buildpacks&quot;&gt;Buildpacks&lt;/h1&gt;

&lt;p&gt;I am not a big fan of Dockerfiles, especially the multi-stage
Dockerfiles which is the right way to separate the build and runtime
containers. Due to the single binary structure of Go applications they
can have a tiny runtime image. So I decided to investigate using
buildpacks&lt;a href=&quot;https://buildpacks.io&quot;&gt;4&lt;/a&gt; which look like a much better alternative for application
development. It even supports new features like reproducible builds
and image rebasing.&lt;/p&gt;

&lt;p&gt;Install the pack tool.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd $HOME/bin
curl -LO https://github.com/buildpacks/pack/releases/download/v0.18.1/pack-v0.18.1-linux.tgz
tar xzf pack-v0.18.1-linux.tgz
chmod +x pack
rm -f pack-v0.18.1-linux.tgz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Start with golang buildpacks sample app&lt;a href=&quot;https://github.com/paketo-buildpacks/samples/tree/main/go/mod&quot;&gt;1&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since this is a golang app, the default buildpack can be tiny:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pack config default-builder paketobuildpacks/builder:tiny
cd $APP
pack build mod-sample --buildpack gcr.io/paketo-buildpacks/go
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With buildpacks running locally, the workflow is &lt;edit&gt; and save, run
pack to build, run docker and test. It takes a few seconds and
multiple steps.&lt;/edit&gt;&lt;/p&gt;

&lt;h1 id=&quot;skaffold-and-minikube&quot;&gt;Skaffold and Minikube&lt;/h1&gt;

&lt;p&gt;This app will run in K8s and will depend on K8s features so it will
need to run inside K8s. Enter Minikube&lt;a href=&quot;https://minikube.sigs.k8s.io&quot;&gt;2&lt;/a&gt; for a local K8s setup and
Skaffold&lt;a href=&quot;https://skaffold.dev&quot;&gt;3&lt;/a&gt; to orchestrate the development workflow.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd $HOME/bin
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
chmod +x minikube
minikube start
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This starts up a full K8s instance locally using the docker
driver. The initial download was ~1GB so it takes a while.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd $HOME/bin
curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64
chmod +x skaffold
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now switch to the golang buildpack sample with skaffold&lt;a href=&quot;https://github.com/GoogleContainerTools/skaffold/tree/master/examples/buildpacks&quot;&gt;5&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;lt;term 1&amp;gt; skaffold dev
&amp;lt;term 2&amp;gt; minikube tunnel
&amp;lt;term 3&amp;gt; kubectl get svc # to get IP
&amp;lt;term 3&amp;gt; curl -s http://&amp;lt;external IP&amp;gt;:8080
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;minikube has its own docker daemon and the buildpacks used and images
built are located inside minikube and not the host docker&lt;a href=&quot;https://minikube.sigs.k8s.io/docs/handbook/pushing/&quot;&gt;6&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;lt;term 4&amp;gt; eval $(minikube docker-env)
&amp;lt;term 4&amp;gt; docker images
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This makes deploying the image very fast because it isn’t copied.&lt;/p&gt;

&lt;h1 id=&quot;development-workflow-1&quot;&gt;Development workflow&lt;/h1&gt;

&lt;p&gt;The skaffold sample is setup to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gcr.io/buildpacks/builder:v1&lt;/code&gt; and
it also works with the builder &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;paretobuildpacks/builder:tiny&lt;/code&gt;. The
Google buildpacks&lt;a href=&quot;https://github.com/GoogleCloudPlatform/buildpacks&quot;&gt;7&lt;/a&gt; supports “file sync” which copies changed files
directly to the image. This means changes are available in seconds
which is great for development.&lt;/p&gt;

&lt;h1 id=&quot;next-steps&quot;&gt;Next steps&lt;/h1&gt;

&lt;p&gt;My application will be a multi-cluster app that exchanges K8s resource
data. First step is to query the resource utilization of the K8s
cluster using client-go.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Update: Git Server option bigFileThreshold</title>
   <link href="https://kscherer.github.io//git/2021/02/11/update-git-server-option-bigfilethreshold"/>
   <updated>2021-02-11T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//git/2021/02/11/update-git-server-option-bigfilethreshold</id>
   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;Many years ago (2014) I setup the my git servers with the git option
core.bigFileThreshold=100k. This reduced memory usage dramatically
because git stopped compressing already compressed files. I have used
this option for many years without apparent problems until one of my
colleagues alerted me that cloning an internal mirror of the Linux
kernel from my git server was transferring over 9GB of data! Cloning
the same repo from kernel.org transferred only approx 1.5GB.&lt;/p&gt;

&lt;h1 id=&quot;so-many-repack-options&quot;&gt;So many repack options&lt;/h1&gt;

&lt;p&gt;When I looked at the bare repo everything seemed normal. The repo had
been repacked properly less than a month ago thanks to
grokmirror. There was a single pack file with a bitmap and a single
pack file that was 9.1GB! I tried all the standard repack commands:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; git repack -A -d -l -b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and when that didn’t help:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; git repack -A -d -l -b -F -f
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But nothing changed. Then my colleague reported that rebuilding with
the above options did work on his machine and reduce the git repo
size.  This meant that there must be a local setting on the server
that was causing the problem. I looked at the local ~/.gitconfig and
saw the bigFileThreshold option I had set so long ago. So I did a
quick experiment with:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; git -c core.bigFileThreshold=512m repack -A -d -F -f
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and it did indeed reduce the bare git repo from 9.1GB to 1.9GB! It
seems that there are ~200K files in the Linux kernel repo that are
over 100k and when they are not compressed the size of the repository
grows a lot!&lt;/p&gt;

&lt;p&gt;Curious how large the files in the kernel repo can become I did a
checkout of the mainline kernel and looked for files of 100Kb.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; find . \( -path */.git/* \) -prune -o \( -type f -size +100k \) | wc -l
914
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Four of these files are even over 10MB!&lt;/p&gt;

&lt;h1 id=&quot;solution&quot;&gt;Solution&lt;/h1&gt;

&lt;p&gt;Once the problem has been clearly identified the solution is usually
simple. In this case the gitolite config for all the kernel repos sets
the core.bigFileThreshold to its default value of 512m. This way all
the other repos can still use the smaller bigFileThreshold setting.&lt;/p&gt;

&lt;p&gt;There is also a way to tell git not to delta compress files with
certain extensions. I created a global git attributes file
/etc/gitattributes with the following content:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;*.bz2 binary -delta
*.gz binary -delta
*.xz binary -delta
*.tgz binary -delta
*.zip binary -delta
*.lz binary -delta
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Which covers all the compressed files in our repos and it had the same
effect so I reverted the bigFileThreshold option to the default of
512M.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Developer productivity</title>
   <link href="https://kscherer.github.io//2020/10/12/developer-productivity"/>
   <updated>2020-10-12T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2020/10/12/developer-productivity</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;During a recent job interview I was asked “Do you think you are a 10x
developer”. The concept of a “10x” developer and developer
productivity is something I have thought a lot about. Fundamentally
the hard part is figuring out what to measure. I don’t have any good
answers but here is how I think about it today.&lt;/p&gt;

&lt;h2 id=&quot;how-to-measure-productivity&quot;&gt;How to measure productivity?&lt;/h2&gt;

&lt;p&gt;Since programming is a fairly creative activity it will always be
difficult to find a measure that cannot be gamed.&lt;/p&gt;

&lt;p&gt;A simple but flawed measure is something like “lines of code” or
“features completed” or “bugs fixed”. These measurements are flawed
because they are only loosely linked to the things users of the code
actually care about. In University I met someone that allegedly
completed a 5 hour coding interview in 1.5 hours with code that passed
all the unit tests. If true this is impressive and a testament to
that particular developers skills. I doubt I would ever be able to
match such a feat.&lt;/p&gt;

&lt;p&gt;Just as a person has many personality facets, a developer can work on
different facets of productivity. I like the word facets because each
is unique while still contributing to the whole.&lt;/p&gt;

&lt;h2 id=&quot;cost-of-programming-errors&quot;&gt;Cost of programming errors&lt;/h2&gt;

&lt;p&gt;An important skill is the ability to produce “error-free” code. I
think computer programming is unique in that a single bug can cost
millions of dollars to fix. Even perfectly correct code can require
rewriting when the requirements or execution environment
changes. Examples of insanely expensive bugs include OpenSSL
HeartBleed, Intel Meltdown and more. These bugs cause the users damage
and also generate rework for the entire industry.&lt;/p&gt;

&lt;p&gt;Programming is a continuous tradeoff between getting the code working
for a specific use-case and making it robust enough to handle multiple
use-cases. Figuring how much it will cost to develop a feature is hard
enough and the risk of expensive bug is rarely factored in. There
isn’t an easy way to measure the cost of expensive bugs. The cost to
fix bugs is also hard to measure and not accounted as an engineering
cost.&lt;/p&gt;

&lt;p&gt;Developing the skill of writing code that doesn’t result in expensive
bugs often requires:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Using tools like static analyzers, linters, enabling all compiler
warnings, fuzzers, code quality scanners, etc. Catching errors early
is often the best return on investment. However, each tool takes
time to learn and integrate. Each run takes time and a high rate of
false positives can result in lost productivity.&lt;/li&gt;
  &lt;li&gt;Developing and maintaining a set of runtime tests. Code developed at
the same time as tests tends to be better designed because it works
best when dependencies are minimized. Code with a good test suite
can be refactored more easily. On the other hand, runtime testing of
a large software base requires significant infrastructure in order
to minimize false positives and maintain a good feedback loop.&lt;/li&gt;
  &lt;li&gt;Careful software reuse. Sometimes using an existing code base is the
right thing to do. For example, almost none of the developers that
thought they could write an encryption library have succeeded. Each
dependency on a third party becomes a liability and has to be
managed carefully. Ideally, it is an open source library and you can
become part of its community and keep up with the upgrades and
security fixes. In the worst case scenario, you end up maintaining a
fork of the software or have to apply horrible workarounds.&lt;/li&gt;
  &lt;li&gt;Creating operationally simple software. Even bug free software can
be a pain to upgrade or keep operational in a high availability
configuration. Software has many different user interfaces and one
is how the software is installed, configured, upgraded and
maintained. I wasn’t exposed to this facet of software until I had
to maintain a cluster of 100+ machines. I have found that whether a
service can reload its configuration without a restart is a good
indication if the operator interface has been taken
seriously. Reloading configuration at runtime requires a good
software design and test suite. When there are bugs it is too easy
for the developers to just deprecate the feature and force
restarts. But being able to reload a configuration without impact on
running sessions is an operationally valuable feature.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the wrong environment an inexperienced developer can introduce
programming errors that will cost more than their
contributions. Everyone likes to talk about “10x” programmers, but I
think we should also talk about “negative productivity” programmers
and what can be done to reduce the cost of these errors by catching
and preventing them earlier.&lt;/p&gt;

&lt;h2 id=&quot;cost-of-fixing-bugs&quot;&gt;Cost of fixing bugs&lt;/h2&gt;

&lt;p&gt;Debugging is a specific developer skill. It is difficult to teach and
hard to explain the instincts of a good debugger to an inexperienced
developer. Being able to make an intermittent bug easily reproducible
or use gdb to track down some memory corruption are critical skills at
the right time. I also saw a talk by a Google engineer that was
investigating a 99th percentile latency outlier and found a Linux
kernel scheduler bug that saved Google millions of dollars a year. As
systems become more complex, the bugs also become harder to fix. I
wish there were better ways to capture and train debugging expertise.&lt;/p&gt;

&lt;h2 id=&quot;individual-productivity-versus-team-productivity&quot;&gt;Individual productivity versus team productivity&lt;/h2&gt;

&lt;p&gt;One of the amazing properties of software is leverage where a single
tool can make a large group of developers more productive. The goal of
every manager should also be to make their team more productive. The
goal of almost every software product is to make their customers more
productive. Being able to find and address productivity bottlenecks in
a team is another developer skill. Developing this skill often
requires:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Understanding of the workflow of the different members of the team&lt;/li&gt;
  &lt;li&gt;Use of automation tools to transition manual work to the computer&lt;/li&gt;
  &lt;li&gt;Creating tools with a compelling user interface for team&lt;/li&gt;
  &lt;li&gt;Talking with upstream and downstream teams to find ways to make
interactions smoother and more automated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This assumes that the team works well together. A toxic team member
can reduce the productivity of an entire team. Language, timezone and
cultural differences can also hinder productivity.&lt;/p&gt;

&lt;h2 id=&quot;choosing-the-right-work&quot;&gt;Choosing the “right” work&lt;/h2&gt;

&lt;p&gt;Even the most perfect code is useless if it doesn’t solve the right
problems. Keeping development aligned with business needs can
contribute to team productivity by eliminating rework. Some of the
skills required to do this well are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Interacting with customers directly and understanding what their
problems are and why they are looking to you to solve them&lt;/li&gt;
  &lt;li&gt;Communicating technical concepts to non-technical people in an
effective way&lt;/li&gt;
  &lt;li&gt;Communicating non-technical requirements to technical people in an
effective way&lt;/li&gt;
  &lt;li&gt;Potentially developing expertise in the customer domain to
understand their domain specific language and problem context&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;It is impossible to be excellent in all these skills. The most
important is to constantly find ways to improve individual and team
productivity. I suspect this isn’t the answer that an interviewer is
expecting. I need to come up with a shorter answer.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Using AWS Session Manager to connect to machines in a private subnet</title>
   <link href="https://kscherer.github.io//aws/2019/11/07/using-aws-session-manager-to-connect-to-machines-in-a-private-subnet"/>
   <updated>2019-11-07T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//aws/2019/11/07/using-aws-session-manager-to-connect-to-machines-in-a-private-subnet</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;We are experimenting with AWS as many people are. One of the first
hurdles is connecting over SSH to the EC2 instances that have been
created. The “standard” mechanism is to setup a Bastion host that has
a restrictive “Security Group” (also known as Firewall). This Bastion
host is accessible from the Internet and once the user has logged into
this host they can then access other instances in the VPC.&lt;/p&gt;

&lt;p&gt;The Bastion host has a few limitations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;It is exposed to the Internet: A Security Group can restrict access
to specific IPs and only open port 22. This is reasonably secure,
but the possibility of an exploit in the SSH server is always a
possibility.&lt;/li&gt;
  &lt;li&gt;SSH key management: The AWS console allows for the creation of SSH
keypairs that can be automatically installed on the instance which
is great. If you have multiple people accessing the Bastion
instance, then either everyone will have to use the same keypair
(which is bad) or there needs to some other mechanism to managing
the authorized_keys file on the Bastion instance. Ideally this is
automated using a tool like Puppet Bolt or Ansible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of my weekly newsletters pointed me to &lt;a href=&quot;https://github.com/xen0l/aws-gate&quot;&gt;aws-gate&lt;/a&gt; which mentioned
the possibility of logging into an instance using SSH without the need
for a Bastion host. This post documents my experience getting it
working.&lt;/p&gt;

&lt;h2 id=&quot;local-requirements&quot;&gt;Local Requirements&lt;/h2&gt;

&lt;p&gt;On the local machine the AWS CLI must be installed. I use a python
virtualenv to keep the python environment separate and avoid requiring
root access.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; python3 -m venv awscli
&amp;gt; cd awscli
&amp;gt; bin/pip3 install awscli
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately it turns out the Session Manager functionality requires
a special plugin which is only distributed as a deb package.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; curl &quot;https://s3.amazonaws.com/session-manager-downloads/plugin/latest/ubuntu_64bit/session-manager-plugin.deb&quot; -o &quot;session-manager-plugin.deb&quot;
&amp;gt; sudo apt install ./session-manager-plugin.deb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The AWS CLI requires an access key. Go the AWS console -&amp;gt; “My Security
Credentials” and create a new Access key (or use existing
credentials).&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; ~/awscli/bin/aws configure
AWS Access Key ID [None]: accesskey
AWS Secret Access Key [None]: secretkey
Default region name [None]: us-west-2
Default output format [None]:
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Also in the AWS EC2 console, create a new KeyPair and download the
.pem file locally. I put the file in ~/.ssh and gave it 0600
permissions. Now add the following to your .ssh/config file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# SSH over Session Manager
host i-* mi-*
ProxyCommand sh -c &quot;~/awscli/bin/aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'&quot;
IdentityFile ~/.ssh/&amp;lt;keypair name&amp;gt;.pem
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;aws-iam-setup&quot;&gt;AWS IAM Setup&lt;/h2&gt;

&lt;p&gt;By default an EC2 instance will not be manageable by the System
Manager. Go to AWS Console -&amp;gt; IAM -&amp;gt; Roles to update the roles.&lt;/p&gt;

&lt;p&gt;I already had a default EC2 instance role and I had to add
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AmazonSSMManagedInstanceCore&lt;/code&gt; permissions to the instance role.&lt;/p&gt;

&lt;h2 id=&quot;launching-the-instance&quot;&gt;Launching the Instance&lt;/h2&gt;

&lt;p&gt;According to the docs the official Ubuntu 18.04 server AMI has the SSM
agent integrated and I relied on this. Finding the right AMI is really
frustrating because there aren’t proper organization names attached to
AMIs. The simplest is to go the &lt;a href=&quot;https://cloud-images.ubuntu.com/locator/ec2/&quot;&gt;Ubuntu AMI finder&lt;/a&gt; and search for
‘18.04 us-west-2 ebs’ and select the most recent AMI.&lt;/p&gt;

&lt;p&gt;In the launch options:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;choose the correct VPC with a private subnet&lt;/li&gt;
  &lt;li&gt;the ‘IAM Role’ with the correct permissions&lt;/li&gt;
  &lt;li&gt;A “Security Group” with port 22 open to you&lt;/li&gt;
  &lt;li&gt;Select the Keypair that was downloaded earlier and setup in your
.ssh/config file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Launch the instance and wait a while. Go to the AWS Console -&amp;gt; Systems
Manager -&amp;gt; Inventory to see that the instance is running and the SSM
agent is working properly.&lt;/p&gt;

&lt;h2 id=&quot;connecting-over-ssh&quot;&gt;Connecting over SSH&lt;/h2&gt;

&lt;p&gt;If everything is setup correctly grab the instance name and do the
login:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; ssh ubuntu@i-014633b619400dfff
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1052-aws x86_64)
&amp;lt;snip&amp;gt;
ubuntu@ip-10-0-1-193:~$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;SSH access without a Bastion host is possible!&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>ZFS Disk replacement on Dell R730</title>
   <link href="https://kscherer.github.io//linux/2019/11/01/zfs-disk-replacement-on-dell-r730"/>
   <updated>2019-11-01T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux/2019/11/01/zfs-disk-replacement-on-dell-r730</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;I manage a bunch of Dell servers and I use OpenManage and
check_openmanage to monitor for hardware failures. Recently one
machine started showing the following error:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Logical Drive '/dev/sdh' [RAID-0, 3,725.50 GB] is Ready
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately “Drive is Ready” isn’t a helpful error message. So I log
into the machine and check the disk:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; omreport storage vdisk controller=0 vdisk=7
Virtual Disk 7 on Controller PERC H730P Mini (Embedded)

Controller PERC H730P Mini (Embedded)
ID                                : 7
Status                            : Critical
Name                              : Virtual Disk 7
State                             : Ready
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The RAID controller log shows a more helpful message:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Bad block medium error is detected at block 0x190018718 on Virtual Disk 7 on Integrated RAID Controller 1.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From experience I know that I could just clear the bad blocks, but the
drive is dying and more will come. Luckily Dell will replace drives
with uncorrectable errors and I received a replacement drive quickly.&lt;/p&gt;

&lt;h2 id=&quot;cleanly-removing-the-drive&quot;&gt;Cleanly removing the drive&lt;/h2&gt;

&lt;p&gt;I know the drive is /dev/sdh, but I created the ZFS pool using drive
paths. Searching &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/dev/disk/by-path/&lt;/code&gt; gave me the correct drive.&lt;/p&gt;

&lt;p&gt;First step is to mark the drive as offline.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; zpool offline pool 'pci-0000:03:00.0-scsi-0:2:7:0'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To make sure I replaced the correct drive I also forced it to blink:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; omconfig storage vdisk controller=0 vdisk=7 action=blink
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next came the manual step of actually replacing the drive.&lt;/p&gt;

&lt;h2 id=&quot;activating-the-new-drive&quot;&gt;Activating the new drive&lt;/h2&gt;

&lt;p&gt;After inserting the new disk I was able to determine the physical disk
number and recreate the RAID-0 virtual disk.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; omconfig storage controller action=discardpreservedcache controller=0 force=enabled
&amp;gt; omconfig storage controller controller=0 action=createvdisk raid=r0 size=max pdisk=0:1:6
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I use single drive RAID0 because I prefer that ZFS use the disks in
raidz2 mode rather than using RAID6 on the controller.&lt;/p&gt;

&lt;p&gt;Then a quick verify that the new virtual disk is using the same PCI
device and drive letter and then add it back into the ZFS pool.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; omreport storage vdisk controller=0 vdisk=7
Virtual Disk 7 on Controller PERC H730P Mini (Embedded)

Controller PERC H730P Mini (Embedded)
ID                                : 7
Status                            : Ok
Name                              : Virtual Disk7
State                             : Ready
Device Name                       : /dev/sdh
&amp;gt; parted -s /dev/sdh mklabel gpt
&amp;gt; zpool replace pool 'pci-0000:03:00.0-scsi-0:2:7:0'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;ZFS will add the new drive and resliver the data.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; zpool status
pool: pool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It would be slightly easier if the virtual disk was handled by the
RAID controller, but the rebuild would take much longer. So far ZFS on
Linux has worked very well for me and I will continue to rely on it.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Building a multi node build cluster</title>
   <link href="https://kscherer.github.io//build/2019/10/18/building-a-multi-node-build-cluster"/>
   <updated>2019-10-18T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//build/2019/10/18/building-a-multi-node-build-cluster</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;I now have built, deployed and managed three internal build systems
that handle thousands (yes thousands) of Yocto builds daily. Each
build system has its tradeoffs and requirements. The latest one I call
&lt;a href=&quot;https://github.com/WindRiver-OpenSourceLabs/ci-scripts&quot;&gt;Wrigel&lt;/a&gt; was specifically designed to be usable outside of WindRiver and
is available on our WindRiver-OpenSourceLabs GitHub repo and on the
Docker Hub. Recently there has been a lot of internal discussion about
build systems and the current state of various open source projects
and I will use this post to clarify my thinking.&lt;/p&gt;

&lt;h2 id=&quot;wrigel-design-constraints&quot;&gt;Wrigel Design Constraints&lt;/h2&gt;

&lt;p&gt;The primary use case of Wrigel was to make it easy for a team inside
or outside WindRiver to join 3-5 “spare” computers into a build
cluster. For this I used a combination of Docker, Docker Swarm and
Jenkins.&lt;/p&gt;

&lt;p&gt;Docker makes it really easy to distribute preconfigured
Jenkins and build container images. Thanks to the generous support of
Docker Cloud all the container images required for Wrigel are built
and distributed on Docker Hub.&lt;/p&gt;

&lt;p&gt;Docker Swarm makes is really easy to join 3-5 (Docker claims up to
thousands) systems together into a cluster. The best part is that
Docker Compose supports using the same yaml file to run services on a
single machine or distributed over a swarm. This has been ideal for
developing and testing the setup.&lt;/p&gt;

&lt;p&gt;Jenkins is an incredible piece of software with an amazing community
that is used everywhere and has plugins for almost any
functionality. I rely heavily on the Pipeline plugin which provides a
sandboxed scripted pipeline DSL. This DSL support both single and
multi-node workflows. I have abused the groovy language support to do
some very complicated workflows.&lt;/p&gt;

&lt;p&gt;I have a system that works and looks to be scalable. Of course the
system has limitations. It is these limitations and the current
landscape of alternatives that I have been investigating.&lt;/p&gt;

&lt;h2 id=&quot;wrigel-limitations&quot;&gt;Wrigel Limitations&lt;/h2&gt;

&lt;p&gt;Jenkins is a great tool, but the Pipeline plugin is very specific to
Jenkins. There isn’t a single other tool that can run the Jenkins
Pipeline DSL. To be fair, every build tool from CircleCI to Azure
Pipelines and Tekton also have their own syntax and lock-in. There are
many kinds of lock-in and not all are bad. One of the perennial
challenges with all build systems has been reproducing the build
environment outside of the build system. Failures due to some special
build system state tend to make developers really unhappy, so I wanted
to explore what running a pipeline outside of a build system would
look like. I acknowledge the paradox of building a system to run
pipelines that also supports running pipelines outside of the system.&lt;/p&gt;

&lt;p&gt;The other limitation is security. The constant stream of CVE reports
and fixes for Jenkins and its plugins is surprising. I am very
impressed with Cloudbees and the community with the way they are
taking these problems seriously. Cloudbees has made significant
progress improving the default Jenkins security settings. This is no
small feat considering Jenkins has a very old codebase. On the
downside my own attempts to secure the default setup have been broken
by Jenkins upgrades three times in the last year. While I understand
the churn I am reluctant to ship Jenkins as part of a potential
commercial product because each CVE would impose additional non
business value work on our team.&lt;/p&gt;

&lt;h2 id=&quot;docker-and-the-root-access-problem&quot;&gt;Docker and the root access problem&lt;/h2&gt;

&lt;p&gt;Docker is an amazing tool and has completely transformed the way I
work. One major problem is that giving a build script access to run
Docker is equivalent to giving root on the machine. Since most build
clusters are internal systems running mostly trusted code it isn’t a
huge problem, but I have always been interested in
alternatives. Recently Podman and rootless Docker have announced
support for user namespaces. I was able to do a Yocto build using
Podman and user namespaces with the 4.18 kernel so huge progress has
been made. I would prefer that the build system required as little
root access as possible, so I will continue to investigate using
rootless Podman and/or Docker.&lt;/p&gt;

&lt;h2 id=&quot;breaking-down-the-problem&quot;&gt;Breaking down the problem&lt;/h2&gt;

&lt;p&gt;At its core, Jenkins is a cluster manager and a batch job
scheduler. It is also a plugin manager, but that isn’t directly
relevant to this discussion. For a long time Jenkins was probably the
most common open source cluster manager. It is only recently with rise
of datacenter scale computers that more sophisticated cluster managers
have become available. In 2019 the major open source cluster managers
are Kubernetes, Nomad, Mesos + Marathon and Docker Swarm. Where
Jenkins is designed around batch jobs with an expected end time, newer
cluster managers are designed around the needs of a long lived
service. These managers have support for batch jobs, but it isn’t the
primary abstraction. They also have many features that Jenkins does
not:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Each job specifies its resource requirements. Jenkins only supports
label selectors for choosing hosts&lt;/li&gt;
  &lt;li&gt;The jobs are packed to maximize utilization of the systems. Jenkins
by default will pack on a single machine and will prefer to reuse
workareas.&lt;/li&gt;
  &lt;li&gt;Each manager supports high availability configurations in the open
source version whereas the HA feature for Jenkins is an Enterprise
only feature&lt;/li&gt;
  &lt;li&gt;Jobs can specify complex affinities and constraints on where the
jobs can run.&lt;/li&gt;
  &lt;li&gt;Each manager has integration with various container runtimes,
storage and network plugins. Jenkins has integration with Docker but
generally doesn’t manage storage or network settings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So by comparison Jenkins looks like a very limited scheduler, but it
does have pipeline support which none of the other projects does. So I
started exploring projects that add pipeline support to these
schedulers. I found many very new projects like Argo and Tekton for
Kubernetes, There are plugins for Jenkins that allow it to use
Kubernetes, Nomad or Mesos, but they can’t really take advantage of
all the features.&lt;/p&gt;

&lt;h2 id=&quot;cluster-manager-comparison&quot;&gt;Cluster manager comparison&lt;/h2&gt;

&lt;p&gt;Now I will compare the features of the cluster managers which I feel
are most relevant to build cluster setup:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;How easy is the setup and maintenance?&lt;/li&gt;
  &lt;li&gt;How complicated is the HA setup?&lt;/li&gt;
  &lt;li&gt;Can it be run across multiple datacenters, i.e. Federated?&lt;/li&gt;
  &lt;li&gt;Community and Industry support?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker Swarm:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Very easy setup&lt;/li&gt;
  &lt;li&gt;Automatic cert creation and rotation&lt;/li&gt;
  &lt;li&gt;transparent overlay network setup&lt;/li&gt;
  &lt;li&gt;HA easy to setup&lt;/li&gt;
  &lt;li&gt;no WAN support&lt;/li&gt;
  &lt;li&gt;Docker Inc. is focused on Kubernetes and future of Swarm is uncertain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nomad:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Install is a simple binary&lt;/li&gt;
  &lt;li&gt;integration with Consul for HA&lt;/li&gt;
  &lt;li&gt;encrypted communications&lt;/li&gt;
  &lt;li&gt;no network setup&lt;/li&gt;
  &lt;li&gt;plugins for job executors including Docker&lt;/li&gt;
  &lt;li&gt;WAN setup supported by Consul&lt;/li&gt;
  &lt;li&gt;Support for Service, Batch and System jobs&lt;/li&gt;
  &lt;li&gt;Runs at large scale&lt;/li&gt;
  &lt;li&gt;Well supported by Hashicorp and community&lt;/li&gt;
  &lt;li&gt;job configuration in json or hcl&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mesos + Marathon:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for Docker and custom containerizer&lt;/li&gt;
  &lt;li&gt;No network setup by default&lt;/li&gt;
  &lt;li&gt;Runs at large scale at Twitter&lt;/li&gt;
  &lt;li&gt;Commercial support available&lt;/li&gt;
  &lt;li&gt;Complicated installation and setup&lt;/li&gt;
  &lt;li&gt;HA requires zookeeper setup&lt;/li&gt;
  &lt;li&gt;no federation or WAN support&lt;/li&gt;
  &lt;li&gt;Small community&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kubernetes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Very popular with lots of managed options&lt;/li&gt;
  &lt;li&gt;Runs at large scale at many companies&lt;/li&gt;
  &lt;li&gt;Supports build extensions like Tekton and Argo&lt;/li&gt;
  &lt;li&gt;Federation support&lt;/li&gt;
  &lt;li&gt;Lots of support options and great community&lt;/li&gt;
  &lt;li&gt;Complicated setup and configuration&lt;/li&gt;
  &lt;li&gt;Requires setup and management of etcd&lt;/li&gt;
  &lt;li&gt;Requires setup and rotation of certs&lt;/li&gt;
  &lt;li&gt;Requires network overlay setup using one of 10+ network plugins like Flannel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my experience with Wrigel, Docker Swarm has worked well. It is only
its uncertain future that has encouraged me to look at Nomad.&lt;/p&gt;

&lt;h2 id=&quot;running-pipelines-outside-jenkins&quot;&gt;Running Pipelines outside Jenkins&lt;/h2&gt;

&lt;p&gt;Many years ago I saw a reference to a small tool on Github called
&lt;a href=&quot;https://github.com/walter-cd/walter&quot;&gt;Walter&lt;/a&gt;. The idea is have a small go tool that can execute a
sequence of tasks as specified in a yaml file. It can execute steps
serially or in parallel. Each stage can have an unlimited number of
tasks and some cleanup tasks. Initially it supported only two stages
so I modified it to support unlimited stages. This tool can only
handle a single node pipeline, but that covers a lot of use cases. Now
the logic for building the pipeline is in the code that generates the
yaml file and not inside a Jenkinsfile. Ideally a developer could
download the yaml file and the walter binary and recreate the entire
build sequence on a local development machine. The temptation is to
have the yaml file call shell scripts, but by placing the full
commands in the yaml file with proper escaping each command could be
cut and pasted out of the yaml and run on a terminal.&lt;/p&gt;

&lt;h2 id=&quot;workflow-support&quot;&gt;Workflow Support&lt;/h2&gt;

&lt;p&gt;It turns out that Jenkins Pipelines are an implementation of a much
larger concept called Workflow. Scientific computing has been building
multi-node cluster workflow engines for a long time. There is a list
of &lt;a href=&quot;https://github.com/meirwah/awesome-workflow-engines&quot;&gt;awesome workflow engines&lt;/a&gt; on Github. I find the concept of
directed acyclic graphs of workflow steps as mentioned by &lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache
Airflow&lt;/a&gt; very interesting because it matches my mental model of
some of our larger build jobs.&lt;/p&gt;

&lt;p&gt;With a package like &lt;a href=&quot;https://github.com/spotify/luigi&quot;&gt;Luigi&lt;/a&gt;, the workflow can be encoded as a graph
of tasks and executed on a scheduler using “contribs”, which are
interfaces to services outside of Luigi. There are &lt;a href=&quot;https://luigi.readthedocs.io/en/stable/api/luigi.contrib.html&quot;&gt;contribs&lt;/a&gt; for
Kubernetes, AWS, ElasticSearch and more.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;With a single node pipeline written in yaml and executed by walter and
a multi node workflow built in Luigi, the build logic would be
independent of the cluster manager and scheduler. A developer could
run the workflows on a machine not managed by a cluster manager. The
build steps could be fairly easily executed on a cluster managed by
Jenkins, Nomad or Kubernetes. Combined with rootless containers the
final solution would be much more secure than current solutions.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Getting started with ElasticSearch</title>
   <link href="https://kscherer.github.io//elasticsearch/2018/05/10/getting-started-with-elasticsearch"/>
   <updated>2018-05-10T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//elasticsearch/2018/05/10/getting-started-with-elasticsearch</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;I manage an internal build system that creates a simple text file with
key value pairs with build statistics. These statistics are then
processed using a fairly gnarly shell script. When I first saw this
years ago I thought it looked like the perfect candidate to use
ElasticSearch and finally had time to look into this.&lt;/p&gt;

&lt;h2 id=&quot;elasticsearch-and-kibana&quot;&gt;ElasticSearch and Kibana&lt;/h2&gt;

&lt;p&gt;ES is a text database and search engine which is useful, but it also
has a neat frontend called Kibana which can be used to query and
visualize the data. Since I manage the system, there was no need to
setup Logstash to preprocess the data since I could just convert it to
json myself.&lt;/p&gt;

&lt;h2 id=&quot;official-docker-images&quot;&gt;Official Docker Images&lt;/h2&gt;

&lt;p&gt;The documentation for ElasticSearch covers installation using Docker,
but there is one gotcha. The webpage that lists all the available
docker images at https://www.docker.elastic.co/ only lists the images
that contain the starter X-Pack with a 30 day trial license. I ended
up using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker.elastic.co/elasticsearch/elasticsearch-oss&lt;/code&gt; image
which is only Open Source content. Same for the Kibana image.&lt;/p&gt;

&lt;h2 id=&quot;docker-compose&quot;&gt;Docker-compose&lt;/h2&gt;

&lt;p&gt;I wanted to run ES and Kibana on the same server, but if you do that
using two separate docker run commands, the auto configuration of
Kibana doesn’t work. I also wanted to make a local volume to hold the
data and so I created a simple docker-compose file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;---
version: '2'
services:
  kibana:
    image: docker.elastic.co/kibana/kibana-oss:6.2.4
    environment:
      SERVER_NAME: $HOSTNAME
      ELASTICSEARCH_URL: http://elasticsearch:9200
    ports:
      - 5601:5601
    networks:
      - esnet

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4
    container_name: elasticsearch
    environment:
      discovery.type: single-node
      bootstrap.memory_lock: &quot;true&quot;
      ES_JAVA_OPTS: &quot;-Xms512m -Xmx512m&quot;
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    networks:
      - esnet

volumes:
  esdata1:
  driver: local

networks:
  esnet:
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now I have ES and Kibana running on ports 5601 and 9200.&lt;/p&gt;

&lt;h2 id=&quot;json-output-from-bash&quot;&gt;JSON output from Bash&lt;/h2&gt;

&lt;p&gt;I have a large collection of files in the form:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;key1: value1
key2: value2
&amp;lt;etc&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Converting this to JSON should be simple, but there were a few
surprises:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;JSON does not like single quotes so wrapping the key and value in
quotes requires either using echo and &quot; or printf which I found
cleaner.&lt;/li&gt;
  &lt;li&gt;JSON requires that the last element does not have a trailing
comma. I abused bash control characters by using backspace to erase
the last comma.&lt;/li&gt;
  &lt;li&gt;ElasticSearch would fail to parse if the JSON contained ( or ). I
used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tr&lt;/code&gt; to delete all backslashes from the JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final code looks like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;convert_to_json()
{
    local ARR=();
    local FILE=$1
    while read -r LINE
    do
        ARR+=( &quot;${LINE%%:*}&quot; &quot;${LINE##*: }&quot; )
    done &amp;lt; &quot;$FILE&quot;

    local LEN=${#ARR[@]}
    echo &quot;{&quot;
    for (( i=0; i&amp;lt;LEN; i+=2 ))
    do
        printf '  &quot;%s&quot;: &quot;%s&quot;,' &quot;${ARR[i]}&quot; &quot;${ARR[i+1]}&quot;
    done
    printf &quot;\b \n}\n&quot;
}

for FILE in &quot;$@&quot;; do
    local JSON=
    JSON=$(convert_to_json &quot;$FILE&quot; | tr -d '\\' )
    echo $JSON
done
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;elasticsearch-type-mapping&quot;&gt;ElasticSearch type mapping&lt;/h2&gt;

&lt;p&gt;The data was being imported into ES properly and I tried to search and
visualize the data, but I found it really hard to visualize the
data. Every field was imported as a text and keyword type which meant
that all the date and number fields could not be visualized as
expected.&lt;/p&gt;

&lt;p&gt;The solution was to create a mapping which assigns types to each field
in the document. If the numbers had not been sent as strings, ES would
have converted them automatically, but I had dates in epoch seconds
which is indistinguishable from a large number. Date parsing is its
own challenge and ES supports many different date formats. In my
specific case, epoch_seconds was the only date format required.&lt;/p&gt;

&lt;p&gt;I took the default mapping and added the type information to each
field.I tried to apply this mapping to the existing document, but ES
does not allow the mapping of a field to be changed because that would
change the interpretation of the data. The solution is to create a new
index and reindex the old index to the new one with types. This worked
and I was now able to visualize the data much more easily.&lt;/p&gt;

&lt;h2 id=&quot;curator&quot;&gt;Curator&lt;/h2&gt;

&lt;p&gt;I now had one index and it was growing quickly. I remembered from
previous research that LogStash uses indexes with a date suffix. This
allows data to be cleaned up regularly and also allows a new mapping
to applied to new indexes. Creating and deleting indexes is handled by
the Curator tool.&lt;/p&gt;

&lt;p&gt;I created two scripts: one for deleting indexes that are 45 days old
and another for creating tomorrows index with the specified
mapping. Running these from cron daily will automate the creation and
cleanup. Last piece is to have the JSON sent to the index that matches
the day.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Every new piece of software has its learning curve. So far the curve
has been quite reasonable for ElasticSearch. I look forward to working
with it more.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Thoughts on Exercise</title>
   <link href="https://kscherer.github.io//exercise/2017/05/23/thoughts-on-exercise"/>
   <updated>2017-05-23T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//exercise/2017/05/23/thoughts-on-exercise</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;Three years ago, I read &lt;a href=&quot;http://www.drmcguff.com/&quot;&gt;‘Body by Science’&lt;/a&gt; and it changed the way I
think about exercise completely. Unfortunately I wasn’t able to find a
gym in Ottawa that used this type of training. I then asked Google for
results on ‘HIT bodyweight’ and found &lt;a href=&quot;http://baye.com/&quot;&gt;Drew Baye&lt;/a&gt; and &lt;a href=&quot;http://baye.com/store/project-kratos/&quot;&gt;Project
Kratos&lt;/a&gt;. After reading most of Drew Baye’s blog and watching any videos
I could find with him, I purchased Project Kratos and started my
experiment.&lt;/p&gt;

&lt;p&gt;Doing a workout once a week at home was perfect for my life situation
at the time: two young children and full time jobs for my wife and
I. Despite the infrequency I made progress surprisingly quickly. The
squat, heel raise and back extension exercises almost tripled in under
six months. But I quickly plateaued on exercises like the pushup,
chinup and crunch. I tried many different things: negatives, forced
reps, more rest, less rest, split routines, etc. but nothing allowed
me to break through the plateau.&lt;/p&gt;

&lt;p&gt;I started experimenting with different programs like &lt;a href=&quot;https://gmb.io/&quot;&gt;GMB&lt;/a&gt;
and &lt;a href=&quot;https://www.gymnasticbodies.com/&quot;&gt;GB&lt;/a&gt;, but all I lacked the mobility to do even their entry
movements. I also read &lt;a href=&quot;https://www.mobilitywod.com/the-supple-leopard/&quot;&gt;‘Supple Leopard’&lt;/a&gt; by Kelly Starret
and &lt;a href=&quot;http://www.therollmodel.com/&quot;&gt;‘Roll Model’&lt;/a&gt; by Jill Miller and realized that mobility was
probably my limiting factor. It took a while before I was able to add
my reading about mobility into my exercise mental model. Here is my
current exercise model that I used to setup my latest exercise routine
and goals.&lt;/p&gt;

&lt;p&gt;Each movement has three components: mobility, skill and
strength. These three components are related is non-linear ways. Even
if they cannot be separated, it can still be useful to think about the
effect of each component on a movement:&lt;/p&gt;

&lt;h2 id=&quot;mobility&quot;&gt;Mobility&lt;/h2&gt;

&lt;p&gt;Do the joints, connective tissue and muscles that participate in the
movement have the range of motion required? There are &lt;em&gt;three&lt;/em&gt; answers
to this question: yes, no and almost.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;No means the joints do not have the required range of motion. For
example I cannot do a single leg squat because I lack the ankle range
of motion necessary.&lt;/li&gt;
  &lt;li&gt;Yes means the range of motion is sufficient for the movement.&lt;/li&gt;
  &lt;li&gt;Almost is the most tricky condition because it can look like the
movement can be done, but it is not optimal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of “almost” include missing shoulder range of motion so that
the lats are not properly engaged for chinups or pullups. When only
the arm muscles are used, the number of reps will always be limited
and risk of shoulder injury is higher.&lt;/p&gt;

&lt;p&gt;Another example is doing a squat without full ankle or hip range of
motion. The squat is possible but it may lead to a rounded back and
that can lead to joint wear and injury.&lt;/p&gt;

&lt;p&gt;There are many examples of world class athletes that are still able to
excel even with mobility limitation but they often pay the price for
it later. If mobility restrictions are not addressed, injuries and
plateaus will keep happening. More complicated movements often require
more range of motion but developing range of motion takes much longer
than building skill or strength.&lt;/p&gt;

&lt;p&gt;Mobility is more than stretching. Increasing the range of motion
requires convincing the nervous system that extending further is
safe. This requires that the muscle be strong enough and lots of time
being relaxed at the end of the range of motion. The fascia are also
involved. Sometimes there can be adhesion of fascia layers that
prevent proper movement. I have experienced many times injuries and
pains resolved using a lacrosse ball or ART where force is applied to
a trigger point.&lt;/p&gt;

&lt;h2 id=&quot;skill&quot;&gt;Skill&lt;/h2&gt;

&lt;p&gt;Skill is the neurological coordination required to do a movement
efficiently. Some movements require little practice, some full body
movements require lots of practice to coordinate all the muscles and
parts of the body properly. This is why practicing a movement without
going to failure can still result in extra repetitions or apparent
strength gains.&lt;/p&gt;

&lt;p&gt;Full muscle contraction is another skill that is an important part of
HIT. Learning to contract a muscle or a set of muscles under intense
discomfort takes practice.&lt;/p&gt;

&lt;h2 id=&quot;strength&quot;&gt;Strength&lt;/h2&gt;

&lt;p&gt;How best to simulate the body to produce the desired adaptation
response of greater power output and/or muscle size? This is the
component that HIT has focused on. The slow movement to momentary
muscular failure protocol works well, but there are limitations. If a
mobility or skill component is lacking, this will prevent a trainee
from achieving proper muscular failure and simulating an adaptation
response.&lt;/p&gt;

&lt;h2 id=&quot;training-for-all-components&quot;&gt;Training for all components&lt;/h2&gt;

&lt;p&gt;The best way to training for strength is HIT: short, infrequent
movement to momentary muscular failure.&lt;/p&gt;

&lt;p&gt;Skill training is best done when the muscles are rested as many skills
require strength to perform properly. Skill training therefore works
best with many repetitions using a lighter load with careful focus on
form.&lt;/p&gt;

&lt;p&gt;Mobility training is very different from skill or strength
training. It takes lots of time and is specific to each individual
body. It works best when done daily and integrated into other daily
activities. I have had to find creative ways to combine a usual
activity with stretching or working on fascia. For instance I will
read in straddle stretch and meditate in squat position.&lt;/p&gt;

&lt;h2 id=&quot;programming&quot;&gt;Programming&lt;/h2&gt;

&lt;p&gt;How best to combine all this information into a weekly program? That
depends on the goals and time available of course.&lt;/p&gt;

&lt;p&gt;If the goal is maximum ROI for minimum time investment, HIT strength
training has the best returns. Movements that do not require skill or
mobility components will have best return and this is why many HIT
gyms use machines. Bodyweight HIT works but many of the movements have
a mobility and skill component and each individual will experience
different limitations.&lt;/p&gt;

&lt;p&gt;The next step is to add mobility and skill practice. Unfortunately
both these require significantly more time investment. Choose a
specific skill and do daily mobility and skill repetitions.&lt;/p&gt;

&lt;p&gt;For example I choose L-Sit and Crow Pose as skills and I do daily
shoulder and wrist mobility followed by those skills on days when I do
not do HIT strength. The mobility work takes 15-20 minutes. The skill
work only takes a few minutes. I try to fit some more skill work in
during the day to maximize the repetitions.&lt;/p&gt;

&lt;p&gt;I have carved out a time for this every day. I will keep going while
there is progress and then switch to something else when I stop
progressing.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Thanks for reading this far. The key to staying motivated is progress
and the key to progress is focusing on very specific goals.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Hashicorp Vault based PKI</title>
   <link href="https://kscherer.github.io//vault/2017/05/09/hashicorp-vault-based-pki"/>
   <updated>2017-05-09T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//vault/2017/05/09/hashicorp-vault-based-pki</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;One of the trends I have noticed is that open source tools encrypt
network connections by default. Some tools like Puppet even make it
impossible to disable TLS encryption and provide tooling to build an
internal Certificate Authority. Docker requires overrides for any
registry that does not have a verified TLS cert. Many tools also
generate self signed certs which Firefox and Chrome always require
manual overrides to use.&lt;/p&gt;

&lt;p&gt;The solution is to have an internal Certificate Authority with the
root CA as part of the trusted store. This internal CA can then be
used to generate certs which will be trusted. But there are always
complications. Many programs do not use the OS trusted store and
require extra configuration to add trusted certs. For example Java
applications require several steps to generate a new trusted store
file and configuration to make that available to the
application. Docker has a special directory to place trusted certs for
registries.&lt;/p&gt;

&lt;h2 id=&quot;options&quot;&gt;Options&lt;/h2&gt;

&lt;p&gt;There are many CA solutions available: &lt;a href=&quot;https://pki.openca.org/&quot;&gt;OpenCA&lt;/a&gt;, &lt;a href=&quot;https://github.com/square/certstrap&quot;&gt;CertStrap&lt;/a&gt;,
&lt;a href=&quot;https://github.com/cloudflare/cfssl&quot;&gt;CFSSL&lt;/a&gt;, &lt;a href=&quot;https://www.vaultproject.io/&quot;&gt;Lemur&lt;/a&gt; and many others. As I looked through all these
programs a couple things kept bugging me. Creating certs is easy,
revocation is where it gets really messy. The critical question is how
to handle revocation in a sensible way. How can the system recover
from a root CA compromise? Once I started reading about CRLs and OSCP
and cert stapling, I got really discouraged. That is why I was
intrigued by &lt;a href=&quot;https://www.vaultproject.io/&quot;&gt;Hashicorp Vault&lt;/a&gt; and its PKI backend.&lt;/p&gt;

&lt;h2 id=&quot;vault&quot;&gt;Vault&lt;/h2&gt;

&lt;p&gt;Vault is a tool for managing secrets of all kinds, including tokens,
passwords and private TLS keys. It is quite complex and the CLI is non
obvious. It supports backends for Authentication, Secret Storage
and Auditing. It has a comprehensive access control language and a
generic wrapper concept that makes it possible to pass secrets without
revealing secrets to the middle man.&lt;/p&gt;

&lt;p&gt;Vault solves the revocation and CA compromise problem by making it
unnecessary. It provides a secure audited out of band channel for
distributing secrets like certs which enables very short lived certs
and secure automated reissuing of certs.&lt;/p&gt;

&lt;h2 id=&quot;vault-pki&quot;&gt;Vault PKI&lt;/h2&gt;

&lt;p&gt;That is the theory, so I decided to try it in practice by creating a
CA and some certs.&lt;/p&gt;

&lt;p&gt;1) Start Vault server, initialize, unseal and authenticate as root:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; vault server -config config.hcl
&amp;gt; export VAULT_ADDR='http://127.0.0.1:8200'
&amp;gt; vault init -key-shares=1 -key-threshold=1
Unseal Key 1: LbOw129fyB3OAzZvxq9RMQefNH8fFm7twS3wlg5Zv2o=
Initial Root Token: d9e9d69b-5d49-e753-3ef2-e6b36c0fb45a
&amp;gt; vault unseal LbOw129fyB3OAzZvxq9RMQefNH8fFm7twS3wlg5Zv2o=
&amp;gt; vault auth
Token (will be hidden): d9e9d69b-5d49-e753-3ef2-e6b36c0fb45a
Successfully authenticated! You are now logged in.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course this is for development only. A production deployment would
use more shares and higher threshold. The unseal keys should be
encrypted using gpg. Note that the root token can be changed.&lt;/p&gt;

&lt;p&gt;2) Create self signed Cert with 10 year expiration&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; vault mount -path=wrlinux -description=&quot;WRLinux Root CA&quot; -max-lease-ttl=87600h pki
Successfully mounted 'pki' at wrlinux'!
&amp;gt; vault write wrlinux/root/generate/internal common_name=&quot;WRlinux Root CA&quot; \
    ttl=87600h key_bits=4096 exclude_cn_from_sans=true
certificate     -----BEGIN CERTIFICATE-----
MIIFBDCCAuygAwIBAgIUMt8NYFtqaYk8Q1OUfdOWuPjXI0IwDQYJKoZIhvcNAQEL
...
serial_number   32:df:0d:60:5b:6a:69:89:3c:43:53:94:7d:d3:96:b8:f8:d7:23:42
&amp;gt; curl -s http://localhost:8200/v1/wrlinux/ca/pem | openssl x509 -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            32:df:0d:60:5b:6a:69:89:3c:43:53:94:7d:d3:96:b8:f8:d7:23:42
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that this is the only time private cert is exposed.&lt;/p&gt;

&lt;p&gt;3) Keep root CA offline and create second vault for intermediate CA&lt;/p&gt;

&lt;p&gt;Create CSR for Intermediate CA&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; vault mount -path=lpd -description=&quot;LPD Intermediate CA&quot; -max-lease-ttl=26280h pki
&amp;gt; vault write lpd/intermediate/generate/internal common_name=&quot;LPD Intermediate CA&quot; \
ttl=26280h key_bits=4096 exclude_cn_from_sans=true
csr     -----BEGIN CERTIFICATE REQUEST-----
MIIEYzCCAksCAQAwHjEcMBoGA1UEAxMTTFBEIEludGVybWVkaWF0ZSBDQTCCAiIw
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;4) Sign CSR and import Certificate&lt;/p&gt;

&lt;p&gt;Note: Intermediate private key never leaves Vault&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; vault write wrlinux/root/sign-intermediate csr=@lpd.csr \
common_name=&quot;LPD Intermediate CA&quot; ttl=8760h
Key             Value
---             -----
certificate     -----BEGIN CERTIFICATE-----
MIIFSzCCAzOgAwIBAgIUAY8RmTDEzwbkUQ0smevPPIPXOkYwDQYJKoZIhvcNAQEL
...
-----END CERTIFICATE-----
expiration      1523021374
issuing_ca      -----BEGIN CERTIFICATE-----
MIIFBDCCAuygAwIBAgIUMt8NYFtqaYk8Q1OUfdOWuPjXI0IwDQYJKoZIhvcNAQEL
...
-----END CERTIFICATE-----
serial_number   01:8f:11:99:30:c4:cf:06:e4:51:0d:2c:99:eb:cf:3c:83:d7:3a:46
&amp;gt; vault write lpd/intermediate/set-signed certificate=@lpd.crt
Success! Data written to: lpd/intermediate/set-signed
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;5) Create Role and generate Certificate&lt;/p&gt;

&lt;p&gt;Vault uses roles to setup cert creation rules.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; vault write lpd/roles/hosts key_bits=2048 \
max_ttl=8760h allowed_domains=wrs.com allow_subdomains=true \
organization='Wind River' ou=WRLinux
Success! Data written to: lpd/roles/hosts
&amp;gt; vault write lpd/issue/hosts common_name=&quot;yow-kscherer-l1.wrs.com&quot; \
ttl=720h
private_key             -----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAvxHQzyEjc13djntQfCo1ncpwU18a8c8iI4OdaOSQV72zbHf2
...
-----END RSA PRIVATE KEY-----
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;6) Final Steps&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Import root CA cert into trusted store&lt;/li&gt;
  &lt;li&gt;Create Policy to limit role access to cert creation&lt;/li&gt;
  &lt;li&gt;Use program like vault-pki-client to automate cert regeneration&lt;/li&gt;
  &lt;li&gt;Audit that certs are only created at expected times&lt;/li&gt;
  &lt;li&gt;Automate cert regeneration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Once this is setup, Heartbleed is a non event! As well as a PKI, i can
also use Vault to manage other secrets as well.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Docker Multi Host Networking</title>
   <link href="https://kscherer.github.io//docker/2017/05/02/docker-multi-host-networking"/>
   <updated>2017-05-02T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//docker/2017/05/02/docker-multi-host-networking</id>
   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;I recently did a presentation at work covering the basics of getting
docker container on different hosts to talk to one another. This was
motivated because I wanted to understand all the available strange
networking options and why Kubernetes choose the one network per pod
model as the default.&lt;/p&gt;

&lt;p&gt;Docker networking breaks many of the current assumptions about
networking. A modern server can easily run 100+ containers and a
datacenter rack can hold 80+ servers. If the networking model is one
IP per container, that implies 100+ IPs per machine and 1000s per
rack. Ephemeral containers with a short lifespan means that the
network has to react quickly.&lt;/p&gt;

&lt;p&gt;Of course there are competing container networking standards: CNM
(libnetwork from Docker) and CNI (CoreOS and Kubernetes). Beyond the
supported network models in Docker there is also a docker network
plugin ecosystem with various vendors providing special integration
with their gear.&lt;/p&gt;

&lt;h2 id=&quot;bridge-mode&quot;&gt;Bridge Mode&lt;/h2&gt;

&lt;p&gt;Let’s start simple with the default bridge mode. Docker creates a
linux bridge and veth per container. By default containers can access
external network but external network cannot access container. This is
the safe default. To allow external access to a container, host ports
are forwarded to container ports. IPTables rules to prevent inter
container communication. This functionality works with older kernels&lt;/p&gt;

&lt;h3 id=&quot;bridge-mode-example&quot;&gt;Bridge mode example&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; docker run --detach --publish 1234:1234 ubuntu:16.04 sleep infinity

# docker0 is the bridge, veth is connected to the docker0 bridge
&amp;gt; ip addr
2: eth0: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500
    inet 128.224.56.107/24 brd 128.224.56.255 scope global eth0
3: docker0: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500
    inet 172.17.0.1/16 scope global docker0
8: vethea44ea7@if7: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 master docker0 state UP

# iptables rules for packet forwarding
&amp;gt; iptables -L
Chain FORWARD (policy DROP)
target     prot opt source               destination
DOCKER     all  --  anywhere             anywhere

Chain DOCKER (1 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             172.17.0.2           tcp dpt:1234

# docker-proxy program forwards traffic from host port 1234 to container port 1234
&amp;gt; pgrep -af proxy
30676 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 1234 \
    -container-ip 172.17.0.2 -container-port 1234
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;bridge-mode-limitations&quot;&gt;Bridge Mode Limitations&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;The container IP is hidden and cannot be used for service
discovery&lt;/li&gt;
  &lt;li&gt;Host ports become a limiting resource&lt;/li&gt;
  &lt;li&gt;Service discovery must have host ip and port&lt;/li&gt;
  &lt;li&gt;Port forwarding has a performance cost&lt;/li&gt;
  &lt;li&gt;Does not scale well&lt;/li&gt;
  &lt;li&gt;Application must support non standard port numbers&lt;/li&gt;
  &lt;li&gt;Large scale solutions involve load balancers + service discovery&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;overlay&quot;&gt;Overlay&lt;/h2&gt;

&lt;p&gt;The overlay network feature Uses VXLAN to create a private network. It
is part of Docker swarm mode. Each group of containers (Pod) has a
dedicated network which is the Kubernetes network model. It does not
require any underlay network modification, i.e. the network that the
hosts are using. The docker swarm integration is very well done and
many of the details are nicely abstracted away.&lt;/p&gt;

&lt;p&gt;Benefits include:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Applications can use standard ports&lt;/li&gt;
  &lt;li&gt;Simplified service discovery can use DNS&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;overlay-example---create-swarm&quot;&gt;Overlay Example - Create Swarm&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;manager&amp;gt; docker swarm init --advertise-addr 128.224.56.106
Swarm initialized: current node (tqxsn8ytpdq8ntd4sswl6qxjo) is now a manager.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To add a worker to this swarm, run the following command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker swarm join \
--token SWMTKN-1-0f49cat8w4xm29qndjza1u294i2 128.224.56.106:2377

worker1&amp;gt; docker swarm join ...
This node joined a swarm as a worker.

manager&amp;gt; docker node ls
ID                           HOSTNAME         STATUS  AVAILABILITY  MANAGER STATUS
qshqkznzaty8ggbyiodzb9jy9    worker2          Ready   Active
r0fj6inhs1tsin07mdsxoaiam    worker1          Ready   Active
tqxsn8ytpdq8ntd4sswl6qxjo *  manager          Ready   Active        Leader
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;swarm-network-and-load-balancing&quot;&gt;Swarm Network and Load Balancing&lt;/h3&gt;

&lt;p&gt;I found this great example from the Nginx example repository:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/docker-swarm-load-balancing.png&quot; alt=&quot;Docker Swarm Load Balancing&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Docker Swarm has built in DNS, scheduling and load balancing!  In the
following example A is Service1, B is Service2 and is not externally
accessible&lt;/p&gt;

&lt;h3 id=&quot;overlay-example---create-service&quot;&gt;Overlay Example - Create Service&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; docker network create --driver overlay demo_net
&amp;gt; docker service create --name service1 --replicas=3 \
        --network demo_net -p 8111:80 service1
&amp;gt; docker service create --name service2 --replicas=3 --network demo_net service2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Containers spread across three machines&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; docker service ps service1
5fl2xpzbka28  service1.1  service1  worker1  Running
u3f4bd8q3p6d  service1.2  service1  worker2  Running
i85jdtgtinxr  service1.3  service1  manager  Running
&amp;gt; docker service ps service2
b5bzfdqw10y2  service2.1  service2  worker1  Running
k39m6utcq56o  service2.2  service2  worker2  Running
uaftc3ax0k17  service2.3  service2  manager  Running
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;overlay-example---load-balancing&quot;&gt;Overlay Example - Load Balancing&lt;/h3&gt;

&lt;p&gt;Service 1 contacts Service 2 using internal DNS.
Swarm uses Round Robin DNS lookup by default.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;manager&amp;gt; curl -s http://worker1:8111/service1.php | grep address
service1 address: 10.255.0.9
service2 address: 10.0.0.5
manager&amp;gt; curl -s http://worker2:8111/service1.php | grep address
service1 address: 10.255.0.8
service2 address: 10.0.0.4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;overlay-limitations&quot;&gt;Overlay Limitations&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;VXLAN MTU and UDP complications&lt;/li&gt;
  &lt;li&gt;VXLAN adds latency (10-20%) and reduces throughput (50-75%)&lt;/li&gt;
  &lt;li&gt;Debugging VXLAN problems difficult&lt;/li&gt;
  &lt;li&gt;Docker swarm hides all the setup and routing complexity&lt;/li&gt;
  &lt;li&gt;Some network vendors provide VXLAN integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;macvlan&quot;&gt;Macvlan&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Linux Networking driver feature&lt;/li&gt;
  &lt;li&gt;Low performance overhead&lt;/li&gt;
  &lt;li&gt;MAC and IP per container, similar to VM&lt;/li&gt;
  &lt;li&gt;MacVlan does not use VLANs!&lt;/li&gt;
  &lt;li&gt;Recently moved from docker experimental&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;macvlan-example&quot;&gt;Macvlan Example&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;host1&amp;gt; docker network create --driver macvlan --subnet 128.224.56.0/24 \
    --gateway 128.224.56.1 -o parent=eth0 mv1
host2&amp;gt; docker network create --driver macvlan --subnet 128.224.56.0/24 \
    --gateway 128.224.56.1 -o parent=eth0 mv1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Choose unused IPs&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;host1&amp;gt; docker run -it --rm --net=mv1 --ip=128.224.56.119 alpine /bin/sh
host2&amp;gt; docker run -it --rm --net=mv1 --ip=128.224.56.120 alpine /bin/sh
/ # ping 128.224.56.119
PING 128.224.56.119 (128.224.56.119): 56 data bytes
64 bytes from 128.224.56.119: seq=0 ttl=64 time=0.782 ms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Imagine /16 subnet where each host has a /24 for container IPs&lt;/p&gt;

&lt;h3 id=&quot;macvlan-limitations&quot;&gt;Macvlan Limitations&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Subnet and gateway must match host network&lt;/li&gt;
  &lt;li&gt;Requires new kernels: 4.2+&lt;/li&gt;
  &lt;li&gt;Requires IPAM and network cooperation&lt;/li&gt;
  &lt;li&gt;Isolation requires VLANs and/or firewalls&lt;/li&gt;
  &lt;li&gt;Limited to one broadcast domain&lt;/li&gt;
  &lt;li&gt;Too many MACs can overflow NIC buffer&lt;/li&gt;
  &lt;li&gt;Docker can allocate IPs in a given range&lt;/li&gt;
  &lt;li&gt;IPVLan L2 mode very similar&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;ipvlan-l3-mode&quot;&gt;IPVlan L3 Mode&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Linux Networking driver feature&lt;/li&gt;
  &lt;li&gt;Low performance overhead&lt;/li&gt;
  &lt;li&gt;Multicast and broadcast traffic silently dropped&lt;/li&gt;
  &lt;li&gt;Mimics Internet architecture of aggregated L3 domains&lt;/li&gt;
  &lt;li&gt;Scales well due to no broadcast domain&lt;/li&gt;
  &lt;li&gt;Docker experimental as of 1.13&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;ipvlan-example&quot;&gt;IPVlan Example&lt;/h3&gt;

&lt;p&gt;create network - requires dockerd run with –experimental&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;host1&amp;gt; docker network create --driver ipvlan --subnet 192.168.120.0/24 \
    -o parent=eth0 -o ipvlan_mode=l3 iv1
host2&amp;gt; docker network create --driver ipvlan --subnet 192.168.121.0/24 \
    -o parent=eth0 -o ipvlan_mode=l3 iv1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Setup routes: host1=128.224.56.106, host2=128.224.56.107&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;host1&amp;gt; ip route add 192.168.121.0/24 via 128.224.56.107
host2&amp;gt; ip route add 192.168.120.0/24 via 128.224.56.106
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create containers&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;host1&amp;gt; docker run -it --rm --net=iv1 --ip=192.168.120.10 alpine /bin/sh
host2&amp;gt; docker run -it --rm --net=iv1 --ip=192.168.121.10 alpine /bin/sh
/ # ping 192.168.120.10
PING 192.168.120.10 (192.168.120.10): 56 data bytes
64 bytes from 192.168.120.10: seq=0 ttl=64 time=0.408 ms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;ipvlan-limitations&quot;&gt;IPVLan Limitations&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Currently experimental&lt;/li&gt;
  &lt;li&gt;Requires new kernels: 4.2+&lt;/li&gt;
  &lt;li&gt;Isolation requires VLANs and/or iptables&lt;/li&gt;
  &lt;li&gt;Manage routes using BGP with Calico, Cumulus, etc.&lt;/li&gt;
  &lt;li&gt;Container networking becomes a routing problem, which is a well
understood problem&lt;/li&gt;
  &lt;li&gt;Policies using BPF on veth and Cillium&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Docker Multi-Host Networking is complicated!&lt;/li&gt;
  &lt;li&gt;Performance and Scale dictate solution&lt;/li&gt;
  &lt;li&gt;Balance between simplifying applications and infrastructure&lt;/li&gt;
&lt;/ul&gt;

</content>
 </entry>
 
 <entry>
   <title>Book Review: Sapiens</title>
   <link href="https://kscherer.github.io//review/2017/05/01/book-review-sapiens"/>
   <updated>2017-05-01T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2017/05/01/book-review-sapiens</id>
   <content type="html">&lt;p&gt;“Sapiens: A Brief History of Humankind” by Yuval Noah Harari&lt;/p&gt;

&lt;p&gt;It is hard to do such a dense and well written book justice in a short
blog post. I really enjoyed the content and writing style.&lt;/p&gt;

&lt;p&gt;Much of content overlaps with books like “Guns, Germs and Steel” and
“The third Chimpanzee” by Jared Diamond. The mass extinctions and
genocides directly attributed to our ancestors are covered. The book
is very careful to draw clear boundaries around the limits of our
historical knowledge.&lt;/p&gt;

&lt;p&gt;The first concept that really got me thinking was Culture as shared
myth or fiction. Getting large groups of humans to live together
requires mechanisms to limit anti-social behaviour, but violence and
surveillance do not scale well. Shared fictions like the hierarchy of
royalty over common people can be much more effective at regulating
behavior. The clearest example from the book is the concept of a
corporation. It exists only because people accept that it exists. It
does not exist because a few people scribbled on some paper, although
the ritual can be important. A corporation is technically just a group
of people. What binds them together is an imagined construct of
hierarchy, rules, values and an identity which is accepted as real by
potentially millions of people.&lt;/p&gt;

&lt;p&gt;One of my favorite lines of the book:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Yet it is an iron rule of history that every imagined hierarchy
disavows its fictional origins and claims to be natural and
inevitable.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Every culture from the Greeks to modern democracy to Communist Russia
made the same claim of being natural and inevitable. The book even
takes on imaged hierarchies like racial and gender and dismantles
their proponents. Money is another convenient shared fiction that many
people claim as inevitable. It also makes the excellent point that our
current society places rich above poor and this is no more natural
than placing men above women or whites above blacks. It makes the
current discussions of wealth inequality even more urgent.&lt;/p&gt;

&lt;p&gt;There is so much thought provoking material in this book I cannot
cover it all. The last section talks about the future of our species:
changing our genetics, becoming cyborgs and creating an intelligence
more capable than our own. Each of these paths has mind boggling
possibilities. The final line of the book sums it up very well:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Since we might soon be able to engineer our desires too, the real
question facing us is not &quot;What do we want to become?&quot; but &quot;What
do we want to want?&quot; Those who are not spooked by this question
probably haven't given it enough thought.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is a book that changed the way I see and think about the
world. That is highest praise for a book that I can think of.&lt;/p&gt;

&lt;p&gt;Rating: Highly recommended&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Book Review: Ego is the Enemy</title>
   <link href="https://kscherer.github.io//review/2017/04/20/book-review-ego-is-the-enemy"/>
   <updated>2017-04-20T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2017/04/20/book-review-ego-is-the-enemy</id>
   <content type="html">&lt;p&gt;“Ego is the Enemy” by Ryan Holiday&lt;/p&gt;

&lt;p&gt;The central message of this book is not new. Ego has always a been a
double edged sword. It motivates and energizes, but it also undermines
us in many ways. Fundamentally all worthwhile progress involves more
than one person and ego undermines human relationships. This book was
a fantastic reminder of all the ways ego can undermine our
relationships with other people and progress on our goals. The most
enjoyable part was all the examples of famous and less famous people
and how they succeeded by controlling ego or failed due to their ego.&lt;/p&gt;

&lt;p&gt;The single line that resonated with me the most was “We choose to
be or to do”. We either choose to expend our energy projecting an
image of who we want to be or expend our energy doing the work. Do the
work because it is important, not because we expect to be rewarded or
acknowledged.&lt;/p&gt;

&lt;p&gt;Thoughts provoked by this book:&lt;/p&gt;

&lt;p&gt;The meaning of work is often a matter of perspective. A piece of code
can be both “just a hack” and a valuable contribution to the world of
open source software at the same time. At the same time I don’t want
to make a small contribution seem more important than it really is.&lt;/p&gt;

&lt;p&gt;Sometimes the work feels meaningful and sometimes I have to remind
myself to change my perspective. Sometimes I think a different job
would be more meaningful, but that ignores all the drudgery that is
part of any job. I feel most motivated when I feel part of something
much bigger than myself. For me it has always been the mythical
“community” of open source software. Time to buckle down and do the
work.&lt;/p&gt;

&lt;p&gt;Rating: Recommended&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>PXE install on UEFI using Foreman and GRUB2</title>
   <link href="https://kscherer.github.io//linux/2017/03/20/pxe-install-on-uefi-using-foreman-and-grub2"/>
   <updated>2017-03-20T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux/2017/03/20/pxe-install-on-uefi-using-foreman-and-grub2</id>
   <content type="html">&lt;p&gt;Most of the bare metal hardware that I manage now supports or defaults
to UEFI. Many have the option to use “Legacy BIOS” mode, but the main
feature I find that I require from UEFI is support for the boot volume
to be 2TB+. I prefer one single RAID0 volume for all the builders for
operational simplicity.&lt;/p&gt;

&lt;h3 id=&quot;foreman&quot;&gt;Foreman&lt;/h3&gt;

&lt;p&gt;My preferred solution for installing the base OS on the hardware
is &lt;a href=&quot;https://theforeman.org/&quot;&gt;Foreman&lt;/a&gt;. It makes automated installs very simple and
reproducible but has only recently supported UEFI and PXE. I will
describe my previous attempts to get this working and how I was able
to get it working with Foreman 1.14.2.&lt;/p&gt;

&lt;h3 id=&quot;pxelinux-and-uefi&quot;&gt;Pxelinux and UEFI&lt;/h3&gt;

&lt;p&gt;Pxelinux is part of the &lt;a href=&quot;http://www.syslinux.org/wiki/index.php?title=The_Syslinux_Project&quot;&gt;syslinux&lt;/a&gt; project and provides many
different types of bootloaders. Pxelinux depends on a custom ROM
inside the network card to run DHCP and download kernel+initrd using
TFTP. It also has support for displaying interactive menus to the
user.&lt;/p&gt;

&lt;p&gt;UEFI contains all this functionality but unfortunately did not think
to extend it or preserve backwards compatibility. All boot time
programs like grub2 and pxelinux required significant rework. I was
able to use the syslinux git tree and compile a working EFI version of
pxelinux that was able to boot the 14.04 Ubuntu installer. But there
were limitations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Foreman only supported non-efi Pxelinux and I had to manually swap
the binaries on the TFTP server&lt;/li&gt;
  &lt;li&gt;The menu system didn’t work so I could not use the Foreman feature
of leaving the system to boot PXE by default and booting the local
hard drive if rebuild was not enabled for that host in Foreman.&lt;/li&gt;
  &lt;li&gt;I could not get this pxelinux to work with 16.04 installer. The
initrd would be downloaded and would hang and trigger a system reset.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;uefi-and-grub2&quot;&gt;UEFI and GRUB2&lt;/h3&gt;

&lt;p&gt;Foreman 1.13 added support for GRUB2 and UEFI, but my initial attempts
failed. When I changed the boot template from PXELinux to PXEGRUB2 the
update of the DHCP server would fail. The DHCP entry was added
properly to the DHCP server using the Foreman Proxy, but it would
cause a traceback on the server and prevent the Host change from being
saved. This bug was fixed in 1.14 and I was finally able to get this
working. There was one more bug in the PXEGRUB2 boot template
involving an assumption about Profiles. I opened an issue and have
submitted a PR to the community templates for this.&lt;/p&gt;

&lt;p&gt;Foreman 1.14.2 was also missing the Preseed default PXEGrub2 template,
but one had already been submitted to the community templates repo, so
I had to manually add &lt;a href=&quot;https://github.com/theforeman/community-templates/blob/develop/provisioning_templates/PXEGrub2/preseed_default_pxegrub2.erb&quot;&gt;this template&lt;/a&gt; to my provisioning templates.&lt;/p&gt;

&lt;h3 id=&quot;tftp-preparation&quot;&gt;TFTP preparation&lt;/h3&gt;

&lt;p&gt;Foreman adds a DHCP record which contains the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;server.filename = &quot;grub2/grubx64.efi&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First step was to find the proper grub2 binary. Fortunately the Ubuntu
wiki had a helpful post covering &lt;a href=&quot;https://wiki.ubuntu.com/UEFI/PXE-netboot-install&quot;&gt;UEFI PXE netboot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I was able to find the xenial grubnetx64.efi &lt;a href=&quot;http://archive.ubuntu.com/ubuntu/dists/xenial/main/uefi/grub2-amd64/current/grubnetx64.efi.signed&quot;&gt;here&lt;/a&gt;. But it turns
out the Debian/Ubuntu grub2 is missing a few useful features that have
been added to the Fedora grub2. The Ubuntu/vanilla grub2 only looks
for grub/grub.conf whereas the Fedora grub2 has patches to search the
grub2 directory and search for grub.cfg-[mac address] which is a
convention that Foreman expects. Since Foreman is a project mostly run
by RedHat employees it makes sense. The Fedora prebuilt grub2
bootloader is &lt;a href=&quot;https://download-ib01.fedoraproject.org/pub/fedora/linux/releases/25/Server/x86_64/os/EFI/BOOT/grubx64.efi&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There is PR which adds a default grub/grub.cfg and uses the grub2
regexp feature to search for $prefix/grub.cfg-[mac address]. This
means that will support vanilla Grub2 soon.&lt;/p&gt;

&lt;p&gt;Foreman will also place the correct kernel and initrd into the boot
directory. It will not replace an older kernel, so sometimes a newer
kernel and initrd need to be download from &lt;a href=&quot;http://archive.ubuntu.com/ubuntu/dists/xenial/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/&quot;&gt;here&lt;/a&gt; and manually
added to the boot directory.&lt;/p&gt;

&lt;h3 id=&quot;how-does-it-work&quot;&gt;How does it work?&lt;/h3&gt;

&lt;p&gt;Here is how this works:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Put the host in build mode. This sets up the grub2/grub.cfg-&lt;mac address=&quot;&quot;&gt; file with the automated build setup. It also adds a DHCP
entry specifying to download the &quot;grub2/grubx64.efi&quot; file.&lt;/mac&gt;&lt;/li&gt;
  &lt;li&gt;Start PXE boot and UEFI retrieves IP, filename and next-server/TFTP
from DHCP server&lt;/li&gt;
  &lt;li&gt;UEFI downloads grub2/grubx64.efi from TFTP&lt;/li&gt;
  &lt;li&gt;GRUB2 looks for grub2/grub.cfg-[mac address]&lt;/li&gt;
  &lt;li&gt;Grub2 template contains the automated install configuration
generated by Foreman&lt;/li&gt;
  &lt;li&gt;GRUB2 downloads kernel and initrd and boots the kernel and starts
the installer&lt;/li&gt;
  &lt;li&gt;After install is complete, PXELinux template is changed back to
chainload local disk&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;The deficiencies of the previous process have been addressed. GRUB2
can boot the 16.04 kernels and even the hwe kernels and installer if I
want to. The menus and boot to local disk are working.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Book Review: Rebirth</title>
   <link href="https://kscherer.github.io//review/2017/02/18/book-review-rebirth"/>
   <updated>2017-02-18T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2017/02/18/book-review-rebirth</id>
   <content type="html">&lt;p&gt;This book is a fictional/auto biographical account of one mans journey
on the Camino pilgrimage trail in Spain. I really enjoyed it. The
characters are quirky and very human with baggage and beautiful
experiences. The dialogue is a little too perfect, but it made me
consider doing a long walk like this.&lt;/p&gt;

&lt;p&gt;Some of my Favorite quotes from the book:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Questions that help guide the way, like the yellow arrows on the
Camino
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Too often I hear about guiding values and statements, but I really
like the idea of guiding questions. Tim Ferris and his podcast guests
often talk about questions that guided their decisions.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;If I loved myself, what would I do?&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I find this a tough question because it feels selfish. Finding a
balance between selfishness and selflessness never ends. I wish there
was a single answer, but I know that isn’t possible.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;Don't ask why, ask 'Now what'? People have made it through horrific
times not by focusing on why but moving on and asking 'Now What?'&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Trying to understand is important, but sometimes the energy is better
spent on getting ready for the future.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;It is not the wound that makes you special, it is the light that
shines through it&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A great reminder that the hardships of life define you as much as the
successes. I have always marvelled at artists that were able to
transform immense pain into incredible music and art.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;Perfect is no unnecessary pain. I wish you a perfect Camino.&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately sometimes pain is necessary. Pain is such a multi
faceted concept and hard to talk about. Maybe I will find someone who
can do it more eloquently than I can.&lt;/p&gt;

&lt;p&gt;Rating: Recommended&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Puppet Infrastructure Overhaul</title>
   <link href="https://kscherer.github.io//puppet/2016/12/22/puppet-infrastructure-overhaul"/>
   <updated>2016-12-22T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//puppet/2016/12/22/puppet-infrastructure-overhaul</id>
   <content type="html">&lt;p&gt;I have been planning to upgrade my infrastructure to Puppet 4 but
other priorities have delayed it. I was finally able to find a way to
start the upgrade work. There are many new pieces of technology
available which I hope will make things work even better than before.&lt;/p&gt;

&lt;h3 id=&quot;puppet-4&quot;&gt;Puppet 4&lt;/h3&gt;

&lt;p&gt;Since Puppet 3 is End Of Life at the end of 2016, this upgrade is
probably the most urgent. I am looking forward to being able to use
the improved Puppet language and r10k. The Puppet Server is supposed
to be much faster and the AIO packages should be easier to install and
support.&lt;/p&gt;

&lt;h3 id=&quot;mcollective-choria&quot;&gt;MCollective Choria&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://www.devco.net/&quot;&gt;R.I.Pienaar&lt;/a&gt; has been busy and built a new mcollective deployment package
called &lt;a href=&quot;http://choria.io/&quot;&gt;Choria&lt;/a&gt;. It has puppet modules which automatically enables
SSL everywhere, has an audit plugin, a packager for plugins and uses
NATS instead of ActiveMQ. My federated cluster with three ActiveMQ
servers has been stable, but it was a pain to setup and
upgrade. It is also managed using a custom puppet module which I do
not want to maintain. I am also hoping to be able to use NATS as a
message bus for some application orchestration.&lt;/p&gt;

&lt;h3 id=&quot;gitolite&quot;&gt;Gitolite&lt;/h3&gt;

&lt;p&gt;I maintain a large internal network of git servers. The base
configuration is very open and anyone with a valid ssh login using NIS
can create or push to repositories. Every repository is available for
unauthenticated read-only access. We have a few post-receive hooks to
limit who can push to what repositories, but our developers respect
our gatekeeper model and do not push to repositories they aren’t
supposed to. The open access model has allowed people to do emergency
fixes when necessary. But there has occasionally been requests for
some sort of access control and I also have considered locking down
the repository with the Puppet modules because it is so critical to
the business, so I decided to experiment with &lt;a href=&quot;http://gitolite.com/&quot;&gt;Gitolite&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;r10k&quot;&gt;R10K&lt;/h3&gt;

&lt;p&gt;I have been using &lt;a href=&quot;http://librarian-puppet.com/&quot;&gt;librarian-puppet&lt;/a&gt; with a custom git
synchronization program which relies on the ActiveMQ network. I have 3
puppet masters and the post-receive hook uses STOMP to broadcast
changes. The &lt;a href=&quot;https://github.com/kscherer/git-stomp-hooks&quot;&gt;git-stomp-hook&lt;/a&gt; receives the broadcast and calls
librarian-puppet as appropriate. This has worked well except for when
the ActiveMQ network was having problems. So I was happy to notice
that the &lt;a href=&quot;https://github.com/voxpupuli/puppet-r10k&quot;&gt;puppet-r10k&lt;/a&gt; module contains a webhook program that can
be used to trigger r10k deploy on the puppet masters. Since r10k was
integrated into &lt;a href=&quot;https://puppet.com/product&quot;&gt;Puppet Enterprise&lt;/a&gt;, I decided to move away from
librarian-puppet. R10k actually works very similarly to the solution I
had cobbled together, it just ignores module dependencies. This is
both a blessing and a curse, but because Puppet does not support
conditional dependencies it may be better long term to manage
dependencies manually.&lt;/p&gt;

&lt;h3 id=&quot;bootstrapping-a-puppet-server&quot;&gt;Bootstrapping a Puppet Server&lt;/h3&gt;

&lt;p&gt;I manage my Puppet 3 server using Puppet and the bootstrap process is
tricky. Given a machine with just the puppet agent, how to get the
Puppet Server + Hiera + R10K and my control repo installed in a
reproducible way. I started with the Puppetlabs &lt;a href=&quot;https://github.com/puppetlabs/control-repo.git&quot;&gt;control-repo&lt;/a&gt;
skeleton which gave me the basics, but no bootstrap. I looked through
a lot of repos and finally found &lt;a href=&quot;https://github.com/puppetinabox/controlrepo&quot;&gt;puppetinabox control-repo&lt;/a&gt;
by &lt;a href=&quot;https://rnelson0.com/&quot;&gt;rnelson0&lt;/a&gt;. This repo uses a script to install the bootstrap
modules locally and puppet apply with some simple puppet manifests to
do the bootstrap. I decided to use this approach as well.&lt;/p&gt;

&lt;h3 id=&quot;a-puppet-module-to-manage-puppet&quot;&gt;A Puppet module to manage Puppet&lt;/h3&gt;

&lt;p&gt;Next step was to choose a module to manage the Puppet server. I
reviewed many but many had crazy dependencies or didn’t support the
way I wanted to configure my systems. I ended up using
the &lt;a href=&quot;https://github.com/theforeman/puppet-puppet&quot;&gt;puppet-puppet&lt;/a&gt; module maintained by the &lt;a href=&quot;https://theforeman.org/&quot;&gt;Foreman&lt;/a&gt;
team. It is a big module, but it supports:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Puppet agent run using cron&lt;/li&gt;
  &lt;li&gt;Puppet server setup on Ubuntu 16.04&lt;/li&gt;
  &lt;li&gt;Compatible with Puppetdb and r10k&lt;/li&gt;
  &lt;li&gt;Foreman integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did add Foreman integration to my Puppet3 module, so having that was
interesting to me.&lt;/p&gt;

&lt;h3 id=&quot;scripting-the-bootstrap&quot;&gt;Scripting the bootstrap&lt;/h3&gt;

&lt;p&gt;The bootstrap script does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Make sure git is installed&lt;/li&gt;
  &lt;li&gt;Clone all the required modules into a bootstrap directory. I make
internal git mirrors of all the puppet modules I use.&lt;/li&gt;
  &lt;li&gt;Run puppet apply using 3 manifests to install puppet server, hiera
and r10k.&lt;/li&gt;
  &lt;li&gt;Run r10k deploy to generate the local production environment.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I now had a server setup and could start creating roles and profiles
to manage the server.&lt;/p&gt;

&lt;h3 id=&quot;r10k-webhook&quot;&gt;R10K Webhook&lt;/h3&gt;

&lt;p&gt;Redundancy in infrastructure is good and having two ways to synchronize
the environments on the masters is also a good idea. I could use
mcollective, but that hasn’t been setup yet. The &lt;a href=&quot;https://github.com/voxpupuli/puppet-r10k&quot;&gt;puppet-r10k&lt;/a&gt;
module comes with a webhook. This webhook is a small ruby sinatra
application that listens for http connections and triggers r10k
commands as appropriate. It supports GitHub, GitLab, Bitbucket,
etc. but I don’t need those. Since I am using a local gitolite server
I created a git post-receive hook that calls curl with the updated
branch:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl -d &quot;{ \&quot;ref\&quot;: \&quot;$REFNAME\&quot; }&quot; -H &quot;Accept: application/json&quot; \
 &quot;https://puppet:puppet@$HOST:8088/payload&quot; -k -q
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;By default the webhook and r10k run as root which is something I try
to avoid. I was able to change the user for the webhook to puppet,
chown all the r10k cache and environment dirs to puppet user and
everything works. It also uses the SSL certs as signed by the Puppet
CA to encrypt the communication.&lt;/p&gt;

&lt;h3 id=&quot;bash-post-receive-hook-and-subshells&quot;&gt;Bash post receive hook and subshells&lt;/h3&gt;

&lt;p&gt;The only problem with this approach is that the user much wait when
running git push for the script to complete. I was able to run the
curl command in a subshell and have the post-receive script exit
quickly.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;( trigger_webhook &quot;$refname&quot; &amp;lt;hostname&amp;gt; ) &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This code will run to completion even once the parent shell has
exited. The logs of the synchronization are stored on the puppet
master. They could be stored on the git server as well but that isn’t
necessary.&lt;/p&gt;

&lt;h3 id=&quot;next-steps&quot;&gt;Next steps&lt;/h3&gt;

&lt;p&gt;Install MCollective Choria and start porting the base configuration
with ntp, ssh keys, package management, etc. to Puppet 4.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Book Review: 'The War of Art' by Steven Pressfield</title>
   <link href="https://kscherer.github.io//review/2016/10/24/book-review-the-war-of-art-by-steven-pressfield"/>
   <updated>2016-10-24T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2016/10/24/book-review-the-war-of-art-by-steven-pressfield</id>
   <content type="html">&lt;p&gt;I found the format strange for a book because it felt like a
collection of short blog posts. The writing was very repetitive and
with all the other books I have read by authors like Seth Godin, the
message didn’t feel new or motivating. I kept reading hoping for some
insight or inspiring wording but I finished the book feeling
disappointed. Almost every day the daily blog post by Seth Godin
resonates with me and/or inspires me, but this book did not.&lt;/p&gt;

&lt;p&gt;Rating: Not recommended&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Book Review: 'The Time Paradox' by Philip Zimbardo and John Boyd</title>
   <link href="https://kscherer.github.io//review/2016/10/24/book-review-the-time-paradox-by-philip-zimbardo-and-john-boyd"/>
   <updated>2016-10-24T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2016/10/24/book-review-the-time-paradox-by-philip-zimbardo-and-john-boyd</id>
   <content type="html">&lt;p&gt;“The Time Paradox” was another highly rated book by Deric Sivers. The
central message of this book also falls into that category of “obvious
now except it wasn’t before”. The premise is very ambitious because it
attempts to universally categorize people and their behaviors across
six dimensions. The usual categories of race, religion, age, gender,
education, etc. are flawed, but this book identifies time as the
universal experience of all humans. Every human has a unique
perspective on the past, present and future we all inhabit. The book
breaks the time perspectives into past positive, past negative,
present-hedonistic, present-fatalistic, future and transcendental future.&lt;/p&gt;

&lt;p&gt;Past Positive: Views past events in a positive way. Finds the good in
the way things happened.&lt;/p&gt;

&lt;p&gt;Past Negative: Views past events in a negative way. Finds the bad in
the way things happened&lt;/p&gt;

&lt;p&gt;Present Hedonistic: In the moment and focused on the pleasures of the
now.&lt;/p&gt;

&lt;p&gt;Present Fatalistic: In the moment but discounting or ignoring the
risks of present actions&lt;/p&gt;

&lt;p&gt;Future: Planning for things that will happen to you later than
now. Delaying gratification and waiting for the larger reward that
will come later.&lt;/p&gt;

&lt;p&gt;Transcendental Future: Planning for things that will happen after the
life of that person has ended. This can be spiritual concepts like the
afterlife or non-spiritual concepts like the 10 000 year “Long Now”
foundation or the native American tribes considering the seventh
generation.&lt;/p&gt;

&lt;p&gt;Once these dimensions are defined, the book presents a fictional
dialog between six people, each of which represents one time
dimension. The conversation is a little too scripted, but I recognized
people I know and their behaviors in each of the characters. Of course
no real person lives in only one time dimension. We all shift through
the different dimensions in different degrees at different times in
our lives. For example, children are very present oriented and our
society rewards and encourages a future orientation.&lt;/p&gt;

&lt;p&gt;The authors then provide the full test that they created to determine
time orientation in their studies. I ended up skipping this section
because I was convinced I already knew that I was too future
oriented. I intend to go back and take the full test and see if there
are any surprises.&lt;/p&gt;

&lt;p&gt;The last part of the book explores the dimensions and suggests ways to
deal with and minimize the less desirable orientations like
past-negative and present-fatalistic. It also suggests ways to balance
the positive dimensions like past-positive, present and future with
specific suggestions to help future oriented people live in the moment
and help everyone find ways to reframe negative past events in
positive ways.&lt;/p&gt;

&lt;p&gt;The tricky one is the transcendental future orientation. Taking a
perspective that extends past ones own life can be very noble, but can
also lead to a state where choosing to be a suicide bomber is a
rational option.&lt;/p&gt;

&lt;p&gt;As stated before, I recognize that I am too future oriented and have
been exploring ways to focus on the present moment more. I am
experimenting with meditation and am trying to find more time for
activities that encourage present awareness: massage, dancing,
music, exercise, cooking and just being silly. I also recognize that
my photography hobby is a great way to encourage a past positive
orientation. The goal is to find a balance because we need all three
dimensions to be happy and feel fulfilled.&lt;/p&gt;

&lt;p&gt;Rating: Highly recommended&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Book Review: 'So Good They Cannot Ignore You' by Cal Newport</title>
   <link href="https://kscherer.github.io//review/2016/10/24/book-review-so-good-they-cannot-ignore-you-by-cal-newport"/>
   <updated>2016-10-24T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2016/10/24/book-review-so-good-they-cannot-ignore-you-by-cal-newport</id>
   <content type="html">&lt;p&gt;I read regularly thanks to our local Public Library. Recently the Tim
Ferris podcast has expanded my reading list with lots of interesting
books. Luckily the local library has most of these books and hold
waiting lists tend to space the reading out well.&lt;/p&gt;

&lt;p&gt;One of the the first Tim Ferris podcasts I listened to was with Derek
Sivers and he mentioned that he maintains a list of book reviews with
ratings on his website. I immediately went to the website and looked
through the highest rated books and setup holds at the library.&lt;/p&gt;

&lt;p&gt;The first book I read was “So good they cannot ignore you” by Cal
Newport. This was a short read and the simplicity of its message
resonated with me. The basic message is that our society rewards
people with rare and valuable skills, not people with passion. Many
sources of career advice talk about “following passion”, but passion
without skills is not sufficient. If passion drives the building of
valuable skills then it is helpful. The media often portrays having
passion as the most important requirement to getting a great job but
that is harmful because it often leads to confusion and inflated
expectations. Cal calls these valuable skills “career capital” and I
really like that perspective. Career capital is skills and experience
that can be exchanged for career opportunities.&lt;/p&gt;

&lt;p&gt;The best books motivate you to make changes in your life. This book
helped me reflect on the career capital that would help me advance my
career. For a software developer that would be submitting patches to
open source projects and writing technical blog posts. I have decided
to commit 10% of my work time to doing this. I haven’t been able to
implement it fully, but I am making more of an effort. I have reported
bugs and submitted some small patches. Ideally this will grow to more
and larger patches.&lt;/p&gt;

&lt;p&gt;Rating: Highly recommended.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Book Review: 'Money: Master the Game' by Tony Robbins</title>
   <link href="https://kscherer.github.io//review/2016/10/24/book-review-money-master-the-game-by-tony-robbins"/>
   <updated>2016-10-24T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//review/2016/10/24/book-review-money-master-the-game-by-tony-robbins</id>
   <content type="html">&lt;p&gt;After listening to a Tim Ferris podcast with Tony Robbins, I decided
to read his latest book “Money: Master the Game”. I was skeptical of
Tony Robbins and his style of motivation speaking. The book has a very
informal speaking style with lots of bold text where you feel Tony is
waving his hands frantically. But the content of the book is superb if
a little long winded. I have read a few books about finance and
investing, but this was the first to take a full life time look at
saving, investing and retiring.&lt;/p&gt;

&lt;p&gt;The first part is about saving regularly, starting early and avoiding
excessive fees which are usually hidden. I feel my family is doing
well here, but making saving automatic is a good reminder that saving
is against our basic nature. Money finds ways to get spent.&lt;/p&gt;

&lt;p&gt;The next part was making the case that we probably require less money
than we think to sustain the retirement lifestyle we think we
want. There were three levels of lifestyle and each had a worksheet
that required estimates of how much we currently spend. I should have
filled out these worksheets, but I did not because I have this fantasy
of managing our finances with something like hledger which will
provide these answers. When I imagine my retirement, it isn’t filled
with high expense activities like travelling the world on a yacht or
having a private jet. Imaging retirement is something I want to do
more of with my family. It will make this kind of planning easier.&lt;/p&gt;

&lt;p&gt;The part about investing had some real gems. I was aware of the
importance of asset allocation and having investments that are not
correlated, but the Ray Dalio “all weather” portfolio was
fascinating. It was the first time I had heard of a portfolio that had
so little downside risk with such substantial upside. I always assumed
that any investment with high return required accepting extra
risk. This is an investment strategy that outperformed the “market” or
S&amp;amp;P 500 over decades with almost no loss in capital (maximum loss was
less than 4%). The “secret” is a large allocation of long term bonds
with a small allocation in gold and commodities. The logic that the
economy has four seasons which are the combination of growth and
inflation. Having assets that do well in each “season” has finally
shown me what proper diversification looks like. Since stocks and
bonds are correlated, the classic advice of stock and bond
diversification is problematic. Right now the world economy is in a
period of low inflation and low growth. When it switches (and it will)
to higher inflation and “negative” growth (what a silly term), the
standard advice for asset allocation will cause big problems.&lt;/p&gt;

&lt;p&gt;For me the action item from this has been to look very carefully at
the current asset allocation of my portfolio. Right now I am following
a very contrarian style and my largest holding are real return bonds,
short US equities and long commodities like gold and energy. But this
is very short term focused and hopefully a longer focus will expose me
to less risk and volatility.&lt;/p&gt;

&lt;p&gt;The next part focused on what Tony referred as the “back of investment
mountain”. I have spent a lot of time thinking about how to invest,
but not what to do with that investment. I assumed I would retire at
some point and spend the rest of my retirement managing my pile of
investments. The book again showed me options that I was not aware
of. Apparently there are “hybrid” annuities that provide payments for
life while growing with the equity market but with full capital
preservation! Frankly it sounds too good to be true, but I have made a
note to investigate this further. The possibility of “getting out of
the game” by having an income without having to worry about it is very
appealing. I am skeptical because I don’t understand why an insurance
company would take on this much long term risk. At least I think the
premium must be very high to offset the risk, but Tony insists this
plan is now available to all US citizens and I intend to see if I can
find something similar in Canada.&lt;/p&gt;

&lt;p&gt;There is a section with interviews of some of the greatest investors
ever like Charles Schwab, Ray Dalio, Warren Buffet, etc. This part was
nice, but did not contain helpful specific advice that wasn’t
mentioned in other parts of the book. There was one brief mention of
technical trading which I found strange because it actually goes
against most of the advice in the book. Technical trading assumes the
past stock price behavior can be used to predict the future stock
price. It is true that some people have become very rich that way, but
to me it is too much like gambling without any acknowledgment that the
stock represents a company or group of companies with assets and
revenue and people. On the other hand technical trading increases
volatility which can be useful to contrarian investors like myself.&lt;/p&gt;

&lt;p&gt;The last chapter is about the power of giving. I am very motivated to
give my time and energy to my family and friends but the giving of
money is complicated. All things being equal, I would like any
donations to do the most good possible. Even defining what I mean by
good is difficult: less suffering?, more education?, more opportunity?
more equality? less disease? Maybe whatever makes me feel the happiest
is the simplest approach and I need to accept that it will probably
not be the most efficient. If my money isn’t making me happy, why even
bother working so hard to accumulate it in the first place? My action
item is to manage our finances better and look for ways to give more
money in ways that will make me and my family happier.&lt;/p&gt;

&lt;p&gt;I learned a lot from this book. It has also changed my opinion of Tony
Robbins. The book was a real gift to me and I am planning my future
differently because of it.&lt;/p&gt;

&lt;p&gt;Rating: Highly Recommended&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Running mesos agents over an unreliable network</title>
   <link href="https://kscherer.github.io//mesos/2016/07/12/running-mesos-agents-over-unreliable-network"/>
   <updated>2016-07-12T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//mesos/2016/07/12/running-mesos-agents-over-unreliable-network</id>
   <content type="html">&lt;p&gt;I have mesos agents located in three datacenters with a usually
reliable WAN connection. Occasionally though all the running tasks in
a DC get killed and it gets traced back to a WAN connection
interruption.&lt;/p&gt;

&lt;p&gt;This hasn’t been a big problem until recently when a fail over link
had high enough latency that the agents would disconnect and kill all
running tasks approx every half hour for about 12 hours. I tried to
figure out which configuration options need to be tweaked for the
master and agents to wait longer before killing tasks and this is what
I came up with.&lt;/p&gt;

&lt;p&gt;Current setup:&lt;/p&gt;

&lt;p&gt;Three DC: DC1 (central), DC2 and DC3.
Mesos 0.27.2 with custom python scheduler
3 node Zookeeper 3.4.5 cluster in DC1 with 3 HA mesos masters.
Zookeeper observer nodes in DC2 and DC3
Agents in DC2 connect to Zookeeper observer in DC2&lt;/p&gt;

&lt;p&gt;From my research there are several timeouts that are at play here:&lt;/p&gt;

&lt;p&gt;1) Zookeeper &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ticktime&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;synclimit&lt;/code&gt;. Unfortunately the zookeeper
&lt;a href=&quot;https://issues.apache.org/jira/browse/ZOOKEEPER-1607&quot;&gt;read-only observer feature&lt;/a&gt; is not available yet, so when the
observer loses connection it drops connections to the agents. There
isn’t an agent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zk_session_timeout&lt;/code&gt; configuration option, but it looks
like the agent force expires the zk session after 10 sec (the master
default). If zk reconnects in less than 10sec the session still
expires, but the master is detected and everything works.&lt;/p&gt;

&lt;p&gt;2) Mesos master &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;agent_ping_timeout&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_agent_ping_timeout&lt;/code&gt;. The
master shuts down the agent after this timeout (75 sec by
default). This causes the slave to restart and kill all running tasks.&lt;/p&gt;

&lt;p&gt;3) &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;agent_reregister_timeout&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_agent_reregister&lt;/code&gt;_timeout. If there
was a master failover during a WAN outage, then this timeout may be
triggered. But the default is 10min so that shouldn’t be a problem.&lt;/p&gt;

&lt;p&gt;Here are my conclusions for my setup. Please let me know if I missed anything.&lt;/p&gt;

&lt;p&gt;1) Since the ZK observers in DC2 and DC3 do not affect main ZK cluster
when disconnected, changing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ticktime&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;synclimit&lt;/code&gt; is not necessary.&lt;/p&gt;

&lt;p&gt;2) Increase &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_agent_ping_timeout&lt;/code&gt; on masters so that
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(agent_ping_timeout * max_agent_ping_timeout)&lt;/code&gt; is longer than most
WAN outages. In my case most outages are less than 10 mins so I am
trying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_agent_ping_timeout&lt;/code&gt; = 40. This means I do not need to
increase reregister timeout. Unfortunately &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_agent_ping_timeout&lt;/code&gt; is
a global configuration and I cannot set this value differently for
agents in the different DCs.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Dell FX2 and Intel X710 nics</title>
   <link href="https://kscherer.github.io//linux/2016/06/21/dell_fx2_and_intel_x710"/>
   <updated>2016-06-21T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux/2016/06/21/dell_fx2_and_intel_x710</id>
   <content type="html">&lt;p&gt;What follows is an attempt to document a 6 month long debugging
odyssey. This is easily the strangest computer behavior I have ever
debugged or tried to understand.&lt;/p&gt;

&lt;h3 id=&quot;background&quot;&gt;Background&lt;/h3&gt;

&lt;p&gt;I manage a cluster of bare metal servers used for coverage build
testing of Wind River Linux. The collection of git repos alone is over
15GB and the resulting IO traffic is high enough that using a public
cloud is not cost effective. The current sweet point for price to
build performance to rack space is a chassis that squeezes 4 blade
servers into a 2U chassis. We have a bunch of the Dell C6220 series
servers and then I decided to try the Dell FX2 chassis. The selling
points for me were the full Dell iDRAC and the network IO aggregator
system. The IO module theoretically would allow me to network 4 X
10GbE per system (160 GbE total) to a redundant switch pair providing
80 GbE uplink capability. We already had a good experience with the
M-8024K module for the M1000e chassis.&lt;/p&gt;

&lt;h3 id=&quot;hardware-setup&quot;&gt;Hardware setup&lt;/h3&gt;

&lt;p&gt;The first chassis was installed and networked. The IO modules were
setup in same way as the M-8024K which is as a VLAN access port. The
network was configured as VLAN 105, but with the access port this
detail is hidden from the systems. The main problem is that the RedHat
and Debian installers do not support VLAN configuration of the network
devices so all my machines have this configuration which allows me to
use Foreman for automated PXE installs.&lt;/p&gt;

&lt;h3 id=&quot;using-newest-ubuntu-installer&quot;&gt;Using newest Ubuntu installer&lt;/h3&gt;

&lt;p&gt;Things were finally ready for me in January 2016. I started the PXE
install of Ubuntu 14.04. This failed because the kernel drivers for
the X710 nic were only integrated in Linux 4.2 and the Ubuntu
installer with the 3.13 kernel could not detect the X710 nic.&lt;/p&gt;

&lt;p&gt;Luckily Ubuntu rebuilds the 14.04 installer image with the 15.10
kernel. I switched to the newest version of the installer and the
installer was able to detect the X710 nic.&lt;/p&gt;

&lt;h3 id=&quot;the-first-hiccup&quot;&gt;The first hiccup&lt;/h3&gt;

&lt;p&gt;This time the kernel and initrd were downloaded, the initial DHCP
succeeded but then DNS lookup to download the preseed failed. This was
strange but not unheard of. It had happened a long time ago but I
hadn’t seen it in years. I was quick to blame our Microsoft DNS
servers and replaced all DNS names in the preseed with IP
addresses. This allowed the installation to complete and the machine
booted Ubuntu as usual. Then things started to get really
strange. DHCP on boot would occasionally fail and then I noticed that
DNS would occasionally time out and then succeed right
afterwards. This made using programs like Puppet impossible.&lt;/p&gt;

&lt;p&gt;I completed the install of the other 3 servers and noticed that
occasionally that DHCP would fail during the install process. This was
mystifying to me because the PXE boot process uses the same DHCP setup
to download the kernel and initrd and I never saw it fail.&lt;/p&gt;

&lt;p&gt;I checked the DHCP server and the server logs showed that the DHCP
received the request and was sending the offer back to the server, but
that offer was never received. Running ethtool did not show any
dropped or corrupted packets reported by the nic.&lt;/p&gt;

&lt;p&gt;After the installation of the remaining 3 servers in the chassis was
complete, I opened a support case with Dell.&lt;/p&gt;

&lt;p&gt;Approx two weeks:&lt;/p&gt;

&lt;p&gt;TOR access port config, IOA VLAN 1 untagged, Hosts untagged = problem&lt;/p&gt;

&lt;h3 id=&quot;round-one---tor-switch-config&quot;&gt;Round one - TOR switch config&lt;/h3&gt;

&lt;p&gt;The configuration of the TOR Cisco switch connecting to the IOA was
the subject of the first round of debugging. The IOA comes by default
in a “no touch” default configuration and it made sense to verify the
setup of the TOR switch. It took a few weeks to get together all the
people involved: myself, on site IT, IT networking, Dell tech support
and Dell networking specialist. After many hours, the TOR switch was
changed from an access port to a VLAN 105 tagged port. This resulted
in all traffic being dropped until the the IOA was changed to make the
105 VLAN untagged. But the DNS/DHCP problem persisted.&lt;/p&gt;

&lt;p&gt;Round One - Approx one month&lt;/p&gt;

&lt;p&gt;TOR VLAN 105, IOA VLAN 105 untagged, Hosts untagged = problem&lt;/p&gt;

&lt;h3 id=&quot;round-two---internal-reproducer&quot;&gt;Round Two - Internal reproducer&lt;/h3&gt;

&lt;p&gt;Moving up levels of Dell support always takes time. While waiting for
networking support to become available I started experimenting. I
wanted to see if I could reproduce the problem without involving the
TOR switch so I setup dnsmasq on blade #1 as a dns caching proxy. I
then added a fake host entry into /etc/hosts so I could be sure that
dnsmasq was being queried and started running nslookup queries on
blade #2. To my surprise, I was able to reproduce the problem even
with the network traffic completely internal to the FX2 chassis.&lt;/p&gt;

&lt;p&gt;Round Two - Approx one month&lt;/p&gt;

&lt;p&gt;IOA VLAN 1 untagged, Hosts untagged = problem&lt;/p&gt;

&lt;h3 id=&quot;round-three---a-solution&quot;&gt;Round Three - A solution?&lt;/h3&gt;

&lt;p&gt;Then I decided to investigate if VLAN tagging at the Linux host level
would change things. I PXE booted blade #3 with the IOA configured
VLAN 105 untagged and when DHCP failed, I switched the IOA to VLAN 1
untagged and used the secondary install console to change the network
config from em1 to em1.105. I was able to complete the install and boot the
machine.&lt;/p&gt;

&lt;p&gt;Amazingly the DHCP/DNS problems went away! It took some time to fix my
Puppet configuration to work with the VLAN tagging and get everything
working. I was also able to demonstrate that the problem using blade
#1 and #2 was present with VLAN 1 untagged and not present with VLAN 1
untagged and linux host configured for VLAN 105.&lt;/p&gt;

&lt;p&gt;Round Three - Approx one month&lt;/p&gt;

&lt;p&gt;IOA VLAN 1 untagged, Hosts tagged 105 = no problem!&lt;/p&gt;

&lt;h3 id=&quot;round-four---debugging-the-ioa&quot;&gt;Round Four - Debugging the IOA&lt;/h3&gt;

&lt;p&gt;Now the focus turned exclusively to the IOA. With the help of Dell
network support, we disabled the outbound ports of the IOA and ran
tcpdump on the hosts. We were able to see packets being sent from the
DNS “server” and not being received by the client about 25% of the
time. About 5-10% of the time the initial DNS query would not even
make it to the DNS server.&lt;/p&gt;

&lt;p&gt;It was around this time that a second FX2 chassis with identical
hardware arrived, but with a newer IOA firmware version. Full of hope
I did a PXE install on a blade, but with the exact same problem.&lt;/p&gt;

&lt;p&gt;The Dell networking support team attempted to reproduce the problem on
their internal lab, but even with an FX2 chassis, Ubuntu 14.04 install
on an FC630 with the X710 nic they were unable to reproduce the
problem. To ensure the systems were configured identically we went
through the entire BIOS setup line by line to compare. I even tried
installs using UEFI and “Legacy BIOS” modes with no change in behavior.&lt;/p&gt;

&lt;p&gt;I then got a crash course in F10 network configuration. It took a
while to find the proper command line incantations, but we setup
counters on the various ports to count incoming and outgoing
packets. We setup fixed ARP entries and tried to reduce the network
traffic as much as possible. Unfortunately the outgoing port counters
did not work, but from the incoming counters it looked like the IOA
was not seeing the packets come in the interface.&lt;/p&gt;

&lt;p&gt;Round Four - Approx two months&lt;/p&gt;

&lt;p&gt;IOA functioning as designed.&lt;/p&gt;

&lt;p&gt;Bonus: I learned how to use the Dell iDRAC virtual media feature to
transfer files to and from a system without network access.&lt;/p&gt;

&lt;h3 id=&quot;round-five---debugging-the-x710-nic&quot;&gt;Round Five - Debugging the X710 nic&lt;/h3&gt;

&lt;p&gt;Now another Dell Linux support tech was brought in and he confirmed
that the Linux config was correct. We then tried a firmware upgrade
for the X710 nic. This involved a failed upgrade attempt using an ISO
upgrade package (only works with Legacy bios mode), a DRAC upgrade with
HTML5 support and finally using the iDRAC upgrade functionality to
upgrade the NIC firmware.&lt;/p&gt;

&lt;p&gt;Unfortunately the firmware upgrade made things even worse!! DHCP
worked but I could not ping inside the chassis. To make things even
more bizarre, ARP would occasionally work but ping would not!&lt;/p&gt;

&lt;p&gt;At this point, we decided to replace the Intel X710 nic with the
Broadcom BCM57840 nic with a similar feature set to see how/if the
problem changed.&lt;/p&gt;

&lt;p&gt;Round Five - Approx one month&lt;/p&gt;

&lt;p&gt;Several failed firmware upgrades and violations of the laws of
networking.&lt;/p&gt;

&lt;h3 id=&quot;round-six---something-goes-right-for-a-change&quot;&gt;Round Six - Something goes right for a change&lt;/h3&gt;

&lt;p&gt;A technician swapped out the Intel Nics for Broadcom Nics. I redid
another PXE install (luckily it is completely automated) and
everything works as expected! No DHCP/DNS errors or any hint of
strange behavior.&lt;/p&gt;

&lt;p&gt;We finally had a solution and the remaining Intel X710 nics were
swapped out over a few weeks.&lt;/p&gt;

&lt;p&gt;Final setup:&lt;/p&gt;

&lt;p&gt;TOR 105 access port, IOA VLAN 1 untagged default config, Linux host untagged.&lt;/p&gt;

&lt;h3 id=&quot;recap&quot;&gt;Recap&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;Bug not reproducible by support&lt;/li&gt;
  &lt;li&gt;Intermittent dropping of UDP packets without connection to non Dell
hardware&lt;/li&gt;
  &lt;li&gt;Enabling VLAN tagging on the host “solved” the problem&lt;/li&gt;
  &lt;li&gt;Incorrect hardware counters&lt;/li&gt;
  &lt;li&gt;Firmware upgrades make things worse&lt;/li&gt;
  &lt;li&gt;Debugging requires coordination of at least 3 teams&lt;/li&gt;
  &lt;li&gt;Root cause never determined&lt;/li&gt;
  &lt;li&gt;Everyone involved agreed it was one of the strangest problems they
have ever debugged&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;number-of-people-involved&quot;&gt;Number of people involved&lt;/h3&gt;

&lt;p&gt;At Wind River: myself, IT and IT networking&lt;/p&gt;

&lt;p&gt;At Dell: 2 tech support, 2 networking support, 1 Linux support, 2
managers&lt;/p&gt;

&lt;p&gt;Total time consumed: approx 2-3 man months over 6 months of calendar
time.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;The reality is that Dell shipped us something broken. The open
question is whether testing could have found this problem before the
hardware shipped. Dell was unable to reproduce the problem internally
and without knowing the root cause of the problem, I can only
speculate.&lt;/p&gt;

&lt;p&gt;Ideally I would like to know the cause of the problem, know that it
was fixed and that no one else will have to suffer through this. But
that would be the fairy tale ending and life doesn’t work that
way. The case is considered closed and I will get back to all the
tasks I had to put on hold for this.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Python Packaging with make and pex</title>
   <link href="https://kscherer.github.io//python/2016/06/03/python-packaging-with-make-and-pex"/>
   <updated>2016-06-03T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//python/2016/06/03/python-packaging-with-make-and-pex</id>
   <content type="html">&lt;p&gt;As it often happens in the life of a professional programmer, a small
python script had grown into a large script and needed to be split
apart and properly packaged. Most of my experience with python had
been with small scripts. I had tried before to understand the python
packaging ecosystem but always got confused by the combinations of
tools and formats.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Python development tools like virtualenv and pip&lt;/li&gt;
  &lt;li&gt;Code distributed in eggs and/or wheels&lt;/li&gt;
  &lt;li&gt;Packages installed using easy_install and/or pip&lt;/li&gt;
  &lt;li&gt;Python packaging tools like setuptools and distutils&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There seemed to be at least two different tools that did almost the
same thing, but neither had good documentation. I did find some decent
blog posts like &lt;a href=&quot;http://jeffknupp.com/blog/2013/08/16/open-sourcing-a-python-project-the-right-way/&quot;&gt;Open Sourcing a Python Project the Right Way&lt;/a&gt; but
there were still workflow steps that I needed to figure out. In the
past I was able to avoid figuring it out, but this time was different
because my “small” script had grown to over 1000 lines of python and
there was no way to avoid it.&lt;/p&gt;

&lt;p&gt;I had an informal set of requirements:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;No root access should be required. Python supports local
installation and virtualenv&lt;/li&gt;
  &lt;li&gt;Bootstrap a development environment quickly&lt;/li&gt;
  &lt;li&gt;The development setup should be self contained and not affect any
other part of the machine&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My research took me all over the web, but one of the most important
pieces of inspiration was this small post on
&lt;a href=&quot;http://blog.bottlepy.org/2012/07/16/virtualenv-and-makefiles.html&quot;&gt;Virtualenv and Makefiles&lt;/a&gt;. I was also inspired by &lt;a href=&quot;https://pex.readthedocs.io/en/stable/&quot;&gt;Pex&lt;/a&gt; which
provided a way to bundle all the python pieces together into a single
self extracting package.&lt;/p&gt;

&lt;p&gt;It took a few days but I was able to combine make, mkvirtualenv, pip
and pex to implement a nice workflow. The &lt;a href=&quot;https://github.com/kscherer/wraxl-scheduler/blob/master/Makefile&quot;&gt;Makefile&lt;/a&gt; will:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Install pip into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$HOME/.local/bin/pip&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Use local pip to install &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;virtualenv&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;virtualenvwrapper&lt;/code&gt; into
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$HOME/.local/bin&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Create a per project virtualenv for the project and install all the
development dependencies like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pylint&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flake8&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pex&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Check if required development packages are installed. Some python
packages have C extensions and require a compiler and header
files.&lt;/li&gt;
  &lt;li&gt;Runs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python setup.py develop&lt;/code&gt; which installs the package
dependencies like yaml and redis. This step also adds the package
to the virtualenv and can be used if development is spread across
multiple git repositories.&lt;/li&gt;
  &lt;li&gt;Uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python setup.py bdist_pex&lt;/code&gt; to build the pex file&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Other nice touches:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The source py files are dependencies on the pex package so editing
a file causes the pex file to be rebuilt. Regex support in Make
simplifies this step&lt;/li&gt;
  &lt;li&gt;Has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make help&lt;/code&gt; which reads comments embedded in the Makefile to
generate nice help output&lt;/li&gt;
  &lt;li&gt;Has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make clean&lt;/code&gt; for easy cleanup&lt;/li&gt;
  &lt;li&gt;Each make step loads the proper virtualenv, so the developer does
not even have to activate the virtualenv manually.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Some annoyances:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Pex does not pick up local python file changes unless I delete the
egg file in the pex build dir.&lt;/li&gt;
  &lt;li&gt;To keep timestamps in order, sometimes it is necessary to touch
certain files.&lt;/li&gt;
  &lt;li&gt;I had to create a .check file to prevent the system package
checking from running every build&lt;/li&gt;
  &lt;li&gt;Dependent on Pypi being available, though pip does cache downloads
locally&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The last step was to integrate the pex file into a docker image. If
the package does not contain dependencies on system libraries, the
Alpine Linux Python docker images can be used as a base. Unfortunately
the python mesos.native packages I am using have dependencies on
libraries like libsaml and I could not use Alpine Linux. But I was
able to use the base Ubuntu image and only needed to install a few
libraries which made the image much smaller than before.&lt;/p&gt;

&lt;p&gt;I noticed that pex file is unpacked into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PEX_ROOT&lt;/code&gt; which is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$HOME&lt;/code&gt;
by default. The last tweak I made was to ensure that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PEX_ROOT&lt;/code&gt; was a
docker volume to avoid the overhead of writing to the union
filesystem. This isn’t strictly necessary, but I try to work as if the
docker image is effectively read-only.&lt;/p&gt;

&lt;p&gt;I have already reused this Makefile structure for other python
projects. I was pleasantly surprised when a colleague of mine was able
to clone the project and rebuild the docker image without any
intervention.&lt;/p&gt;

&lt;p&gt;I am now able to focus on refactoring and developing the project. The
packaging part is solved in a clean way that can easily be shared with
others.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Docker Daemon and Systemd</title>
   <link href="https://kscherer.github.io//docker/2016/02/29/docker-and-systemd"/>
   <updated>2016-02-29T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//docker/2016/02/29/docker-and-systemd</id>
   <content type="html">&lt;p&gt;I recently read an article on &lt;a href=&quot;http://lwn.net/&quot;&gt;LWN&lt;/a&gt; about &lt;a href=&quot;http://lwn.net/Articles/676831/&quot;&gt;Systemd vs Docker&lt;/a&gt;
and I was disappointed. As far as I am concerned, this is preventing
one of the worst design flaws in Docker from being addressed. Docker
CEO Solomon Hykes also thinks this should be resolved, though
&lt;a href=&quot;https://github.com/docker/docker/issues/2658&quot;&gt;Issue #2658&lt;/a&gt; has remained open since Nov 2013.&lt;/p&gt;

&lt;p&gt;The current Docker design sets up all containers as children of the
Docker daemon process. The consequence of this is that upgrading the
daemon requires that all the containers are stopped/killed. Other
operations like changing the daemon command line requires stopping all
the containers. I have to be extra careful with my Puppet
configuration because any change to the config files will restart the
docker daemon. To prevent inadvertent restarts I had to remove the
normal configuration to service dependency which normally restarts the
daemon when the configuration changes.&lt;/p&gt;

&lt;p&gt;From an operational perspective this is a pain. It represents another
in a long line of software that requires significant operational
resources to deploy properly. If the operator is lucky, the
containerized application can be managed with load balancers or DNS
rotation. If the service cannot work this way or the Ops team cannot
build the required infrastructure, then upgrades mean downtime. With
VMs it is possible to move the application to another machine, but
CRIU isn’t ready yet. These “solutions” require large amounts of
operational effort. I built a rolling upgrade system around Ansible to
handle docker upgrades.&lt;/p&gt;

&lt;p&gt;My experience with &lt;a href=&quot;http://mesos.apache.org/&quot;&gt;Mesos&lt;/a&gt; has been very different. The Mesos team has a
&lt;a href=&quot;http://mesos.apache.org/documentation/latest/upgrades/&quot;&gt;supported upgrade path&lt;/a&gt; with lots of testing. I have upgraded at least
5 releases of Mesos without issues or any downtime.&lt;/p&gt;

&lt;p&gt;What does this have to do with systemd? In order to support seamless
upgrades of the docker daemon, the ownership of the container
processes will have to be shared with some other process. This could
be another daemon, but the init system is an obvious choice. If the
docker daemon co-operated with another daemon or systemd by sharing
ownership of the processes, then a nice upgrade path could be
developed.&lt;/p&gt;

&lt;p&gt;The Docker team is working on containerd and has stated that RunC
would be integrated and this may where better integration with an init
system becomes possible. I realize this is selfish, but for me all
these squabbles are just distracting developers from addressing one of
my major pain points with using Docker.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Benchmarking docker storage backends</title>
   <link href="https://kscherer.github.io//docker/2015/07/09/benchmarking-docker-storage-backends"/>
   <updated>2015-07-09T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//docker/2015/07/09/benchmarking-docker-storage-backends</id>
   <content type="html">&lt;p&gt;I am using docker simulate building Wind River Linux (which is based
on OE-Core and Poky) on different hosts. The actual build is done on a
bind mount outside of the container so I did not expect the storage
backend to affect performance, but it did.&lt;/p&gt;

&lt;p&gt;See &lt;a href=&quot;https://github.com/docker/docker/issues/2891&quot;&gt;Docker Issue #2891&lt;/a&gt; for full history.&lt;/p&gt;

&lt;h3 id=&quot;setup&quot;&gt;Setup&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;docker 1.7&lt;/li&gt;
  &lt;li&gt;Ubuntu 14.04.2&lt;/li&gt;
  &lt;li&gt;Vivid kernel 3.19.0-21-generic&lt;/li&gt;
  &lt;li&gt;Dual 6C Xeon with 64GB RAM and 100GB root SSD and dual 3TB RAID0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using following Dockerfile:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;FROM ubuntu:14.04.2

MAINTAINER Konrad Scherer &amp;lt;Konrad.Scherer@windriver.com&amp;gt;

RUN useradd --home-dir /home/wrlbuild --uid 1000 --gid 100 --shell /bin/bash wrlbuild &amp;amp;&amp;amp; \
    echo &quot;wrlbuild ALL=(ALL) NOPASSWD: ALL&quot; &amp;gt;&amp;gt; /etc/sudoers

RUN dpkg --add-architecture i386 &amp;amp;&amp;amp; \
    apt-get update &amp;amp;&amp;amp; \
    DEBIAN_FRONTEND=noninteractive apt-get -qy install --no-install-recommends \
    libc6:i386 libc6-dev-i386 libncurses5:i386 texi2html chrpath \
    diffstat subversion libgl1-mesa-dev libglu1-mesa-dev libsdl1.2-dev \
    texinfo gawk gcc gcc-multilib help2man g++ git-core python-gtk2 bash \
    diffutils xz-utils make file screen sudo wget time patch &amp;amp;&amp;amp; \
    apt-get clean &amp;amp;&amp;amp; \
    rm -rf /var/lib/apt/lists/* &amp;amp;&amp;amp; \
    rm -rf /usr/share/man &amp;amp;&amp;amp; \
    rm -rf /usr/share/doc &amp;amp;&amp;amp; \
    rm -rf /usr/share/grub2 &amp;amp;&amp;amp; \
    rm -rf /usr/share/texmf/fonts &amp;amp;&amp;amp; \
    rm -rf /usr/share/texmf/doc

USER wrlbuild

CMD [&quot;/bin/bash&quot;]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Building poky, fido release, core-image-minimal on a ext4 bind mount with the docker
image using different storage backends.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd &amp;lt;buildarea&amp;gt;
mkdir downloads
git clone --branch fido git://git.yoctoproject.org/poky
source poky/oe-init-build-env mybuild
ln -s ../downloads .
sed -i 's/#MACHINE ?= &quot;qemux86-64&quot;/MACHINE ?= &quot;qemux86-64&quot;/' conf/local.conf
bitbake -c fetchall core-image-minimal
time bitbake core-image-minimal
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;

&lt;p&gt;Bare-metal:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    33m5.260s
user    289m41.356s
sys     27m23.488s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Aufs:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    40m24.416s
user    258m48.932s
sys     56m29.284s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Devicemapper with official binary in loopback mode:
This requires &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--storage-opt dm.override_udev_sync_check=true&lt;/code&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    35m24.415s
user    289m10.660s
sys     34m21.168s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Devicemapper with my own compiled dynamic binary:
This still requires &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--storage-opt dm.override_udev_sync_check=true&lt;/code&gt;
even though docker info states udev sync is supported.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    34m18.387s
user    294m1.720s
sys     31m43.764s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Overlayfs:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    33m46.890s
user    293m40.084s
sys     35m31.480s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Aufs still has a measurable performance overhead even when the IO is
done on a bind mount outside of the aufs filesystem. Devicemapper and
overlayfs do not add overhead to this specific scenario. I did have
problems with devicemapper on Ubuntu 14.04 and the 3.13 kernel, but
since I upgraded to the 3.16 kernel I have not had any problems with
devicemapper errors. The only problems I have add were related to the
udev sync detection and new requirement with Docker 1.7.&lt;/p&gt;

&lt;p&gt;My options are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ignore the udev sync requirement with a flag&lt;/li&gt;
  &lt;li&gt;Compile and distribute my own dynamically linked version of docker
and hope that docker will provide an official version on Ubuntu&lt;/li&gt;
  &lt;li&gt;Switch to Overlayfs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are reports of problems with Overlayfs and using rpm inside a
container. I will do some more testing with Overlayfs, but it seems my
best option now is to move all my Ubuntu builders to use Overlayfs.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>XFS and RAID setup</title>
   <link href="https://kscherer.github.io//2015/06/26/xfs-and-raid-setup"/>
   <updated>2015-06-26T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2015/06/26/xfs-and-raid-setup</id>
   <content type="html">&lt;h2 id=&quot;choosing-xfs&quot;&gt;Choosing XFS&lt;/h2&gt;

&lt;p&gt;I manage a cluster of builder machines and all the builders use the
ext4 filesystem. To load the machines effectively, the builds are
heavily parallelized and using a RAID0 striped setup avoid IO
being a bottleneck on the builds. When RedHat 7 was released the
default filesystem was changed to xfs, I realized that it would be
a alternative to ext4 because RedHat wouldn’t have made that change if
xfs wasn’t a fast and solid filesystem. I recently got some new
hardware and started an experiment.&lt;/p&gt;

&lt;h2 id=&quot;default-raid-settings&quot;&gt;Default RAID settings&lt;/h2&gt;

&lt;p&gt;The system has 6 4TB disks and I created 2 RAID0 disks of 3 disks each
for a total of 12TB per drive. The machine has a battery backed RAID
controller and each drive had as its default settings: stripe size of
64KB, write back, adaptive read ahead, disk cache enabled and a few
more.&lt;/p&gt;

&lt;h2 id=&quot;creating-the-xfs-drives&quot;&gt;Creating the xfs drives&lt;/h2&gt;

&lt;p&gt;Once the machine was provisioned, I started reading about xfs
filesystem creation options and mount options. There were several
points of confusion:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Some web pages referred to a crc option which validates
metadata. This sounds like a good idea, but is not available with
the xfsprogs version on Ubuntu 14.04&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I didn’t realize at first that the inode64 option is a mount option
and not a filesystem creation option&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since the disks are using hardware RAID which is not generally
detectable by the mkfs program, the geometry needs specified when
creating the drive.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;parted -s /dev/sdb mklabel gpt
parted -s /dev/sdb mkpart build1 xfs 1M 100%
mkfs.xfs -d su=64k,sw=3 /dev/sdb1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;These commands create the partition and tell xfs the stripe size and
number of stripes.&lt;/p&gt;

&lt;h2 id=&quot;xfs-mount-options&quot;&gt;XFS mount options&lt;/h2&gt;

&lt;p&gt;It was clear that the inode64 was useful because the disks are large
and the metadata is spread out over the drive. The interesting option
was the barrier entry. There is an entry in the &lt;a href=&quot;http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F&quot;&gt;XFS Wiki FAQ&lt;/a&gt;
about this situation. If the storage is battery backed, then the
barrier is not necessary. Ideally the disk write cache is also
disabled to prevent data loss if the power is lost to the machine. So
I went back the RAID controller settings and disabled the disk cache
on all the drives and then added &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nobarrier,inode64,defaults&lt;/code&gt; to the
mount options for the drives.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The experiment has started. The first build on the machine was very
fast, but the contribution of the filesystem is hard to determine. If
there are any interesting developments I will post updates.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Adventures with Git and server packfile bitmaps</title>
   <link href="https://kscherer.github.io//git/2015/05/15/git-and-bitmaps"/>
   <updated>2015-05-15T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//git/2015/05/15/git-and-bitmaps</id>
   <content type="html">&lt;p&gt;In git 2.0, a new feature called bitmaps was added. The entry from the
git &lt;a href=&quot;https://git.kernel.org/cgit/git/git.git/tree/Documentation/RelNotes/2.0.0.txt&quot;&gt;Changelog&lt;/a&gt; has the following entry:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;The bitmap-index feature from JGit has been ported, which should
significantly improve performance when serving objects from a
repository that uses it.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One of my colleagues told me that he had experimented with it had
noticed some impressive speedups that I was able to reproduce. On the
local GigE network a linux kernel clone went from approx 3 minutes to
1.5 minutes, a speedup of almost 50%!&lt;/p&gt;

&lt;p&gt;The instructions seemed very simple. Just log into the git server and
run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git repack -A -b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;on every bare repo. The first hurdle was upgrading to a newer version
of git. Our git servers are running CentOS 5, CentOS 6 and Ubuntu
14.04. The EPEL version of git is 1.8 and 14.04 ships with 1.9.1.&lt;/p&gt;

&lt;p&gt;For Ubuntu 14.04 the solution was to use the LaunchPad
&lt;a href=&quot;https://launchpad.net/~git-core/+archive/ubuntu/ppa&quot;&gt;Git Stable PPA&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But for CentOS, it was a little trickier. Since I hate distributing
binaries directly I decided to backport the latest Fedora git
srpm. Getting it to build required a few hacks with bash completion
and installing a few dependencies, but it took less than 30 minutes to
get both CentOS 5 and 6 rpms.&lt;/p&gt;

&lt;p&gt;The upgrade of git on the servers worked very well because they are
using xinetd to run the git-daemon and the very next connection to the
server after the upgrade started using the newly installed git 2.3.5
binary.&lt;/p&gt;

&lt;p&gt;There were of course a few hiccups. An internal tool that used git
request-pull was relying on one of the working “heuristics” (see
changelog) that were removed.&lt;/p&gt;

&lt;p&gt;The next step was to repack all the bare repos on the server. So I
wrote a script to run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git repack -A -b&lt;/code&gt; and left it to run
overnight. Recovering from this the next few days would require me to
become very familiar with the git man pages.&lt;/p&gt;

&lt;p&gt;First problem was that the git server ran out of disk space. Turns out
I needed to add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-d&lt;/code&gt; flag in order to delete the previous pack
files. I had effectively doubled the disk space requirements of every
repo!&lt;/p&gt;

&lt;p&gt;It also turns out the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-A&lt;/code&gt; leaves packfiles that contain dangling
objects. So I reran my script with&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git gc --aggressive
git repack -a -d -b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This helped a lot but repos that were using alternates were still
taking a lot more space than before because repack was making one big
packfile of all the objects and effectively ignoring the alternates
file. This is documented in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git clone&lt;/code&gt; man page.&lt;/p&gt;

&lt;p&gt;So I went to all the repos with alternates and ran:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git repack -a -d -b -l
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-l&lt;/code&gt; flag only repacks files that are not available in the
alternates. With some extra cleanup, this resulted in even less disk
space usage than before. Unfortunately this does mean that a repo with
alternates cannot have a bitmap.&lt;/p&gt;

&lt;p&gt;On one server many repos still did not contain the bitmap file. After
much experimentation I finally figured out that the pack.packSizeLimit
option had been set on the server only to 500M. This meant that repos
larger than 500M would have multiple pack files and since the bitmap
requires a single pack file, no bitmap was created. The lack of
warning extended the debugging time considerably.&lt;/p&gt;

&lt;p&gt;Finally one of my servers had an old mirror of the upstream Linux
kernel repo and even after &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git gc --aggressive&lt;/code&gt; the repo was 1.5GB,
which is over 500MB larger than a new clone. So I started
experimenting with the other repack flags, including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-F&lt;/code&gt;. The result
was that the repo ballooned to over 4GB and I couldn’t find a way to
reduce the size. Even cloning the repo to another machine resulted in
a 1.5GB transfer. In the end, I ended up doing a fresh clone and
swapping the objects/pack directories.&lt;/p&gt;

&lt;p&gt;I was able to reproduce the behavior with a fresh clone as well:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone --bare git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
cd linux-stable
git repack -a -d -F
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In summary:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;To create bitmaps without increasing disk space usage:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; git repack -a -d -b -l
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I was not able to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git repack -F&lt;/code&gt; in a way that did not
quadruple the size of the Linux kernel repo. It even caused clones
of the repo to be larger as well&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Git should have a warning if bitmaps are requested but cannot be
created due to packSizeLimit restrictions. I plan to file a bug or
make a patch.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

</content>
 </entry>
 
 <entry>
   <title>Docker backend performance update</title>
   <link href="https://kscherer.github.io//2015/03/03/docker-backend-performance-update"/>
   <updated>2015-03-03T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2015/03/03/docker-backend-performance-update</id>
   <content type="html">&lt;p&gt;A long time ago I filed Docker &lt;a href=&quot;https://github.com/docker/docker/issues/2891&quot;&gt;2891&lt;/a&gt; issue regarding the
performance of the aufs backend vs devicemapper.&lt;/p&gt;

&lt;p&gt;Quick summary is that the aufs backend was approx 30% slower even
though the build was being done in a bind mount outside of the
container.&lt;/p&gt;

&lt;p&gt;I finally got around to checking again using Docker 1.5 on Ubuntu
14.04 with 3.16 utopic LTS kernel.&lt;/p&gt;

&lt;p&gt;The current stable poky release is dizzy:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd &amp;lt;buildarea&amp;gt;
mkdir downloads
chmod 777 downloads
git clone --branch dizzy git://git.yoctoproject.org/poky
source poky/oe-init-build-env mybuild
ln -s ../downloads .
bitbake -c fetchall core-image-minimal
time bitbake core-image-minimal
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is no need to set the parallel packages and jobs now in
local.conf because bitbake now chooses reasonable defaults.&lt;/p&gt;

&lt;p&gt;Bare Metal:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    29m59.190s
user    278m0.988s
sys     59m47.379s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Devicemapper:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    32m21.074s
user    281m53.994s
sys     68m45.554s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;AUFS:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;real    37m14.612s
user    259m19.226s
sys     85m50.269s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I only ran each build once so this is not an authoritative
benchmark. It shows that there is a performance overhead of approx 20%
when using the aufs backend even if the IO is done on a bind mount.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Git Server option bigFileThreshold</title>
   <link href="https://kscherer.github.io//git/2014/09/26/git-server-option-bigfilethreshold"/>
   <updated>2014-09-26T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//git/2014/09/26/git-server-option-bigfilethreshold</id>
   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;I manage the git infrastructure for the Linux group at Wind River: the
main git server and 5 regional mirrors which are mirrored using
grokmirror. I plan to do a post about our grokmirror setup. The main
git server holds over 500GB of bare git repos and over 600 of those
are mirrored. Many repos are not mirrored. Some repos are internal,
some are mirrors of external upstream repos and some are mirrors of
upstream repos with internal branches. The git server runs CentOS 5.10
and git 1.8.2 from EPEL.&lt;/p&gt;

&lt;h1 id=&quot;the-toolchain-binary-repos&quot;&gt;The toolchain binary repos&lt;/h1&gt;

&lt;p&gt;One of the largest repos contains the source for the toolchain &lt;em&gt;and&lt;/em&gt;
all the binaries. Since the toolchain takes a long time to build, it
was decided that Wind River Linux should ship pre-compiled binaries
for the toolchain. There is also an option which allows our customers
to rebuild the toolchain if they have a reason to.&lt;/p&gt;

&lt;p&gt;The bare toolchain repo size varies between 1 and 3GB depending on
supported architectures. Many of the files in the repo were tarballs
around 250MB size.&lt;/p&gt;

&lt;h1 id=&quot;why-is-the-git-server-down-again&quot;&gt;Why is the git server down again?&lt;/h1&gt;

&lt;p&gt;When a new toolchain is ready for integration, it is uploaded to the
main git server and mirrored. Then the main tree is switched to enable
the new version of the toolchain and all the coverage builders start
to download the new version. Suddenly the git servers would become
unresponsive and would thrash under memory pressure until they would
be inevitably rebooted. Sometimes I would have to disable the coverage
builders and stage their activation to prevent a thundering herd from
knocking the git server over again.&lt;/p&gt;

&lt;h1 id=&quot;why-does-cloning-a-repo-require-so-much-memory&quot;&gt;Why does cloning a repo require so much memory?&lt;/h1&gt;

&lt;p&gt;I finally decided to investigate this and found a reproducer
quickly. Cloning a 2.9GB bare repo would consume over 7GB of RAM
before the clone was complete. The graph of used memory was
spectacular. I started reading the git config man page and asking
google various questions.&lt;/p&gt;

&lt;p&gt;I tried setting the binary attributes on various file types, but
nothing changed. See man gitattributes for more information. The
default set seem to be fine.&lt;/p&gt;

&lt;p&gt;I tried various git config options like core.packedGitWindowSize and
core.packedGitLimit and core.compression as recommended in many blog
posts. But the memory spike was still the same.&lt;/p&gt;

&lt;h1 id=&quot;corebigfilethreshold&quot;&gt;core.bigFileThreshold&lt;/h1&gt;

&lt;p&gt;From the git config man page:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Files larger than this size are stored deflated, without attempting delta compression.
Storing large files without delta compression avoids excessive memory usage, at the slight
expense of increased disk usage.

Default is 512 MiB on all platforms. This should be reasonable for most projects as source
code and other text files can still be delta compressed, but larger binary media files
won’t be.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The 512MB number is key. The reason the git server was using so much
memory is because it was doing delta compression on the binary
tarballs. This didn’t make the files any smaller because they were
already compressed and required a lot of memory. I tried one command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git config --global --add core.bigFileThreshold 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And suddenly (no git daemon restart necessary), the clone took a
fraction of the time and the memory spike was gone. The only downside
was that the repo required more disk space; about 4.5GB. I then tried:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git config --global --add core.bigFileThreshold 100k
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The resulted in approx 10% more disk space (3.3GB) and no memory spike
when cloning.&lt;/p&gt;

&lt;p&gt;This setting seems very reasonable to me. The chance of having a text
file larger than 100Kb is very low and the only downside is slightly
higher disk usage. Git already is very efficient in this regard.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;UPDATE&lt;/em&gt; This setting can cause disk space issues on linux kernel
repos. See &lt;a href=&quot;/git/2021/02/11/update-git-server-option-bigfilethreshold.html&quot;&gt;update here&lt;/a&gt;&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Replacing disks before they fail</title>
   <link href="https://kscherer.github.io//linux/2013/11/08/replacing-disks-before-they-fail"/>
   <updated>2013-11-08T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux/2013/11/08/replacing-disks-before-they-fail</id>
   <content type="html">&lt;h1 id=&quot;hardware-setup&quot;&gt;Hardware setup&lt;/h1&gt;

&lt;p&gt;I am managing an R710 Dell server with 6 2TB disks. The RAID
controller does not support JBOD mode, so I had to create 6 RAID0
virtual disks with one disk per group. The disks are then passed
through to Linux as /dev/sda to /dev/sdf. I am running 6 xen vms and
each vm gets a dedicated disk. The vms are coverage builders and not
mission critical so there is no point in added redundancy. I have a
nice Cobbler/Foreman setup that makes provisioning very quick.&lt;/p&gt;

&lt;h2 id=&quot;openmanage-and-check_openmanage&quot;&gt;OpenManage and check_openmanage&lt;/h2&gt;

&lt;p&gt;I am running the Dell OpenManage software on the system. If fact I am
running it on all my hardware. I am using the &lt;a href=&quot;https://github.com/camptocamp/puppet-dell&quot;&gt;puppet/dell&lt;/a&gt; module
graciously shared on Github. The OpenManage package does many things
including CLI query access to all the hardware.&lt;/p&gt;

&lt;p&gt;Then I stumbled across &lt;a href=&quot;http://folk.uio.no/trondham/software/check_openmanage.html&quot;&gt;check_openmanage&lt;/a&gt; which is a Nagios check
which queries all the hardware and notifies Nagios if there are any
problems. I had already used the Puppet integration with Nagios to
setup a bunch of checks for ntp, disk and some other services. To make
things even easier, check_openmanage is in EPEL and Debian. It did not
take much time to add this check to the existing checks.&lt;/p&gt;

&lt;h2 id=&quot;predicted-failure&quot;&gt;Predicted Failure&lt;/h2&gt;

&lt;p&gt;So once everything was setup, I started getting warned about many
things that I was not aware of like firmware out of date and that some
hard drives were predicted to fail. The output of check_openmanage
looks like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WARNING: Physical Disk 1:0:4 [Seagate ST32000444SS, 2.0TB] on ctrl 0 is Online, Failure Predicted
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A reasonably painless call to Dell and a replacement disk is shipped.&lt;/p&gt;

&lt;h2 id=&quot;disk-replacement&quot;&gt;Disk replacement&lt;/h2&gt;

&lt;p&gt;When a disk fails it has a really nice blinking yellow light. To make
things clean, I wanted to shutdown and delete the correct vm before
changing the disk. How to figure out the correct vm to shutdown.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; omreport storage pdisk controller=0 pdisk=1:0:4
Physical Disk 1:0:4 on Controller PERC 6/i Integrated (Embedded)
Controller PERC 6/i Integrated (Embedded)
ID                              : 1:0:4
Status                          : Non-Critical
Name                            : Physical Disk 1:0:4
State                           : Online
Failure Predicted               : Yes

&amp;gt; omreport storage pdisk controller=0 vdisk=5
List of Physical Disks belonging to Virtual Disk 5
Controller PERC 6/i Integrated (Embedded)
ID                              : 1:0:4
Status                          : Non-Critical
Name                            : Physical Disk 1:0:4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Okay found the correct physical disk and the associated virtual disk.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; omreport storage vdisk controller=0 vdisk=5
Virtual Disk 5 on Controller PERC 6/i Integrated (Embedded)
ID                            : 5
Status                        : Ok
Name                          : Virtual Disk 5
State                         : Ready
Device Name                   : /dev/sdf
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Okay I know that this physical disk maps to the device /dev/sdf and I
initiated a shutdown of the vm that uses that disk.&lt;/p&gt;

&lt;p&gt;The disk with predicted failure has a flashing amber light which makes
it easy to figure out which one to swap.&lt;/p&gt;

&lt;p&gt;Once the swap is complete run the following command to recreate the vdisk.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;omconfig storage controller controller=0 action=createvdisk raid=r0 size=max pdisk=1:0:4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And /dev/sdf is available once again.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>OpenStack Grizzly deployment using puppet modules</title>
   <link href="https://kscherer.github.io//linux%20openstack/2013/08/30/openstack-grizzly-deployment-using-puppet-modules"/>
   <updated>2013-08-30T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux%20openstack/2013/08/30/openstack-grizzly-deployment-using-puppet-modules</id>
   <content type="html">&lt;h2 id=&quot;openstack-grizzly-3-node-cluster-installation&quot;&gt;Openstack Grizzly 3 node cluster installation&lt;/h2&gt;

&lt;p&gt;There is a lot of infrastructure that I leveraged to do this
installation:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Local ubuntu mirror&lt;/li&gt;
  &lt;li&gt;Debian Preseed files to automate installation&lt;/li&gt;
  &lt;li&gt;Dell iDRAC and faking netboot using virtual CDROM&lt;/li&gt;
  &lt;li&gt;Puppet master with git branch to environment mapping&lt;/li&gt;
  &lt;li&gt;Git subtrees to integrate OpenStack puppet modules&lt;/li&gt;
  &lt;li&gt;An example hiera data file to handle configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;local-ubuntu-mirror&quot;&gt;Local Ubuntu mirror&lt;/h2&gt;

&lt;p&gt;Having a local mirror makes installations much simpler because
packages download very quickly. The ideal setup uses netboot because
the mirror already contains the kernel and initrd and packages needed
to do the installation. I used:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ubuntu/dists/precise/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/linux
ubuntu/dists/precise/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/initrd.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To create the mirror I used the &lt;a href=&quot;https://launchpad.net/ubumirror&quot;&gt;ubumirror&lt;/a&gt; scripts provided by
Canonical.&lt;/p&gt;

&lt;h2 id=&quot;debian-preseed&quot;&gt;Debian Preseed&lt;/h2&gt;

&lt;p&gt;I already have some experience using debian preseed files to automate
installation of Ubuntu and Debian. The documentation is spread out all
over the Internet. Most of the preseed is just sets the local mirror
and network setup. The OpenStack related options were the disk layout
and adding the Ubuntu Cloud Archive.&lt;/p&gt;

&lt;h3 id=&quot;openstack-compute-node-disk-layout&quot;&gt;Openstack Compute Node disk layout&lt;/h3&gt;

&lt;p&gt;The machines I am using were purchased before I even knew OpenStack
existed. They were used for Wind River Linux coverage builds and the
simplest configuration uses 2 900GB SAS drives in RAID0. The builds
require a lot of disk space and builds on SSD and in memory provided
only a small speedup versus the increase in cost.&lt;/p&gt;

&lt;p&gt;My idea was to use LVM and allow cinder to use the remaining space to
create volumes for the vms. Here are the relevant preseed options to
handle the disk layout.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;d-i partman-auto/method string lvm
d-i partman-auto/purge_lvm_from_device  boolean true
d-i partman-auto-lvm/new_vg_name string cinder-volumes
d-i partman-auto-lvm/guided_size string 500GB
d-i partman-auto/choose_recipe select atomic
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are 3 kinds of storage in OpenStack: instance/ephemeral, block and
object.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Object storage is handled by swift and not part of this
installation.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Block storage is done by default using iscsi and LVM
logical volumes. Cinder looks for a LVM volume group called
cinder-volumes and creates logical volumes there.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Instance/Ephemeral storage by default goes into /var on the root
filesystem. This is why I made the root filesystem 500GB. But this
does not allow live migration because the root filesystem is not
shared. If the vm was booted using block storage then the iscsi
driver can handle the migration of vms. Another option is to mount
/var on a shared nfs drive.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;ubuntu-cloud-archive&quot;&gt;Ubuntu Cloud Archive&lt;/h3&gt;

&lt;p&gt;I added the cloud and puppetlabs apt repos in the preseed to prevent
older versions of packages being installed.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;d-i apt-setup/local0/repository string \
    http://apt.puppetlabs.com/ precise main dependencies
d-i apt-setup/local0/comment string Puppetlabs
d-i apt-setup/local0/key string http://apt.puppetlabs.com/pubkey.gpg

d-i apt-setup/local1/repository string \
    http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/grizzly main
d-i apt-setup/local1/comment string Ubuntu Cloud Archive
d-i apt-setup/local1/key string \
    http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/precise-updates/grizzly/Release.gpg

tasksel tasksel/first multiselect ubuntu-server
d-i pkgsel/include string openssh-server ntp ruby libopenssl-ruby \
    vim-nox mcollective rubygems git puppet mcollective facter \
    ruby-stomp puppetlabs-release ubuntu-cloud-keyring
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;dell-idrac-and-faking-netboot-using-virtual-cdrom&quot;&gt;Dell iDRAC and faking netboot using virtual CDROM&lt;/h2&gt;

&lt;p&gt;Unfortunately I do not have DHCP, PXE and TFTP in this subnet to do
netboot provisioning. I am working on this with our IT department. So
for now I have to fake it.&lt;/p&gt;

&lt;p&gt;I grab the mini.iso from the Ubuntu mirror&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ubuntu/dists/precise/main/installer-amd64/current/images/netboot/mini.iso
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This contains the netboot kernel and initrd. I can then log into the
Dell iDRAC and start the remote console for the server. Using Virtual
Media redirection, I connect the mini.iso and boot the server. Press
F11 to get the boot menu and select Virtual CDROM.&lt;/p&gt;

&lt;p&gt;But using this directly means I have to type everything into a tiny
console window. So I modified the isolinux.cfg to change the kernel
params to load the preseed automatically&lt;/p&gt;

&lt;p&gt;Mount mini.iso locally and copy the contents to the hard drive&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo mount -o loop mini.iso /mnt/ubuntu/
cp -r /mnt/ubuntu/ .
chmod -R +w ubuntu
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here are the contents of the isolinux.cfg after editing:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;default preseed
prompt 0
timeout 0

label preseed
    kernel linux
    append vga=788 initrd=initrd.gz locale=en_US auto \
        url=&amp;lt;server&amp;gt;/my.preseed priority=critical interface=eth0 \
        console-setup/ask_detect=false console-setup/layout=us --
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then make a new iso:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;mkisofs -o ubuntu-precise.iso -b isolinux.bin -c boot.cat \
    -no-emul-boot -boot-load-size 4 -boot-info-table -R -J -v -T ubuntu/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then the process is almost completely automated. Except that the
server cannot download the preseed until the networking is
configured. This info can be added to the kernel params, but then I
would have to edit each iso for each server. With RedHat kickstarts I
was able to add a script that mapped MAC address to IP and completely
automate this. But with preseeds I need to manually enter the network
info. The proper solution is a provisioner like Cobbler or Foreman.&lt;/p&gt;

&lt;h2 id=&quot;puppet-master-with-git-branch-to-environment-mapping&quot;&gt;Puppet master with git branch to environment mapping&lt;/h2&gt;

&lt;p&gt;I have setup my puppet masters based on the &lt;a href=&quot;https://puppetlabs.com/blog/git-workflow-and-puppet-environments/&quot;&gt;post&lt;/a&gt; by Puppetlabs:&lt;/p&gt;

&lt;p&gt;I like this setup a lot. All development happens on my desktop and I
have a consistent version controlled collection of all modules
available to my systems. I am using it give some colleagues that are
learning puppet a nice environment that won’t mess up my systems.&lt;/p&gt;

&lt;p&gt;But I have some custom in-house modules and I want to put the
OpenStack puppet modules in the same git branch beside them. The
existing tools like puppet module and puppet librarian, etc. do not
work in this use case. I want to be able to use git for these external
repos and be able to easily share any patches I make with
upstream. Enter git subtree.&lt;/p&gt;

&lt;h2 id=&quot;git-subtrees-to-integrate-openstack-puppet-modules&quot;&gt;Git subtrees to integrate OpenStack puppet modules&lt;/h2&gt;

&lt;p&gt;Git subtree is part of the git package contrib files. Enabling it on
my system was simple:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd ~/bin
cp /usr/share/doc/git/contrib/subtree/git-subtree.sh .
chmod +x git-subtree.sh
mv git-subtree.sh git-subtree
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now I can go to my modules directory and add in the OpenStack puppet
modules&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;for arg in cinder glance horizon keystone nova; do \
    git subtree add --prefix=modules/$arg \
      --squash https://github.com/stackforge/puppet-$arg stable/grizzly;\
done
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are some more supporting modules like inifile, rabbitmq, apt,
vcs, etc. Look in openstack/Puppetfile for the full list.&lt;/p&gt;

&lt;p&gt;Next was to enable the modules on my machines. First the hiera data
needs to added for the network config. I was inspired by Chris Hodge’s
&lt;a href=&quot;http://www.youtube.com/watch?v=owpi1WF9dws&quot;&gt;video&lt;/a&gt; and &lt;a href=&quot;https://gist.github.com/ody/5718115&quot;&gt;hiera data&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The gist has some minor issues. I posted a &lt;a href=&quot;https://gist.github.com/kscherer/6383077&quot;&gt;revised version&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The last piece is to enable the modules on the nodes&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;node 'controller' {
    include openstack::repo
    include openstack::controller
    include openstack::auth_file
    class { 'rabbitmq::repo::apt':
        before =&amp;gt; Class['rabbitmq::server']
    }
}
node 'compute' {
    include openstack::repo
    include openstack::compute
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Most of this infrastructure already existed or I had already done in
the past. I was able to reimage 3 machines and have a working grizzly
installation in about 3 hours.&lt;/p&gt;

&lt;p&gt;Many thanks to all people who have contributed to Debian, Ubuntu, Puppet and
the OpenStack puppet modules.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Starting Openstack deployment</title>
   <link href="https://kscherer.github.io//2013/04/22/starting-openstack-deployment"/>
   <updated>2013-04-22T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2013/04/22/starting-openstack-deployment</id>
   <content type="html">&lt;h1 id=&quot;starting-with-openstack&quot;&gt;Starting with Openstack&lt;/h1&gt;

&lt;p&gt;I have some experience with Xen, but no experience with any software
that controls a hypervisor. Wind River has several customers
interested in using oVirt and Openstack with Wind River Linux. Another
team is looking at oVirt, but no one had taken up the Openstack
investigation. I have experience with Puppet and Puppetlabs has some
official Openstack modules, so it seems a good place to start.&lt;/p&gt;

&lt;h1 id=&quot;fedora-18&quot;&gt;Fedora 18&lt;/h1&gt;

&lt;p&gt;I re-purposed a coverage builder and installed Fedora 18. I had read
about Openstack and Fedora and thought that would be a good place to
start. Then Redhat announced the &lt;a href=&quot;https://github.com/redhat-openstack/packstack&quot;&gt;Packstack&lt;/a&gt; and &lt;a href=&quot;http://openstack.redhat.com/Main_Page&quot;&gt;RDO&lt;/a&gt; project
and I decided to give it a try.&lt;/p&gt;

&lt;p&gt;The initial install failed due to selinux being disabled and conflicts
with NIS (our NIS deployment contains users with uids that conflict
with the ones in the rpms). When I finally got the packstack installer
to complete after a clean install, openstack refused to recognize the
admin user. So I did a reinstall using CentOS 6.4 and everything
worked without issue.&lt;/p&gt;

&lt;h1 id=&quot;openstack-and-images&quot;&gt;Openstack and images&lt;/h1&gt;

&lt;p&gt;My experience with virtual machines has always been boot and install
onto some empty, usually virtual, disk. Openstack was my first
interaction with images. The docs recommend a base F18 image. The
first attempt to download using the horizon interface seemed to
hang. I was able to wget the image on my local machine 30 minutes
after the download using the web page had started.&lt;/p&gt;

&lt;h1 id=&quot;openstack-and-lvm&quot;&gt;Openstack and LVM&lt;/h1&gt;

&lt;p&gt;My initial install of the host OS created a large LVM partition called
cinder-volumes for the Openstack Block storage service. Unfortunately,
the packstack installer renamed the volume group. I had to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Stop the cinder-volume service&lt;/li&gt;
  &lt;li&gt;Delete the packstack created volume group and physical volume&lt;/li&gt;
  &lt;li&gt;Rename the local LVM volume group&lt;/li&gt;
  &lt;li&gt;Restart the cinder-volume service.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;openstack-and-volumes&quot;&gt;Openstack and volumes&lt;/h1&gt;

&lt;p&gt;I went into the Volume section of the Horizon Web UI and created a
Volume. Running lvs on the host shows that a volume was created in the
correct place. I launched the instance as tiny and noticed that it did
not require a volume, which I found strange. I had setup a security
group to allow ssh and inject my public key. I then associated a
floating ip and was able to log into the vm! This was a happy moment.&lt;/p&gt;

&lt;p&gt;After some poking around, a disk space check revealed that the VM had
10 GB disk space. This confused me because I had not associated it
with a volume. So I repeated the process but setup the VM to boot off
the volume I created earlier. This time the boot failed due to missing
boot image.&lt;/p&gt;

&lt;h1 id=&quot;some-ec2-history&quot;&gt;Some EC2 history&lt;/h1&gt;

&lt;p&gt;I did more research and found this &lt;a href=&quot;http://alestic.com/2012/01/ec2-ebs-boot-recommended&quot;&gt;article&lt;/a&gt;. It explains some the
history of virtual machine infrastructure. When EC2 was first
launched, the VMs had no persistent storage. Customers had to use some
sort of web service like S3 to persist information. This kind of image
is called instance-store; Openstack refers to it as Ephemeral storage.&lt;/p&gt;

&lt;p&gt;Then Amazon introduced EBS to provide persistent storage. It could be
attached to an instance-store image as a another block device. In
Openstack this is handled by Cinder as block level storage.&lt;/p&gt;

&lt;p&gt;Then came the ability to boot from EBS volumes. This matches my
internal model of a virtual machine as persistent like a physical
machine. By default the volumes are empty, so the next step is
populating the volume with the proper bits. I have experience with
Cobbler to use kickstart and others to install new systems, but I was
curious if the image could be “transferred” to the volume.&lt;/p&gt;

&lt;p&gt;The Horizon Web UI was not helpful. Some more research revealed the
following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cinder create --image-id &amp;lt;image-id&amp;gt; --display-name mybootable-vol 10
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This runs qemu-img convert and writes the raw image to the new cinder
volume. This volume can be booted directly, but the Web UI still
requires an image name which is ignored.&lt;/p&gt;

&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;

&lt;p&gt;Types of Openstack VMs:
1) Ephemeral storage only. Default size of 0 means use image disk
size.
2) Ephemeral + block storage. VM must format if blank and
mount. Volume can only be attached to one VM.
3) Block storage only. Web UI does not support image to volume
conversion but cinder does.&lt;/p&gt;

&lt;p&gt;Next on my list is NetApp cinder driver and installation on Ubuntu
12.04 Server using official puppet modules.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Lessons learned running e2croncheck</title>
   <link href="https://kscherer.github.io//linux/2013/03/19/lessons-learned-running-e2croncheck"/>
   <updated>2013-03-19T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux/2013/03/19/lessons-learned-running-e2croncheck</id>
   <content type="html">&lt;p&gt;Filesystems (ext4, xfs, zfs, etc) are one of those things whose
failure nobody really wants to think about. The difference between a
hard disk failure and complete filesystem corruption is largely
academic. However a filesystem has many failure modes and the scariest
is silent corruption that goes undetected for a long time. Worst case
scenario is that backups are rendered useless.&lt;/p&gt;

&lt;p&gt;The long time solution to detecting and correcting minor filesystem
issues is fsck. The tool has several limitations:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;The check can only be run while the filesystem is offline.&lt;/li&gt;
  &lt;li&gt;The check is serial per filesystem. It can be parallelized across
multiple filesystems&lt;/li&gt;
  &lt;li&gt;As the amount of data on the filesystem grows, the time to complete
the check grows as well.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What seems to be standard practice is the following:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Install and configure system with defaults&lt;/li&gt;
  &lt;li&gt;Leave system running as long as possible&lt;/li&gt;
  &lt;li&gt;When the machine hangs at a critical moment, reboot the machine&lt;/li&gt;
  &lt;li&gt;Wait for hours until admin logs into console and fsck check is
manually killed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This has several obvious drawbacks:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;fsck almost never gets a full run, especially if the system uses
hibernation and/or S3 sleep&lt;/li&gt;
  &lt;li&gt;The downtime always happens at the worst possible time&lt;/li&gt;
  &lt;li&gt;No one knows how long an fsck is actually going to take&lt;/li&gt;
  &lt;li&gt;The fsck may not be necessary, but the disk/machine needs to be
offline anyways&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Online fsck seems to impossible, because the state of the filesystem
can change in ways that make the check wrong.&lt;/p&gt;

&lt;p&gt;Databases have a similar problem: how to do a backup while the system
is in operation. The solution there is to use filesystem
snapshots. This is how I stumbled upon e2croncheck. The original from
Theodore Ts’o is &lt;a href=&quot;http://ftp.sunet.se/pub/Linux/kernels/people/tytso/e2croncheck&quot;&gt;here&lt;/a&gt;. I found a revised version on GitHub by
&lt;a href=&quot;https://github.com/ion1/e2croncheck/blob/master/e2croncheck&quot;&gt;Ion&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The script creates a read write snapshot of the filesystem. LVM uses a
copy on write snapshot volume to track changes to the original
filesystem. The script thens runs e2fsck on the snapshot which will
report if there is actual corruption on the filesystem that needs to
be repaired offline.&lt;/p&gt;

&lt;p&gt;This seems like a better solution than the standard practice of
ignoring the problem so I setup my next servers in the following way:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Six physical disks in hardware RAID 5&lt;/li&gt;
  &lt;li&gt;Two virtual disks: 500GB system and 8.6TB data&lt;/li&gt;
  &lt;li&gt;System uses ext4&lt;/li&gt;
  &lt;li&gt;Data uses lvm with one lvm physical volume and one lvm volume group&lt;/li&gt;
  &lt;li&gt;Single logical volume at 8TB with 500GB unused space for snapshot&lt;/li&gt;
  &lt;li&gt;Cronjob to run e2croncheck weekly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LVM snapshots are not without their problems. The big one is
performance. There is overhead to the COW filesystem, but thanks to
the Internet I found some benchmarks comparing performance with
&lt;a href=&quot;http://www.nikhef.nl/~dennisvd/lvmcrap.html&quot;&gt;chunksize&lt;/a&gt;. The default chunksize is 4kB and increasing the
chunksize to 64kB increases performance by 10x!&lt;/p&gt;

&lt;p&gt;I also added ionice with e2fsck set to idle priority. So far the
changes mean that the background check does not interfere with
programs that are running.&lt;/p&gt;

&lt;p&gt;The final version of the script is located &lt;a href=&quot;https://github.com/kscherer/puppet-modules/blob/production/modules/e2croncheck/files/e2croncheck&quot;&gt;here&lt;/a&gt; inside a puppet
class to install the file and cron job.&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>When root cannot delete a file</title>
   <link href="https://kscherer.github.io//2012/10/20/operation-not-permitted"/>
   <updated>2012-10-20T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2012/10/20/operation-not-permitted</id>
   <content type="html">&lt;p&gt;Operation not permitted&lt;/p&gt;

&lt;p&gt;It started when dpkg could not upgrade the util-linux package because the
file /usr/bin/delpart could not be symlinked. So I tried to delete the file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo rm /usr/bin/delpart
rm: cannot remove `/usr/bin/delpart': Operation not permitted
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;All operations on the file failed. I tried mv, fsck, reboot into
rescue, etc.&lt;/p&gt;

&lt;p&gt;So I googled “linux ext4 ‘Operation not permitted”. This did not help
much, but I noticed a link about ext2 extended attributes. I have
never used extended attributes, so I did a quick read of the man page
for lsattr and chattr.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd /usr/bin
sudo lsattr delpart
---D-a-----tT-- delpart
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That was a strange set of attributes. So I compared to another random file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo lsattr zip
-------------e- zip
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the problem is found, the solution is straightforward&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo chattr +e -DatT delpart
sudo lsattr delpart
-------------e- delpart
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The question now is how the file got to this state. I can only
speculate that a fsck run “repaired” this corrupted file to this
strange but consistent state. I wonder if there are other surprises
waiting for me on this disk.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Ruby adventure</title>
   <link href="https://kscherer.github.io//2012/09/25/ruby-adventure"/>
   <updated>2012-09-25T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2012/09/25/ruby-adventure</id>
   <content type="html">&lt;p&gt;Puppet is a Ruby project and many of the tools that work with Puppet
are also Ruby tools. For example, RSpec and Vagrant. To get access to
these tools, the “normal” path would be to use the Ubuntu package
manager, apt-get. But the Ruby world has its own packaging system,
rubygems. The first thing I tried and used was:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;gem install --user-install
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which installs ruby code as a local user. Less use of
sudo and root access is a good thing. The only downside is adjusting
the PATH variable to find the rubygems.&lt;/p&gt;

&lt;p&gt;The next trick is using RVM to install multiple ruby versions to user
account. This allows another level of containerization. Installation
is simple.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl -L https://get.rvm.io | bash -s stable
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Only annoyance is that the script modifies my bashrc and
bash_profile. I erased those edits, sourced the rvm initialization
file and installed a recent ruby.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;source ~/.rvm/scripts/rvm
rvm install 1.9.3
rvm use 1.9.3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next install the gem packages for testing inside the rvm&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;gem install --no-ri --no-rdoc puppet rspec-puppet puppetlabs_spec_helper
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next step was to use rspec-puppet to run my puppet class unit tests,
but after much debugging, it turns out the move to Puppet 3.0.0 broke
rspec puppet.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;gem install puppet -v 2.7.19
gem uninstall puppet -v 3.0.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now my rspec unit tests pass, but unfortunately just as slowly as
before. Looks like Ruby 1.9.3 didn’t speed things up much.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Dell R720 Server install</title>
   <link href="https://kscherer.github.io//2012/09/12/dellr720-remote-install"/>
   <updated>2012-09-12T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2012/09/12/dellr720-remote-install</id>
   <content type="html">&lt;p&gt;Got a brand new Dell R720 server to install recently. The task sounded
simple enough: Install CentOS 6.3 x86_64 on the machine as quickly as
possible. The configuration of the server included 6 2TB drives which
will be used to store the code that will be shipped to the customer,
so RAID0 is not a good choice. A good place to start would be the RAID
configuration.&lt;/p&gt;

&lt;p&gt;The server comes with iDRAC7 and it allows me to connect to the server
which was physically located over 3000km away. Linux and the iDRAC6
version of the VNC viewer did not work with arrow keys. A crazy hack
was necessary. It is described in detail here:
https://github.com/pjr/keycode-idrac. This was fixed with
iDRAC7. Progress!&lt;/p&gt;

&lt;p&gt;Out of the box, Dell grouped the 6 disks into a RAID5 disk group, but
on top of that were 5 2TB and 1 80GB virtual disks on this disk
group. Further research shows that the BIOS cannot boot partitions
larger that 2TB and that many older file systems cannot handle disks
larger than 2TB. But newer technologies like
&lt;a href=&quot;http://en.wikipedia.org/wiki/GUID_Partition_Table&quot;&gt;GPT&lt;/a&gt; and
&lt;a href=&quot;http://en.wikipedia.org/wiki/Ext4&quot;&gt;ext4&lt;/a&gt; can handle theoretically
handle this. Let’s give it a whirl.&lt;/p&gt;

&lt;p&gt;In the RAID controller BIOS, the 6 virtual disks are deleted and one
massive 9TB disk is created. This disk will need to be booted using
UEFI.&lt;/p&gt;

&lt;p&gt;Next step, go into the system setup. The BIOS now uses the latest
&lt;a href=&quot;http://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface&quot;&gt;UEFI&lt;/a&gt;
and the arrow key that worked in the text mode bios now no longer
work. However after pressing the Num Lock key, the UEFI options can be
navigated using the keyboard. This is very annoying, but workable.&lt;/p&gt;

&lt;p&gt;Under Boot Setting the boot system is switched to UEFI. To install
CentOS from CD, the virtual media option on the iDRAC uses a special
USB device to attaches the CentOS 6.3 install iso as CD on the
server. After waiting almost 5 minutes for the server to reboot, the
UEFI boot from Virtual CD fails. Oh well back to the old BIOS booting.&lt;/p&gt;

&lt;p&gt;This means a smaller boot disk will be necessary. Back into the RAID
controller BIOS, delete the single disk and recreate one 500GB disk
and a larger 8.5TB disk. This time, the Virtual CD is found and the
install proceeds as usual.&lt;/p&gt;

&lt;p&gt;But upon reboot, the system hangs and the system does not boot! I use
the CentOS netinstall CD as rescue to find out that the kickstart
decided to install into the large data drive, that the BIOS cannot
boot! To tell kickstart to ignore the large data drive it was necessary
to add:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ignoredisk --drives=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:1*
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;to the partition information in the kickstart and reinstall. This time
the install is successful and the server reboots into CentOS 6.3 and
puppet does the base configuration.&lt;/p&gt;

&lt;p&gt;Next step is to prepare the large data drive to hold the data. One of
the irritations of using ext and many other filesystems is that fsck
is only possible when the disk is offline. For servers, this means
that fsck never gets run. Occasionally the server is rebooted and an
fsck is started. But this is usually the worst possible time to do
it. The result is that fsck is completely disabled and fingers are
crossed. One potential solution is e2croncheck.&lt;/p&gt;

&lt;p&gt;It uses lvm read only snapshots to run fsck on a disk without making
it necessary to take the disk offline. There are a couple caveats of
course:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;There is a performance impact of running fsck on a disk&lt;/li&gt;
  &lt;li&gt;The disk must obviously be on an lvm partition&lt;/li&gt;
  &lt;li&gt;There must be free space in the volume group to hold any writes that
are done while the snapshot is active&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this and alignment issues in mind, the following commands were
used to create the lvm and ext4 partition:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;parted -s /dev/sdb mklabel gpt
parted -s /dev/sdb mkpart data ext2 1M 100%
pvcreate --dataalignment=1M -M2 /dev/sdb1
vgcreate vg /dev/sdb1
lvcreate -L 8T -n git vg
mkfs.ext4 -m 0 -E stride=16,stripe_width=80 /dev/mapper/vg-git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The RAID controller uses 64Kb block size, so I use a partition offset
and a data alignment of 1Mb to ensure all blocks line up on
boundaries. To help ext4 work within the RAID effectively,
stride=4K*16=64K and stripe_width=(6 disks - 1 parity = 5) * 16.&lt;/p&gt;

&lt;p&gt;The server is now finally ready to be used.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Download CBC Radio stream </title>
   <link href="https://kscherer.github.io//linux/2012/04/20/download-unofficial-cbc-podcast"/>
   <updated>2012-04-20T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//linux/2012/04/20/download-unofficial-cbc-podcast</id>
   <content type="html">&lt;p&gt;I listen to CBC Radio a lot. I often find the quality of the show
&lt;a href=&quot;http://www.cbc.ca/ideas&quot;&gt;Ideas&lt;/a&gt; superb. Recently there was a show
called “All in the Family” which introduced me to the
&lt;a href=&quot;http://www.cdc.gov/ace/index.htm&quot;&gt;ACE&lt;/a&gt; (Adverse Childhood
Experiences) study. The results of this study are worthy of another
blog post, but this post is about something technical. Sorry.&lt;/p&gt;

&lt;p&gt;I wanted to download this show as a podcast so I could share it. I
went the Ideas website and the show is not available as a podcast but
there was a link to listen to the current show. This link brings up a
Flash Audio player which plays the show.&lt;/p&gt;

&lt;p&gt;At this point I knew that since the audio is being played on my
computer I can capture it. First I looked in the web source for an
obvious link, but the code is so obfuscated I gave up quickly.&lt;/p&gt;

&lt;p&gt;Next I considered recording the audio while it was playing, but I did
not want to tie up the computer for an hour.&lt;/p&gt;

&lt;p&gt;After a few Google searches I stumbled across some posts that
mentioned UrlSnooper to figure out the location of a
stream. UrlSnooper is a Windows program, but there was a mention of a
Linux program called &lt;a href=&quot;http://ngrep.sourceforge.net/&quot;&gt;ngrep&lt;/a&gt;. What a
great tool! I ran the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo aptitude install ngrep
sudo ngrep -W byline -qilw 'get' tcp dst port 80
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then I used FireFox to open the audio stream and saw the following in
a long stream of output in the console where I ran ngrep:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;T xxx.xxx.xxx.xxx:35064 -&amp;gt; 64.208.5.41:80
GET /maven_legacy/thumbnails/ideas_20111213_27203_uploaded.mp3 HTTP/1.1.
Host: thumbnails.cbc.ca.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An mp3 on thumbnails.cbc.ca?? I tried:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;wget http://thumbnails.cbc.ca//maven_legacy/thumbnails/ideas_20111213_27203_uploaded.mp3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Done! I had the mp3 of the Ideas show. I love Linux.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Rebuild MD3000 RAID0</title>
   <link href="https://kscherer.github.io//md3000/2012/04/13/rebuild-md3000-raid0"/>
   <updated>2012-04-13T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//md3000/2012/04/13/rebuild-md3000-raid0</id>
   <content type="html">&lt;p&gt;Some background. We do lots of coverage builds of the Wind River Linux
products and we have a blade cluster attached to various SAN devices
which hold the temporary build data. The builds are very CPU and disk
intensive and push the limits of the SAN devices. The default
configuration for a MD3000i SAN is one large RAID5 group, but this
results in one unused RAID controller and unnecessary redundancy for
our case of temporary build files.&lt;/p&gt;

&lt;p&gt;So I reconfigured the MD3000i to have 2 RAID0 disk groups, one for
each controller. This keeps both RAID controllers busy. Within each
RAID group I made a disk for each host, i.e. 2 disks for each
host. Each host then spread its builds evenly across the disks. Each
builder runs 4 simultaneous builds, 2 on each disk.&lt;/p&gt;

&lt;p&gt;Because MD3000i has redundant controllers, etc the multipath driver is
necessary. I also found that I needed to make an alias for the wwid
of each iSCSI disk to avoid naming problems when /dev/mapper/mpath0
became mpath3 for unpredictable reasons.&lt;/p&gt;

&lt;p&gt;The only problem with RAID0 is that when a physical disk dies, the
whole disk group dies too. No data is lost, but I thought I would
capture the rebuild process here for future reference.&lt;/p&gt;

&lt;p&gt;First log into dell storage manager.&lt;/p&gt;

&lt;p&gt;Go to Modify &amp;gt; Delete Disk Groups and delete failed RAID 0
virtual disks and group.&lt;/p&gt;

&lt;p&gt;Now create another RAID0 Disk group and the virtual disks. Make sure
the names of the virtual disks and the host are the same as the
previous virtual disk. Make sure the preferred owner for each disk
used by the same host is different.&lt;/p&gt;

&lt;p&gt;Log into machine. Unmount failed drive and remove from multipath. I
use the device ba2 (buildarea2) in my examples.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;puppet agent --disable
umount /ba2
multipath -f ba2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run multipath -d (dryrun) to get the wwid of the new disk.&lt;/p&gt;

&lt;p&gt;Edit /etc/multipath.conf to replace now invalid wwid with new
wwid. Also change wwid in puppet class for this host. I am using
extdata in puppet to manage the contents of the /etc/multipath.conf
file.&lt;/p&gt;

&lt;p&gt;Run multipath -d to verify that multipath alias is working. Run
multipath (no args) to actually create disk.&lt;/p&gt;

&lt;p&gt;Now create partitions. Ensure that partition is RAID stripe
aligned. Use gpt because of large partitions (and it is new standard)&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;parted /dev/mapper/ba2 mklabel gpt
parted -s /dev/mapper/ba2 mkpart ba2 ext2 1M 100%
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Format the drive. On CentOS 5 I have no choice but to use ext3. Use
stride width of 32 which corresponds with 4k block x 32 = 128K raid
block size.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;mkfs.ext3 -m 0 -E stride=32 /dev/mapper/ba2p1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For some reason the format blazes through blocks 0 to 9000 and then
slows to a crawl.&lt;/p&gt;

&lt;p&gt;Final step, turn off fsck checks so reboots don’t hang. I don’t like
it, but keeping the builders offline for hours to run fsck is not
acceptable. It is easier just to reformat the drive regularly. Again
the data on these drives is temporary and easily replaced.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;tune2fs -i 0 /dev/mapper/ba2p1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With the disk recreated, I can run puppet to rebuild all
infrastructure necessary for the coverage builders.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;puppet agent --enable
puppet agent --test
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;All done. Take a couple hours. If lots of disks fail, then this takes
too long. I never did a test with RAID5 in this configuration to see
if the performance is acceptable. The builds are very sensitive to I/O
bandwidth and are running well with RAID0 so I may not have time to
run a comparison.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Wisdom from David TT</title>
   <link href="https://kscherer.github.io//hobbies/2012/03/27/wisdom-from-david-tt"/>
   <updated>2012-03-27T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//hobbies/2012/03/27/wisdom-from-david-tt</id>
   <content type="html">&lt;p&gt;I play violin with the
&lt;a href=&quot;http://ottawachamberorchestra.com/&quot;&gt;Ottawa Chamber Orchestra&lt;/a&gt; which
is an amateur orchestra. Our conductor is David Theis-Thompson who is
also a professional violinist/violist with the NAC orchestra. He is
one of the best conductors I have ever been lucky enough to play
with. I hope to share some of his wisdom here:&lt;/p&gt;

&lt;p&gt;“Remember that in Edward Grieg’s music, there are always trolls”&lt;/p&gt;

&lt;p&gt;“You have to play it just like Brahms, even if it is Schumann”&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Toi+Moi - Gregoire and Star Akademie</title>
   <link href="https://kscherer.github.io//culture/2012/03/27/toimoi"/>
   <updated>2012-03-27T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//culture/2012/03/27/toimoi</id>
   <content type="html">&lt;p&gt;My daughter attends a French Catholic school and recently the
administration has chosen
&lt;a href=&quot;https://www.youtube.com/watch?v=W5zMPmu-EJw&quot;&gt;Toi+Moi&lt;/a&gt; as a theme
song. My daughter has been singing it almost non stop. It is wonderful
to see her so excited and I am enjoying her version of the song. One
evening she wanted to hear the song at home, but we have do not have a
TV so I went to Google and the first link was a link to the Wikipedia
entry for the song. From there I found a link to the official video by
&lt;a href=&quot;https://www.youtube.com/watch?v=kOru9ITtVIg&quot;&gt;Gregoire&lt;/a&gt;. The song is
simple and catchy and I was finally able to understand some of the
lyrics that had been lost through the school PA system.&lt;/p&gt;

&lt;p&gt;The contrast between these two videos could not be more striking. The
original video has ordinary happy people and the song has dynamics and
a nice understated piano part. The album was funded by 347 people
using the music equivalent of Kickstarter and 40 of them were invited
to appear on the video. The video invites everyone to join the
dance and the people are genuinely silly and happy.&lt;/p&gt;

&lt;p&gt;The Star Academie video is pure celebrity making machine. Lots of
makeup, glamorous clothing, carefully generated emotion, heavy bass,
zero nuance and predictable choreography. I know that many people like
it, but as a parent I much prefer my child to watch the original
version. The story of a bunch of strangers donating money to
help someone realize their dream is preferable to the un-reality of an
entertainment corporation manufactured competition.&lt;/p&gt;

&lt;p&gt;I do not want to diminish the talents of the performers on Star
Academie. They are chasing their dreams as well. It is unfortunate
that this is being exploited by the producers. This has been the model
for a long time. But the success of Gregoire shows that a new option
is now possible.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Introduction</title>
   <link href="https://kscherer.github.io//2012/03/23/intro"/>
   <updated>2012-03-23T00:00:00+00:00</updated>
   <id>hhttps://kscherer.github.io//2012/03/23/intro</id>
   <content type="html">&lt;p&gt;This is my first post using Jekyll. Most of the blog aesthetics was
copied from &lt;a href=&quot;http://julianyap.com/&quot;&gt;Julian Yap&lt;/a&gt;. Thank you Julian!&lt;/p&gt;

&lt;p&gt;I am preparing to upload a multi-part series on setting up a Xen
cluster with over 30 different flavours of Linux.&lt;/p&gt;
</content>
 </entry>
 
 
</feed>
