Fuzzmz | ramblings on tech, life and randomness

Covid-19 vaccines and Romania

Serban Constantin — Sun, 28 Mar 2021 18:20:05 GMT

It saddens me to say that the way the Covid-19 vaccinations have been handled in Romania is, for lack of a better word, complete shitstorm .

The main problem came not from the technology behind the platform, though we'll get to talk about that a bit as well. As is usually the case with Romanian authorities, the communication side of things is where things break down.

For a ;tldr you can jump straight to the timeline or, for the ~~geeks~~ data scientists out there, to the code behind part of the analysis.

I'd suggest though at least skimming through the full article.

A short intro

You see, in Romania, like in most other countries, the population was split into three groups in order to more efficiently distribute the vaccine:

Category 1: persons working in healthcare and social services;
Category 2: persons with high risk (over 65 years old, people diagnosed with chronic conditions, etc.) and essential workers;
Category 3: the general population.

To better handle this, towards the end of December 2020 an online platform is launched which initially allows individuals from category 1 to make vaccination appointments at any of the vaccination centers with open spots. Following that, around the middle of February, people from category 2 can do the same.

An issue slowly creeps

Playing a bit with the official data from the government Covid-19 vaccination transparency report, something interesting shows up.

Despite not being officially possible to get a vaccine if you're not in these two categories, looking over the data we can spot something:

Based on the stats from the 14th of March we can notice that already 1950 people from Category 3 had at least one dose of vaccine administered to them.

As is usual in Romania, the official communications said that only essential and at-risk people are able to get vaccinated, and yet the data says otherwise.

I agree that maybe I'm being nitpicky about the issue considering that when looking at the total vaccination numbers Category 3 is insignificant compared to the other two.

The problem is though that the data generally ignores the social impact: after a long period of lockdown where most of the population respected the rules it can be frustrating to see that others cut ahead in line.

And this exact issue of a queue, so dreaded in the Romanian culture, is what will come to bite us again in a bit.

The announcement for the masses

Around the 13th of March the officials announce that starting from the 15th, the general population will be able to sign up for a vaccine, or at least, get on the waiting list for one.

Good communication, right? We have a date, people can prepare, your tech team can finish doing maintenance and testing considering that a large part of the population will be using the website.

Except that on the 14th of March at around 6:30PM the website was wide open for registrations.

As expected, a large number of people flooding the website brings it down, but not before being available for almost an hour during which the waiting lists for the few vaccine centers in Bucharest filled up with hundreds of people.

To address the situation, the officials in charge of the platform announced that they're going to do a staged rollout the next day based, if I remember correctly, on county infection metrics. Based on that, the waiting lists for Bucharest would be open at 3PM on May 15th.

Unexpectedly, at around 4am on the 15th, people could yet again schedule themselves not only on the waiting list, but also to get actual vaccines in some centers.

The rest of the day goes on without surprises, with the platform being either online or offline depending on the actual load.

Communication issues

My main frustration comes from the issues the government had around clearly communicating the scope and timelines of the Covid-19 vaccination program. By not respecting its set schedules it eroded the trust people had in them and their ability to manage the situation.

This was yet again clearly seen when it came to adding (and removing) vaccination centers on the platform: it seemed to happen haphazardly and randomly, without any clear details.

This lead to situations such as people being the 7000th person on the waiting list for a center while new centers were being added with open waiting lists. What's even worse, existing centers were being removed and people reassigned to others without knowing what position you're holding in the queue.

To address this communication problems different websites popped up.

Date la zi

Datelazi.ro prezintă infografii actualizate periodic cu datele furnizate de autoritățile competente.

COVID-19: Date La Zi

Date la zi offers a great overview dashboard of the Covid-19 situation in Romania, and what's more important, it's an open source project.

Vaccin.live

Informatii pentru locuri libere in centrele de vaccinare

Vaccin.live

Vaccin.live is the most popular of them, offering a per-county list of available centers, their waiting lists and the vaccine used (something which the official website doesn't display, but makes indirectly available in the response payload). Initially it also had the advantage of being updated more often than the map present on the platform itself.

vaccinica

Statistici vaccinare Romania

vaccinica is a newer appearance, but a very useful one, providing a data dashboard around the vaccination effort based on the official Covid-19 transparency report.

The timeline

Bellow is a timeline of the events, but please note that it's mostly from my recollection, so there might be discrepancies around the time and dates of early events (platform launch, etc.)

27 Dec 2020

Vaccination platform launches allowing medical and social workers to make vaccination appointments.

15ish Jan 2021

People from category 2 (over 65, at risk, and essential workers) can make vaccination appointments.

13 Mar 2021

Announcement is made that category 3 (general population) will be able to make appointments or get in the queue starting from 15 Mar.

14 Mar 2021 18:30

Category 3 people can for a short time make appointments before the official launch.

14 Mar 2021 19:30

Platform goes down.

14 Mar 2021 20:40ish

Officials announce a staged rollout starting from 9am the next day.

15 Mar 2021 04:00ish

Platform is again up and open for scheduling.

15 Mar 2021

Platform is either online or offline depending on load for most of the day.

16 Mar 2021 onward

Platform is stable but centers get added or removed randomly with no prior notice.

The code

You too can also play with the data if you're interested using the following Jupyter Notebook.

GitHub: fuzzmz/rocovidstats/main

Click to run this interactive environment. From the Binder Project: Reproducible, sharable, interactive computing environments.

The source code for it is also available for those so inclined.

fuzzmz/rocovidstats

Exploratory JupyterLab notebook for the Covid-19 transparency data provided by the Romanian government. - fuzzmz/rocovidstats

GitHubfuzzmz

Jenkins as a back-end

Serban Constantin — Tue, 06 Mar 2018 21:00:00 GMT

I have had the honor of being invited as a speaker to Jenkins User Conference 2018 in Tel Aviv, Israel to talk about how Jenkins can be used as a sort-of backend for jobs and queues.

The slides

The video

Unfortunately in the video recording they used the presentation without triggering the animations. If you want to see them, follow along with the slides above.

The abstract

Automate all the things? But when you can't, why not pass them to other people? In this presentation I'll be talking about the history of our deployment process and how this got simplified by leveraging Jenkins parametrized jobs and custom front-ends to create tools used by project managers and stakeholders. From editing config files to deploy, to clickable solutions.

Scientific code quality

Serban Constantin — Tue, 06 Mar 2018 21:00:00 GMT

Was recently talking to a friend pursuing a physics Ph.D. about bad scientific code quality and the possible reasons behind it. We came to the conclusion that it's most likely equal parts culture, motivation, and experience. What follows is that chat distilled.

Looking over some scientific code written in Python with a friend doing his Ph.D. I couldn't not notice how bad its quality was. This in turn lead to a discussion around coding culture and a biased scientific stance on coding.

The coding culture

One of the possible reasons for horrible code is that, for most of it, academic code tends to be throwaway code. In academia you write chunks of code (sometimes fairly big chunks of code) to test your hypothesis, or to prove that some algorithm works, but rarely if ever will anyone, most likely including the one who wrote the code, will ever read or modify it; this in turn means that it doesn't make any sense to optimize for maintainability, weed out bugs or use proper engineering practices such as doing code review, writing documentation, tests, CI/CD flows, etc.

This coding culture goes hand in hand with...

Scientific upbringing

Maths (and by extension physics and other hard sciences) is terse, so it's more natural for someone heavily steeped in math to begin writing code that resembles it (also a reason I think Python was adopted is because it looks more like math than other languages). That means terribly unreadable code for someone not intimately acquainted to the problem space; so you'll see lots of single character variables in functions that look like submissions to a code golf competition.

To give even more context to terse variable names, sometimes they are certainly cultural (e.g. fhat = ... for Fourier coefficients), but more often than not scientists are implementing variable names that read straight out of a paper; this makes sense, as it makes it easier to follow the code in parallel to the paper. There's little chance to read some paper talking about a quantity x_ij and name it something else, there's usually just no point. Even if it did read smoother in the code alone, you can lose some readability that goes along with the paper.

Of course when the quantity is something physical and simple it's not a big deal, like temperature instead of T, but often these variables represent some really random thing that would be silly to name. I mean if we're talking about a second order mixed partial there's no getting around it, it's getting named f_xy, code reviewers be damned.

This means that once you know the underlying math, a lot of scientific programming is pretty straight-forward. There usually aren't a lot of inter-connected moving parts, so they never really have the need to discover design patterns, standards, and best practices common in the software world, which takes us back to the coding culture.

Of COM, memory and restarts

Serban Constantin — Wed, 31 May 2017 21:00:00 GMT

If you're old enough to have played with Windows back in the 95/98 era then you might have messed around with .com files. I once stumbled upon a quick way to restart the PC by using a basic .com file, but why did that work, and more importantly, why doesn't it work on more recent versions of the OS?

A bit of history

Most people might think of EXE files when asked "what is a program on Windows?", but before thta the only programs that existed on Windows land were COM files. The COM format, inherited from CP/M wasn't actually a format, it was just a memory image which got loaded into memory unchanged and then ran from the first byte.

The downside of COM files, and what eventually lead to the creation of EXE was that the programs couldn't be bigger than 64KB (I guess this had something to do with register segments being around 64KB on the 8086), and slowly COM was forgotten.

Of memory and processor instructions

Getting to how to restart a PC using a COM file, I ran accross it by accident, by renaming a .bat file which had something like cd c:\games in it to COM, which reseted the PC.

Now, knowing tha COM files are actually memory images, the restart actually makes sense. cd disassembles into arpl [si+0x20],sp which despite looking like a priviledged instruction, it actually isn't; more than this, it isn't even a valid instruction in real or virtual mode, which causes the illegal instruction exception and restarts the PC.

Another fun fact is that the code for the warmest reboot on PC is 0xcd 0x19 (int 0x19) which jumps back into the BIOS code when it starts looking for the operating system to boot. You could then have files containing INT 19; (CD 19) which was the fastest way to soft reboot when the file was run from a MSDOS mode shortcut.

But wait, there's more! This has the side-effect of confusing the NT NTLDR, which spits out a "your computer does not have enough free conventional memory to run Windows NT" error message.

SSL security and trust

Serban Constantin — Thu, 23 Mar 2017 21:00:00 GMT

Symantec has been caught yet again failing to make its due diligence when issuing certificates: this time it looks like they have failed to properly validate at least 30.000 certificates.

In response, the Chrome team announced that it will stop recognizing the extended validation status of all certificates issued by Symantec-owned certificate authorities, and to nullify all currently valid certificates in a staggered approach by decreasing the maximum age of Symantec-issued certs over a series of releases, ending with a validity period of only 9 months by Chrome 64' release.

The web of trust

The problem with certificates authorities is that you need to trust them in order for the whole system to work. That is, you trust that Symantec, or Thawte, Verisign, Equifax and others (all bought by Symantec in the past years) actually checked that whoever requested that certificate is the actual owner of that business.

The issue here is not the validity of the certificates but the trust under which they were issued. EV certificates promise Extended Validation of the legal entity requesting the certificate: check with BBB and double check with Dun & Bradstreet, for instance. If you can skip the 'extended validation'-part by paying more, EV becomes meaningless because a fraudster with deep pockets can get an EV-certificate for his bankofamurica.com-website.

For an example of the process someone goes through to get an EV certificate you can read the great post written by Troy Hunt, Journey to an extended validation certificate.

TooBigToFail™

The problem is these entities getting TooBigToFail™, which then makes them almost invulnerable to meaningful actions.

By consolidating so many certificate authorities under one umbrella it makes it almost impossible to revoke their certificates without breaking a large part of the Internet. To put this into perspective, Symantec certificates represent more than 30 percent of the Internet's valid certificates by volume in 2015 and around 42% of all certificate validations.

In a perfect world companies who have their root certificates entrusted as part of the TLS core infrastructure need to have better checks and balances than to simply say "oops, we done goofed" after the fact. If they demonstrate - as Symantec has demonstrated - that they can't manage that, their root certificates need to be yanked out of the chain of trust as soon as possible.

It will be interesting to see how the customers who have paid for their certs in good faith react to this move by Google, and hopefully the backlash will make Symantec take better care in the future.

What next?

The question now is what happens next? What do we do to ensure that we're in the clear?

Well, a solution would be to move away from Symantec and the rest and towards Let's Encrypt, especially if you don't need EV certificates.

But Let's Encrypt really does almost NO validation what-so-ever, I hear you say. As long as you are in control of the web server at said domain, you can get a certificate. How many web servers are compromised on a daily basis at any one domain?

This is only sometimes helpful in practice, but it's important to remember the distinction between "what you promise" and "how much of what you promise you deliver".

Let's Encrypt doesn't promise all that much; their system is pretty much just designed to ensure that the entity that requested the certificate has (or very recently had, their certificates do last a modest period of time after issue) operational control of the host for which the certificate is issued.

That isn't a terribly grand promise: it doesn't imply anything about the real-world owner of the site; tell you whether or not somebody compromised the host, etc. but it has the virtue of both being relatively easy and cheap to automatically verify and of being something where the most common failure mode (compromised host) isn't a threat that SSL is supposed to protect against, no matter how exhaustively vetted (it is supposed to protect the channel between you and the server from 3rd parties, not assure you that the guy running the server is trustworthy); so failures there aren't terribly serious.

To the best of my knowledge, while LE doesn't promise much, they have so far delivered it; and have avoided failures that threaten other people's sites. If my little VPS gets hacked, or my shoddy admin dashboard has an exploit, somebody else getting an LE cert for my site is quite plausible; but that's because somebody else does, indeed, have operational control of my site. Getting an LE cert without demonstrating operational control is what would be really worrisome.

And, that is where Symantec has been trouble: they've been caught issuing enormously powerful certs for high profile domains without the request of the domain holder, which are potent weapons for MiTM attacks. That is bad.

A short on switchable graphics

Serban Constantin — Thu, 29 Sep 2016 21:00:00 GMT

With the increase in the number of people using only laptops for their computing needs came new technologies used to provide both power and battery life. Switchable graphics - moving from an integrated GPU to a dedicated one, is one such technologies. But how do they work?

Per-display graphics card?

This whole post started after we received new laptops at work equipped with both an Intel HD Graphics 530 integrated GPU (iGPU) and an AMD FirePro W5170M (discrete GPU - dGPU). One of my colleagues was wondering if, when running multiple monitors, it is possible to get one or the other of the displays to be driven exclusively by the dGPU.

The short answer to the above question is no. It's impossible because only the iGPU is connected to the display; the discrete GPU only renders to video memory, and the integrated GPU displays the image on the screen.

Remember that a graphics chip has two jobs: fill the frame buffer with data and output that data to the monitor. The first and the second job actually happen at the same time, but in different locations in memory; the GPU is actually reading and writing to VRAM at the same time. The part of the memory being currently displayed is called the frame buffer, and the part being filled up is called the back buffer.

On a switchable graphics system the iGPU does both jobs most of the time: it takes commands like "draw a red line from here to there" and sets the value of the pixels along that line to the color red while another part of the iGPU reads the data out to the video port. This second part is actually fairly slow and takes the same amount of time, no matter what data is in video memory. When you operate in high power mode, the dGPU fills the back buffer and the iGPU reads it.

So in a push-pull configuration like this, the dedicated video chip is not ever actually connected to the video port. Instead, it simply feeds data into the frame buffers that the Intel Graphics Processor uses to display the images. Because of that, there's really no way for you to connect one GPU to one monitor - there's literally no connection between the GPU and the monitor at all. All video goes through the iGPU.

From AccuRev to git

Serban Constantin — Thu, 05 May 2016 21:00:00 GMT

Version control systems are really important tools in the day to day life of a software developer. What happens though when you have to move both code and people off of one system to another? Let's just say that you'll need time, patience and the desire to write your own tools...

As you know, I'm a big fan of git, so when work decided to move our source control from AccuRev to git I was one of the first ones to jump up in excitement. Don't get me wrong, AccuRev isn't a bad tool per se, but it does have enough downsides which make it a pain to work with, especially if you're used to something more lean such as git.

AccuRev downsides

My main issues with AccuRev are mainly the following:

it's slow as hell, especially the GUI
command line options aren't as good compared to git
user hooks are a joke compared to git, which in turn leads to
harder to ensure coding style and local tests by running scripts before committing/pushing code
harder to integrate correctly with continuous integration systems
getting history is a pain in the ass: for example it's next to impossible from the output of accurev hist to figure out if a file was added or deleted
not at all conductive to agile development: branches (streams) can't be re-used if deleted, workspaces need to be manually moved to track another stream, which usually leads to "one workspace per stream syndrome"
not portable: workspaces are hardcoded to the system (they have the PC name in their metadata); this means that if you'll switch PC's you'll need to either manually trawl through your list of workspaces and update them one by one, or try and script it (which doesn't work if you're switching from Windows to Linux and vice-versa).
still on the workspace part, if you work on multiple PCs you'll end up having more than one workspace on a given stream, because workspaces can't be reused (seems like nothing in AccuRev can be reused).

Migrating developers

All of the issues from above cascade and make the developer behave in a certain way in order to accommodate the tool.

The best example to give in order to show the change of mentality needed to move from AccuRev to git is caused by issues number 7 in my list. Because it's hard to make a workspace track another stream (equivalent to a git branch) users started creating separate folders for each stream; this in turn allowed them to simply diff the folders in order to check the differences between the development and stable streams for example. When moving to git, the first question I had was what happens when they change the branch they're on and how to see the differences between branches.

Another bad behavior was caused by how slow AccuRev can be when it comes to branching: in order to reduce the time spent waiting around, most users pushed changes directly to the development (or worse, master) branch instead of having feature branches; this then made it hard to push just specific changes to the master branch when you wouldn't want to integrate all changes, or one of them needed to wait for another fix.

Most of these workflow differences got solved via trainings as well as due to git's popularity, which means that most questions are just a Google search away.

Migrating code

The next step in making the move was actually getting the code into git. Some teams decided that history wasn't important for them, so they'll just dump all of their existing code on the stable branch and then go from there. For us though that was unacceptable, so I decided to write something that would migrate our AccuRev history to git.

Before going into the code let's talk a bit about some AccuRev terms, and compare them to git.

AccuRev has the concept of depots, which should map to git repositories, but in our workplace we usually assigned them per team, so they ended up holding completely different components and systems. This way, depots are better mapped to projects in our specific case.

Next are the streams, which are the branches from git. Streams can optionally have a parent, which is the equivalent to git branching, or they can start from scratch, which is how we translated repositories.

Basically we would have the following AccuRev structure:

TEAM_DEPOT -> Wizard -> Wizard_stable -> Wizard_develop

TEAM_DEPOT -> Toolchain -> Toolchain_stable -> Toolchain_develop

In the TEAM_DEPOT we'd have two components, Wizard and Toolchain, which start from the parent (empty) stream of Wizard and Toolchain, which in turn has the child of Wizard/Toolchain_stable on which actual code resides.

If we are to map this to git, we'd have the following:

Wizard repository -> master branch -> develop branch

Toolchain repo -> master branch -> develop branch

Instead of having a single repository which contains both projects each of them is split into it's own repo, with it's own history, permissions and so on.

In order to migrate our code we had to do the following steps:

select an AccuRev stream to migrate (this was usually the stable stream for each particular project)
get the full history for that stream
for each historical event get the author, message and timestamp
for each historical event get the actual files and commit them to a git repo with the info obtained at the previous step
keep doing 3 and 4 until you're up to date

Optionally, because for a period of time both AccuRev and git would be available, with development being done in AccuRev, allow updating a git repository with the new history from AccuRev:

get the stream history
check the latest commit from git and map it to AccuRev history
start migrating from the newest transaction in AccuRev not available in git

The first issue we ran across was due to workspaces and how they're attached to streams: we needed to check if the location used was already associated with an existing workspace, as that would prevent us from creating the new workspace.

The second one was having to move the migration workspace once created in case the user had to perform multiple migrations on different streams in the depot.

Then we found out that AccuRev doesn't really sanitize the messages in any way, which can in turn lead to failures when trying to commit the changes into git.

Another strange case is that, while AccuRev insists that all streams start with the depot name, that match isn't case sensitive, so you could have a depot named Project and the stream could start with project, which in turn caused my script to fail.

But the worst thing is that you can't tell if a file is deleted or added from the AccuRev history file. This in turn lead to my first implementation being slow for large repositories, because I would have to get all the files for each history step (not just the changed ones), copy the files to the local git repository, commit them, then deleting all the files in the git repo before copying the next round of files from AccuRev in order to detect deleted files.

To fix this we decided to move to a stream and workspace implementation, where we would have a pass-through stream tied to the one we would migrate and a workspace tied to this stream that points locally to git repo folder. Pass-through streams are interesting, as they allow you to change the history element they're pointing at without modifying the original stream; what this means is that, by having an AccuRev workspace tied to it, we wouldn't have to get all the files each time, but just the changed ones by simply modifying the transaction at which the pass-through stream was pointing and then updating our workspace.

Get it while it's hot!

In case anyone has to go through this themselves I've made the code available on Github and GitLab, and each and every contribution is appreciated.

Unfortunately the tests can't be made available because you'd need a reference depot which isn't portable. You can see the Making private tests public article for more information about this.

github.com/fuzzmz/accurev-to-git

gitlab.com/fuzzmz/accurev-to-git

Dockerize all the things!?!

Serban Constantin — Tue, 12 Apr 2016 21:00:00 GMT

Back in my build engineer days (and then in a sort-of devops role) I kept hearing some colleagues rave and praise Docker as the end-all-be-all for, basically, everything. Is it really like that?

Following is just me throwing some thoughts against the wall, so take everything with a grain of salt. Also, some of the info might be a bit outdated, since the last time I seriously looked over at Docker for work was middle of 2015.

What is Docker?

An open platform for distributed applications for developers and sysadmins.

So you'll notice that unless you're a developer or a sysadmin, Docker will not be useful for you. If you're a developer or a sysadmin, but you don't write or manage distributed (clustered) applications, Docker will not be useful for you.

Docker is particularly useful for continuous integration workflows and automated applicaton scaling. If you don't currently do continuous integration, Docker will probably not be very useful for you. If your application does not consist of multiple components (often called "micro services") that can scale independently, Docker will probably not be very useful to you.

If you're an end user, using Docker is very much like using any other virtualization technology. You get a "vm" image and you run it in your "VM" hypervisor (e.g. VMWare Fusion or VirtualBox), or you get a "docker container" and you run it in your "docker daemon".

Here is a good list of the top 8 reasons to use Docker. Now, answer honestly, how man apply to your daily workflow?

Here is a good list of system you must have in place before you deploy Docker:

secured least-privilege access (key based logins, firewalls, fail2ban, etc)
restorable secure off-site database backups
automated system setup (using Ansible, Puppet, etc)
automated deploys
automated provisioning
monitoring of all critical services
and more (documentation, etc)

Do you think you're ready?

What about CoreOS?

Linux for Massive Server Deployments. CoreOS enables warehouse-scale computing on top of a minimal, modern operating system.

So you'll notice that unless you spend your day deploying massive "warehouse-scale" numbers of systems, CoreOS will not benefit you. CoreOS is particularly useful if you run lots of Linux virtual machines that are very similar. Where "lots" is probably thousands+. CoreOS works in combination with two cluster management frameworks called "etcd" and "fleet". If you don't already use a cluster management framework for your applications to handle things like "service discovery" and "task scheduling", these cluster management frameworks will not be useful for you.

Plus, CoreOS only supports applications that run in containers. If your distributed application is not comprised of containers, CoreOS won't be useful to you.

Docker strengths

If your app runs in Docker, and you are willing to pay to run it in a big provider's infrastructure, you get a lot of stuff for your money, like really well designed automated monitoring and metrics and role-based-access-control and networking/routing/load-balancing infrastructure. But much of the workflow around managing/launching containers will be specific to AWS/Google.

Similar older technologies

Java - run your program on any system without recompilation!
OSv - a stripped-down "vm" for running java apps on top of a hypervisor
OpenVZ / Virtuozzo / Solaris Zones / FreeBSD Jails - see Operating-system-level virtualization

How do I get started with Docker?

It really depends on how you currently provision and deploy systems/apps. Specifically, where do you store the configuration information for your systems and services? For example if you're using Puppet, you may want to try agentless puppet first. If you're using git, you may want to use more pre-/post-commit hooks. If you're not using Jenkins, you probably want to start using Jenkins first. If you're already in a heavy auto-scaling environment where SSH is impractical and you have to use zeromq or MCollective, then, you already know more about docker than me.

Burn it all

Serban Constantin — Thu, 31 Mar 2016 21:00:00 GMT

It is a known fact that some of the hardware we sell gets either a bit too hot, or a bit too loud, but I never until now thought that we're throwing out money in the (cooler) wind... at least until we received the following email from one of our field engineers, the people who are in direct contact with our customers and help diagnose their issues.

What follows is a sanitized email from one of them, published with consent and sanitized.

All,

I'd like to share a small story that currently ends with one experimental, somewhat incomplete,
uncalibrated, but possibly helpful line of cryptic U-Boot environment to keep the suspense going:

`silentfan=i2c mw 2f 0.1 80; i2c mw 2f 58.1 1; i2c mw 2f 0.1 82; i2c mw 2f 10.1 34; i2c mw 2f 11.1
34; i2c mw 2f 7.1 3f; i2c mw 2f 1.1 f; i2c mw 2f 2.1 f; i2c mw 2f 5.1 f; i2c mw 2f 6.1 f; i2c mw 2f 0.1 80`

The story starts with an BOARD1, involves early and late BOARD2, and goes back to
a BOARD3.

Once upon a time an engineer received a BOARD1 and got some very explicit comments in the cubicle office
environment for turning it on because of the noise it made. So he came up with a simple cardboard
shield for the flat front fan to ensure that all the air actually went through the SoC heatsink properly.
That way, he could disconnect two of the three fans in the back and add a resistor cable to the front fan
to slow it down. The BOARD1 was ok then.

Time passes and along comes the first BOARD2 which looked like the early BOARD1 inside the box. After
turning it on for the very first time, it got turned off VERY quickly because it was much worse. The same
cardboard shield and modification was added. It was bearable then.
When another engineer visited the office with an unmodified BOARD2 and used it in open space, he got
nearly threatened with violence.
Then new BOARD2s with a black airflow shield over the heatsink happened, which was a really great idea.
The fans used were not, so that one also got modified by replacing the back fans with quiet ones and
disconnecting the useless front fan completely. Again it was ok then.

Along the way the engineer discovered that a BOM mistake had been made for the BOARD2 that caused I2C
to be effectively partially non-functional without software patches to the SDK. Two resistors were
missing. Adding software patches to the SDK for U-Boot and Linux, all I2C magically worked, and, e.g,
temp sensors could suddenly be used. This was important because some customers wanted to know
much more about the thermal side of our high end chips and jet engine fans didn't position us well
against Intel. Also, over 100 of the BOARD2s were to be modified also to be quiet by buying many few
fans and resistors to address a specific market. The new part number for such systems is XXXX.
The engineer that started the idea didn't really like the extra effort and cost that someone would have to
spend to make this happen, so he thought about the topic some more, to see if there was a simpler way
to make the system quiet.

Along comes a new special customer opportunity asking for a BOARD1 in an application space where you can't
sell by going there with a noisy box. So a BOARD1 was found to be modified, but the engineer was still thinking
about inexpensive options to make BOARD2s office use compatible.
Wasn't there a device in there called W83793G in both the BOARD2 and BOARD1 schematics called "Winbond H/W
Monitor"? Wasn't that device capable of doing PWM fan control? Wasn't it also capable of measuring
temp and doing autonomous fan control properly? Couldn't that chip be configured to do the right thing
about fans and noise?

So he looked at the different boards available to him and found the following situation:

* BOARD1
    * The W83793G is sufficiently wired up in the MODEL1 to be used for autonomous fan
control. It controls both fan PWM lines and can measure chip temp if properly
configured. Default pin config settings are not quite ok, but good enough.
    * The MODEL1 BOARD2s don't have the PWM fans though. They only have fans with the tacho
signal, so there is nothing to control
    * The BOARD1s don't have the black shield that current BOARD2s have, so the back fans
can't be used to create suction through the main heat sink, eliminating all usefulness of
fan control. This can be remedied with a simple piece of U-shaped cardboard to fill the
gap between heat sink and back fans... which permits disconnecting the front fan
completely.
    * The old MODEL1 did have the four wire PWM fans, but such an old board can
only serve as spare parts box to strip.
* BOARD2
    * The BOARD2 comes with the back airflow shield that is needed for proper heat sink
temp control through the back fans. The front fan is totally pointless due to the  black
airflow shield, but we still connect it and pay for it
    * The fan PWM lines to the W83793G are wired up, so fan speed can be controlled
    * The W83793G is accessible with I2C SW patches applied to the SDK, despite not being
quite accessible as the schematics suggest. But XX schematics can be a bit strange
anyway at times as it seems …
    * It is shipped with four wire PWM fans that are not perfectly quiet, but quite bearable
when speed is reduced
    * … but the temp diodes are disconnected from the W83793G by choice of BOM, so
automatic control in HW is impossible. Linux could in theory do software based fan
control but our SDK doesn't.

So we have BOARD1 which would be perfectly capable of being quiet by doing automatic fan control if it
had the PWM fans and the black airflow shield … but doesn't. *sigh*

And we have BOARD2s which would be perfectly capable of being quiet by doing automatic fan control if
they had four resistors in a different place on the PCB … but don't. *sigh*

The moral of the story is: We could have saved money on fans and leave a much better impression at
customers by doing it right with minimal extra review and changes.

Yes, the W83793G is a pretty obsolete part these days and other pure HW means may be better now,
but the fact that I found now that we had only 95% of the solution in place leads me to the final
question.

Couldn't we have thought noise and airflow through before we issue a production manufacturing
order for such boards? We are actively paying for useless BOM that could have been VERY useful.

Coming back to the start: So what is the U-Boot environment line about? If you take PWM fans and put
them into a BOARD1, add the U-cardboard shape in the back to be like the BOARD2 black airflow shield,
disconnect the front fan completely, and then run the line … you end up with a reasonable system.

And now I probably should figure out if there is a reasonable way to do SW based fan control on the BOARD2
under Linux. Too late for the many BOARD2s to be modified … Oh well more money down the drain …

BR

Green URL bar

Serban Constantin — Tue, 08 Mar 2016 22:00:00 GMT

The blog is now green! Well, better said, parts of the URL bar are now green because this is being served to you by way of a magical HTTPS connection.

Despite this space being a collection of static web pages, and despite the fact that I'm not using any tracking or advertising, I still think it's a good idea to serve content over a secure connection, partly because this way the content can't be tampered with or ads injected into it by ISPs or other 3rd parties (in most cases).

Thanks to the great people at Kloudsec who auto-provision and auto-renew LetsEncrypt certs for Github Pages with custom domains. You can find out more about them here.

Making private tests public

Serban Constantin — Tue, 08 Mar 2016 22:00:00 GMT

Say you've been developing a tool for your company or your own use, and want to make the source public. As any good coder, you've also written some tests, but guess what, those tests can't be easily made available.

Or better said, the tests rely on non-free tools and a specific configuration for them to boot.
— Serban Constantin (@fuzzmz) March 9, 2016

To give some more context, the tests are specific to a tool I wrote for migrating history from AccuRev to git. The problem here is that AccuRev is a tool which isn't widely available, and to make things even more fun, we're talking about migrating history, so we need to have a known good state in the tests in order to check the migration was done successfully.

Bellow is a simple test which will fail if the user doesn't have AccuRev installed on his machine:

def test_accurev_login_error_handling():
    assert not accurev_login('badtest', 'badtest')

That's a simple one, but we can get into even more complicated ones, like comparing the history obtained by the migration script to a known good configuration:

def test_get_history(set_fixture):
    historyfile = set_fixture[2]
    modhist = set_fixture[3]
    acchist = get_history('Test_gitmigrate_testhist')

    with open(acchist, 'r') as fin:
        content = fin.read().splitlines(True)
    with open(modhist, 'w') as fout:
        fout.writelines(content[4:])
    assert filecmp.cmp(modhist, historyfile)

The problem here is that even if the developer has AccuRev installed on his system, there's no way for me to provide him with a valid stream history which will allow the tests to pass because, well, it's history, which contains information like the user who commited the code, when it was commited, what the message was and so on.

I'm wondering what's the proper way to make the tests for such a project available.

Should I make them public and just note that they'll fail?
Write a guide on how to modify the tests in order for them to work? But then, doesn't this pose the risk of sabotaging your own tests, so that they'll always pass no mater what?
Simply mark all AccuRev specific tests are failable and tell possible contributors to enable them if they have AccuRev? But then I'll have to be careful during pull requests to not merge any of the changes done to make the tests work for their specific environment.
Not publish any tests at all? Or just leave in the very generic python ones?

I know that this is mitigated to some level by the fact that if someone is interested in this migration script they're most likely going to have all the software installed, but it's still a bit of a headache. Not to mention not being able to have automated testing via Travis-CI once the code is made public, which means even more fun with running manual tests in case of a pull request.

So, how would you handle this? Hit me up on Twitter (just reply to the tweets embedded above) or leave me your thoughts on the bellow Google+ post.

Google Photos Shared Albums annoyances

Serban Constantin — Thu, 10 Dec 2015 22:00:00 GMT

Google recently launched an update to its Photos service which allows users to give collaborative access to albums to other people. This way your family and friends can all come together and upload photos to the same album to preserve them for posterity (or however long Google decides to run the service - RIP Google Reader).

This feature is a nice starting point for Google Photos going social, but it needs a bit more feature for it to be truly awesome (see what you did, I'm almost agreeing with The Verge here!).

After playing with the new feature for a bit I've found the following:

Controls appear to be all-or-nothing. I can control whether an album allows people to add or not, but no way to give people different links with different controls to the same album
There's no way to remove a person from a shared album
A contributor can add/remove their own photos, but cannot remove other people's
The owner can remove other people's photos that were added to the album
If a contributor joins an album, adds photos, then leaves the album, all of their photos are removed with them
The owner (and presumably anyone else) has the option to add any of the photos from a shared album to their library
Each picture gets labeled with the person who added its name
There's no notification that someone left an album
If you turn "collaborate" off on a previously shared album, it removes all other people that have joined & their photos

I was hoping that turning collaborate off might "freeze" the album in its current state, and still allow people to get notifications if I added photos, but it looks like this is not the case.

One other feature that I'd personally really like to see would be the ability to tap someone's picture (which is present in the list of who's in the album at the top) and see all the photos they added.

Oh kernel, my kernel

Serban Constantin — Wed, 09 Dec 2015 22:00:00 GMT

So you're a geek and get your hands on the newest Android goodness (say, a Nexus 6P or 5X), install a terminal emulator - you're a geek, remember - and then run uname -r only to be surprised at running a kernel that's a couple of years old. Why is that?

Upstream fast and large, Android slow

One of the most cited reasons for this is that Android kernels still carry a lot of custom patches which are not available in the upstream kernel.

Say that an upstream developer introduces an API change; the first step he should do after this is to check that all existing code compiles and works without regressions. However, he's not obligated to take custom kernel modules (like grsecurity, rt or Android) into account, nor is he responsible for third party drivers.

As such, all of the workload of bringing in upstream changes to the Android kernel lays solely on Google developers, and considering the fast pace of kernel development this task is time consuming.

Except that's not it

The problem is that all of the above simply isn't that accurate. For example, for an Android device running Linux 3.18, there are only 700 patches to Android-ify your kernel (adding modules like cpufreq_interactive, binder, SElinux fixes, etc.). You can see the list of kernel branches supported here to get an idea on how much, or little, they differ from upstream.

One of the reasons might be that SoC vendor brings up the kernel on a new chip (so Qualcomm for example ensures that everything is working on the SD808/810). The vendor invests and a lot of time and money in making sure everything is stable. The issue here is that the kernel changes a lot of hands: upstream (Linus) provides the base kernel, Google provides the core Android patches on top of a specific version (3.10, 3.14, 3.18, 4.1), the SoC vendor brings up their chip with their own set of patches, and then the actual device OEM will apply the final changes on top to support a specific device.

This means that SoC vendors don't really have an incentive to bring new kernels to an existing chip. Also note that the development time from when a SoC starts getting worked on and when it's released is pretty long, so your kernel might have actually been not that old when it was selected. It's much easier from a SoC vendor perspective to deal with a new kernel when you're working on bringup for a new SoC. Likewise, even if the SoC vendor upgrades a kernel, the OEM would have to pick that up, rebase their device-specific patches on top o it, then go through all the testing and bug fixing that entails.

Realistically that won't happen though, as the SoC vendor is already doing bringup on the next generation of chips, and the OEM is working on their next devices.

Does it matter though?

Well, it doesn't really matter if you're running 3.10 when upstream is at 4.4. If you're on a desktop and there are bugfixes or new drivers that improve some piece of hardware in your system, great, a new kernel might improve your life significantly. For your phone though, the kernel has been developed specifically for your device because the hardware isn't interchangeable.

If there are better drivers or bugfixes from 4.4, they've probably been backported to your device on 3.10 - and yes, those will come in with your standard OTAs. You don't need a full kernel version bump to take advantage of those.

In fact you don't want a full kernel version bump because all of your device and SoC specific code can break in mysterious ways. Watch as some mmc refactoring upstream randomly breaks your wifi driver! Missed one armv8 patch? Boom, random SIMD register corruption that causes programs to randomly segfault. People are especially sensitive to kernel stability on their phones, so the kernels need to be rock solid. Backporting patches to a kernel the SoC vendor has spent a lot of time on making stable is always the safer route.

But what about PCs?

The only reason that this is agnostic to Android devices, and not say, desktop PC's, is because of how drivers/modules are implemented on Android.

Because there is no standardized bootloader on Android, the kernel cannot go about detecting hardware on the platform, and drivers need to be baked into the kernel itself (and thus, loses modularity). This is why it's such an intensive process to upgrade the kernel, as drivers are statically written in, rather than dynamically being linked.

In short, this would be much less of a problem if there were a standardized bootloader, and easy hardware detection in Android Kernel. Once that happens, dynamic linking of modules could be implemented, and much less customization would need to be done to the kernel itself.

In theory, implementing dynamic linking would also be the gateway for Google to provide universal updates to Android devices. It is because of these deep, inflexible changes that Google cannot even begin to push universal updates.

If you've reached this point and still want more, then you can read Running a mainline kernel on a cellphone by Jonathan Corbet on the struggles and fun you'd have to face in having the latest and greatest.

Extending Equinox p2 for (fun and) profit

Serban Constantin — Wed, 24 Jun 2015 21:00:00 GMT

I've had the honor of being selected as an Ignite speaker at EclipseCon France 2015 to talk about how we are extending Equinox p2 for use in our CodeWarrior offering.

The slides

The video

The abstract

The goal of this talk is to share our experience in making things happen between clicking on Help - Install New Software and Finish and how the features we developed evolved from one another.

The talk will present an overview of the following custom Equinox p2 actions, as well as how they integrate into shipping both an Eclipse based IDE and enabling future updates.

The following custom actions are treated:

1. Freescale Install - enabling placing the binary contents of extensions outside of the Eclipse home folder and ensuring that even files that are write-locked get updated
2. Freescale Copy - a faster Freescale Install by disabling the write-lock check and acknowledging the caveats that come with this
3. Freescale Remove - at uninstall make sure that all binary files added using Freescale Install or Copy are correctly removed
4. Freescale Execute - allowing execution of files during the install phase, on both Linux and Windows
5. Freescale Merge - XML files are fun and useful, and at times they need to be updated.
6. Freescale Process Check - conditional installation denial based on running processes
7. Freescale Common Deploy - installing features and plugins outside of the Eclipse location

Travis-CI article publishing

Serban Constantin — Tue, 24 Mar 2015 22:00:00 GMT

Geeking out after switching back the blog to Pelican and hosting it on GitHub Pages, I decided to set up automated publishing of the HTML content from the markdown articles without manually running Pelican on my own PC. So what better way to do this than plug everything into Travis CI, because continuous integration doesn't automatically imply code.

Configuring the job

I'm not going to go through the whole set-up of having Pelican up and running, nor about how to configure GitHub pages in order to publish your blog, though those are some nice ideas for future articles :fa-lightbulb-o:.

The publishing flow would be as follows:

Create a separate article/article-name branch on which to work on the article
Submit a pull request to the source branch once the article is finished
On pull request merge run a Travis CI job to get the latest changes, publish using Pelican and push the output back to the master branch in order to make the changes public.

I am fully aware that this sounds so much more complicated compared to publishing an article using Wordpress for example, but it fits well into my mostly command line world, love and general geekiness.

The actual implementation can be split in two:

Getting Travis CI to notice that a merge was made
Telling Travis CI what to do once 1 happens

1. Travis integration

Actually setting up Travis CI is pretty easy and their docs are amazing, so in short you flip a setting in your GitHub repository options and add a .travis.yml file which basically configures how a Travis CI job works.

My .travis.yml file is as follows:

language: python
python:
  - '2.7'
branches:
  only:
  - source
before_install:
  - "export TRAVIS_COMMIT_MSG=\"$(git log --format=%B -n 1 | head -c23)\""
install:
  - pip install -r requirements.txt
  - pip install git+https://github.com/fuzzmz/[email protected]
script:
  - git config --global user.name "Serban Constantin"
  - git config --global user.email [email protected]
  - make publish
  - make github
env:
  global:
  - secure: ASDASDADS

This file tells us that the scripts will run on Python 2.7, and to only run when changes happen to the source branch.

The next step is to tell Travis that before installing our required packages he should export the git commit message which prompted the build to the TRAVIS_COMMIT_MSG variable, which will then be used when pushing the HTML files to the master branch.

After that we go ahead and install all the packages required to publish the blog (Pelican, Markdown, etc), as well as a custom fork of one of the packages and configure git so that the correct author is displayed in the repository master branch after pushing the published content to it.

We then go ahead and actually publish the content via make publish and push it to GitHub using make github, functions which will get explained a bit further down.

One more thing that needs done is to have a way to let Travis push the changes back to the repo. This is done by generating a token on GitHub and encoding it using the travis ruby application so that others can't use said token to push to my repository.

The last thing that needs to be done is to disable pull request builds in Travis CI to prevent the blog being updated by a pull request.

2. Travis commands

The actual steps to publish the blog and push it back to GitHub are detailed in the Makefile and called via make publish and make github.

PELICAN=pelican

BASEDIR=$(CURDIR)
INPUTDIR=$(BASEDIR)/content
OUTPUTDIR=$(BASEDIR)/output
CONFFILE=$(BASEDIR)/pelicanconf.py

clean:
    [ ! -d $(OUTPUTDIR) ] || rm -rf $(OUTPUTDIR)

publish:
    $(PELICAN) $(INPUTDIR) --debug --output $(OUTPUTDIR) --settings $(CONFFILE)

github:
    ghp-import -n -b master -m "${TRAVIS_COMMIT_MSG}" $(OUTPUTDIR)
    @git push -fq https://${GH_TOKEN}@github.com/$(TRAVIS_REPO_SLUG).git master

In the Makefile we define the current directory, the directory which holds the articles in markdown, the output directory which will store the generated HTML files as well as the path to the Pelican configuration file.

To publish the blog we simply run Pelican.

To push the changes to the repo so they become live we use ghp-import to simplify the process of committing the files and setting the correct branch (master in this case) and commit message, and then do a git push.

And that's it, now you just have to wait for the Travis CI build to take place, after which your content will be automatically made available.

Fuzzmz | ramblings on tech, life and randomness

Covid-19 vaccines and Romania

A short intro

An issue slowly creeps

The announcement for the masses

Communication issues

The timeline

The code

Jenkins as a back-end

The slides

The video

The abstract

Scientific code quality

The coding culture

Scientific upbringing

Of COM, memory and restarts

A bit of history

Of memory and processor instructions

SSL security and trust

The web of trust

TooBigToFail™

What next?

A short on switchable graphics

Per-display graphics card?

Sharing is caring

From AccuRev to git

AccuRev downsides

Migrating developers

Migrating code

Get it while it's hot!

Dockerize all the things!?!

What is Docker?

What about CoreOS?

Docker strengths

Similar older technologies

How do I get started with Docker?

Burn it all

Green URL bar

Making private tests public

Google Photos Shared Albums annoyances

Oh kernel, my kernel

Upstream fast and large, Android slow

Except that's not it

Does it matter though?

But what about PCs?

Read more

Extending Equinox p2 for (fun and) profit

The slides

The video

The abstract

Travis-CI article publishing

Configuring the job

1. Travis integration

2. Travis commands