In my endeavours to try to make AI more safe and understandable, I make use of neural networks which I call “Lipschitz Networks” (even though this term is not consistently used in literature).
Lipschitz networks constrain the gradient p-norm $|\nabla_x f(x)|_p$ of your network with respect to the inputs to a maximum of your choice, lets say 1. In practice, there are multiple ways to do this. The requirements for a suitable implementation are as follows:
One very safe or deterministic way of implementing requirement 1 is constraining the jacobian operator norm of each layer with respect to the input. In fully connected networks, the weight matrices coincide with the jacobian, so it is convenient to constrain these directly, layerwise: $|W^i|_p \leq 1 \ \forall i$
However, a layerwise constraints can overdo the trick. Since the Lipschitz constant of a (fully connected) neural network is determined by the product of the jacobians and the lipschitz constants of the activations, a layerwise constraint easily accumulates into something that is much smaller than 1 and cannot recover the “full gradient”. (Anil et al., 2019) refer to this as gradient norm attenuation.
The specific problem is the fact that the usual activation functions, while being Lipschitz-1, cannot maintain a maximum allowed gradient everywhere. For instance, if one neuron has a preactivation of $< 0$, ReLU will result in a gradient of 0 there and a possible $| \nabla_x f(x) | = 1$ is unachievable. (Anil et al., 2019) show this very nicely by trying to fit a layerwise constrained network with ReLU activation to the absolute value function. Spoiler: It does not work.
So they went ahead and derived a new activation function: GroupSort. It sorts
the preactivations within n subgroups of the input. Example: GroupSort(1) is the
full sort operation, GroupSort(d/2) is the MaxMin operation.
Since it is merely a permutation, it maintains gradient 1 everywhere, while being
a sufficient nonlinearity to serve as activation. Together with a specific constraint,
they are able to prove universal approximation of GroupSort Lipschitz networks!
The weight norm constraint to achieve p-normed Lipschitzness is:
Ok, so let us train a Lipschitz network for some binary classification task! For training data, let’s use the two-moons dataset. Using BCE as loss and Adam as optimizer, we can train a Lipschitz network with a Lipschitz constant of 1. We immediately see that the network is unable to achieve a good classification performance.

The reason for this is not that the gradient is too constrained. In fact, we should be able to achieve perfect classification performance with any Lipschitz constant > 0, because the decision frontier is defined only by the sign of the output (or the sign of output - 0.5 if it’s in [0,1]), and the sign is scale invariant. So why does this not work?
Recall how BCE works: It tries to maximize the margins, i.e. get the output for
class 0 as close as possible to 0 and the output of class 1 as close as possible
to 1. With a sigmoid as output activation, this means unbounded increase for
the preactivations to minimize BCE.
In unconstrained networks, that is fine and actually desirable. In a Lipschitz
network, it is unadvisable to concentrate on margin maximization because, when
the gradient is bounded, the objective may clash with the actual goal of
classification: Maximizing accuracy. A loss function that cares about margin
maximization up until a certain point is much better suited here: Hinge loss!

Hinge loss does not assign a penalty to training data outside of a margin of specified size, so the Lipschitz network can concentrate its efforts on optimizing the decision frontier.
More on this when I find more time.
Y) on a remote server (called X) and want to use VSCode on my local machine to connect to
Y.
Now, Vagrant in its default setting hosts a VM on localhost:2222 which you can then
ssh to via $ vagrant ssh. With $ vagrant ssh-config you will get the ssh config used to
connect to it. It looks something like this, which works well from the remote machine:
Host default
HostName 127.0.0.1
User vagrant
Port 2222
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
PasswordAuthentication no
IdentityFile /path/to/some/generated/private_key
IdentitiesOnly yes
LogLevel FATAL
By pasting it into your ~/.ssh/config you can then use $ ssh default to connect to it.
However, this only works on the remote machine where you set the VM up.
In order to connect to it from the outside, you need to somehow connect to the remote machine X at port
2222, which is likely closed to the outside.
The solution to this problem is a ProxyJump. ProxyJumps connect you to a remote machine via an intermediate “gateway” machine. The only unusual thing here is that the gateway machine is the same as the
target, only a different port.
So I tried something like this. Notice that I copied the private key to my local machine.
# On local machine
Host X
HostName X.com
User nnolte
...
Host default
HostName X.com
User vagrant
Port 2222
ProxyJump X
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
IdentityFile path/to/copied/private_key
This did not work, gave me a connection refused, similar to when I tried directly connecting to port 2222.
No expert, but probably because this still counts as an external connection.
Funnily enough, swapping HostName X.com to HostName localhost turns out to be the solution:
# On local machine
Host X
HostName X.com
User nnolte
...
Host default
HostName localhost # NOT X.com
User vagrant
Port 2222
ProxyJump X
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
IdentityFile path/to/copied/private_key
✨✨✨✨
Cool, this worked. I am no expert on SSH config, but I did not expect localhost to be interpreted “relative” to the ProxyJump.
With this setup, you can connect to default from your local machine and use VSCode with the Remote extension as usual.
Well, function composition is something that seems achievable with Python:
def f(x):
return x - 2
def g(x):
return 2 * x
x = 7
# now I want (g . f)
g(f(x)) # applied g on the output of f
What if i want to only compose, without immediate application? I just want the function that can be represented by h = (g . f)
x = list(range(10))
h = lambda x : g(f(x))
map(h, x)
Ok, that works fine. What if i wanted to have the incredibly convenient syntax as haskell has it?
What we need to do then is to override a binary operator to compose. How about __mul__?
The problem: Builtins cannot be extended
def compose(g,f):
return lambda x : g(f(x))
type(f).__mul__ = compose
# TypeError: can't set attributes of built-in/extension type 'function'
Fortunately for us, clarete hacked around in the C-python bindings to make builtin extensions possible directly from python: forbiddenfruit. Whether or not that is a good idea, who knows?
from forbiddenfruit import curse
def f(x):
return x - 2
def g(x):
return 2 * x
curse(type(f), '__mul__', compose)
x = list(range(10))
list(map(g*f, x))
# [-4, -2, 0, 2, 4, 6, 8, 10, 12, 14]
One can also compose more than functions, h*g*f, or adjust the compose function to take *args or **kwargs.
What can you do with extended builtins?
]]>We have a tuple of types
#include <tuple>
template<typename ... Ts>
using t=std::tuple<Ts...>;
struct a{};
struct b{};
struct c{};
using my_tuple = t<a,b,c>;
and we would like to get all possible type combinations of length n, taken from this tuple.
That corresponds the nth power of cartesian products of the tuple.
So, my result should look like this:
combinations<my_tuple, 2> // returns t<t<t<a,a>, t<a,b>, t<a,c>>,
// t<t<b,a>, t<b,b>, t<b,c>>,
// t<t<c,a>, t<c,b>, t<c,c>>>
I like to prototype algorithms in python first and then translate.. Less fiddling with details
One possible solution to do combinatorics looks like that:
def combinations(arr, n, res=[]):
if n == 0:
return res
return [combinations(arr, n-1, res+[i]) for i in arr]
It will recursively call combinations, “keeping track” of the current indices by appending them to the result and then returning when we have reached the desired dimension.
There is a neat trick for checking which type you are currently fiddling with.
Declare some type that holds your type of interest and do not define it,
then gcc and clang give you a nice error if you try to instantiate one of these bad boys,
displaying your type nicely:
template<typename ... Ts>
struct type_printer;
int main () {
type_printer<my_tuple>{};
}
in gcc-9.1 gives
<source>: In function 'int main()':
<source>:47:37: error: invalid use of incomplete type 'struct type_printer<std::tuple<a, b, c> >'
47 | type_printer<std::tuple<a,b,c>>{};
| ^
<source>:5:8: note: declaration of 'struct type_printer<std::tuple<a, b, c> >'
5 | struct type_printer;
| ^~~~~~~~~~~~
Recursing works fairly straight forward in the C++ type system.
You can see that in many parts of the STL and everywhere on StackOverflow.
Remember, we need something that refers to itself and some stopping condition.
A small example of recursion is something along the lines of std::make_index_sequence:
template<std::size_t ... Is>
struct index_sequence{};
//result... carries the ascending pack of integers
template<std::size_t n, std::size_t ... result>
struct make_index_sequence {
//every time we iterate, we append n-1 to the result.
using type = typename make_index_sequence<n-1, n-1, result...>::type;
};
//stopping condition: we will not continue if we reached 0
template<std::size_t ... result>
struct make_index_sequence<0, result...> {
using type = index_sequence<result...>;
};
To concatenate and append to tuples types, we use these little helpers, making use of std::tuple_cat to determine the type:
template <typename... tups>
using tuple_cat_t = decltype(std::tuple_cat(std::declval<tups>()...));
template <typename tup, typename item>
using append = tuple_cat_t<tup, std::tuple<item>>;
We will also need to “iterate over tuples”, which is normally done via index sequences, therefore we define an index sequence with the length of a tuple:
template <typename tup>
using index_sequence_for_tuple =
std::make_index_sequence<std::tuple_size_v<tup>>;
Now, we need a helper to execute one operation on each entry of a tuple and “return” a transformed tuple, very similar to boost::hana::transform
template <typename tup,
template <typename> typename op,
std::size_t... Is>
auto operate_t_impl(std::index_sequence<Is...>)
-> std::tuple<op<std::tuple_element_t<Is, tup>>...>;
template <typename tup, template <typename> typename op>
using operate_t = decltype(
operate_t_impl<tup, op>(std::declval<index_sequence_for_tuple<tup>>()));
The usual way get a parameter pack of the types from a tuple is
std::tuple_element_t<Is>(my_tup)...
Is is a parameter pack of the indices you want to gather the types from, so in our case all of them 0,1,2,3,4....
That is the reason for the existence of the helper index_sequence_for_tuple.
Since the index_sequence is no parameter pack as we need it for the tuple iteration,
we use a common trick involving function template argument deduction in operate_t_impl.
To get the std::size_t ... Is from our index_sequence, we give (a std::declval of) the sequence as function argument
and let the argument deduction deduce std::size_t ... Is for us.
Ok, so now we can invoke “unary operations” (type-transformations) with a signature template <typename> typename op on all elements of the tuple,
and “return” a result tuple. So to say, we just implemented poor mans hana::transform.
Now we can perform elementwise transformations on a tuple and recurse, lets bring it together to perform our task:
template <typename tup, std::size_t n, typename result = std::tuple<>>
struct combinations {
//this "operation" is conceptually similar to a unary lambda given in std::transform
//its python equivalent: combinations(arr, n-1, res+[i])
template <typename item>
using operation = typename ::combs<tup, n - 1, append<result, item>>::type;
//this "loops" over the tuple, each time invoking operation, which takes care of the recursion
//its python equivalent: [operation for i in arr]
using type = operate_t<tup, operation>;
};
and the partial template specialization corresponding to the stopping condition:
//its python equivalent:
// if n == 0:
// return res
template <typename tup, typename result>
struct combinations<tup, 0, result> {
using type = result;
};
Thats it, much less code that i would have expected when starting this exercise.. :D See the example on godbolt.
]]>