Looking for the right archive format with deduplication

I’m often copying my files to an external disk as a backup option. So far, nothing fancy.

However, I end up with many copies of more or less the same folders over time. And that takes a lot of space for nothing.

Incremental backups, that only add new files to an archive, might look as a solution for that, but over time the paths changes, the structure changes. So the deduplication would break.

One solution could be to format my disk with a system such as ZFS or BTRFS which have a deduplication built-in. However I can’t use my disk for quick operations/archiving on Windows, nor on Mac (easily at least).

So I want an archive format with:

  • Deduplication
  • Some compression (lot of source code files)

One might think, and I often came across that statement while searching on the web for a solution, that one shouldn’t worry about that as compression algorithm inheritely support deduplication. That is actually wrong, they’re not made to merge very large patterns. The following experiment shows it with file of random data (incompressible) :

-rwxrwxrwx 1 tbarbette tbarbette 10M Apr 17 21:09 binA
-rwxrwxrwx 1 tbarbette tbarbette 10M Apr 17 21:10 binACOPY
-rwxrwxrwx 1 tbarbette tbarbette 10M Apr 17 21:09 binB
-rwxrwxrwx 1 tbarbette tbarbette 256M Apr 17 21:19 binBIG
-rwxrwxrwx 1 tbarbette tbarbette 256M Apr 17 21:19 binBIGCOPY
-rwxrwxrwx 1 tbarbette tbarbette 10M Apr 17 21:09 binC

Here’s the size for a few formats :

SizeTimeComma,d
tar.gz553M19.48star -cpazf [archive] [file]
tar.xz553M303star -cpaJf [archive] [file]
zpaq287M18.7szpaq add [archive] [file]
wim287M4.4s7zz a [archive].wim [file]
.7z543M37s7zz a [archive].7z [file]
.7z (512M dictionnary)287M140s7zz a [archive].7z -m0=LZMA2:d512m [file]
.7z (1513M dictionnary)287M114s7zz a [archive].7z -m0=LZMA2:d1513m [file]

zpaq is the clear winner considering it can also compress at the same time, however it’s not very widespread, and has no good GUI available. I’m wondering about recovery in case of problems.

wim has no compression, so it will need to be encapsulated in something else. The problem is then to add some files. I have to uncompress the inner wim format first. The idea being to save the same computer again and again, one of those archives is 200G, so it means adding a single file would take a huge time. While zpaq can add one quite fast.

7zip with a dictionary large enough to hold big duplicates would seem to be a good compromise.

Any input?

MPTCP on Windows with WSL2

Limitations

It is possible to use MPTCP, but WSL2 uses a virtual interface that prevents advertising multiple paths. There might be a solution using multiple forwarded ports but I haven’t been able to use it yet.

Prerequisite

Install Ubuntu in WSL2 (simply look for Ubuntu in the Microsoft Store)

Optional: Allow Windows to keep both Wifi and Ethernet open

Windows will automatically turn off wifi when Ethernet is plugged in. If you want to try MPTCP over Wifi + Ethernet (or 4G through USB, all the same) you must disable this behavior :

1. Open Registry Editor.

2. Go to HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\Windows\WcmSvc\Local.

3. Create/change the fMinimizeConnections registry DWORD to 0.

4. Close Registry Editor and reboot.

Step 1 : Install an MPTCP-compatible Kernel (easier than it sounds!)

sudo apt install build-essential flex bison libssl-dev libelf-dev pahole
git clone https://github.com/microsoft/WSL2-Linux-Kernel.git
cd WSL2-Linux-Kernel
cp Microsoft/config-wsl .config

Edit .config and change “#CONFIG_MPTCP is not set” by CONFIG_MPTCP=y

make -j4
cp arch/x86/boot/vmlinux.bin /mnt/c/vmlinux

Then shut down WSL in a CMD window:

wsl --shutdown

And to boot in your new kernel add a file in C:\Users\$USER\.wslconfig

[wsl2]
kernel=C:\vmlinux

Step 2 : Install mptcpd

This is to get the “mptcpize” command to run a legacy TCP application with mptcp

sudo apt install mptcpd

Step 3 : Try it out !

sudo apt install iperf
sudo tcpdump -i  lo -w capture.pcap
mptcpize run iperf -s
mptcpize run iperf -c 127.0.0.1 -b 1k -l 1

Then open capture.pcap with wireshark and you should see MPTCP instead of TCP 🙂

Step 3 : SSH and failover

[todo!]

VOO in bridge mode with IPv6 (optional: and prefix delegation!)

Despite old threads that can be seen on VOO’s forum, VOO do not seem to use SLAAC in bridge mode (anymore?), but DHCPv6. Also VOO only gives a /64 prefix so you can’t do internal subnets 🙁

Important: my outgoing (WAN) interface directly connected to the VOO modem in bridge mode is enx000ec6ec03b3 . My internal LAN interface is br0 (it’s a bridge between my actual eth0 LAN interface and a WiFi access point using hostapd, but that’s for another day).

This tutorial assumes Ubuntu 18.04:

sudo apt install wide-dhcpv6-client

sudo vi /etc/wide-dhcpv6/dhcp6c.conf

interface enx000ec6ec03b3 {
  send ia-na 1;
  send ia-pd 1;
  request domain-name-servers;
  request domain-name;
  script "/etc/wide-dhcpv6/dhcp6c-script";
};

# Only for prefix delegation
id-assoc pd 1 {
  prefix-interface br0 { #internal facing interface (LAN)
    sla-id 0; # subnet. Combined with ia-pd to configure the subnet for this interface.
    ifid 1; #IP address "postfix". if not set it will use EUI-64 address of the interface. Combined with SLA-ID'd prefix to create full IP address of interface.
    sla-len 0; # Number of prefix bits assigned. Sadly this is 0 with voo... 
    };
  };

  id-assoc na 1 {
  # id-assoc for eth1
};

sudo vi /etc/default/wide-dhcpv6-client

INTERFACES="enx000ec6ec03b3"

sudo service wide-dhcpv6-client restart

At this point you should get an IPv6 address:

enx000ec6ec03b3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 109.89.XXX  netmask 255.255.255.0  broadcast 109.89.XXXX
        inet6 2a02:2788:XXXXXXXXX:8458  prefixlen 128  scopeid 0x0<global>
        inet6 fe80::20e:c6ff:feec:3b3  prefixlen 64  scopeid 0x20<link>
        ether 00:0e:c6:ec:03:b3  txqueuelen 1000  (Ethernet)
        RX packets 1358557038  bytes 1701875645905 (1.7 TB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 648168501  bytes 176987273193 (176.9 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Enable prefix delegation

Actually enable the prefix delegation with radvd:

sudo apt-get install radvd

sudo vi /etc/radvd.conf

interface br0 # LAN interface
{
  AdvManagedFlag off; # no DHCPv6 server here.
  AdvOtherConfigFlag off; # not even for options.
  AdvSendAdvert on;
  AdvDefaultPreference high;
  AdvLinkMTU 1280;
  prefix ::/64 #pick one non-link-local prefix assigned to the interface and start advertising it
  {
    AdvOnLink on;
    AdvAutonomous on;
  };
};

sudo service radvd restart

Some configuration is taken and adapted from https://www.ipcalypse.ca/?p=204

GitLab with Docker : Fixing “Error: PG::ConnectionBad” or “DETAIL: The data directory was initialized by PostgreSQL version 9.6, which is not compatible with this version 11.7.”

An issue I had where I could not find any fix online. The closest help I could find was this blog post : https://gotanbl.com/foss/how-update-gitlab-in-docker/, but there are multiple mistakes in the article, and the author is not reachable to fix them, so I’ll put the main trivia here…

The root of the issue (in my case) is that updating GitLab (with docker at least) is quite cumbersome. If you run it almost every week, then you can always upgrade to “latest”. If you do it from time to time, you have to upgrade from minor to latest minor then to first major, then to latest minor, then first major etc… And there is no way to do that automatically. So sometimes, running the “usual command” won’t work.

The problem which leads to this error is that the Postgresql database version will only be updated on some versions. If you skip the right one, then you’ll never update and all subsequent updates will break…

Worst, you’ll end up being told to run some commands to do this and that… However the problem is that the docker container will die as it fails to start. So you won’t be able to enter those commands.

The solution

First, note your current version:

sudo docket exec -it gitlab bash
cat /opt/gitlab/version-manifest.txt |grep gitlab-ce|awk '{print $2}'

Then, stop and remove the container. It’s safe, as the real files, db, etc are kept in the $GITLAB_HOME:

sudo docker exec -t gitlab gitlab-backup create
sudo docker stop gitlab
sudo docker rm gitlab

Basically, you’ll have to follow a specific upgrade path that can be found at : https://docs.gitlab.com/ce/update/#upgrade-paths

At the time of writing, this is the path:

8.11.x -> 8.12.0 -> 8.17.7 -> 9.5.10 -> 10.8.7 -> 11.11.8 -> 12.0.12 -> 12.1.17 -> 12.10.14 -> 13.0.14 -> 13.1.11 - > 13.x (latest)

So if your version is 11.10, you’ll have to upgrade at 11.11.8, and continue up to the latest.

To update to a version, do the following.

Verify the command to run the container matches what you used to install GitLab in the first place, you should re-use exactly the same command, only the last $VERSION should change:

export GITLAB_HOME=/srv/gitlab
sudo docker run --detach --hostname gitlab.tombarbette.be --env GITLAB_OMNIBUS_CONFIG="external_url 'https://gitlab.tombarbette.be/'; gitlab_rails['gitlab_shell_ssh_port'] = 2022; " --publish 2443:443 --publish 2080:80 --publish 2022:22 --name gitlab --restart always --volume $GITLAB_HOME/config:/etc/gitlab --volume $GITLAB_HOME/logs:/var/log/gitlab --volume $GITLAB_HOME/data:/var/opt/gitlab gitlab/gitlab-ce:$VERSION

The $VERSION should be the version with “-ce.0”, so for instance 11.11.8-ce.0, this is the docker container version that can be found at https://hub.docker.com/r/gitlab/gitlab-ce/tags?page=1&ordering=last_updated

Normally, when you launch that command the update for postgresql should be done automatically. If somehow you start hearing complaints, you can force the upgrade to the version XXX with:

gitlab-ctl pg-upgrade -v XXX

Where XXX is the database version.

After launching a specific version, you have to wait for GitLab to completely start, to be sure all migration was terminated.

If in troubles, you might want to check the logs with :

sudo docker logs -f gitlab

Typically the logs in $GITLAB_HOME starts to be meaningful when this problem is fixed and gitlab completely started, so it was not helpful for me.

So at this point, go back to the version list above and advance one by one…

It may seem crazy, but now to avoid that you have no choices than updating every weeks… Or you’ll have to play with versions again…

Now, I run a cron script that will save and update GitLab every week. Anyway, GitLab is an horror memory-wise and slows down with time. So removing and re-adding the container every week is actually helpful…

#!/bin/bash
sudo docker exec -t gitlab gitlab-backup create
sudo docker stop gitlab
sudo docker rm gitlab
export GITLAB_HOME=/srv/gitlab
sudo docker run --detach --hostname gitlab.tombarbette.be --env GITLAB_OMNIBUS_CONFIG="external_url 'https://gitlab.tombarbette.be/'; gitlab_rails['gitlab_shell_ssh_port'] = 2022; " --publish 2443:443 --publish 2080:80 --publish 2022:22 --name gitlab --restart always --volume $GITLAB_HOME/config:/etc/gitlab --volume $GITLAB_HOME/logs:/var/log/gitlab --volume $GITLAB_HOME/data:/var/opt/gitlab gitlab/gitlab-ce:latest

A poster of our latest work, CrossRSS, a Stateless CPU-Aware Datacenter Load-Balancer

Today we will present a poster of our latest work, at CoNEXT’20 : CrossRSS! CrossRSS is a load-balancer that spreads the load uniformly even inside the servers. It uses knowledge of the dispatching done inside the servers, RSS, to purposely select less-loaded cores without any server modification, or inter-core communications on the server. Learn more by watching the short video!

The poster session will be held on the 4th of December, 2:30 CET on the Mozilla VR Hub

Extended Abstract ; Hub ; Video ; Poster-As-Slides

Dynamic DNS with OVH

It may not be a clear thing, but OVH allows to have your own Dynamic DNS if you rent a domain name, surely a better thing than the weird paid website from dyndns.org. I will explain how to handle the update with Linux using ddclient.

On the manager

Connect to https://www.ovh.com/manager/web/#/configuration/domain/ , select your domain name, and create a new dynhost with the button on the right.

Enter a sub-domain name such as “mydyn” (.tombarbette.be), and add the actual IP for now, or just 8.8.8.8 for the time being.

Then it is not finished, you have to create a login that will be able to update that dns entry. Select the second button to handle accesses and create a new login.

Select a login, probably the name of the subdomain, the subdomain itself, and a password.

On the server

sudo apt install ddclient

Then edit /etc/ddclient.conf

protocol=dyndns2
use=web,web=checkip.dyndns.com
server=www.ovh.com
login=tombarbette.be-mydns
password='password'
mydns.tombarbette.be

Just do “sudo ddclient” to update once then “sudo service ddclient restart” to get it updated automatically.

May this be helpful to someone, personally I just forget it all the time so I wanted to leave a post-it somewhere.

PROXIMUS_AUTO_FON automatic connexion on linux using wpa_supplicant

If you understand this title, you don’t need more explanation :

/etc/network/interfaces
auto wlan1
iface wlan1 inet dhcp
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

/etc/wpa_supplicant/wpa_supplicant.conf
ctrl_interface=/var/run/wpa_supplicant

network={
ssid="PROXIMUS_AUTO_FON"
scan_ssid=1
key_mgmt=WPA-EAP
eap=TTLS
identity="LOGIN@proximusfon.be"
password="PASS1234"
phase2="auth=MSCHAPV2"
}

Some may ask why some people would want to do that… I’m now using Voo, but I use my parent’s FON login when voo crash. My current project is towards aggregating the two links by load balancing, or at least have some kind of automatic failover. The more interesting part would be to switch to “FON only” when I reach my 100Gb limit…

Install and share the Canon Pixma MX395 Scanner with Sane

Found a Pixma MX395 at 27€ yesterday… It’s quite easy to find the Canon debian package to install the printer (use these one and not the included) and “scangearmp” which is the specific tool from Canon to scan, but it is not standard, and do not allow to share your scanner on the network through SANE.

The current version of sane do not support that printer, so you’ll need to use an updated one. Do :

sudo add-apt-repository ppa:rolfbensch/sane-git
sudo apt-get install sane sane-utils libsane

And it’s up !

scangearmp -L should show your scanner :
scanimage -L < ~ [14:04:02] device `v4l:/dev/video0' is a Noname USB2.0 Camera virtual device device `pixma:04A91766_21F9AD' is a CANON Canon PIXMA MX390 Series multi-function peripheral

Also edit /etc/sane.d/saned.cong to add the network subnet which can access the scanner :
10.0.0.0/24
[2a02:578:3fe:8139::]/64

For me. Do not forget the IPv6 address, of course 😉

Then on your client, install sane and edit /etc/sane.d/net.conf to add the server address :
10.0.0.1

And if you run scanimage -L on your client you should now see the remote scanner :
scanimage -L
device `v4l:/dev/video0' is a Noname USB2.0 UVC HD Webcam virtual device
device `net:10.0.0.1:v4l:/dev/video0' is a Noname USB2.0 Camera virtual device
device `net:10.0.0.1:pixma:04A91766_21F9AD' is a CANON Canon PIXMA MX390 Series multi-function peripheral

Proximus BBOX 3 in bridge mode with prefix delegation on Linux

Using bridge mode allows you to get a public IP address on one computer (which can serve as a router) behind your modem. This allows you to know your public IP address without using a third-party service, and control more finely all your routing parameters inside your own Linux-based router (this tutorial) or a better router than the BBOX’s one.

We’ll call “the router” the device you want to use behind the modem for clarity.

The bridge mode of the Proximus BBOX 3 is quite interesting. You connect normally to your BBOX using DHCP and will get a locally routable address (i.e. 192.168.0.0/24), but you can use PPP over Ethernet (PPPoE) to get a virtual interface inside your router. This virtual “ppp” interface will have a public IP address, and packets will flow IN and OUT the internet through that interface.

Proximus allows you to therefore maintain 2 PPP connections, one established by the BBOX (also used for the TV), and the other inside your router. It also means your home gets 2 IPv4 addresses.

I prefer that mode to the VOO one, where the external IP address is given by DHCP to only one host in the LAN, the first device to connect to the router using DHCP (dangerous and prone to configuration errors...). Same and independently for IPv6 using DHCPv6. While Proximus not only gives you an IPv6 address but also a /64 prefix via PPPoE to get a direct connection without using a crappy NAT to all your PCs. For IPv6, Proximus is much simpler than setting up an independent DHCPv6 client which gives back the v6 prefix to your LAN side. The second downside is that VOO must use ugly hacks to allow connection to the box as there is no "modem internal network" anymore. You can access your modem at the normally-illegal 192.168.100.1 address as this is on the "public web" space from the router perspective. Moreover, it seems that the modem stops responding to DHCP requests from time to time, losing connectivity... VOO bridge mode is definitively not good... But this may be a temporary bug. I did not observe this anymore...

The bridge/WAN part

Edit /etc/network/interfaces to add the following lines , assuming that eth0 is the interface used to connect to your BBOX.

auto dsl-provider
 iface dsl-provider inet ppp
 pre-up /bin/ip link set eth0 up
 provider dsl-provider

Install pppoe with sudo apt-get install pppoe on ubuntu/debian or sudo yum install pppoe centos/fedora

Then create a file named /etc/ppp/peers/dsl-provider and add the following lines :

noipdefault
defaultroute
replacedefaultroute
hide-password
noauth
persist
mtu 1492
plugin rp-pppoe.so eth0
user "fc0123456@skynet"
usepeerdns

Then edit the file /etc/ppp/chap-secrets and add the line :
"fc012345@skynet" * "password"

If you lost your skynet credentials (personally, I just never received them), you can change them online on MyProximus. You’ll have to reboot your modem so it receives automatically the new credentials.

And that’s all, you can reboot or do a sudo pon dsl-provider and you’ll have a new interface with a public IPv4 and a /64 IPv6.

The router/LAN part

To give connectivity in IPv4 for your hosts and use your Linux host as a router, you’ll have to do a NAT. But you can delegate your IPv6 range and give public IPv6 addresses to all your PCs using SLAAC! Remember to also install a firewall…

To do so, install radvd and add in /etc/radvd.conf (if br0 is the interface connected to your internal network) :

interface br0

{
 AdvSendAdvert on;
 prefix ::/64
 {
   AdvOnLink on;
   AdvAutonomous on;
   AdvRouterAddr on;
 };
 RDNSS 2001:4860:4860::8888 2001:4860:4860::8844
 {
   # AdvRDNSSLifetime 3600;
 };
};

Then do a sudo radvd restart and that’s it.

The RDNSS line gives the address of Google’s public DNS to your host. We could use Proximus’ one, but I don’t have the address on hand.

Do not hesitate to contact me!

Creating a dynamic and redundant array with LVM and MDADM

RAID5 allows to create an array of N+1 drives where N is the number of drives which will contain real data. The last drive will be used to store parity about the other drives (in practice, the parity information is stored by chunks across all drives and not only on one drive). RAID 5 allows to loose any of the drive without loosing the data thanks to the parity drive, and has a cheaper cost than RAID 1 where the usable data will be N/ instead of N-1.

MDADM is the tool of predilection to build a RAID5 drive. Given 3 disks, the command to build a raid 5 array is :

[code lang=”bash”]mdadm –create /dev/md0 –level=5 –raid-devices=3 /dev/sda1 /dev/sdb1 /dev/sdc1[/code]

Problem is, RAID5 drives are not easily splittable/shrinkable/resizable, the operation is complex and must be done offline. The solution is to use LVM on top of MDADM to build a big volume group which will be “protected” by RAID5 allowing to make dynamic paritions on it :

[code lang=”bash”]pvcreate /dev/md0
vgcreate group0 /dev/md0[/code]

And then create multiple, online-resizeable partitions with :

[code lang=”bash”]lvcreate /dev/group0 -n system -L 10G
mkfs.ext4 /dev/mapper/group0-system[/code]

[code lang=”bash”]lvcreate /dev/group0 -n home -L 50G
mkfs.ext4 /dev/mapper/group0-home[/code]

To resize a partition, one can do :

[code lang=”bash”]lvresize /dev/mapper/group0-home -L +10G
resize2fs /dev/mapper/group0-home[/code]

Which will add 10G to the partition, and resize it. It will work even with the system partition, without needing any reboot.