How to pass host device to podman container in user mode using podman-compose

One of the main ideas of containerization is a way to achieve repeatable, hassle-free results. So, when I see headlines that podman gives the same experience and supports docker-compose like deployments (or even provides socket for native docker-compose), I expect that I can grab someone else’s docker-compose file, run $ podman-compose up and enjoy working installation of something. But it’s a trap.

In theory, if you leave service state management out of the equation, it works, but only if the payload doesn’t need access to devices, privileged ports, and you don’t need to organize interpods file sharing or sharing files between the host and pods.

Nevertheless, from time to time when I install a new host with the intention to use containers, I give podman a try as the “new shiny better” alternative to docker. I allocate time (like one evening), and if I’m unable to make it work as I need, I purge it and fall back to docker.

The last time I wanted to run prind in podman, my idea was:

  1. Create separate user
  2. Add it to dialout and video groups on the host system
  3. Run services in user-mode with podman-compose

Looks simple, right? But not in the case of podman.
First of all, device pass-thru doesn’t work in user mode, so if you want to use podman’s --device option, or device: section of compose file, you have to run it under root, which immediately wipes out a major part of podman’s advantages.
Second, podman uses user namespaces, uids:gids which  you see in a container not the same as in a host system. Because of that you may have user root in container which started by regular user, but in fact uids:gids will be offset to a host system by values configured in /etc/{subuid,subgid}, which means that even you have video group in the container, it will be different from the video group that the host system user may be a member of.
Because of that, my first two steps of the plan don’t work out of the box.

To counter that, quite recently, podman has an elegant and beautiful solution: you should add a group named keep-groups to the container with the --group-add option. So, here it is? You just add the group keep-groups to the groups section of the compose file? End of the story? Everything works as expected and everyone is happy? Nope!

For some reason it doesn’t work (at least in debian’s podman-compose 1.0.6-1~bpo12+1). But before this masterpiece was born, the same effect may be achieved with --annotation run.oci.keep_original_groups=1 and it works with compose. You need to mount volume /dev and add the next section:

services:
klipper:
...
volumes:
- /dev:/dev
annotations:
run.oci.keep_original_groups: 1
...

It makes groups looks weird and counter-intuitive in containers and forbids you adding new groups with `groups:` section, but it’s better than nothing:

print3d@aurora:~$ id
uid=1001(print3d) gid=1001(print3d) groups=1001(print3d),20(dialout),44(video),100(users)
print3d@aurora:~$ podman run --group-add keep-groups -v /dev:/dev -it m5p3nc3r/v4l-utils
✔ docker.io/m5p3nc3r/v4l-utils:latest
Trying to pull docker.io/m5p3nc3r/v4l-utils:latest...
Getting image source signatures
Copying blob 366c4c59e228 done
Copying blob 5843afab3874 done
Copying config 13e86697f5 done
Writing manifest to image destination
Storing signatures
/ # id
uid=0(root) gid=0(root) groups=65534(nobody),65534(nobody),65534(nobody),0(root)
/ # ls -l /dev/{ttyUSB1,video0}
ls: /dev/{ttyUSB1,video0}: No such file or directory
/ # ls -ln /dev/ttyUSB1
crw-rw---- 1 65534 65534 188, 1 Dec 26 15:26 /dev/ttyUSB1
/ # ls -ln /dev/video0
crw-rw---- 1 65534 65534 81, 0 Dec 17 15:01 /dev/video0
/ # v4l2-ctl --list-devices
UVC Camera (046d:0825) (usb-0000:04:00.3-5):
/dev/video0
/dev/video1
/dev/media0

LVM recovery

Few days ago i made mistake and forced fsck to check partition that contain LVM instead of logic volume, as result i got broken LVM metadata. I was unable to see volume group an logic volumes.
pvs output looked like that:

# pvs -v

Scanning for physical volume names
Incorrect metadata area header checksum

I tried to run pvck but it did not help me, it founded corrupted metadata but did not repair LVM:

# pvck -d -v /dev/md5
Scanning /dev/md5
Incorrect metadata area header checksum
Found label on /dev/md5, sector 1, type=LVM2 
Found text metadata area: offset=4096, size=193024
Incorrect metadata area header checksum

Finally i founded that it’s possible to make backups of LVM metadata and restore it when needed, but i think that i had only broken LVM with broken metadata.
It’s hard to describe how happy I was when I found that by default LVM create backups of metadata when you make any changes. I found it into /etc/lvm/backup dir, after that recovery become easy task, first i recreate physical volume:

pvcreate -u b3Lk2a-pydG-Vhf3-DSEJ-9b84-RLm9-UEr6r3 --restorefile /etc/lvm/backup/vg-320 /dev/md5

UUID can be founded in pv section into metadata file:

 physical_volumes {
 
 pv0 {
 id = "<strong>b3Lk2a-pydG-Vhf3-DSEJ-9b84-RLm9-UEr6r3</strong>"
 device = "/dev/md5" # Hint only

Next i restored volume group:

vgcfgrestore -f /etc/lvm/backup/vg-320 vg-320

After that logical volumes became visible:

# lvs
 LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
 root vg-320 -wi-a--- 15.00g 
 swap vg-320 -wi-a--- 1.00g 
 var vg-320 -wi-ao-- 200.00g 
 zoneminder vg-320 -wi-a--- 15.00g

After reinitialization with vgscan -v && vgchange -ay commands, volume groups ready for fsck.

Libvirt + vnc + sasl

Error: connection to hypervisor host got refused or disconnected!Yesterday i wanted to configure libvirt with kvm virtualization, while i read comments in config file, i observed, that qemu can share credentials  with  libvirt via sasl. Also i found few how-to, that said ‘just copy /etc/sasl2/libvirt.conf to /etc/sasl2/qemu.conf’.
I done that, but when i tried to open console of VM i got “Error: connection to hypervisor host got refused or disconnected!”.
May be you think, that you can find something interesting in log? Nope. May be you think that you can run virt-manager in debug mode and will see something useful? Nope. The reason, why this happened is because, libvirt  run as root, but they start VM’s as libvirt-qemu user. And sasl2 database has owner root:root and 640 permissions. I changed owner of /etc/libvirt/passwd.db to libvirt-qemu:root and problem is gone.

Pulseview compilation

Half year ago i wanted to make device that can be used to clone ski pass. I thought that ski passes use RFID 125kHz. First i bought itead module RDM6300 but it turned out that it can only read tags, so i bought  EM4095 chip. At this time i also noticed, that most ski passes use MIFARE tags that operated at 13MHz.
Anyway i want to complete this project and build device that can read and write 125kHz tags (really there is to many different tags that operate on 125kHz and uses different protocols, so i want to start with EM4100 tags).That tags use manchester encoding to transfer data, also tags can use different bitrate. It is easy task to encode data into manchester, but it’s really pain in the ass if you want to decode them and does not know bitrate.
I have clone of saleae logic analizer so i decided to practice with decode manchester with libsigrokdecode. Sigrok have ‘official’ gui for libsigrok and libsigrokdecode called Pulseview.
I found that debian wheezy have old libsigrok and do not have pulseview at all, after that i decided to build sigrok and pulseview from scratch. It is really not easy quest, because additionally to libsigrok, libsigrokdecode you need to compile old libusb and libvisa.
Finally when i compiled all that stuff, i faced with errors when i tried to compile pulseview with decoders support.

First, libsigrokdecode need Python >= 3.0, Python.h placed in python3.2/Python.h, so you need to change it into libsigrokdecode.h:

./include/libsigrokdecode/libsigrokdecode.h:#include <python3.2/Python.h> /* First, so we avoid a _POSIX_C_SOURCE warning. */

Second, if you will got that error:

[ 40%] Building CXX object CMakeFiles/pulseview.dir/pv/view/decodetrace.cpp.o
/var/tmp/sigrok/pulseview/pv/view/decodetrace.cpp: In member function ‘virtual void pv::view::DecodeTrace::paint_mid(QPainter&, int, int)’:
/var/tmp/sigrok/pulseview/pv/view/decodetrace.cpp:203:3: error: ‘hash_combine’ is not a member of ‘boost’
/var/tmp/sigrok/pulseview/pv/view/decodetrace.cpp:204:3: error: ‘hash_combine’ is not a member of ‘boost’
/var/tmp/sigrok/pulseview/pv/view/decodetrace.cpp:205:3: error: ‘hash_combine’ is not a member of ‘boost’
make[2]: *** [CMakeFiles/pulseview.dir/pv/view/decodetrace.cpp.o] Error 1

Then you need to add “#include <boost/functional/hash.hpp>” into /var/tmp/sigrok/pulseview/pv/view/decodetrace.cpp

Third,  if you got that:

CMakeFiles/pulseview.dir/pv/data/decoderstack.cpp.o: In function `pv::data::DecoderStack::decode_proc(boost::shared_ptr<pv::data::Logic>)':
/var/tmp/sigrok/pulseview/pv/data/decoderstack.cpp:267: undefined reference to `srd_session_new'
/var/tmp/sigrok/pulseview/pv/data/decoderstack.cpp:283: undefined reference to `srd_inst_stack'

You need to add -lsigrokdecode into CMakeFiles/pulseview.dir/link.txt

I spent too many time to compile that stuff, so i decided to place here archive with complete libsigrok, libsigrokdecode, libvisa, libusb, sigrok and pulseview. I compiled it with preffix /opt/sigrok, so if you want to use it, place that stuff into /opt and run like that:

LD_LIBRARY_PATH=/opt/sigrok/lib /opt/sigrok/bin/pulseview

Enjoy: sigrok.tar
md5: 7bbb1d434959848c741230fe90a590c5 /tmp/sigrok.tar.gz
PS
Also you must install  libboost-thread.

Zram

Few months ago, i tried very cool feature called ‘zram’. It is linux kernel module that allow to create compressed block devices into memory, it can be used for creating compressed fs in ram  (/tmp for example) or for swap.
May be you think, that in context of swap, it is dumb to keeping  memory pages in memory when OS need that memory. =) But compress and store pages in memory is faster than write it onto disk, in most cases memory pages can be heavily compressed, that will help OS to free RAM, if you have SSD it will save life of your disk, also you can continue using swap on disk. If you want to keep you swap partition on-line, you must give higher priority for swap in zram, when zram will full, OS will started to using swap on disk.

I used that init.d script for debian, but i changed it to use not  a whole RAM for zram devices, but half of all memory (in worst case, when pages can not be compressed, zram will use only half of my memory). If you want to do same modification, just change echo $((mem_total / num_cpus )) to echo $((mem_total / num_cpus / 2)) in that script.
Without modifications this script will slice you memory by number of CPU core in your system, create swaps on that slices and attach it to your system with priority 100 (usualy swap partitions have priority -1).
I made simple test of compression ratio for zram:
Detached one of my swaps:

$ sudo swapoff /dev/zram3

Created core file of iceweasel process and wrote it into zram:

$ pgrep -lf icewea
3375 sh -c /usr/bin/iceweasel
3376 /usr/bin/iceweasel
$ gcore 3376
[Thread debugging using libthread_db enabled]
[New Thread 0x7f7c0b9fd700 (LWP 8455)]
...blablabla...
0x00007f7c62a57c13 in poll () from /lib/libc.so.6
Saved corefile core.3376
$ sudo dd if=./core.3376 of=/dev/zram3
dd: writing to `/dev/zram3': No space left on device
2027297+0 records in
2027296+0 records out
1037975552 bytes (1,0 GB) copied, 6,79003 s, 153 MB/s

Core file does not fit completely into zram device, but it is dose not mater, let’s look at compression ratio:

$ cd /sys/block
$ echo `cat ./zram3/orig_data_size`/`cat ./zram3/compr_data_size`|bc
2.68475926964795164955

So, in most cases zram has compress ratio more than 2.5.
Huh, i think it is pretty cool.

How to fix ‘Timezone database is corrupt – this should *never* happen!’

Today i upgraded to wheezy. When i entered my blog i observed that it does not work anymore. I looked into logs and found next errors:

[26-Jun-2013 12:59:41 UTC] PHP Fatal error:  date(): Timezone database is corrupt - 
this should *never* happen! in ...
[26-Jun-2013 12:59:49 UTC] PHP Fatal error:  strtotime(): Timezone database is 
corrupt - this should *never* happen! in /html/wp-includes/functions.php on line 33...

After a little investigation I discovered that happens when you use php in chroot enviroment (i using php5-fpm with chroot, so it is my case). I tried to copy /usr/share/zoneinfo in chroot environment with parent dir structure and correct permissions, but nothing change. Somewhere i read that it problem can happen in debian, because maintainers of php packages, patch source files, the solution – is to install tzdatadb:

apt-get install php-pear php5-dev
pecl install timezonedb
echo 'extension=timezonedb.so'> /etc/php5/mods-available/timezonedb.ini
ln -sf /etc/php5/mods-available/timezonedb.ini /etc/php5/conf.d/30-timezonedb.ini
service php5-fpm restart

After that all work like a charm.

PS

strings /usr/sbin/php5-fpm|grep Quake| head -n8
Quake I save: ddm4 East side invertationa
Quake I save: d7 The incinerator plant
Quake I save: d12 Takahiro laboratories
Quake I save: e1m1 The slipgate complex
Quake I save: e1m2 Castle of the damned
Quake I save: e2m6 The dismal oubliette
Quake I save: e3m4 Satan's dark delight
Quake I save: e4m2 The tower of despair

Easter egg?

Kali linux on LiveUSB with working persistent partition

Few days ago i wanted to make liveusb with kali linux (i had backtrack before). I used this guide to install kali, but i observed, that persistence partition does not work. There is partitions on my usb drive:
Kali - old partition table
When init script found persistent partition and tried to mount  their return error “mount: mounting /dev/sdaX on /root/lib/live/mount/persistence/sdaX failed: Device or resource busy”.
I think this happened because official guide suggest to write iso9660 image on usb drive, and init script think that it is cd drive and mount whole usb device, not a partition where iso placed. This i found in boot.log:

There you can see, that after kali boot, usb drive (sda) still mounted, and i can not mount second partition:
Kali1
After that whole usb drive is busy.  I attached boot.log to this post with enabled debug, may be it will help someone to fix that.

I decided to make bootable usb disk instead of flashing iso on it. For doing that i used extlinux and original kali iso file.
First i create 2 partitions on usb drive, one for kali and second for persistent files:

Kali2

Do not forget to set bootable flag on first partition and correct label for persistent paririon.
After that, install mbr from extlinux:

$ dd if=/usr/lib/extlinux/mbr.bin of=/dev/sda
0+1 records in
0+1 records out
440 bytes (440 B) copied, 0.00126658 s, 347 kB/s

Copy kali linux on first partition:

$ mkdir /mnt/sr0 /mnt/kali
$ mount /dev/sr0 /mnt/sr0/
mount: block device /dev/sr0 is write-protected, mounting read-only
$ mount /dev/sda1 /mnt/kali/
$ rsync -a /mnt/sr0/* /mnt/kali

Also i modify boot menu and add entry with persistence boot option at live.cfg:

label live-686-pae-persistence
menu label ^Live persistence (686-pae)
menu default
linux /live/vmlinuz
initrd /live/initrd.img
append boot=live noconfig=sudo username=root hostname=android-53f31a089339194f persistence

After that you need to rename isolinux.cfg to extlinux.conf and install extlinux:

$ cp /mnt/kali/isolinux/isolinux.cfg /mnt/kali/isolinux/extlinux.conf
$ extlinux --install /mnt/kali/isolinux/
/mnt/kali/isolinux/ is device /dev/sda1

Mount persistence partition and create config:

$ mkdir /mnt/persist
$ mount /dev/sda2 /mnt/persist/
$ echo "/ union" > /mnt/persist/persistence.conf

After that you can reboot and check, that persistent partition work.
Kali3

Split and encode FLAC/CUE to mp3/ogg in one run.

flacsFew days ago i  needed to split and encode one flac file to mp3. I found few solutions, one of them only split flac file, other encode flac to mp3, but no one do not do it in one run.
So, i wrote script, you must specify flac and cue/toc files, after that script will convert flac file to group of mp3 or ogg files and will add tags.
To get this script work, you need to install next packages “cuetools”, “shntool”, “id3v2”, “vorbis-tools” and “lame”.
You will not found lame in standard repositories of squeeze, but you can install it from backports or from debian multimedia repository (all packages available in modern debian based distribution, at least I tested it in debian 8-10).
Usage: flac2mp3 -f /path/to/flac.flac -s /path/to/cue.cue
Also, you can choose between mp3 or ogg with swich -e mp3 or -e ogg (mp3 will be used by default).

Latest version of script available there: https://gist.github.com/IvanBayan/b8a1e7a1629de20b06ef6dc44843c9fe

#!/bin/bash
 
LAMEOPTS="-b 320 --quiet"
OGGOPTS="-b 320 --quiet"
 
while getopts ":s:f:e:" opt; do
  case $opt in
    f)
      FILE="$OPTARG"
      ;;
    s)
      SPLIT="$OPTARG"
      ;;
    e)
      FMT="$OPTARG"
      ;;
    \?)
      echo "Invalid option: -$OPTARG" >&2
      exit 1
      ;;
    :)
      echo "Option -$OPTARG requires an argument." >&2
      exit 1
      ;;
  esac
done
 
# Set default format to mp3
case ${FMT:="mp3"} in
    mp3|MP3|ogg|OGG)
    ;;
    *)
        echo "Unknown format $FMT" >&2
        exit 1
    ;;
esac
 
# Check both files
if [ ! -f "$FILE" -o ! -f "$SPLIT" ]
then
    echo "You must specify correct flac file and CUE/TOC file." >&2
    exit 1
fi
 
#Get number of tracks
NUMTRACKS=`cueprint -d '%N' "${SPLIT}"`
 
for i in `seq 1 $NUMTRACKS`
do
    # Clear previous obtained variables 
    unset PERFORMER
    unset ALBUM
    unset TITLE
 
    # Set performer, album, track title
    PERFORMER=`cueprint -n $i -t '%p' "${SPLIT}"`
    ALBUM=`cueprint -d '%T' "${SPLIT}"`
    TITLE=`cueprint -n $i -t '%t' "${SPLIT}"`
 
    ## Check perfomer, album and track title
    if [ -z "$PERFORMER" ]
    then
        echo "Track $i: Can not obtain performer from cue, set it to 'Unknown Artist'" >&2
        PERFORMER="Unknown Artist"
    fi
 
    if [ -z "$ALBUM" ]
    then
        echo "Track $i: Can not obtain album from cue, set it to 'Unknown Album'" >&2
        ALBUM="Unknown Album"
    fi
 
    if [ -z "$TITLE" ]
    then
        echo "Track $i: Can not obtain track from cue, set it to 'Track $i'" >&2
        TITLE="Track $i"
    fi
 
    ## End
 
    # Split and encoding files
    echo "Encoding track $i/$NUMTRACKS."
    if [ "$FMT" = "mp3" -o "$FMT" = "MP3" ] 
    then        
        cuebreakpoints "$SPLIT"| shnsplit -q -o "cust ext=mp3 lame  $LAMEOPTS - %f" \
-x $i -O always -a "" -z " - $TITLE" "$FILE"
        OUTPUT=`printf "%.2d - ${TITLE}.mp3" $i`
        id3v2 -T $i -a "$PERFORMER" -A "$ALBUM" -t "$TITLE" "$OUTPUT"
    fi
 
    if [ "$FMT" = "ogg" -o "$FMT" = "OGG" ]
    then
        cuebreakpoints "$SPLIT"|shnsplit -q -o "cust ext=ogg oggenc $OGGOPTS -o %f -" \
-x $i -O always -a "" -z " - $TITLE" "$FILE"
        OUTPUT=`printf "%.2d - ${TITLE}.ogg" $i`
        echo -e "TRACKNUMBER=${i}\nARTIST=${PERFORMER}\nPERFORMER=${PERFORMER}" \
                             "\nALBUM=${ALBUM}\nTITLE=${TITLE}\n"|vorbiscomment "$OUTPUT"
    fi    
 
done

Failsafe zoneminder with gluster and geo-replication.

Near year ago i configured zoneminder for monitoring my approach. But i got one problem, host that running zoneminder placed in same approach, so, if it will be stolen, it will be stolen with recordings. I spend a lot of time while choosing solution to organize replication on remote server. Most solution that i found work in “packet” mode, they start replication once in NN sec or after accumulating a certain number of events and usually use rsync. Rsync allow to reduce traffic usage, but increase time between event when new frame will be wrote on disk and event when frame will be wrote on remote side. In this situation every second counts, so I thought that solutions like “unison”, “inosync”, “csync2”, “lsyncd” not applicable.

I tried to use drbd, but get few problems. First – i do not have separate partition on host and can not shrink existing partitions to get new. I tried it with loopback devices, but drbd have deadlock bug when used with loopback. This solution work near two days after deadlock occurs, you can not read content of mounted fs or unmount it and only one solution to fix it that i found – reboot. Second problem – “split brain” situation, when master host restart. I did not spent time to found solution because all ready had first problem. May be it can be fixed with split-brain handlers script or with “heartbeat”. Drbd has great write speed in asynchronous mode and has small overhead, so, may be in another situation i will choose drbd.

Next what i tried was “glusterfs”, first i tried glusterfs 3.0. It was easily to configure it, and glusterfs was worked, but very slow. For good performance glusterfs need short latency between hosts. It useful for local networks but completely useless for hosts connected over slow links. Also glusterfs 3.0 did not have asynchronous write like in drbd. I temporarily thrown searching for solutions, but after a while in release notes for glusterfs 3.2 i found that gluster got “geo-replication“. First i think that this it what i need and before has understood (see conclusion) how it work i started to configure it.

If you will have troubles with configuration, installation  here you can find manual.
For Debian, first that you need is to add backports repository on host with zonemnider and on hosts where you planned to replicate data:

$ echo 'deb http://backports.debian.org/debian-backports squeeze-backports \
main contrib non-free' >> /etc/apt/sources.list
$ apt-get update
$ apt-get install glusterfs-server

After that you must open ports for gluster on all hosts where you want to use it:

$ iptables -A INPUT -m tcp -p tcp --dport 24007:24047 -j ACCEPT 
$ iptables -A INPUT -m tcp -p tcp --dport 111 -j ACCEPT 
$ iptables -A INPUT -m udp -p udp --dport 111 -j ACCEPT 
$ iptables -A INPUT -m tcp -p tcp --dport 38465:38467 -j ACCEPT

If you do not planned to use glusterfs over NFS (as i) you can skip last rule.
After ports will opened, gluster installed and service running you need to add gluster peers. For example you run zoneminder on host “zhost” and want to replicate it on host “rhost” (do not forget add hosts in /etc/hosts on both sides).  Run “gluster” and add peer (here and below gluster commands must to be executed on master host, i.e. on zhost):

$ gluster
gluster> peer probe rhost

Let’s check it:

gluster> peer status
Number of Peers: 1
 
Hostname: rhost
Uuid: 5e95020c-9550-4c8c-bc73-c9a120a9e96e
State: Peer in Cluster (Connected)

Next  necessary to create volumes on both hosts, do not forget to create directories where gluster will save their files. For examples you planned to use “/var/spool/glusterfs”:

gluster> volume create zoneminder transport tcp zhost:/var/spool/glusterfs
Creation of volume zoneminder has been successful. Please start the volume to access data.
gluster> volume create zoneminder_rep transport tcp rhost:/var/spool/glusterfs
Creation of volume zoneminder_rep has been successful. Please start the volume to access data.
gluster> volume start zoneminder
Starting volume zoneminder has been successfu
gluster> volume start zoneminder_rep
Starting volume zoneminder_rep has been successfu

After that you must configure geo-replication:

gluster&gt; volume geo-replication zoneminder gluster://rhost:zoneminder_rep start
Starting geo-replication session between zoneminder &amp; gluster://rhost:zoneminder_rep has been successful

And check that it is work:

gluster> volume geo-replication status
MASTER               SLAVE                                              STATUS    
--------------------------------------------------------------------------------
zoneminder           gluster://rhost:zoneminder_rep           starting...
gluster> volume geo-replication status
MASTER               SLAVE                                              STATUS    
--------------------------------------------------------------------------------
zoneminder           gluster://rhost:zoneminder_rep           OK

Also i made next configurations:

gluster> volume geo-replication zoneminder gluster://192.168.107.120:zoneminder_rep  config sync-jobs 4
geo-replication config updated successfully
gluster> volume geo-replication zoneminder gluster://192.168.107.120:zoneminder_rep  config timeout 120
geo-replication config updated successfully
gluster> volume set zoneminder nfs.disable on
Set volume successful
gluster> volume set zoneminder_rep nfs.disable on
Set volume successful
gluster> volume set zoneminder nfs.export-volumes off
Set volume successful
gluster> volume set zoneminder_rep nfs.export-volumes off
Set volume successful
gluster> volume set zoneminder performance.stat-prefetch off
Set volume successful
gluster> volume set zoneminder_rep performance.stat-prefetch off
Set volume successful

I disabled nfs because i did not planned to use gluster volumes over nfs, also i disabled prefetch because they produce next error on geo-ip modules:

E [stat-prefetch.c:695:sp_remove_caches_from_all_fds_opened] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_setattr_resume+0x1b0) [0x7f2006ba8ed0] (-->/usr/lib/glusterfs/3.2.7/xlator/debug/io-stats.so
(io_stats_setattr+0x14f) [0x7f200467db8f] (-->/usr/lib/glusterfs/3.2.7/xlator/performance/stat-prefetch.so(sp_setattr+0x7c) [0x7f200489c99c]))) 0-zoneminder_rep-stat-prefetch: invalid argument: inode

In conclusion i must to say, that it is not a solution that i looked for, because as it turned out  gluster use rsync for geo-replication. Pros of this solution: it is work and easy to setup. Cons it is use rsync. Also i expected that debian squeeze can not mount gluster volumes in boot sequence, i tried to modify /etc/init.d/mountnfs.sh but without result, so i just add mount command in zoneminder start script.

PS
echo ‘cyclope:zoneminder      /var/cache/zoneminder   glusterfs       defaults        0       0’ >> /etc/fstab
And add ‘mount /var/cache/zoneminder’ in start section of /etc/init.d/zoneminder

PPS
Do not forget to stope zoneminder and copy content of /var/cache/zoneminder on new partition before using it.

How to block large ip subset on the example of TOR

I wrote before how i blocked TOR exit nodes by iptables, disadvantages of method that i used before – big amount of rules (one per each ip). This solution easy and obvious, but had speed penalty. Today i want write about more effective solution that use ipset, let’s see what is ipset:

IP sets are a framework inside the Linux 2.4.x and 2.6.x kernel, which can be administered by the ipset utility. Depending on the type, currently an IP set may store IP addresses, (TCP/UDP) port numbers or IP addresses with MAC addresses in a way, which ensures lightning speed when matching an entry against a set. If you want to store multiple IP addresses or port numbers and match against the collection by iptables at one swoop; dynamically update iptables rules against IP addresses or ports without performance penalty; express complex IP address and ports based rulesets with one single iptables rule and benefit from the speed of IP sets then ipset may be the proper tool for you.

(c)

Because of exit nodes can exist on hosts that using dynamic IP i want to delete addresses from list after timeout (otherwise list will always growing and contain unused addresses or addresses without TOR exit nodes, larger list requires more memory and CPU time for processing). If addresses persist between list updates, timeout will be reseted.

To do this i used iptree set (you can learn more about set types in manual for ipset), because that type provide timeout for each address.

First i installed ipset:

# apt-get install xtables-addons-common

After that i modify scripts that i used before. In perl script i made a pair of minor bug fixes and in shell script i add new facility for loading rules into ipset.
Perl script:

#!/usr/bin/perl -w
use strict;
use LWP::Simple;
 
my $list = get("http://exitlist.torproject.org/exit-addresses");
my $i;
my @ips;
 
if( ! defined( $list ) ) {
    exit( 1 );
}
if( $#ARGV == -1 ) {
exit(0);
}
 
foreach $i (split( /\n/, $list )) {
    push( @ips, $1 ) if( $i =~ m/((?:\d{1,3}\.){3}\d{1,3})/);
}
 
if( $ARGV[0] eq "-ip" ) {
    print( join( "\n", @ips ) . "\n");
}

That script return exit code without arguments that signaled can this script fetch addresses or not, with parameter “-ip” they return list of addresses.
Next shell script:

#!/bin/bash
SOLVE="/root/bin/solve.pl"
IPSET="/usr/sbin/ipset"
 
case "$1" in
ipset)
        if ! /usr/bin/perl $SOLVE
        then
                echo "Can not fetch Tor exit nodes" 1>&2
                exit 1
        fi
        if ! $IPSET -L tor 2>&1 > /dev/null
        then
                $IPSET -N tor iptree --timeout 259200
        fi
        for i in `/usr/bin/perl $SOLVE -ip`
        do
                $IPSET -A tor $i 2> /dev/null
        done
 
;;
 
iptables)
        if /usr/bin/perl $SOLVE
        then
                /sbin/iptables -F TorExitnodes
                /sbin/iptables -I TorExitnodes -j RETURN
 
                for i in `/usr/bin/perl $SOLVE -ip`
                do
                 /sbin/iptables -I TorExitnodes -s $i -j TorBlockAndLog
                done
        else
                        echo "Can not fetch Tor exit nodes" 1>&2
        exit 1;
        fi
 
;;
 
*)
        echo "Usage ./$0 "
        exit 1
;;
esac
 
exit 0

This script can add rules into iptables (old variant) or into ipset. I added this script in cron and run every few hours. When ipset found that address all ready in list, they update timeout, if address will not observed in 72 hours, they will be automatically deleted.

Finishing touch – new rule in iptables:

# iptables -A INPUT -i eth+ -m set --match-set tor src -m comment --comment "Block TOR exit nodes thru IPSET" -j DROP

That’s all. Do not forget to place this rule before rules where you permit access to your server.