yet another blog : en

Sat, 03 Jun 2006

Encrypting backups with one's gpg public key for fun and profit :-)

As part of my diseba project, I'm looking at ways to implement a secure backup on a remote machine. It appears that there is a simple way to ensure that the backup can only be read by yourself: encrypt it with gpg using your public key (gpg --encrypt --recipient is your friend). Only you with access to your private key can decrypt the backup. Of course, as a good friend told me, you'd better save your private key in a safe place outside your backup! :-)

Such a system could be implemented in existing softwares like backup2l or BackupPC.

2006-06-02T22:04:19Z [/ideas] permanent link

Wed, 29 Mar 2006

How arguments are passed to a shebang script

When implementing ocamlscript, I realized that argument passing for a shebang script (a Unix script starting with #!) was rather undocumented. So here is a tiny documentation of this rather contrived argument passing convention.

Suppose you have the OCaml program:

let _ =
  for i = 0 to Array.length Sys.argv  - 1 do
    Format.printf "argv(%d): \"%s\"@\n" i Sys.argv.(i)
  done
Compiled with: ocamlopt -o argv-test argv-test.ml.

And suppose you have following script-with-args shebang script:

#!./argv-test intarg1 intarg2 intarg3

and following script-without-args shebang script:

#!./argv-test

As a reference, if argv-test is called directly, you have:

$ ./argv-test arg1 arg2 arg3
argv(0): "./argv-test"
argv(1): "arg1"
argv(2): "arg2"
argv(3): "arg3"
Now, let's compare the different ways of calling the shebang script:
$ ./script-without-args 
argv(0): "./argv-test"
argv(1): "./script-without-args"

$ ./script-without-args arg1 arg2 arg3
argv(0): "./argv-test"
argv(1): "./script-without-args"
argv(2): "arg1"
argv(3): "arg2"
argv(4): "arg3"

$ ./script-with-args 
argv(0): "./argv-test"
argv(1): "intarg1 intarg2 intarg3"
argv(2): "./script-with-args"

$ ./script-with-args arg1 arg2 arg3
argv(0): "./argv-test"
argv(1): "intarg1 intarg2 intarg3"
argv(2): "./script-with-args"
argv(3): "arg1"
argv(4): "arg2"
argv(5): "arg3"

You see the pattern? :) If the shebang script is called with arguments given on the shebang line, those arguments are available in a single string as argv(1). Otherwise, argv(1) is equal to the name of the shebang script itself. If any argument in given on the command line to the shebang script, they are available as normal argument, after binary name, optional internal arguments and script name. To determine if the shebang script has internal arguments: if the called binary has at least three arguments, one should test if argv(1) is the name of an real file (and thus the script has not argument) or not (and thus the script has arguments).

As a side note, if one wants to use the env utility:

$ cat ./env-script
#!/usr/bin/env ./argv-test 

$ ./env-script 
argv(0): "./argv-test"
argv(1): "./env-script"

$ ./env-script arg1 arg2 arg3
argv(0): "./argv-test"
argv(1): "./env-script"
argv(2): "arg1"
argv(3): "arg2"
argv(4): "arg3"

However env does not support arguments on the shebang line:

$ cat ./env-script-with-args 
#!/usr/bin/env ./argv-test intarg1 intarg2 intarg3

$ ./env-script-with-args 
/usr/bin/env: ./argv-test intarg1 intarg2 intarg3: No such file or directory

2006-03-29T19:38:24Z [/misc] permanent link

Tue, 28 Mar 2006

Opening of diseba project on Gna!

I have opened diseba project on Gna!. You can browse the source code, download it under Subversion (aka svn) and subscribe to the mailing lists. Please join us!

2006-03-28T19:37:42Z [/misc] permanent link

Sat, 25 Mar 2006

Diseba: a distributed and secure backup (take II)

Continuing on the idea of a solution for distributed and secure backup, I have now a more clear view of the software. Compared to my first original idea, I changed some details. However the cryptographic details could be kept as is. And I have found a name for the project: diseba. :)

Overview

The general idea is the same as in my first note: save the content of your hard disk on your friends' hard-disks, in such a way that only you can read those copies (through strong encryption). Moreover, the network of friends is closed (i.e. you must agree with the others to enter the "club").

The main idea is to divide the backup into two main steps: storing the content to backup into a set of encrypted blocks and then distributing those encrypted blocks onto the hard-disks of your friends. More specifically, a backup phase is divided into four steps:

  1. Determine which files are new or updated;
  2. Cut them in fixed size and encrypted blocks, on local disk, along with a Master Index that contains all needed information to restore a backup;
  3. Save those blocks on the remote hard-disks, distributing enough copies to ensure long term availability. Two methods could be used: rsync or a bittorrent-like distribution;
  4. Send to remote nodes the list of blocks and master index we want to keep, all other blocks or master indexes for this node being erased.

The main disadvantage of this approach is that it might use, in the worse case, twice the disk space of the saved space. But its big advantage is it implementation simplicity and quite good security.

At any point in the above steps, the local software stores on disk its state so it can restart from where it was stopped.

In steps 1 and 2, all files, links and directories of the directories to backup are examined:

As result, a file is saved with following structure:

  +------------------+  +------------------+  +------------------+
  |IV| file content  |  |   file content   |  | file content |MAC|
  +------------------+  +------------------+  +------------------+
     encrypted block       encrypted block       encrypted block

Content of the Master Index

The Master Index is a password encrypted compressed file that contains all the information needed to restore a backup from a set of encrypted blocks. More specifically, it contains:

The Master Index is added a MAC, encrypted and stored on disk with salt and number of hash iterations for decryption.

On disk backup structure

The backup on the local disk has following hierarchy:

node0/master_index/1  <-- saved master index #1
node0/master_index/5
node0/backup_state    <-- local status of the software in case of interruption
node0/blocks/0/40
node0/blocks/0/80     <-- encrypted blocks as 1 KB files
 :
node0/blocks/f/67f
             |
             +---- the set of blocks are divided into 16 directories, 0 to f
node1/   <-- same directory structure as for node0
 :
node5/

Remote copy of blocks and master indexes

Once a new backup (blocks and master index) has been generated on local disk, one needs to save it on remote nodes. We propose to use two strategies:

  1. rsync synchronisation of the disk directory structure. This would be not efficient with several remote nodes but it could be quite useful on a local network (and simple to implement ;) ;
  2. a bittorrent like approach.

The bittorrent-like approach is quite simple: use the copy made on other nodes to reduce the amount of data that an initial node should send. More precisely, for each block that should be copied on N remote nodes:

  1. first copy: complete copy from node 0 to node 1;
  2. second copy: 1/2 copy from node 0 to node 2, 1/2 copy from node 1 to node 2;
  3. ...
  4. Nth copy: 1/N copy from node 0 to node N-1, ..., 1/N copy from node N-2 to node N-1.

Each node is in charge of making the copies of its own blocks and master index on other nodes. To implement above algorithm, one would need following network API:

trigger_block_copy([list of source nodes], [list of block ids])
 => returns success or failure for each block

trigger_master_index_copy([list of source nodes], master_index_id)
 => returns success or failure

pull_block_parts([list of (block id, offset, length)])
 => returns, for each requested part, failure or block content

pull_master_index_part(master index id, offset, length)
 => returns failure or master index content

erase_block_list([list of block id])
 => returns success or failure

erase_master_index(master index id)
 => returns success or failure

A source node would call trigger_blocks_copy to request to one or more nodes a copy of a list of blocks. Each one of the nodes receiving this call would in turn use pull_block_parts to get the needed parts from other nodes having a copy of the needed block. After all new blocks are copied on remote nodes, the source node can call erase_block_list to erase outdated blocks. The procedure is the same for master indexes.

In case a destination node crashes in the middle of a copy, the source node should consider that the copy has failed and it should elect a new destination node. In case a sending node crashes in the middle of a copy, the receiving node should erase the corresponding block and report a failure to the source node.

As the source node sends at backup step 4 the complete list of blocks and master index it wants to keep, we avoid keeping stale blocks or partially copied blocks in case of nodes' crashes.

It might appear that a source node cannot save enough copies onto remote nodes (because they are unavailable): it should attempt to retry those copies at a latter time.

Clique management

To create a new clique (group of friends making backups between each other), one should:

To join an existing clique, a new node should do:

In order to have the same node numbering on each node, the public keys of the clique nodes are sorted and each node is given a number equal to its public key rank. A local unencrypted configuration file contains public SSL keys of other nodes and local directories to backup.

Few notes on implementation

In order to answer remote node requests asynchronously, the diseba program should be multi-threaded: one thread to make local backup and trigger local to remote copies, one or more threads to fulfil remote nodes' copy requests.

I still intend to implement this program in OCaml. ;) RPC over SSL would be used for communication between nodes.

Any opinion on this design?

2006-03-25T14:27:40Z [/ideas] permanent link

Tue, 21 Mar 2006

A program to make some statistics on a filesystem

I wanted to know how files are distributed in my home directory, so I wrote a quick OCaml program to do this analysis. This program gives following result on my home directory (it counts links as files):

$ ./filesystem-analysis.ml ~
Total file size:  6120592065 bytes
Minimum non empty file size:    1.00 B 
Maximum file size:  163.77 MB
Average file size:   28.81 KB
Total number of files: 207476
 of which are empty files: 544
Total number of directories: 22932
File size distribution:
[   1.00 B  -    1.00 B ]     592 files - total average of  592.00 B 
[   2.00 B  -    3.00 B ]     574 files - total average of    1.68 KB
[   4.00 B  -    7.00 B ]     772 files - total average of    4.52 KB
[   8.00 B  -   15.00 B ]    2205 files - total average of   25.84 KB
[  16.00 B  -   31.00 B ]    4077 files - total average of   95.55 KB
[  32.00 B  -   63.00 B ]   15371 files - total average of  720.52 KB
[  64.00 B  -  127.00 B ]    7598 files - total average of  712.31 KB
[ 128.00 B  -  255.00 B ]    6258 files - total average of    1.15 MB
[ 256.00 B  -  511.00 B ]   10922 files - total average of    4.00 MB
[ 512.00 B  - 1023.00 B ]   21493 files - total average of   15.74 MB
[   1.00 KB -    2.00 KB]   24063 files - total average of   35.25 MB
[   2.00 KB -    4.00 KB]   35981 files - total average of  105.41 MB
[   4.00 KB -    8.00 KB]   27669 files - total average of  162.12 MB
[   8.00 KB -   16.00 KB]   16737 files - total average of  196.14 MB
[  16.00 KB -   32.00 KB]   16654 files - total average of  390.33 MB
[  32.00 KB -   64.00 KB]    7057 files - total average of  330.80 MB
[  64.00 KB -  128.00 KB]    4190 files - total average of  392.81 MB
[ 128.00 KB -  256.00 KB]    2905 files - total average of  544.69 MB
[ 256.00 KB -  512.00 KB]    1036 files - total average of  388.50 MB
[ 512.00 KB - 1024.00 KB]     738 files - total average of  553.50 MB
[   1.00 MB -    2.00 MB]     301 files - total average of  451.50 MB
[   2.00 MB -    4.00 MB]     115 files - total average of  345.00 MB
[   4.00 MB -    8.00 MB]     104 files - total average of  624.00 MB
[   8.00 MB -   16.00 MB]      38 files - total average of  456.00 MB
[  16.00 MB -   32.00 MB]      14 files - total average of  336.00 MB
[  32.00 MB -   64.00 MB]       8 files - total average of  384.00 MB
[  64.00 MB -  128.00 MB]       1 files - total average of   96.00 MB
[ 128.00 MB -  256.00 MB]       3 files - total average of  576.00 MB

Returned results are pretty expected. Over the 6GB of files in my home directory, I have about 200,000 files of an average size of 30KB and 23,000 directories (about 10% of the number of files). The vast majority of files are below 32KB but a small number of big files are taking the vast majority of my disk space.

A more interesting point is that a non negligible number of files (10%) is smaller than 64 bytes. In the case of the distributed backup system, it would be time and space saving to save them within the master index.

2006-03-21T20:07:50Z [/misc] permanent link

Mon, 20 Mar 2006

Some cryptographic details distributed backup

As a followup to my previous note on a distributed backup system, I give here some details on its cryptography. Most of this note comes from Niels Ferguson and Bruce Schneier Practical Cryptography book. Feel free to criticize my choices. ;)

Cryptographic primitives

I intend to use Niels and Schneier Fortuna random number generator. Compared to their original design, I won't implement my entropy pool system but use the /dev/random facility of Linux. That's not perfect but certainly much easier to implement. This random number generator is used to generate per file (or per backup phase) specific key, each time a file is saved. This gives perfect forward secrecy in case a key is compromised.

AES 256 bits is used as block cipher. I could also use Serpent 256 bits, or even both of them cascated, but I don't know if this has any value.

As block chaining mode, use Cipher Block Chaining (CBC) mode with a Nonce-Generated Initialization Vector (IV). The nonce is generated by encrypting with the file key the 64-bits file counter, incremented each time a file is saved (i.e. a new encryption is made).

  (* + is xor addition *)
C0 := E(K, 64-bits-counter)
Ci := E(K, Pi + C{i-1})   for i = 1, .. k (k = number of blocks)

As cryptograhic hash function, use two call to SHA256d defined from SHA-256 as:

SHA256d(m) := SHA-256(SHA-256(m))

As Message Authentication Code, use HMAC constructed with SHA-256 (K is the MAC key, m the message, a and b are two specified constants):

  (* + is xor operation *)
HMAC(K,m) := SHA-256((K + a) || SHA-256((K + b) || m))

Protection of master index

The master index stores the list of saved file with their encryption key and other associated information. This master index is protected by a password, known by the user. As such a password has not enough entropy, so a salting and streching technique is used to reinforce password strength. Firstly, a 256-bits random salt is generated and stored alongside the master index. Then, a cryptographic hash function SHA256d is used repeatedly as follow:

x0 := 0
xi := SHA256d(x{i-1} || password || salt)   for i = 1, .., r
K  := xr

r is the maximum number of iterations that can be made during 200-1000 ms. K is the final master key for the master index.

Each time the master index is encrypted, a new Initialization Vector is needed. For this, we use as nonce the global 64-bits file counter by taking a value from it.

To check that the master index as been correctly decrypted once the user has entered his password or that the master index has not been modified, the compressed master index is added a Message Authentication Code before being encrypted. The MAC key and encryption/decryption keys are generated from the derived password key with following formulas, where "string" is just a simple 16-bytes string concatenated to the key:

MAC_key        := SHA256d(password-derived-K || "authentication01")
encryption_key := SHA256d(password-derived-K || "encryptionabcdef")

Before being stored on remote nodes, the master index is thus compressed, added a MAC and encrypted.

Encryption and authentication of files

In my initial design, each file is encrypted with a freshly generated 256 bits key, the nonce used for Initialization Vector being the freshly allocated 64-bits file identifier. As that might be too costly, I think it would be easier to use a unique key for the set of files making a backup phase. So I indent to use following algorithm:

For each backup phase:
  1. generate a random encryption key
  2. generate a random Message Authentication Code key

  3. for each file
     a. take a value and increment the global 64-bits counter
     b. compress the file
     c. compute file MAC with MAC key
     d. encrypt the file and its MAC with IV and encryption key
     d. store an entry in master index, with file identifier and
        used keys (encryption and MAC)

We store for each file the used keys, so in there is no need to make cleanup in the index file, as each file can be re-encrypted with a different key in a latter backup phase.

As said in my previous note, it is not necessary to encrypt directories and empty files, except putting an entry for them in the master index. As an optimisation, if the compressed file is smaller than 32 bytes (the size of an encryption and MAC keys), it is more efficient effective to store it directly in the master index. The real question is: are there so many files of that type?

The main issue with above design is that file size is immediately visible to attackers. A refinement of the design would be to use a fixed block size (e.g. 8 or 16 Kbytes) and partition a file or a set of files amongst several blocks. For a complete backup phase, following algorithm would be used:

For each backup phase:
  1. compress all files and sort them according to there size

  2. starting from the biggest one:
     if the file is bigger that a block:
        split it in the needed amount of blocks and pad the last
        block with dummy data (zeros)
     else (the file is smaller than a block):
        put it in a new block and try to fit as much as possible
        other files in the same block

  3. for each generated block, give it a 64-bits block identifier
     and encrypt it

For each file, store in the master index the used keys (encryption and MAC), the list of block identifiers where parts of the file is in, compressed file size and offset of the file within the first block.

For very big files, it would be impractical to compress them in advance, so one would compress, split them in blocks and compute MAC and encryption on each block on the fly.

To conclude

I've used fairly standard cryptography, following the advice of others and being pretty conservative. However, I might have missed something: please tell it to me! ;)

A side note: to improve the backup program latency, one should seperate encryption and block segmentation on one side from remote backup phase and use on the other side using a set of threads to do the job in parallel. Moreover, in order to improve network usage, one should allow to send several blocks/files in the same RPC call.

2006-03-20T22:42:40Z [/ideas] permanent link

Sat, 18 Mar 2006

A software to securely backup your hard-disk on friends' hard-disk

The idea

Making backup in a cost effective way is usually difficult with current technology for indivuals (who can't support the cost of a tape backup system).

Considering that hard-disk are quite cheap and have more and more capacity, one solution would be to save the content of your hard-disk on the disk of your friends (and of course, they would do the same on yours). With enough duplication, a crash of a disk would not impact your backups. Of course, you don't want your friends to read your emails, so saved data should be properly encrypted.

Specification (or at least something close to it :-)

Note: following details are freely inspired by encfs and TrueCrypt.

A backup system is made as a set of nodes, each node running on the machine of an individual and saving parts of his files (probably as a Unix daemon-like program). Each node has an SSL-like private/public key and communicates with other nodes over SSL connections. At system setup, each node knows the public key of other nodes through out-of-band means (email). Each node has a master password, used to generate derivative keys.

Each node is configured to save a part of the local file hierachy, with exclusion patterns (for example ~ files).

Each node is in charge of the encryption and the copying of its files onto remote nodes. So a node will take care of copying each file of the backup in at least n encrypted copies. Each file copied on a remote node is compressed and encrypted with an individual key, generated from a master key and some random each time the file is copied.

A set of rules is enforced by each node so that total available capacity is equally shared between nodes. At launch time, each node knows the total space reserved for the backup system on local hard-disk. It agrees with other node of the group total capacity and the best way to use it.

Each saved file is indexed by a 64-bits integer, generated by the source node. On the source node, a global counter is used, initially set to zero and incremented each time a file is copied on a remote node. That way, remote nodes only knows a file number and its size and cannot deduce the file hierarchy on source node, when files are modified (well, maybe some information can be deduced from file size but I consider this a minor issue), etc.

On the source node, a master index is maintained. This index stores, for each saved file:

In case of a directory, saved information in the index are indentical except that no file identifier is necessary. The master index also stores the latest value of the 64-bits file identifier counter.

The master index of a source node is saved on remote nodes explicitely (i.e. they know this is the master index of the source node) and encrypted with a key generated from the node master password. In case of a crash on the source node, the master index can be recovered from remote nodes, in order to reconstruct the disk from saved files indexed by the master index (using the master password).

To support incremental backup, a local copy of each saved file is necessary in order to compute the difference. This local copy could be used as an added local backup, on a different disk for example.

Transactional system

One issue of the system is that the master index and saved files should be coherent, even in case of crash of one or more nodes at any step of a backup phase. So we need to use a transactional-like system to ensure this.

The API between nodes offers a way to start a transaction, and then close it or abort it. If the SSL connection supporting the transaction is interrupted (remote node crash for example), the transaction is aborted.

When saving a set of files (e.g. a directory), the source node:

  1. starts a transaction;
  2. saves a file on a remote node, allocating to it a new identifier;
  3. repeat previous step for all files of the directory;
  4. updates locally the index;
  5. saves the index on remote nodes, as a new one;
  6. closes successfully the transaction;
  7. starts a second transaction;
  8. removes the old files on previous nodes.
  9. removes the old index on all remote nodes.
  10. closes successfully the transaction;
  11. removes locally the old index.

Previous algorithm should probably be tailored, depending on the number of files to save. Maybe it would be better to saved a fixed amount of files (say 10 to 50 files) in each transaction.

If a remode node sees an opened transaction as aborted (either voluntarily or because the source node has disappeared), it erases all files created during the transaction. If a source node sees a remote node crash, it can elect a new remote node and use it in replacement of the previous one.

When a file is saved on a remote node, the remode node replies with a success only when the file is effectively stored on the remote node local disk (of course, real write of data depends of the underlying file-system and mounting parameters).

I should probably do a SPIN model of this algorithm. :)

API between nodes

One could use ONC RPC (as defined by IETF) to communicate between nodes. The data stored (files and index) on remote node should be defined in the XDR format.

We could use following API:

 /* transactions */
start_transaction() -> transaction_id
close_transaction(transaction_id)
abort_transaction(transaction_id)

 /* to handle files */
save_file(transaction_id, 64bits_id, file_content)
erase_file(transaction_id, 64bits_id)
get_file(transaction_id, 64bits_id) -> file_content

 /* to handle master index */
save_index(transaction_id, version)
erase_index(transaction_id, version)
get_index(transaction_id, version) -> index_content

 /* to manage groups */
enter_group() -> computed_group_parameters
leave_group()
pushed_group_update() -> computed_group_parameters

Each API call is identified by the source node, from the connection used. So each node can uniquely store a source node files into a separate directory.

Possible user interface for the program

I think it would be nice if the program would have a very simple user interface, like the following:

distrib-backup --initialize-start-group               # to initiate a new group
distrib-backup --initialize-join-group  a-remote-node # to join an existing group
distrib-backup                                        # usual start
distrib-backup --restore-after-crash a-remote-node    # to reconstruct after a crash
The backup software should have a local user interface: either a command line one (shell like) or a web interface, whichever is the easiest to implement.

Implementation

In OCaml of course! :) OCaml as already the bindings or libraries for:

And of course, the OCaml language has all the properties to make a safe system (garbage collection, strong typing, etc.).

Remaining issues

What do you think of it?

2006-03-18T20:11:59Z [/ideas] permanent link

Sat, 11 Feb 2006

Literate programming as source code comments

For a long time, I've been interested in literate programming, an idea originaly developed by Donald. E Knuth in its CWEB system when programming the TeX system. His idea is quite simple: write a program not as a piece of code but as a book, where actual code is interleaved with pieces of documentation about that code. For example, for more than two years now I've been using Noweb to write the code of demexp. Noweb is very nice. Most notably, it is language agnostic so it can be used on all the different pieces of source you have in a project.

The main issue with literate programming is a technical one: the mixed source code and documentation is written into a file, e.g. file.ml.nw, from which are derived compilable code (file.ml) and documentation (file.ml.tex). This approach is not compatible with proprietary IDE environments that your are forced to use for very specific programming (e.g. Intel's IXP SDK for IXP2x00 Network Processors or Xilinx's ISE for FPGA). Moreover, the more I think about it, the more I believe that compilable source code should be the reference point:

I have thus started to think about embedding literate programming chunks as language comments. A special processing on source code extracts documentation from literate markup.

My original idea was quite simple: use a Wiki-like syntax to write the documentation, with a few set of tags:

One could even consider transforming an existing Wiki system storing its data in text file, like DokuWiki, to write the literate parts of the code.

More recently, Felix BREUER has proposed to use TeXmacs as a literate programming tool. I have proposed him my idea of documentation as source comments and he seems to like it, so we will try to experiment that idea.

To conclude, while I'm pretty sure that literate programming as source code comments is the way to go, I'm not sure of the right path between a wiki-like syntax and a TeXmacs approach. While TeXmacs approach is very seducing because TeXmacs is a full fledge scientific editor, with image creation and inclusion, formulas, references and links, its file format is not very readable and could clutter a lot the source code, making it unreadable. Felix has proposed an original solution to this issue: put the TeXmacs code at the end of the source file (albeit still embedded in it) and use a system of references for edition within TeXmacs. On the other hand, a wiki-like system is always readable, even for people not using the literate programming approach. Time and experiment will tell what is the right approach.

2006-02-11T14:14:26Z [/ideas] permanent link

Wed, 25 Jan 2006

Idea: write a formally proven voting machine

For those having some spare time and who are willing to dig into tricky (but interesting) areas, I have a nice idea of project to do: write the (free) software of a voting machine where all (or at least most of) the code is formally proven.

The voting machine I consider would be a very classical one: no network connection (so no remote electronic voting), a touch screen with the different voting options, votes are stored on a local hard drive. One can use a standard PC as reference hardware (with for example a graphical screen with mouse click to emulate the touch screen), even if for a real voting machine a temper-proof hardware would be mandatory.

The purpose of the software would be:

The software would be written in plain C, using information (and maybe code) from Simple Operating System (SOS) for hardware interface (mouse and screen control). Using Why proof obligation producer, one would produce the formal version of the program for the Coq proof assistant, that would be used for the formal proofs.

Of course, all of those software are available as free software and the resulting code and formal proofs should be also available as free software. The net result of such a project would not be a complete voting machine (this is much more complex than that) but a first step towards a trustable voting machine.

What do you think of it?

2006-01-25T18:03:05Z [/ideas] permanent link

Fri, 30 Dec 2005

Switch to pyblosxom

I switched from blosxom to pyblosxom to make my blog: the permanent link of blog entries was changing with modification time! Pyblosxom has a sensible way of managing permanent links. Additional bonus: the user documentation of pyblosxom is very nice.

2005-12-30T17:56:39Z [] permanent link

Wed, 28 Dec 2005

Sage: a nice RSS reader for Firefox

I'm currently using Sage as a RSS reader for Firefox. It is pretty basic but suits my needs: to know what is new is a newsfeed (dynamic bookmarks of Firefox lack this functionnality). And being a Firefox extension, it is very simple to install and is integrated with the browser: clicking on a news item opens the right link (and even a vote on demexp if Firefox is properly configured).

2005-12-28T11:02:40Z [] permanent link

Tue, 27 Dec 2005

PGP pathfinder: have statistics about your GPG key

I recently discovered PGP pathfinder & key statistics a system to gather statistics about your GPG key and determine the trust path between you and another person.

For example, the trust path between me and Branden Robinson (Debian Project Leader) is, in the shortest case:

     0 	A3AD7A2A  stats  David MENTRE dmentre.at.linux-france.org #13401 signs
     1 	D0AB4075  stats  Pascal Terjan (Professional) pterjan.at.mandriva.com #2488 signs
     2 	29499F61  stats  Sam Hocevar sam.at.zoy.org #77 signs
     3 	2B46A27C  stats  Branden Robinson branden.at.deadbeast.net #478

The longest being:

     0 	A3AD7A2A  stats  David MENTRE dmentre.at.linux-france.org #13401 signs
     1 	6DEAAEFC  stats  Jean-Marie Favreau (jm) jeanmarie.favreau.at.free.fr #16593 signs
     2 	85C79389  stats  Damien Raude-Morvan (Mail) drazzib.at.drazzib.com #24021 signs
     3 	C3567689  stats  Emmanuel Berre manu.at.ouvaton.org #11053 signs
     4 	23706F87  stats  Pierre Machard pmachard.at.debian.org #1075 signs
     5 	3FCC2A90  stats  Amaya Rodrigo Sastre amaya.at.debian.org #55 signs
     6 	2B46A27C  stats  Branden Robinson branden.at.deadbeast.net #478
You can see the Web of Trust at work. Quite interesting isn't it?

The author of this script, Henk P. Penning, also provides an analysis of the strong set in the PGP web of trust which I haven't read yet.

2005-12-27T15:38:14Z [] permanent link

An analysis of Debian Sarge source code in main

I read in latest Debian Weekly News an analysis of Debian Source code made by Spanish students. In fact, they just ran the SLOCCount utility of David A. Wheeler. But I find always interesting to have up-to-date figures: it gives some hints on the accomplishment made by the free software community.

You'll find the report at: http://www.upgrade-cepis.org/issues/2005/3/up6-3Amor.pdf

About 60% of software in Sarge are written in C, 17% in C++. OCaml is not counted explicitely, but ML is at 0.31%. :(

The biggest programs are:

  1. OpenOffice.org (1.1.3): more than 5 millions of lines;
  2. Linux kernel (2.6.8): 4M;
  3. NVU: ~2.5M;
  4. Mozilla (1.7.7): ~2.5M;
  5. GCC (3.4.3): 2.4M;
  6. XFS-XTT (1.4.1): 2.3M;
  7. Xfree86 (4.3.0): 2.3M.

I'm quite surprised by the size of NVU, but I suppose this is because it is based on Mozilla: most lines of code are probably duplicates of Mozilla's code. XFS-XTT size is also surprising.

I like those figures because they give some hints on the complexity of modern softwares. We need to design programming languages and programming environments that help to write software of more than 10 million lines of code. None of the languages I've seen so far help much in managing so complex programs. Of course, counting lines of code is a rough metric and programs written in high level languages (O'Caml, Haskell) need less lines of code for the same functionnality.

2005-12-27T15:25:17Z [/free-software] permanent link

Slide on secure email

I've made a few slides to present GnuPG and Enigmail extension to Thundirbird and use them for securing email exchanges. Here there are in SXI format.

2005-12-27T15:12:24Z [] permanent link

Made with PyBlosxom