TWiki . Cookbook . UpdatableMd5

Applying a md5 checksum through multiple updates

Problem

We may want to apply an md5 checksum on a bunch of strings in a file, such as the textual content of an xml document. We may not have content as a single string, but as many short substrings.

The md5 library from mzlib only appears to support input from a single byte string or a port, and makes it seem that using it on mulitple strings requires concatenation or an scratch file to treat as a port.

Solution

The following code puts a small wrapper to allow md5 to eat multiple bites of bytes.

(module updatable-md5 mzscheme
  
  ;; updatable md5: a wrapper around (lib "md5.ss") to allow multiple input sources.
  
  ;; The standard library interface from mzlib's md5 module requires a single
  ;; input port.  This library puts a wrapper around (lib "md5.ss") to take in
  ;; multiple input ports.  When we're satisifed, we can then call digest to get
  ;; the digest of the total input.

  ;; I really wanted this so I could more accurately run the tree-folding examples
  ;; from Oleg Kiselyov's "A Better XML Parser through Functional Programming".
  ;; (http://okmij.org/ftp/Scheme/xml.html)
  
  (require (lib "port.ss")
           (lib "md5.ss")
           (lib "contract.ss"))
  

  ;; An md5-state encapsulates what we need to talk to the md5 library.  We
  ;; treat this as an opaque structure to the outside world.
  (define-struct md5-state (op ch result))
  
  
  ;; make-md5: -> md5-state
  ;; Makes a fresh md5-state.
  (provide make-md5)
  (define (make-md5)
    (define-values (ip op) (make-pipe ip op))
    (define ch (make-channel))
    (define state (make-md5-state op ch #f))
    (thread (lambda () (channel-put ch (md5 ip))))
    state)
  
  ;; update/port: md5-state input-port -> void
  ;; Feed in more data from an input port to be digested into our md5 state.
  (provide/contract (update/port (md5-state? input-port? . -> . any)))
  (define (update/port state ip)
    (check-not-digested! 'update/port state)
    (copy-port ip (md5-state-op state))
    (void))
  
  
  ;; update/bytes: md5-state bytes -> void
  ;; Feed in more data from bytes to be digested into our md5 state.
  (provide/contract (update/bytes (md5-state? bytes? . -> . any)))
  (define (update/bytes state bytes)
    (check-not-digested! 'update/bytes state)
    (write-bytes bytes (md5-state-op state))
    (void))


  (define (digested? state)
    (and (md5-state-result state) #t))
  
  ;; check-not-digested!: symbol md5-state -> void
  ;; raise error if digest has been called on state.
  (define (check-not-digested! who state)
    (when (digested? state)
      (error who "Already digested; no further updates allowed.")))

  
  
  ;; digest: md5-state -> bytes
  (provide/contract (digest (md5-state? . -> . bytes?)))
  (define (digest state)
    (when (not (digested? state))
      (close-output-port (md5-state-op state))
      (set-md5-state-result! state (channel-get (md5-state-ch state))))
    (md5-state-result state)))

Example usage:

> (define m (make-md5))
> (update/bytes m #"Nobody inspects")
> (update/port m (open-input-bytes #" the "))
> (update/port m (open-input-bytes #"spammish repetition"))
> (digest m)
#"bb649c83dd1ea5c9d9dec9a18df0ffe9"

Discussion


Comments about this recipe

This solution is quite clever.

Recently (today actually) a message digest package supporting incremental generation was added to PLaneT. See Digest package at PLaneT.

-- JensAxelSoegaard - 05 Feb 2007

Contributors

-- DannyYoo - 28 Jun 2006

CookbookForm
TopicType: Recipe
ParentTopic: StringChapter
TopicOrder: 999

----- Revision r1.2 - 05 Feb 2007 - 19:32 GMT - JensAxelSoegaard
Copyright © 1999-2003 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.