s c h e m a t i c s : c o o k b o o k

/ Cookbook.UpdatableMd5

This Web


WebHome 
WebChanges 
TOC (with recipes)
NewRecipe 
WebTopicList 
WebStatistics 

Other Webs


Chicken
Cookbook
Erlang
Know
Main
Plugins
Sandbox
Scm
TWiki  

Schematics


Schematics Home
Sourceforge Page
SchemeWiki.org
Original Cookbook
RSS

Scheme Links


Schemers.org
Scheme FAQ
R5RS
SRFIs
Scheme Cross Reference
PLT Scheme SISC
Scheme48 SCM
MIT Scheme scsh
JScheme Kawa
Chicken Guile
Bigloo Tiny
Gambit LispMe
GaucheChez

Lambda the Ultimate
TWiki.org

Applying a md5 checksum through multiple updates

Problem

We may want to apply an md5 checksum on a bunch of strings in a file, such as the textual content of an xml document. We may not have content as a single string, but as many short substrings.

The md5 library from mzlib only appears to support input from a single byte string or a port, and makes it seem that using it on mulitple strings requires concatenation or an scratch file to treat as a port.

Solution

The following code puts a small wrapper to allow md5 to eat multiple bites of bytes.

(module updatable-md5 mzscheme
  
  ;; updatable md5: a wrapper around (lib "md5.ss") to allow multiple input sources.
  
  ;; The standard library interface from mzlib's md5 module requires a single
  ;; input port.  This library puts a wrapper around (lib "md5.ss") to take in
  ;; multiple input ports.  When we're satisifed, we can then call digest to get
  ;; the digest of the total input.

  ;; I really wanted this so I could more accurately run the tree-folding examples
  ;; from Oleg Kiselyov's "A Better XML Parser through Functional Programming".
  ;; (http://okmij.org/ftp/Scheme/xml.html)
  
  (require (lib "port.ss")
           (lib "md5.ss")
           (lib "contract.ss"))
  

  ;; An md5-state encapsulates what we need to talk to the md5 library.  We
  ;; treat this as an opaque structure to the outside world.
  (define-struct md5-state (op ch result))
  
  
  ;; make-md5: -> md5-state
  ;; Makes a fresh md5-state.
  (provide make-md5)
  (define (make-md5)
    (define-values (ip op) (make-pipe ip op))
    (define ch (make-channel))
    (define state (make-md5-state op ch #f))
    (thread (lambda () (channel-put ch (md5 ip))))
    state)
  
  ;; update/port: md5-state input-port -> void
  ;; Feed in more data from an input port to be digested into our md5 state.
  (provide/contract (update/port (md5-state? input-port? . -> . any)))
  (define (update/port state ip)
    (check-not-digested! 'update/port state)
    (copy-port ip (md5-state-op state))
    (void))
  
  
  ;; update/bytes: md5-state bytes -> void
  ;; Feed in more data from bytes to be digested into our md5 state.
  (provide/contract (update/bytes (md5-state? bytes? . -> . any)))
  (define (update/bytes state bytes)
    (check-not-digested! 'update/bytes state)
    (write-bytes bytes (md5-state-op state))
    (void))


  (define (digested? state)
    (and (md5-state-result state) #t))
  
  ;; check-not-digested!: symbol md5-state -> void
  ;; raise error if digest has been called on state.
  (define (check-not-digested! who state)
    (when (digested? state)
      (error who "Already digested; no further updates allowed.")))

  
  
  ;; digest: md5-state -> bytes
  (provide/contract (digest (md5-state? . -> . bytes?)))
  (define (digest state)
    (when (not (digested? state))
      (close-output-port (md5-state-op state))
      (set-md5-state-result! state (channel-get (md5-state-ch state))))
    (md5-state-result state)))

Example usage:

> (define m (make-md5))
> (update/bytes m #"Nobody inspects")
> (update/port m (open-input-bytes #" the "))
> (update/port m (open-input-bytes #"spammish repetition"))
> (digest m)
#"bb649c83dd1ea5c9d9dec9a18df0ffe9"

Discussion


Comments about this recipe

This solution is quite clever.

Recently (today actually) a message digest package supporting incremental generation was added to PLaneT. See Digest package at PLaneT.

-- JensAxelSoegaard - 05 Feb 2007

Contributors

-- DannyYoo - 28 Jun 2006

CookbookForm
TopicType: Recipe
ParentTopic: StringChapter
TopicOrder: 999

 
 
Copyright © 2004 by the contributing authors. All material on the Schematics Cookbook web site is the property of the contributing authors.
The copyright for certain compilations of material taken from this website is held by the SchematicsEditorsGroup - see ContributorAgreement & LGPL.
Other than such compilations, this material can be redistributed and/or modified under the terms of the GNU Lesser General Public License (LGPL), version 2.1, as published by the Free Software Foundation.
Ideas, requests, problems regarding Schematics Cookbook? Send feedback.
/ You are Main.guest