s c h e m a t i c s : c o o k b o o k

/ RegExpRecipes? / Cookbook.RegexMatchingLetters

This Web


WebHome 
WebChanges 
TOC (with recipes)
NewRecipe 
WebTopicList 
WebStatistics 

Other Webs


Chicken
Cookbook
Erlang
Know
Main
Plugins
Sandbox
Scm
TWiki  

Schematics


Schematics Home
Sourceforge Page
SchemeWiki.org
Original Cookbook
RSS

Scheme Links


Schemers.org
Scheme FAQ
R5RS
SRFIs
Scheme Cross Reference
PLT Scheme SISC
Scheme48 SCM
MIT Scheme scsh
JScheme Kawa
Chicken Guile
Bigloo Tiny
Gambit LispMe
GaucheChez

Lambda the Ultimate
TWiki.org

Matching Letters

Problem

You want to see if a value contains only alphabetic characters.

Solution

A first approximation can be achieved using the standard character classes:

> (regexp-match "^[A-Za-z]+$" "PLTScheme")
("PLTScheme")
> (regexp-match "^[A-Za-z]+$" "123-Have Fun")
#f
> (if (regexp-match "^[A-Za-z]+$" "Soup")
      (printf "I like alphabet soup!")
      (printf "I don't like number soup."))
I like alphabet soup!
> 

Unfortunately, this does not properly handle foreign languages that might have additional characters outside the standard 26 english letters.

> (regexp-match "^[A-Za-z]+$" "Molière")
#f

If you need to match alternative alphabets (as defined by the user's locale settings), you should use SRFI 14 (Character-set Library) and the char-set:letter character set.

> (define exp
         (regexp (string-append "^["
                   (char-set->string 
                      (char-set-difference
                        char-set:letter char-set:punctuation))
                                "]+$")))
> (regexp-match exp "PLTScheme")
("PLTScheme")
> (regexp-match exp "Molière")
("Molière")

Discussion

SRFI 14 provides a large set of character set and character set manipulation tools. For the purposes of this recipe we can build a suitable regular expression by assembling it from the "beginning of line" character (^), the "match one-of" opening bracket ([), the set of letter characters from Unicode (char-set:letter) less the set of punctuation characters (char-set:punctuation =), which we convert to a string, the "match one-of" closing bracket (=]), the "match at least one" operator (+), and the "to end of line" character ($).

Here's how you'd use this in a program:

(require (lib "14.ss" "srfi"))

(define (test-alphabetic words)
  (letrec ((exp
            (regexp (string-append "^["
                                   (char-set->string 
                                    (char-set-difference
                                     char-set:letter char-set:punctuation))
                                       "]+$")))
           (checker (lambda (words alphawords)
                      (if (null? words) alphawords
                          (let ((word (car words))
                          (if (regexp-match exp (car words))
                              (checker (cdr words) (cons (car words) alphawords))
                              (checker (cdr words) alphawords))))))
    (checker words '())))2004


(define test-words
  (list "silly" "façade" "coöperate" "niño" "Renée" "Molière" 
        "hæmoglobin" "naïve" "tschüß" "random!stuff#here"))

> (test-alphabetic test-words)
("tschüß" "naïve" "hæmoglobin" "Molière" "Renée" "niño" "coöperate" "façade" "silly")

References

Your system's locale (3) manpage Mastering Regular Expressions


Comments about this recipe

Even though we are able to handle most Latin-1 character sets, Scheme is really not fully Unicode compliant. This will be addressed for PLT Scheme in the soon-to-be-released update.

Contributors

-- BrentAFulgham - 18 May 2004

CookbookForm
TopicType: Recipe
ParentTopic: PatternMatching?
Other Parents:
Next Topic: PattMatchMatchingWords?

 
 
Copyright © 2004 by the contributing authors. All material on the Schematics Cookbook web site is the property of the contributing authors.
The copyright for certain compilations of material taken from this website is held by the SchematicsEditorsGroup - see ContributorAgreement & LGPL.
Other than such compilations, this material can be redistributed and/or modified under the terms of the GNU Lesser General Public License (LGPL), version 2.1, as published by the Free Software Foundation.
Ideas, requests, problems regarding Schematics Cookbook? Send feedback.
/ You are Main.guest