e r l a n g : c o o k b o o k

/ Erlang.RegexMatchingWords

This Web


WebHome 
WebChanges 
TOC
NewRecipe 
WebTopicList 
WebStatistics 

All Webs


Chicken
Cookbook
Erlang
Know
Main
Plugins
Sandbox
Scm
TWiki  

Erlang Links


Erlang.org
Erlang Wiki
ErlMan
Erlang Wiki
The Jungerl
Erlang-fr.org
Joe Armstrong
Lambda the Ultimate

Erlang Web Ring


[Prev]: Joe Armstrong's Page
[Next]: Joe Armstrong's Page

Matching Words

Problem

You want to select words from a string.

Solution

Determine the defining features of a word for your specific application, then write a regular expression that models this idea.

Words_1 = "[^ ]+".        % as many non-whitespace bytes as possible
Words_2 = "[A-Za-z'-]+".  % as many letters, apostrophes, and hyphens

1> regexp:first_match("'alpha-beta gamma", Words_1).
{match,1,11}
2> string:substr("'alpha-beta gamma",1,11).
"'alpha-beta"
3> regexp:first_match("'alpha-beta&or gamma", Words_2).
{match,1,11}
4> string:substr("'alpha-beta&or gamma",1,11).   
"'alpha-beta"

Discussion

Erlang does not have a built-in definition for words in strings. On the one hand, this is inconvenient since you have to define your own meaning of "word". On the other hand, this is the correct behavior since the concept of words varies significantly between applications, locales, encodings, and input source.

The meaning of "word" in a particular application's context can vary significantly. Languages usually support pluralization of singular nouns, attach posessive modifiers, allow hyphenated word combinations, and so forth. The regular expression used must reflect the expected range of words to be encountered.

Unfortunately, there is no existing Perl-compatible regular expression module for use in Erlang.

References


Comments about this recipe

Contributors

-- BrentAFulgham - 30 Aug 2004

CookbookForm
TopicType: Recipe
ParentTopic: RegexRecipes
TopicOrder: 020

 
 
Copyright © 2004 by the contributing authors. All material on the Erlang Cookbook web site is the property of the contributing authors.
This material can be redistributed and/or modified under the terms of the GNU Lesser General Public License (LGPL), version 2.1, as published by the Free Software Foundation.
Ideas, requests, problems regarding Schematics Cookbook? Send feedback.
/ You are Main.guest