e r l a n g : c o o k b o o k

/ Erlang.StringBasics

This Web


WebHome 
WebChanges 
TOC
NewRecipe 
WebTopicList 
WebStatistics 

All Webs


Chicken
Cookbook
Erlang
Know
Main
Plugins
Sandbox
Scm
TWiki  

Erlang Links


Erlang.org
Erlang Wiki
ErlMan
Erlang Wiki
The Jungerl
Erlang-fr.org
Joe Armstrong
Lambda the Ultimate

Erlang Web Ring


[Prev]: Joe Armstrong's Page
[Next]: Joe Armstrong's Page

String Basics

Few would recommend Erlang as a high-performance string manipulating language. Strings in erlang are simply lists of characters, with a bit of syntactic sugar to allow you to easily construct such lists as text enclosed within quotation marks. In fact, to quote Sendmail's excellent case studyin implementing their Sendmail load balancing "Client Daemon" in Erlang:

But Erlang's treatment of strings as lists of bytes is as elegant as it is impractical. The factor-of-eight storage expansion of text, as well as the copying that occurs during message-passing, cripples Erlang for all but the most performance-insensitive text-processing applications.

To understand why Erlang string handling is less efficient than a language like Perl, you need to know that each character uses 8 bytes of memory. That's right -- 8 bytes, not 8 bits! Erlang stores each character as a 32-bit integer, with a 32-bit pointer for the next item in the list (remember, strings are lists of characters.)

This was not done out of wanton wastefullness; using such large values means that Erlang can easily handle anything the UNICODE people throw at it, and the decision to represent strings as lists of characters means that a host of built-in Erlang primitives work on strings without any work on our parts. On the down side, this also means strings use a lot of memory, and that access to the nth element takes O(n) time (rather than the O(1) time we would get with strings represented as arrays of characters.)

In Erlang, literal strings are enclosed in quotes and may contain linebreaks inline, i.e.

1>A_string = "A literal string with a linebreak (\"\\n\")
in it".
2>A_string.
"A literal string with a linebreak ("\n")\nin it"
3>io:fwrite(A_string).
A literal string with a linebreak ("\n")
in it
Because Erlang strings are lists, you can use any of the functions in the lists library to manipulate the string:
1> lists:sort("Hello").
"Hello"
2> lists:sort("ZYX").
"XYZ"
3> lists:subtract("123212", "212").
"312".
4> lists:suffix(".txt", "test.txt").
true
5> lists:suffix(".txt", "test.html").
false
If you want to find out what character is at a specific position in a string, use lists:nth(N, List), where N is a 1-based index in the string:
6> lists:nth(A_string, 1)
65
length will tell you how long a string (or any list) is.
7> length(A_string).
46
Due to Erlang's single-assignment nature, it does not provides mutable strings. If you want to modify a string, you must build a new string out of the revised elements. You can make a copy of a string using duplicate. However, this is of limited utility, since you can't modify it anyway:
8> C_String = hd(lists:duplicate(1, A_string)).
"A literal string with a linebreak \\n\nin it"
You can create new strings with a combination of lists:duplicate and lists:merge:
9> F_string = lists:merge(lists:duplicate(5, "*")).
"*****"
You can also use lists:append to join strings:
10> G_string = lists:append(["Hello, ", "Erlang", "!"]).
"Hello, Erlang!"

Discussion

We should also add information about processing string data using the Erlang binary data type, which is often much more efficient since the data is stored as sequences of bytes.

Contributors

Based on work by GordonWeakliem and NoelWelsh. Explanations about string handling inefficiencies based on the Erlang FAQ.

-- BrentAFulgham - 19 Aug 2004

CookbookForm
TopicType: Recipe
ParentTopic: StringRecipes
TopicOrder: 010

 
 
Copyright © 2004 by the contributing authors. All material on the Erlang Cookbook web site is the property of the contributing authors.
This material can be redistributed and/or modified under the terms of the GNU Lesser General Public License (LGPL), version 2.1, as published by the Free Software Foundation.
Ideas, requests, problems regarding Schematics Cookbook? Send feedback.
/ You are Main.guest