s c h e m a t i c s : c o o k b o o k

Cookbook.XmlChapter
  Difference Topic XmlChapter (r1.12 - 11 Jun 2004 - NoelWelsh)
Added:
>
>

%META:TOPICMOVED{by="NoelWelsh" date="1086949700" from="Cookbook.XmlRecipes" to="Cookbook.XmlChapter"}%

  Difference Topic XmlChapter (r1.11 - 01 Jun 2004 - AntonVanStraaten)
Changed:
<
<

Introduction

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

Syntax rules in XML

An XML document is text, usually a particular encoding of Unicode such as UTF-8 or UTF-16, although other encodings may be used.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", it must conform at the very least to the following:

  • It must have one and only one root element.
  • Non-empty elements must be delimited by a start-tag and an end-tag. Empty elements may be marked with an empty-element tag.
  • All attribute values must be quoted (either single (') or double (") quotes, but a single quote closes a single quote and a double quote a double quote. The other pair can then be used inside values.)
  • Tags may be nested but may not overlap, that is each non-root element must be completely contained in another element.

Element names in XML are case-sensitive: for example and are a well-formed matching pair whereas and are not.

Also, again unlike HTML, clever choice of XML element names allows the meaning of the data to be retained as part of the markup. This makes it more easily interpreted by software programs.

As a concrete example, a simple recipe expressed in an XML representation might be:

<?xml version="1.0" standalone="yes"?>
<rhythmdb version="1.0">
  <entry type="song">
    <title>Never Let Me Down Again</title>
    <genre>Pop/Rock</genre>
    <artist>Depeche Mode</artist>
    <album>Music For The Masses</album>
    <track-number>1</track-number>
    <duration>287</duration>
    <file-size>6533368</file-size>
    <location>file:///home/hector/ogg/depeche_mode/music_for_the_masses/never_let_me_down_again.ogg</location>
    <mtime>1079831032</mtime>
    <play-count>1</play-count>
    <last-played>1083552958</last-played>
    <mimetype></mimetype>
  </entry>
</rhythmdb>

Identifying information accurately enables programs to manipulate it easily: in this example, it is now easy to convert the quantities to other measuring systems, or to print the ingredients as icons for those with low reading skills (or different native language), or to refer to the individual ingredients or steps from elsewhere (another recipe, for example).

An XML document that meets certain other criteria in addition to being well-formed (such as complying with an associated DTD) is said to be "valid".

XML schema languages

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML schema languages allow software designers to describe the structure of particular XML-based markup languages in a formal way. Such a description is called a schema. Well-tested tools exist to validate XML files against a schema to automatically verify whether the document conforms to the described structure. Other usages of the schema exist; XML editors for instance can use schemas to support the editing process.

The oldest XML schema format is the DTD, which is inherited from SGML. While DTD support is obiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:

  • No support for newer features of XML, most importantly namespaces.
  • Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.
  • Custom non-XML syntax to describe the schema, inherited from SGML.

A newer XML schema language, described by the W3C as the successor of DTDs, is simply called XML Schema, also referred to as XML Schema Definition (XSD). XSD are far more powerful than DTDs in describing XML languages. Additionally XSD uses an XML based format, which makes it possible to use the XML toolset to help process XML schema. It also becomes possible to write a schema for the schema language itself. Criticisms of XSD are:

  • Standard is very large, which makes it difficult to understand and implement.
  • XML-based syntax leads to verbosity in schema description, which makes XSDs harder to read and write.

An alternative XML schema language recently gaining in popularity is Relax NG. It is standardized by OASIS. Relax NG comes in two formats, an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability, but since there is a well-defined way to translate compact syntax to the XML syntax and back again the advantage of using standard XML tools is not lost. Relax NG has a more compact definition which makes it easier to implement than XSD.

Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. Relax NG intentionally does not provide these facilities.

XML Extensions

  • XPath It is possible to refer to individual components of an XML document using XPath. This allows stylesheets in (for example) XSL and XSLT to dynamically "cherry-pick" pieces of a document in any sequence needed in order to compose the required output.
  • XQuery is to XML what SQL is to relational databases.
  • XML namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.

Processing XML files

The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

Versions of XML

The current version of XML is 1.1 (as of 2004-05-04). The first version XML 1.0 currently exists in its third revision. XML 1.0 and XML 1.1 differ in the requirements of characters used for element names, attribute names etc.: XML 1.0 only allows characters which are valid Unicode 2.0, which includes most world scripts, but excludes scripts which only entered in a later Unicode version, such as Mongolian, Cambodian, Amharic, Burmese, etc.. XML 1.1 only disallows certain control characters, which means that any other character can be used, even if the Unicode standard grows exponentially.

It should be noted here that the restriction present in XML 1.0 only applies to element/attribute names: both XML 1.0 and XML 1.1 allow for the use of full Unicode in the content itself. Thus XML 1.1 is only needed if in addition to using a script added after Unicode 2.0 you also wish to write the elements in that script.

Other minor changes between XML 1.0 and XML 1.1 are that control characters are now allowed to be included but only when escaped, and two special 'form-feed' characters are included, which must be treated as whitespace.

All XML 1.0 documents will be valid XML 1.1 documents, with one exception: XML documents declaring themselves as being ISO-8859-1 encoded which are actually CP1252 encoded may now be invalid.

See Also

Recipes

>
>

Introduction

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

Syntax rules in XML

An XML document is text, usually a particular encoding of Unicode such as UTF-8 or UTF-16, although other encodings may be used.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", it must conform at the very least to the following:

  • It must have one and only one root element.
  • Non-empty elements must be delimited by a start-tag and an end-tag. Empty elements may be marked with an empty-element tag.
  • All attribute values must be quoted (either single (') or double (") quotes, but a single quote closes a single quote and a double quote a double quote. The other pair can then be used inside values.)
  • Tags may be nested but may not overlap, that is each non-root element must be completely contained in another element.

Element names in XML are case-sensitive: for example and are a well-formed matching pair whereas and are not.

Also, again unlike HTML, clever choice of XML element names allows the meaning of the data to be retained as part of the markup. This makes it more easily interpreted by software programs.

As a concrete example, a simple recipe expressed in an XML representation might be:

<?xml version="1.0" standalone="yes"?>
<rhythmdb version="1.0">
  <entry type="song">
    <title>Never Let Me Down Again</title>
    <genre>Pop/Rock</genre>
    <artist>Depeche Mode</artist>
    <album>Music For The Masses</album>
    <track-number>1</track-number>
    <duration>287</duration>
    <file-size>6533368</file-size>
    <location>file:///home/hector/ogg/depeche_mode/music_for_the_masses/never_let_me_down_again.ogg</location>
    <mtime>1079831032</mtime>
    <play-count>1</play-count>
    <last-played>1083552958</last-played>
    <mimetype></mimetype>
  </entry>
</rhythmdb>

Identifying information accurately enables programs to manipulate it easily: in this example, it is now easy to convert the quantities to other measuring systems, or to print the ingredients as icons for those with low reading skills (or different native language), or to refer to the individual ingredients or steps from elsewhere (another recipe, for example).

An XML document that meets certain other criteria in addition to being well-formed (such as complying with an associated DTD) is said to be "valid".

XML schema languages

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML schema languages allow software designers to describe the structure of particular XML-based markup languages in a formal way. Such a description is called a schema. Well-tested tools exist to validate XML files against a schema to automatically verify whether the document conforms to the described structure. Other usages of the schema exist; XML editors for instance can use schemas to support the editing process.

The oldest XML schema format is the DTD, which is inherited from SGML. While DTD support is obiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:

  • No support for newer features of XML, most importantly namespaces.
  • Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.
  • Custom non-XML syntax to describe the schema, inherited from SGML.

A newer XML schema language, described by the W3C as the successor of DTDs, is simply called XML Schema, also referred to as XML Schema Definition (XSD). XSD are far more powerful than DTDs in describing XML languages. Additionally XSD uses an XML based format, which makes it possible to use the XML toolset to help process XML schema. It also becomes possible to write a schema for the schema language itself. Criticisms of XSD are:

  • Standard is very large, which makes it difficult to understand and implement.
  • XML-based syntax leads to verbosity in schema description, which makes XSDs harder to read and write.

An alternative XML schema language recently gaining in popularity is Relax NG. It is standardized by OASIS. Relax NG comes in two formats, an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability, but since there is a well-defined way to translate compact syntax to the XML syntax and back again the advantage of using standard XML tools is not lost. Relax NG has a more compact definition which makes it easier to implement than XSD.

Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. Relax NG intentionally does not provide these facilities.

XML Extensions

  • XPath It is possible to refer to individual components of an XML document using XPath. This allows stylesheets in (for example) XSL and XSLT to dynamically "cherry-pick" pieces of a document in any sequence needed in order to compose the required output.
  • XQuery is to XML what SQL is to relational databases.
  • XML namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.

Processing XML files

The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

Versions of XML

The current version of XML is 1.1 (as of 2004-05-04). The first version XML 1.0 currently exists in its third revision. XML 1.0 and XML 1.1 differ in the requirements of characters used for element names, attribute names etc.: XML 1.0 only allows characters which are valid Unicode 2.0, which includes most world scripts, but excludes scripts which only entered in a later Unicode version, such as Mongolian, Cambodian, Amharic, Burmese, etc.. XML 1.1 only disallows certain control characters, which means that any other character can be used, even if the Unicode standard grows exponentially.

It should be noted here that the restriction present in XML 1.0 only applies to element/attribute names: both XML 1.0 and XML 1.1 allow for the use of full Unicode in the content itself. Thus XML 1.1 is only needed if in addition to using a script added after Unicode 2.0 you also wish to write the elements in that script.

Other minor changes between XML 1.0 and XML 1.1 are that control characters are now allowed to be included but only when escaped, and two special 'form-feed' characters are included, which must be treated as whitespace.

All XML 1.0 documents will be valid XML 1.1 documents, with one exception: XML documents declaring themselves as being ISO-8859-1 encoded which are actually CP1252 encoded may now be invalid.

See Also

Recipes

Added:
>
>


Changed:
<
<

Comments here

Contributors

-- HectorEGomezMorales - 19 May 2004

>
>

(edit comments)



  Difference Topic XmlChapter (r1.10 - 31 May 2004 - AntonVanStraaten)
Changed:
<
<

Comments here

>
>

Comments here

Changed:
<
<

Contributors

>
>

Contributors

Changed:
<
<

%META:FIELD{name="NextTopic" title="Next Topic" value=""}%

>
>

%META:FIELD{name="TopicOrder" title="TopicOrder" value="16"}%

  Difference Topic XmlChapter (r1.9 - 20 May 2004 - HectorEGomezMorales)
Deleted:
<
<

See Also

  Difference Topic XmlChapter (r1.8 - 19 May 2004 - HectorEGomezMorales)
Added:
>
>


Comments here

Deleted:
<
<

Comments here

  Difference Topic XmlChapter (r1.7 - 19 May 2004 - HectorEGomezMorales)
Changed:
<
<

>
>

Introduction

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

Syntax rules in XML

An XML document is text, usually a particular encoding of Unicode such as UTF-8 or UTF-16, although other encodings may be used.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", it must conform at the very least to the following:

  • It must have one and only one root element.
  • Non-empty elements must be delimited by a start-tag and an end-tag. Empty elements may be marked with an empty-element tag.
  • All attribute values must be quoted (either single (') or double (") quotes, but a single quote closes a single quote and a double quote a double quote. The other pair can then be used inside values.)
  • Tags may be nested but may not overlap, that is each non-root element must be completely contained in another element.

Element names in XML are case-sensitive: for example and are a well-formed matching pair whereas and are not.

Also, again unlike HTML, clever choice of XML element names allows the meaning of the data to be retained as part of the markup. This makes it more easily interpreted by software programs.

As a concrete example, a simple recipe expressed in an XML representation might be:

<?xml version="1.0" standalone="yes"?>
<rhythmdb version="1.0">
  <entry type="song">
    <title>Never Let Me Down Again</title>
    <genre>Pop/Rock</genre>
    <artist>Depeche Mode</artist>
    <album>Music For The Masses</album>
    <track-number>1</track-number>
    <duration>287</duration>
    <file-size>6533368</file-size>
    <location>file:///home/hector/ogg/depeche_mode/music_for_the_masses/never_let_me_down_again.ogg</location>
    <mtime>1079831032</mtime>
    <play-count>1</play-count>
    <last-played>1083552958</last-played>
    <mimetype></mimetype>
  </entry>
</rhythmdb>

Identifying information accurately enables programs to manipulate it easily: in this example, it is now easy to convert the quantities to other measuring systems, or to print the ingredients as icons for those with low reading skills (or different native language), or to refer to the individual ingredients or steps from elsewhere (another recipe, for example).

An XML document that meets certain other criteria in addition to being well-formed (such as complying with an associated DTD) is said to be "valid".

XML schema languages

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML schema languages allow software designers to describe the structure of particular XML-based markup languages in a formal way. Such a description is called a schema. Well-tested tools exist to validate XML files against a schema to automatically verify whether the document conforms to the described structure. Other usages of the schema exist; XML editors for instance can use schemas to support the editing process.

The oldest XML schema format is the DTD, which is inherited from SGML. While DTD support is obiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:

  • No support for newer features of XML, most importantly namespaces.
  • Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.
  • Custom non-XML syntax to describe the schema, inherited from SGML.

A newer XML schema language, described by the W3C as the successor of DTDs, is simply called XML Schema, also referred to as XML Schema Definition (XSD). XSD are far more powerful than DTDs in describing XML languages. Additionally XSD uses an XML based format, which makes it possible to use the XML toolset to help process XML schema. It also becomes possible to write a schema for the schema language itself. Criticisms of XSD are:

  • Standard is very large, which makes it difficult to understand and implement.
  • XML-based syntax leads to verbosity in schema description, which makes XSDs harder to read and write.

An alternative XML schema language recently gaining in popularity is Relax NG. It is standardized by OASIS. Relax NG comes in two formats, an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability, but since there is a well-defined way to translate compact syntax to the XML syntax and back again the advantage of using standard XML tools is not lost. Relax NG has a more compact definition which makes it easier to implement than XSD.

Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. Relax NG intentionally does not provide these facilities.

XML Extensions

  • XPath It is possible to refer to individual components of an XML document using XPath. This allows stylesheets in (for example) XSL and XSLT to dynamically "cherry-pick" pieces of a document in any sequence needed in order to compose the required output.
  • XQuery is to XML what SQL is to relational databases.
  • XML namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.

Processing XML files

The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

Versions of XML

The current version of XML is 1.1 (as of 2004-05-04). The first version XML 1.0 currently exists in its third revision. XML 1.0 and XML 1.1 differ in the requirements of characters used for element names, attribute names etc.: XML 1.0 only allows characters which are valid Unicode 2.0, which includes most world scripts, but excludes scripts which only entered in a later Unicode version, such as Mongolian, Cambodian, Amharic, Burmese, etc.. XML 1.1 only disallows certain control characters, which means that any other character can be used, even if the Unicode standard grows exponentially.

It should be noted here that the restriction present in XML 1.0 only applies to element/attribute names: both XML 1.0 and XML 1.1 allow for the use of full Unicode in the content itself. Thus XML 1.1 is only needed if in addition to using a script added after Unicode 2.0 you also wish to write the elements in that script.

Other minor changes between XML 1.0 and XML 1.1 are that control characters are now allowed to be included but only when escaped, and two special 'form-feed' characters are included, which must be treated as whitespace.

All XML 1.0 documents will be valid XML 1.1 documents, with one exception: XML documents declaring themselves as being ISO-8859-1 encoded which are actually CP1252 encoded may now be invalid.

See Also

Changed:
<
<

-- HectorEGomezMorales - 03 May 2004

>
>

-- HectorEGomezMorales - 19 May 2004

  Difference Topic XmlChapter (r1.6 - 19 May 2004 - AntonVanStraaten)
Changed:
<
<

Introduction

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

Syntax rules in XML

An XML document is text, usually a particular encoding of Unicode such as UTF-8 or UTF-16, although other encodings may be used.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", it must conform at the very least to the following:

  • It must have one and only one root element.
  • Non-empty elements must be delimited by a start-tag and an end-tag. Empty elements may be marked with an empty-element tag.
  • All attribute values must be quoted (either single (') or double (") quotes, but a single quote closes a single quote and a double quote a double quote. The other pair can then be used inside values.)
  • Tags may be nested but may not overlap, that is each non-root element must be completely contained in another element.

Element names in XML are case-sensitive: for example and are a well-formed matching pair whereas and are not.

Also, again unlike HTML, clever choice of XML element names allows the meaning of the data to be retained as part of the markup. This makes it more easily interpreted by software programs.

As a concrete example, a simple recipe expressed in an XML representation might be:

<?xml version="1.0" standalone="yes"?>
<rhythmdb version="1.0">
  <entry type="song">
    <title>Never Let Me Down Again</title>
    <genre>Pop/Rock</genre>
    <artist>Depeche Mode</artist>
    <album>Music For The Masses</album>
    <track-number>1</track-number>
    <duration>287</duration>
    <file-size>6533368</file-size>
    <location>file:///home/hector/ogg/depeche_mode/music_for_the_masses/never_let_me_down_again.ogg</location>
    <mtime>1079831032</mtime>
    <play-count>1</play-count>
    <last-played>1083552958</last-played>
    <mimetype></mimetype>
  </entry>
</rhythmdb>

Identifying information accurately enables programs to manipulate it easily: in this example, it is now easy to convert the quantities to other measuring systems, or to print the ingredients as icons for those with low reading skills (or different native language), or to refer to the individual ingredients or steps from elsewhere (another recipe, for example).

An XML document that meets certain other criteria in addition to being well-formed (such as complying with an associated DTD) is said to be "valid".

XML schema languages

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML schema languages allow software designers to describe the structure of particular XML-based markup languages in a formal way. Such a description is called a schema. Well-tested tools exist to validate XML files against a schema to automatically verify whether the document conforms to the described structure. Other usages of the schema exist; XML editors for instance can use schemas to support the editing process.

The oldest XML schema format is the DTD, which is inherited from SGML. While DTD support is obiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:

* No support for newer features of XML, most importantly namespaces.

* Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.

* Custom non-XML syntax to describe the schema, inherited from SGML.

A newer XML schema language, described by the W3C as the successor of DTDs, is simply called XML Schema, also referred to as XML Schema Definition (XSD). XSD are far more powerful than DTDs in describing XML languages. Additionally XSD uses an XML based format, which makes it possible to use the XML toolset to help process XML schema. It also becomes possible to write a schema for the schema language itself. Criticisms of XSD are:

* Standard is very large, which makes it difficult to understand and implement.

* XML-based syntax leads to verbosity in schema description, which makes XSDs harder to read and write.

An alternative XML schema language recently gaining in popularity is Relax NG. It is standardized by OASIS. Relax NG comes in two formats, an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability, but since there is a well-defined way to translate compact syntax to the XML syntax and back again the advantage of using standard XML tools is not lost. Relax NG has a more compact definition which makes it easier to implement than XSD.

Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. Relax NG intentionally does not provide these facilities.

XML Extensions

  • XPath It is possible to refer to individual components of an XML document using XPath. This allows stylesheets in (for example) XSL and XSLT to dynamically "cherry-pick" pieces of a document in any sequence needed in order to compose the required output.
  • XQuery is to XML what SQL is to relational databases.
  • XML namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.

Processing XML files

The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

Versions of XML

The current version of XML is 1.1 (as of 2004-05-04). The first version XML 1.0 currently exists in its third revision. XML 1.0 and XML 1.1 differ in the requirements of characters used for element names, attribute names etc.: XML 1.0 only allows characters which are valid Unicode 2.0, which includes most world scripts, but excludes scripts which only entered in a later Unicode version, such as Mongolian, Cambodian, Amharic, Burmese, etc.. XML 1.1 only disallows certain control characters, which means that any other character can be used, even if the Unicode standard grows exponentially.

It should be noted here that the restriction present in XML 1.0 only applies to element/attribute names: both XML 1.0 and XML 1.1 allow for the use of full Unicode in the content itself. Thus XML 1.1 is only needed if in addition to using a script added after Unicode 2.0 you also wish to write the elements in that script.

Other minor changes between XML 1.0 and XML 1.1 are that control characters are now allowed to be included but only when escaped, and two special 'form-feed' characters are included, which must be treated as whitespace.

All XML 1.0 documents will be valid XML 1.1 documents, with one exception: XML documents declaring themselves as being ISO-8859-1 encoded which are actually CP1252 encoded may now be invalid: this because CP1252 uses the control characters block of ISO-8859-1 for special glyphs like €, Œ, and ™. XML 1.0 documents which declare CP1252 encoding will remain valid.

There are also discussions on an XML 2.0, although it remains to be seen if such will ever come about. XML-SW (SW for skunk works), written by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of namespaces, XML Base and XML Information Set into the base standard.

See Also

>
>

Added:
>
>

See Also

Changed:
<
<

Comments here

>
>

Contributors

Changed:
<
<



>
>

Comments here

Deleted:
<
<

%META:FIELD{name="OtherParents" title="Other Parents" value=""}%

  Difference Topic XmlChapter (r1.5 - 15 May 2004 - AntonVanStraaten)
Added:
>
>



%META:FORM{name="CookbookForm"}% %META:FIELD{name="TopicType" title="TopicType" value="Chapter"}% %META:FIELD{name="ParentTopic" title="ParentTopic" value="TOC"}% %META:FIELD{name="OtherParents" title="Other Parents" value=""}% %META:FIELD{name="NextTopic" title="Next Topic" value=""}%

  Difference Topic XmlChapter (r1.4 - 11 May 2004 - HectorEGomezMorales)
Changed:
<
<

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

>
>

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

Changed:
<
<

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed" [1], it must conform (at the very least) to the following:

>
>

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", it must conform at the very least to the following:

Changed:
<
<

  • It must have one (and only one) root element.
>
>

  • It must have one and only one root element.
Added:
>
>

XML schema languages

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML schema languages allow software designers to describe the structure of particular XML-based markup languages in a formal way. Such a description is called a schema. Well-tested tools exist to validate XML files against a schema to automatically verify whether the document conforms to the described structure. Other usages of the schema exist; XML editors for instance can use schemas to support the editing process.

The oldest XML schema format is the DTD, which is inherited from SGML. While DTD support is obiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:

* No support for newer features of XML, most importantly namespaces.

* Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.

* Custom non-XML syntax to describe the schema, inherited from SGML.

A newer XML schema language, described by the W3C as the successor of DTDs, is simply called XML Schema, also referred to as XML Schema Definition (XSD). XSD are far more powerful than DTDs in describing XML languages. Additionally XSD uses an XML based format, which makes it possible to use the XML toolset to help process XML schema. It also becomes possible to write a schema for the schema language itself. Criticisms of XSD are:

* Standard is very large, which makes it difficult to understand and implement.

* XML-based syntax leads to verbosity in schema description, which makes XSDs harder to read and write.

An alternative XML schema language recently gaining in popularity is Relax NG. It is standardized by OASIS. Relax NG comes in two formats, an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability, but since there is a well-defined way to translate compact syntax to the XML syntax and back again the advantage of using standard XML tools is not lost. Relax NG has a more compact definition which makes it easier to implement than XSD.

Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. Relax NG intentionally does not provide these facilities.

Added:
>
>

See Also

Changed:
<
<

Comments

>
>

Comments here

  Difference Topic XmlChapter (r1.3 - 05 May 2004 - HectorEGomezMorales)
Changed:
<
<
>
>

Deleted:
<
<

TOC: No TOC in "Cookbook.XmlChapter"

Added:
>
>

Changed:
<
<

Basic bread Flour Yeast Warm Water Salt Mix all ingredients together, and knead thoroughly. Cover with a cloth, and leave for one hour in warm room. Knead again, place in a tin, and then bake in the oven.

>
>

Never Let Me Down Again Pop/Rock Depeche Mode Music For The Masses 1 287 6533368 file:///home/hector/ogg/depeche_mode/music_for_the_masses/never_let_me_down_again.ogg 1079831032 1 1083552958

Added:
>
>

Changed:
<
<

Parsing XML into Data Structures

Problem

You want a Scheme List that corresponds to the structure and content of an XML file. For example you have this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<Recipe name="bread" prep_time="5 mins" cook_time="3 hours" >
  <title>Basic bread</title>
  <ingredient amount="3" unit="cups" >Flour</ingredient>
  <ingredient amount="0.25" unit="ounce" >Yeast</ingredient>
  <ingredient amount="1.5" unit="cups" >Warm Water</ingredient>
  <ingredient amount="1" unit="teaspoon" >Salt</ingredient>
<Instructions>
  <step>Mix all ingredients together, and knead thoroughly.</step>
  <step>Cover with a cloth, and leave for one hour in warm room.</step>
  <step>Knead again, place in a tin, and then bake in the oven.</step>
</Instructions>
</Recipe>

We will use SSAX library to parse this XML file in to a SXML list. Were ssax:xml->sxml is a function that takes a stream from a XML file (that in this case is test.xml) and the XML namespace to be used (in this case is empty that maps to the default namespace)

(require (lib "ssax.ss" "ssax"))
(ssax:xml->sxml (open-input-file "test.xml") empty)

This outputs a SXML list structure:

(|*TOP*|
 (|*PI*| xml "version=\"1.0\" encoding=\"UTF-8\"")
 (|Recipe|
   (@ (prep_time "5 mins") (name "bread") (cook_time "3 hours"))
   (title "Basic bread")
   (ingredient (@ (unit "cups") (amount "3")) "Flour")
   (ingredient (@ (unit "ounce") (amount "0.25")) "Yeast")
   (ingredient (@ (unit "cups") (amount "1.5")) "Warm Water")
   (ingredient (@ (unit "teaspoon") (amount "1")) "Salt")
   (|Instructions|
     (step "Mix all ingredients together, and knead thoroughly.")
     (step "Cover with a cloth, and leave for one hour in warm room.")
     (step "Knead again, place in a tin, and then bake in the oven."))))

Parsing XML into a DOM Tree

Problem

You want to use the Document Object Model (DOM) to access and perhaps change the parse tree of an XML file.

Parsing XML into SAX Events

Problem

You want to receive Simple API for XML (SAX) events from an XML parser because event-based parsing is faster and uses less memory than parsers that build a DOM tree.

Making Simple Changes to Elements or Text

Problem

You want to filter some XML. For example, you want to make substitutions in the body of a document, or add a price to every book described in an XML document, or you want to change:

<book id="1"> to <book> <id>1</id>

Validating XML

Problem

You want to ensure that the XML you're processing conforms to a DTD or XML Schema.

Finding Elements and Text Within an XML Document

Problem

You want to get to a specific part of the XML; for example, the href attribute of an a tag whose contents are an img tag with alt text containing the word "monkey".

Processing XML Stylesheet Transformations

Problem

You have an XML stylesheet that you want to use to convert XML into something else. For example, you want to produce HTML from files of XML using the stylesheet.

Processing Files Larger Than Available Memory

Problem

You want to work with a large XML file, but you can't read it into memory to form a DOM or other kind of tree because it's too big.

Reading and Writing RSS Files

Problem

You want to create an Rich Site Summary (RSS) file, or read one produced by another application.

Writing XML

Problem

>
>

Recipes

Changed:
<
<

You have a data structure that you'd like to convert to XML.

>
>

Added:
>
>

Comments

  Difference Topic XmlChapter (r1.2 - 04 May 2004 - HectorEGomezMorales)
Changed:
<
<

You want a Scheme data structure that corresponds to the structure and content of an XML file. For example, you have XML representing a configuration file, and you'd like to say $xml->{config}{server}{hostname} to access the contents of ....

>
>

You want a Scheme List that corresponds to the structure and content of an XML file. For example you have this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<Recipe name="bread" prep_time="5 mins" cook_time="3 hours" >
  <title>Basic bread</title>
  <ingredient amount="3" unit="cups" >Flour</ingredient>
  <ingredient amount="0.25" unit="ounce" >Yeast</ingredient>
  <ingredient amount="1.5" unit="cups" >Warm Water</ingredient>
  <ingredient amount="1" unit="teaspoon" >Salt</ingredient>
<Instructions>
  <step>Mix all ingredients together, and knead thoroughly.</step>
  <step>Cover with a cloth, and leave for one hour in warm room.</step>
  <step>Knead again, place in a tin, and then bake in the oven.</step>
</Instructions>
</Recipe>

We will use SSAX library to parse this XML file in to a SXML list. Were ssax:xml->sxml is a function that takes a stream from a XML file (that in this case is test.xml) and the XML namespace to be used (in this case is empty that maps to the default namespace)

(require (lib "ssax.ss" "ssax"))
(ssax:xml->sxml (open-input-file "test.xml") empty)

This outputs a SXML list structure:

(|*TOP*|
 (|*PI*| xml "version=\"1.0\" encoding=\"UTF-8\"")
 (|Recipe|
   (@ (prep_time "5 mins") (name "bread") (cook_time "3 hours"))
   (title "Basic bread")
   (ingredient (@ (unit "cups") (amount "3")) "Flour")
   (ingredient (@ (unit "ounce") (amount "0.25")) "Yeast")
   (ingredient (@ (unit "cups") (amount "1.5")) "Warm Water")
   (ingredient (@ (unit "teaspoon") (amount "1")) "Salt")
   (|Instructions|
     (step "Mix all ingredients together, and knead thoroughly.")
     (step "Cover with a cloth, and leave for one hour in warm room.")
     (step "Knead again, place in a tin, and then bake in the oven."))))
  Difference Topic XmlChapter (r1.1 - 03 May 2004 - HectorEGomezMorales)
Added:
>
>

%META:TOPICINFO{author="HectorEGomezMorales" date="1083565800" format="1.0" version="1.1"}%

XML

Introduction

XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, XSIL, SVG, etc) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

Syntax rules in XML

An XML document is text, usually a particular encoding of Unicode such as UTF-8 or UTF-16, although other encodings may be used.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed" [1], it must conform (at the very least) to the following:

  • It must have one (and only one) root element.
  • Non-empty elements must be delimited by a start-tag and an end-tag. Empty elements may be marked with an empty-element tag.
  • All attribute values must be quoted (either single (') or double (") quotes, but a single quote closes a single quote and a double quote a double quote. The other pair can then be used inside values.)
  • Tags may be nested but may not overlap, that is each non-root element must be completely contained in another element.

Element names in XML are case-sensitive: for example and are a well-formed matching pair whereas and are not.

Also, again unlike HTML, clever choice of XML element names allows the meaning of the data to be retained as part of the markup. This makes it more easily interpreted by software programs.

As a concrete example, a simple recipe expressed in an XML representation might be:

        <?xml version="1.0" encoding="UTF-8"?>
        <Recipe name="bread" prep_time="5 mins" cook_time="3 hours" >
           <title>Basic bread</title>
           <ingredient amount="3" unit="cups" >Flour</ingredient>
           <ingredient amount="0.25" unit="ounce" >Yeast</ingredient>
           <ingredient amount="1.5" unit="cups" >Warm Water</ingredient>
           <ingredient amount="1" unit="teaspoon" >Salt</ingredient>
           <Instructions>
              <step>Mix all ingredients together, and knead thoroughly.</step>
              <step>Cover with a cloth, and leave for one hour in warm room.</step>
              <step>Knead again, place in a tin, and then bake in the oven.</step>
           </Instructions>
        </Recipe>
Identifying information accurately enables programs to manipulate it easily: in this example, it is now easy to convert the quantities to other measuring systems, or to print the ingredients as icons for those with low reading skills (or different native language), or to refer to the individual ingredients or steps from elsewhere (another recipe, for example).

An XML document that meets certain other criteria in addition to being well-formed (such as complying with an associated DTD) is said to be "valid".

XML Extensions

  • XPath It is possible to refer to individual components of an XML document using XPath. This allows stylesheets in (for example) XSL and XSLT to dynamically "cherry-pick" pieces of a document in any sequence needed in order to compose the required output.
  • XQuery is to XML what SQL is to relational databases.
  • XML namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.

Processing XML files

The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

Versions of XML

The current version of XML is 1.1 (as of 2004-05-04). The first version XML 1.0 currently exists in its third revision. XML 1.0 and XML 1.1 differ in the requirements of characters used for element names, attribute names etc.: XML 1.0 only allows characters which are valid Unicode 2.0, which includes most world scripts, but excludes scripts which only entered in a later Unicode version, such as Mongolian, Cambodian, Amharic, Burmese, etc.. XML 1.1 only disallows certain control characters, which means that any other character can be used, even if the Unicode standard grows exponentially.

It should be noted here that the restriction present in XML 1.0 only applies to element/attribute names: both XML 1.0 and XML 1.1 allow for the use of full Unicode in the content itself. Thus XML 1.1 is only needed if in addition to using a script added after Unicode 2.0 you also wish to write the elements in that script.

Other minor changes between XML 1.0 and XML 1.1 are that control characters are now allowed to be included but only when escaped, and two special 'form-feed' characters are included, which must be treated as whitespace.

All XML 1.0 documents will be valid XML 1.1 documents, with one exception: XML documents declaring themselves as being ISO-8859-1 encoded which are actually CP1252 encoded may now be invalid: this because CP1252 uses the control characters block of ISO-8859-1 for special glyphs like €, Œ, and ™. XML 1.0 documents which declare CP1252 encoding will remain valid.

There are also discussions on an XML 2.0, although it remains to be seen if such will ever come about. XML-SW (SW for skunk works), written by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of namespaces, XML Base and XML Information Set into the base standard.

Parsing XML into Data Structures

Problem

You want a Scheme data structure that corresponds to the structure and content of an XML file. For example, you have XML representing a configuration file, and you'd like to say $xml->{config}{server}{hostname} to access the contents of ....

Parsing XML into a DOM Tree

Problem

You want to use the Document Object Model (DOM) to access and perhaps change the parse tree of an XML file.

Parsing XML into SAX Events

Problem

You want to receive Simple API for XML (SAX) events from an XML parser because event-based parsing is faster and uses less memory than parsers that build a DOM tree.

Making Simple Changes to Elements or Text

Problem

You want to filter some XML. For example, you want to make substitutions in the body of a document, or add a price to every book described in an XML document, or you want to change:

<book id="1"> to <book> <id>1</id>

Validating XML

Problem

You want to ensure that the XML you're processing conforms to a DTD or XML Schema.

Finding Elements and Text Within an XML Document

Problem

You want to get to a specific part of the XML; for example, the href attribute of an a tag whose contents are an img tag with alt text containing the word "monkey".

Processing XML Stylesheet Transformations

Problem

You have an XML stylesheet that you want to use to convert XML into something else. For example, you want to produce HTML from files of XML using the stylesheet.

Processing Files Larger Than Available Memory

Problem

You want to work with a large XML file, but you can't read it into memory to form a DOM or other kind of tree because it's too big.

Reading and Writing RSS Files

Problem

You want to create an Rich Site Summary (RSS) file, or read one produced by another application.

Writing XML

Problem

You have a data structure that you'd like to convert to XML.

-- HectorEGomezMorales - 03 May 2004

 
 
Copyright © 2004 by the contributing authors. All material on the Schematics Cookbook web site is the property of the contributing authors.
The copyright for certain compilations of material taken from this website is held by the SchematicsEditorsGroup - see ContributorAgreement & LGPL.
Other than such compilations, this material can be redistributed and/or modified under the terms of the GNU Lesser General Public License (LGPL), version 2.1, as published by the Free Software Foundation.
Ideas, requests, problems regarding Schematics Cookbook? Send feedback.
/ You are Main.guest