eXtensible Markup Language

What is XML?


XML is a spin off from Standard Generalized Markup Language (SGML)... developed by the military... which is the problem in itself. With a fairly extensive career in the military... I would say... if the military wrote it... it's complicated.

For that reason XML was designed to be a light weight version of SGML. Actually XML came around because the short comings of HTML created the need.

CREATED 2012-08-25 14:44:54.0

00-16-43

UPDATED 2012-09-19 21:11:27.0

The Fundamentals


<name> <name age="29" sex="M" looks="good"> <name xmlns=http://www.leistware.com/dtd/MyDtd.dtd> </name> <name age="29" sex="M" looks="good">kenny</name> <name first="Kenny" age="29" sex="M" looks="good" /> <person age="29" sex="M" looks="good">
 <firstname>Kenny</firstname>
 <lastname>L</lastname>
</person>
<!CDATA[note:CDATA sections are transfered as-is ]]>

- marks the boundries of an element. The tag is: The Tag

  • always delimited with < and > e.g. < tagname >
  • must be a qualified xml tag name - begin with a number or xml in any case can not
  • is case sensitive - <name></name> but not <name></Name>
  • can not contain spaces within the name - spaces indicate attributes.

There are two types of tags. The open tag and the close tag.

The Open Tag - can contain attributes in name/value pairs e.g. name=kenny . Attribute values must be contained in quotes if there is spaces in the value. However, it is a good idea to contain all values in quotes. e.g. name=Kenny and name="Kenny" are both valid. However name=Kenny L would throw a parse error and need to be contained in quotes i.e. name=Kenny L

The open tag can also contain namespace references. Namesspace references indicated a seperate document that defines the specifics about the tag.

The Closing Tag - encased in < > but the name begins with a slash /. The name MUST be the same as the open tag (case sensitive). e.g.

is made up of a set of tags that may or may not encase data. There are two kinds of elements. The element and the empty element. An empty element has no data. In the interest of saving bytes, the empty element may omit the closing tag but not the slash. The Element

The element:

The empty element: make the data an attribute

Elements can be nested but not overlapped. Empty elements can not have children (other elements nested inside). In this example person could be an empty element but it would have to dump it's kids.

- To reference key characters like (<) an entity reference must be used so that the parser doesn't get confused. An entity reference begins with an ampersand & and ends with a semi-colon ;. For example: &lt; would make the < symbol (lt = LessThan), and &gt; would make the > symbol. Entity Reference

Sometimes entity references can become messy... an ocean of &'s. For this reason the CDATA section transferes all data as-is CDATA

CREATED 2012-08-27 12:03:48.0

00-16-46

UPDATED 2012-09-20 16:35:53.0

The Document


The XML Document has two basic parts

  • Header - Declarations defining the document, XML version and basically how to treat the document
  • Content - A frame that contains the data

CREATED 2012-09-20 12:09:01.0

00-17-2F

UPDATED 2012-09-20 12:47:57.0

The Header


<? xml version="1.0" encoding="utf-8" standalone="no" ?> <! DOCTYPE [name or root element] [type] [location]> <! DOCTYPE person SYSTEM "dtd/Person.dtd"> <! DOCTYPE person PUBLIC "http://www.leistware.com/dtd" "Person.dtd"> <person xmlns:"http://www.w3c.org/1999/xmlschema/instance"
  xmlns:"http://www.leistware.com/schemas"
  xsi:schemalocation="http"//www.leistware.com/schemas" "MySchema.xsd"

The header contains processing instructions on how to handle the document. The processing instruction is defined with <?...?> OR <!...>.First is the xml declaration. It should be the first line in the document.

  • version - the version of the xml language. 1.0 or 1.1
  • encoding - the character set used in the document
  • standalone - does this document stand by itself or does it depend on other documents

The header can also contain DOCTYPE declarations

  • name or rootelement - the root element in the document that is being referenced. Not sure why this is because there is only suposed to be one root element in each document so it should be obvious. But I'm not making the rules.
  • type - SYSTEM or PUBLIC
    • SYSTEM means it's on the local file system. That means that location should be an absolute or relative path
    • PUBLIC means it's NOT on the local file system. Then location would be a URL where it is.
  • location - a path or URL where the document is located.

This DOCTYPE declaration is identifying an docuement (dtd) in the dtd directory that has a root element of Person

This DOCTYPE declaration i s pointing to a document on the web that has a root element of Person

- The schema (newer than DOCTYPE) accomplishes the same thing in a different manner. This example defines a schema named MySchema.xsd Schema

CREATED 2012-09-20 12:48:14.0

00-17-31

UPDATED 2012-09-20 12:53:00.0

The Content


<person age="29" sex="M" looks="good">
  <firstname>Kenny</firstname>

The content is where the data is stored. The content is a structure made up of elements, however, the content can only contain one element. That is the root element. All other elements must be nested within the root element. In this example person is the root element. firstname, lastname, and address are child elements of person:

CREATED 2012-09-20 12:37:45.0

00-17-30

UPDATED 2012-09-20 16:36:23.0

DBID: db.wam

Page Server: Ithica

©2012 Leistware Data Systems

      Hello anonymous