h1. Table of Contents
# [#Introduction]
#* [#Why another XML system?]
#** [#Off-the-shelf XML does not do enough during setup]
#** [#Existing data search tools are not enough]
#* [#Implementation]
# [#The memory-resident tree structure]
#* [#The basis of in-memory XML data is the DOM tree]
#* [#Nodepaths]
# [#The TreeAccNode class]
# [#The TreeAcc class]
#* [#Instantiation]
#* [#Basic search]
#* [#Adding a node]
#* [#Replace the value of an existing node]
#* [#Save the in-memory XML data tree to a file]
#* [#Walk the entire XML data tree]
# [#Pre-processing]
#* [#Input files which come into play]
#* [#Introducing the defval manifest]
#* [#Helper methods]
#* [#Makeup of a defval manifest]
#** [#Declaration of helper methods]
#** [#Mapping of defaults (hardwired and calculated) to a node]
#*** [#Hardwired default values]
#*** [#Calculated default values]
#** [#Mapping of validator helper methods to a node]
#*** [#Nodepath validation]
#*** [#Group validation]
#*** [#Exclude validation]
#* [#How defaults and content validation works]
#* [#Preprocessing cycle]
# [#Enhanced Nodepathing]
#* [#Specifying the constraints as part of the nodepath]
#* [#How it works]
# [#ManifestServ and ManifestRead implementation modules]
#* [#ManifestServ implementation module]
#* [#ManifestRead implementation module]
#* [#SocketServProtocol module]
# [#ManifestServ and ManifestRead commandline modules]
#* [#ManifestServ]
#* [#ManifestRead]
# [#Additional resources]
#* [#For more information]
#* [#Filing bugs]
#* [#Getting help]
{anchor:Introduction}
h1. Introduction
This XML preprocessor and parser system has been developed as part of the OpenSolaris Caiman Distribution Constructor project, but is created to be adaptable to any project which uses an XML manifest. It is currently used for the Caiman Automated Installer as well. It is written because off the shelf XML does not provided enough support for the Caiman project.
{anchor:Why another XML system?}
h3. Why another XML system?
{anchor:Off-the-shelf XML does not do enough during setup}
h4. Off-the-shelf XML does not do enough during setup
Document Type Definitions (DTDs) are the first XML format specification. A DTD is a specification against which an XML document can be checked. DTDs support syntax validation: checks that XML elements and attributes are properly ordered and some type checking. They have limited default setting capabilities.
Schemas have succeeded DTDs. Schemas are like a DTDs but have clearer syntax as they are written in XML. Like DTDs they do syntax validation. However, they do not do default setting. There are many kinds of schemas. The one picked for this system is RelaxNG since it is a powerful schema and support for it is already delivered (as libxml2) as part of OpenSolaris.
This preprocessor uses a schema, but does much more. It provides full setting of defaults, whether calculated or hardwired. More than just plain syntax validation, it offers full semantic validation. Both of these features are fully flexible as users write python methods to do the work. The methods are mapped to data nodes in an in-memory XML data tree. Methods also have full access to all data, in case, for example, a default for one data node depends on the value of another.
While not currently implemented, hooks are in place to do layering of multiple XML files to produce an aggregate.
{anchor:Existing data search tools are not enough}
h4. Existing data search tools are not enough
The existing Document Object Model (DOM) searches and returns all XML data tree nodes with a given tag name using a preorder tree traversal order. Additional processing is required to narrow down to a single node. The new XML parser, by contrast, implements an easy way to specify a specific node, even among nodes with the same name.
{anchor:Implementation}
h3. Implementation
The system described on this page is implemented in the following modules found in the slim_source (ssh://anon@hg.opensolaris.org/hg/caiman/slim_source) gate:
# usr/src/lib/install_utils/TreeAcc.py
#* TreeAccNode class - Representation of an XML data node as used by the system.
#* TreeAcc class - Implementation of the low level data tree manipulation and search API
# usr/src/lib/install_utils/ENParser.py
#* Enhanced Nodepathing Parser
# usr/src/lib/install_utils/DefValProc.py
#* Default setting and content validation engine
# usr/src/lib/install_utils/defval-manifest.rng
#* Schema used to validate defval manifest
# usr/src/lib/install_utils/ManifestServ.py
#* High-level preprocessor and parser API, and support for serving data to remote processes.
# usr/src/lib/install_utils/ManifestRead.py
#* Support for remote processes to read data served by ManifestServ
# usr/src/lib/install_utils/SocketServProtocol.py
#* Definitions used by both usr/src/lib/install_utils/ManifestServ and usr/src/lib/install_utils/ManifestRead modules.
# usr/src/lib/install_utils/install_utils.py
#* Contains support routines used for parsing, among other things.
# usr/src/cmd/install-tools/ManifestServ.py
#* ManifestServ commandline interface
# usr/src/cmd/install-tools/ManifestRead.py
#* ManifestRead commandline interface
{anchor:The memory-resident tree structure}
h1. The memory-resident tree structure
{anchor:The basis of in-memory XML data is the DOM tree}
h3. The basis of in-memory XML data is the DOM tree
The basis of the in-memory XML data is a DOM tree set up by Sax2. DOM is chosen so the data can be kept in memory. Anticipated applications have small, bounded amounts of data so there is no problem keeping it in memory.
The tree consists of DOM nodes. The core DOM node represents an XML element. Other DOM nodes may hang off of an element node to represent its attributes, value, or other aspects. xml.dom Python interfaces are used for setting up, writing out, finding data and getting around the tree at the lowest level.
The DOM interfaces and, to some degree, their tree structure, are hidden, wrapped by the TreeAcc API, discussed later. The tree made available by this system takes a different form, as shown in the next section.
{anchor:Nodepaths}
h3. Nodepaths
Define a _nodepath_ to be a route through various tree nodes to get from one place in a tree (usually the root) to certain destination nodes. The simplest nodepath would contain the names of nodes to traverse to get from point A to point B, separated by forward slashes. For example, consider the following tree:
{code}
A
|
-----------------
| |
B C
--------- ---------
| | | |
D E F G
|
---------
| |
H I
{code}
The simplest nodepath to get from A to I would be A/B/D/I.
Note: when starting from the root, specifying the root node is optional. Since it would always need to be specified, it can be implied and left off. The path to I from the top becomes B/D/I.
The nodepath lists element* nodes and attribute** nodes in the same way. The only apparent difference (other than their type) is that attribute nodes may appear only as the last node of a nodepath. For example, one cannot tell from looking at the tree or nodepath whether H is an attribute node or an element node.
\* An XML _element_ has a name and provides room for content in the form of a value and/or sub-elements.
\*\* An XML _attribute_ represents a property of an element, and is bound to the element it helps define.
{anchor:The TreeAccNode class}
h1. The TreeAccNode class
TreeAccNodes are a new software layer wrapped around or centered around a DOM element node. They provide object-oriented interfaces for getting information about the node and the tree it is in.
There are two kinds of TreeAccNodes:
* ELEMENT type:
** Associated with a DOM node. Takes the value of its DOM node.
** Can appear anywhere in a nodepath.
* ATTRIBUTE type: associated with an attribute of a DOM node.
** The parent TreeAccNode of an ATTRIBUTE TreeAccNode always corresponds to the DOM element containing that attribute. This is an internal implementation detail, however; to the outside, an ATTRIBUTE TreeAccNode is used/referenced the same way as an ELEMENT TreeAccNode.
** Always a tree leaf, the last piece of a nodepath.
** There can be multiple ATTRIBUTE TreeAccNodes hanging from a single ELEMENT TreeAccNode.
TreeAccNodes have the following fields:
# name: String to identify the node.
# type: ELEMENT or ATTRIBUTE
# value: String value of the represented attribute or element.
#* A value is optional to an element.
#* A value is mandatory to an attribute.
# attr_dict: a dictionary of attribute key-value pairs.
#* For ELEMENT type, it contains the list of all attributes under the current element.
#* For ATTRIBUTE type, it contains only the value of the current attribute represented.
# element_node: A handle to the corresponding DOM element represented.
# tree: Current DOM data tree instance.
These fields are accessible via methods:
* get_name(): Return the string name of this node.
* get_path(): Return the path from the root to this node.
* get_value(): Return the string value of this node.
* get_attr_dict(): Return the attributes as key-value pairs.
* get_tree(): Return a reference to the tree instance.
* get_element_node(): Return the corresponding DOM element of this node.
* is_leaf(): Return True or False that this node is at the end of a nodepath.
* is_attr(): Return True or False that this node represents an ATTRIBUTE.
* is_element(): Return True or False that this node represents an ELEMENT.
An implementation detail: TreeAccNodes are created in a just-in-time fashion. They are wrapped around a DOM element node and then returned. They contain a pointer to their reference DOM node.
{anchor:The TreeAcc class}
h1. The TreeAcc class
This class implements low-level data tree manipulation and search. TreeAcc instances correspond to XML data trees. Brief discussion of implementation follows. All of the methods below are bound to TreeAcc instances.
{anchor:Instantiation}
h3. Instantiation
A tree is created as follows:
{code}newTree = TreeAcc("xml_file"){code}
{anchor:Basic search}
h3. Basic search
Search is invoked via
{code}
find_node("path/to/node", starting_node=None)
{code}
This begins a search for nodepath "path/to/node". _starting_node_ specifies where in the tree to begin the search; _None_ specifies the root of the tree. The search is done in pre-fix recursive fashion, eating the leftmost path piece as it descends into the tree. A match is returned when the final path piece is matched. If two sibling nodes matched a non-final path piece, control returns to the second sibling and the search resumes to check its progeny in the tree for more matches.
(Note: searching is covered in much more detail in the section on [#Enhanced Nodepathing].)
{anchor:Adding a node}
h3. Adding a node
Adding a node is useful, for example, when adding a default value to the tree where one was not specified in the input XML file. A node is added with the following method call:
{code}add_node(path, value, type, starting_ta_node=None, is_unique=True){code}
This adds a new element containing the value _value_ of type _type_ at path _path_. _starting_ta_node_ specifies where in the tree to start searching; _None_ means start from the root of the tree. Fail if _is_unique_ is _True_ and a node which matches _path_ already exists. Fail if there can be more than one possible parent for the new node.
Implementation is as follows:
# Strip the last path piece from the path provided. This yields the path of the would-be parent.
# Perform the search with the new parent path.
# If only one node is found, that's the parent.
# Add the new TreeAccNode (at least from the perspective of the caller...). Under the covers, what really happens is:
#* If ELEMENT type is specified, create new DOM element node
#* If ATTRIBUTE type specified, add attribute to found DOM element node.
{anchor:Replace the value of an existing node}
h3. Replace the value of an existing node
Call the following:
{code}replace_value(path, new_value, starting_ta_node=None){code}
to replace the value of an existing element or attribute at _path_ with value _value_. _starting_ta_node_ specifies where in the tree to start searching; _None_ means start from the root of the tree. Replacement occurs only if one node matches.
Implementation is similar to add_node():
# Search the path.
# If only one node matches, then replace the value of that node.
{anchor:Save the in-memory XML data tree to a file}
h3. Save the in-memory XML data tree to a file
The following method:
{code}save_tree(out_file){code}
saves the in memory DOM tree to file _out_file_ using xml.dom methods.
{anchor:Walk the entire XML data tree}
h3. Walk the entire XML data tree
All TreeAccNodes can be returned and processed by doing the following:
{code}
walker = tree.get_tree_walker()
curr_list = tree.walk_tree(walker)
while (curr_list != None):
process_list(curr_list) # Your processing goes here...
curr_list = tree.walk_tree(walker)
{code}
Implementation is a mixture of xml.dom methods and home grown code (get_TreeAccNode_cluster_from_element()) that packages up a DOM element node and its child attribute nodes into a list for return.
{anchor:Pre-processing}
h1. Pre-processing
Raw XML data (as read from a manifest, for example) is processed before it is served or consumed. This processing completes and thoroughly validates the data. Processing consists of the following operations:
# Read the raw XML manifest data into memory.
# Fill in the raw data with defaults.
# Perform semantic validation on the data.
# Validate the data against schema via xmllint. The xmllint utility is part of libxml2.
{anchor:Input files which come into play}
h3. Input files which come into play
# Manifest to be validated: The input specification to the program employing this XML processing.
#* Provided by the user of the project.
#* May be specified incompletely if defaults will be filled in.
#* Name ends in ".xml" suffix, e.g. DC-manifest.xml
# Manifest schema: Schema against which the manifest is validated using xmllint.
#* Developed as part of the project.
#* Name is keyed to the manifest and ends in ".rng", e.g. DC-manifest.rng
# Defval manifest: Specification of defaults and semantic validation processing.
#* Developed as part of the project.
#* Name is keyed to the manifest and ends in ".defval.xml", e.g. DC-manifest.defval.xml
# Defval manifest schema: Schema against which the defval manifest is validated using xmllint.
#* Delivered as a part of this XML system.
#* Name is "defval-manifest.rng"
#* The same for all projects.
{anchor:Introducing the defval manifest}
h3. Introducing the defval manifest
After a project's XML schema is written to define the syntax of a project's XML manifest, a defval manifest can be written to define and set default values in it, and to describe how to perform semantic and content validation on it. This makes the project's manifest data robust.
{anchor:Helper methods}
h3. Helper methods
Helper methods are methods written as part of the project, which perform validation and calculation of defaults. The defval manifest defines mappings of helper methods to nodes of in-memory data. The methods operate on the data, to validate it or to provide a default value for it.
Validator methods are given as an argument a nodepath corresponding to TreeAccNodes to validate. They return True if data validates, False if not. Note that the TreeAccNodes being validated contain a pointer to the tree itself as well as the value to validate, so the method can reference any part of the tree it needs to perform its function.
Deflt-setter methods return a string value which will be placed into a new node. For context, they are passed as an argument a handle to the parent_node of where the new node would be placed. As with validator methods, the deflt-setter methods have access to the entire tree through the node passed to it.
{anchor:Makeup of a defval manifest}
h3. Makeup of a defval manifest
{anchor:Declaration of helper methods}
h4. Declaration of helper methods
Helper methods are defined by their name and the module they live in. The {{<helpers>}} section of the defval manifest helps shorten helper references made throughout the rest of the file, since the defval manifest may refer to these helpers many times.
Each helper method must have an entry in this section. Here is the format:
{code}
<helpers>
<validator <!-- Validator entry looks like this -->
ref="shortcut_name"
module="module-where-method-lives"
method="methodname"
invert="True"
/>
<deflt_setter <!-- Default setter entry looks like this -->
ref="shortcut_name"
module="module-where-method-lives"
method="methodname"
/>
</helpers>
{code}
All attributes except the {{invert}} validator attribute are mandatory. When {{invert}} is specified and set to "True", the result of the validator method is inverted. This is useful to apply the same validator to different nodes which validate with opposite values. For example, the same validator can be used to validate nodes which must be numeric and to validate other nodes which must *not* be numeric.
{anchor:Mapping of defaults (hardwired and calculated) to a node}
h4. Mapping of defaults (hardwired and calculated) to a node
Default values may be calculated or hardwired.
{anchor:Hardwired default values}
h5. Hardwired default values
Here is the format of a defval manifest entry that maps a hardwired value to (nodes matching) a particular nodepath. (Note that there can be multiple nodes which match.)
{code}
<default>
nodepath="/path/to/node"
type=[ "element" | "attribute" ]
missing_parent=["create" | "skip" | "error"]
skip_if_no_exist="ancestor/node/path"
empty_str=["set_default"|"valid"|"error"]
from="value"
hardwired-value
</default>
{code}
Explanations:
* The _nodepath_ is from the root of the tree.
* _missing_parent_ describes how to handle setting a default if no viable parent node exists in the tree.
** _create_ creates one or more ancestor nodes to forge a path from the root to where the default would be set. This is useful for creating nodes with default values for nodes which must exist. Note that no node is created if the location of the created parent cannot be locked down to a single place in the tree.
** _skip_ does not create a default node if a viable parent node does not exist. This is useful when it is inappropriate to create a new parent node. For example, if a "home directory" node exists as a child to a "user" node, it does not make sense to create a new user just to be able to create a home directory for it.
** _error_ causes preprocessing to fail if a viable parent node does not exist.
* _skip_if_no_exist_ dictates what to do if an optional high ancestor (for example, one level down from the root) does not exist. This is useful when a huge part of the tree (as a unit) is not specified nor desired; it prevents the attempted setting of a default in the missing part of the tree from establishing the undesired part of the tree. _skip_if_no_exist_ conditions take precedence over _missing_parent_ conditions. Consider the tree in the [#Nodepaths] section: if the default for H is set up with "skip-if-no-exist=B", then H won't get a default set up for it if B doesn't exist, even if 'missing-parent="create"' is set for it.
* _empty_str_ dictates how to handle nodes with empty strings.
** _set_default_ indicates to treat nodes with empty strings as if they are not specified, and to set a default value in them.
** _valid_ indicates that empty strings are valid values as is.
** _error_ indicates that empty strings trigger errors
* _from="value"_ indicates that this entry specifies a hardwired value. That is, the value of this entry is a hardwired value.
Note that a value string containing spaces is interpreted as a single value. If the value is a list, enclose the individual items in single- or double-quotes to individuate them.
{anchor:Calculated default values}
h5. Calculated default values
Default values, for example those requiring context for determination, may also be calculated by a Python method. To do this, change _from_ to indicate _"helper"_ and indicate the helper method where the hardwired value was:
{code}
<default>
nodepath="/path/to/node"
...
...
from="helper"
helper-method-shortcut-name
</default>
{code}
{anchor:Mapping of validator helper methods to a node}
h4. Mapping of validator helper methods to a node
Semantic and content validation is done by mapping a validator method to a nodepath. All nodes which match the nodepath will be passed to the validator method. The method will return True for each node that validates. There are three ways of invoking validator methods:
{anchor:Nodepath validation}
h5. Nodepath validation
{code}
<validate nodepath="path/to/nodes"
missing= [ "ok" | "ok_if_no_parent" | "error" ]
skip_if_no_exist="ancestor/node/path"
>
"helper-method-shortcut-name" [..."helper-method-shortcut-name"]
</validate>
{code}
Explanations:
* The _nodepath_ is from the root of the tree.
* _missing_ describes what to do if one or more viable parent nodes (whose nodepath matches all but the last part of the given _nodepath_) has no child nodes which match the _nodepath_.
** _ok_ treats missing nodes as if they were present and valid. This could be used as an alternative to setting a default value, allowing the project itself to default to a value internally.
** _ok_if_no_parent_ treats missing nodes as valid only if their parent is also missing. An example of use would be a "home directory" node which is a child to a "user" node. If the user exists the home directory must be valid, but there does not have to be a "home directory" node when there is no parent "user" node.
** _error_ treats missing nodes as errors.
* _skip_if_no_exist_ dictates what to do if an optional high ancestor (for example, one level down from the root) does not exist. This is useful when a huge part of the tree (as a unit) is not specified nor desired; it prevents validation failure when the optional subtree which would contain the nodes to be validated does not exist.
* The value of a nodepath validation entry is a list of one or more validator method shortcut names. Quotes around each name are needed if there is more than one item in the list.
{anchor:Group validation}
h5. Group validation
Validate nodes in a group of nodepaths with a single validator method. This is useful when many types of nodes require the same validation method. Validation occurs only on nodes which are present. Does not err on missing nodes.
{code}
<validate group="helper-method-shortcut-name">
"path/one/to/nodes" [ ..."path/n/to/nodes" ]
</validate>
{code}
{anchor:Exclude validation}
h5. Exclude validation
Like _group validation_, _exclude validation_ validates nodes which match one of many possible nodepaths. However, it is the invert of group validation, checking all nodes which *do not* match any given nodepaths. Validation occurs only on nodes which are present; does not err on missing nodes.
{code}
<validate exclude="helper-method-shortcut-name">
'path/one/to/exclude" [..."path/n/to/exclude" ]
</validate>
{code}
{anchor:How defaults and content validation works}
h3. How defaults and content validation works
Initial defval manifest processing reads the {{<helpers>}} list and builds dictionaries of methods and modules, both indexed by the helper-method-shortcut-name. Later, as a node is presented for processing, strings for the method and module specified for that node are looked up. Method and module strings are passed to the python getattr() builtin to retrieve a callable function object. That function is then called to do the processing.
For all default setting, and for _validate nodepath_ and _validate group_ processing, the TreeAcc search facilities are used. For _validate exclude_ processing, the search is made using the tree walking facilities (also made available by the TreeAcc module).
{anchor:Preprocessing cycle}
h3. Preprocessing cycle
Processing has been described at a high level in the beginning of this chapter. Now that all of the pieces have been described, a more detailed breakdown can be presented:
# The defval manifest data is read into memory from the defval manifest XML file. Call this tree the _defval tree_.
# The _defval tree_ is validated against the defval manifest schema using xmllint.
# The raw data is read into memory from the manifest XML file. Call this tree the _manifest tree_.
# The _manifest tree_ filled in with defaults.
# Semantic validation is done on the _manifest tree_.
# The _manifest tree_ is written out to a temporary file.
# The temporary file is validated against schema via xmllint.
# If everything validates, the project will be served data from the _manifest tree_.
{anchor:Enhanced Nodepathing}
h1. Enhanced Nodepathing
When the TreeAcc class search facilities are given a nodepath which consists of names of nodes, it will return a list of all nodes which match that nodepath. Enhanced Nodepathing constrains the search to a subset or even a specific node, by looking at values of ancestor nodes found during the search. It can further narrow which ancestor nodes are relevant by looking at values of their progeny (child nodes, grand-child nodes, and so on), even though those progeny do not fall in the ancestral search path. Constraints can be aggregated, so searches can be narrowed as much as needed.
{anchor:Specifying the constraints as part of the nodepath}
h3. Specifying the constraints as part of the nodepath
Enhanced Nodepathing introduces a new language for specifying constraints as part of the nodepath itself. Here is the Backus Naur Form for that language:
{code}
nodename :== name of a node (string)
value :== value of a node (string)
nodepath :== nodename | nodepath/nodename
valpathspec :== nodepath=value | valpathspec | valpathspec:valpathspec
token :== token | token/token | nodename | nodename=value | nodename[valpathspec]
enhanced_nodepath :== token
{code}
Consider the following tree:
{code}
A
|
-----------------------
| |
B B
(value=v1) (value=v2)
| |
----------------------- -------------------------
| | | | | |
D C C C C D
(value=v3) (value=v4) (value=v5) (value=v6) (value=v7) (value=v8)
| | \ | | \
E F \ E F \
(value=v9) (value=v10) \ (value=v11) (value=v12) \
G G
(value=v13) (value=v14)
{code}
|| enhanced nodepath || values returned ||
| A/B/C | v4, v5, v6, v7 |
| A/B=v1/C | v4, v5 |
| A/B\[C=v5\]/D | v3 |
| A/B\[C=v4:C/E=v9\]/C\[F=v10\]/G | v13 |
Any value can be put inside single or double quote pairs so that special characters \[\]:=/ are not interpreted by the parser. This allows for values such as {{http://pkg.opensolaris.org}}.
Values which are lists (e.g. '"en_us" "Posix/C"') cannot be specified in enhanced nodepaths.
{anchor:How it works}
h3. How it works
The first thing the search entry point find_node() does is call parse_nodepath() of module ENParser.py to parse the path. The parse_nodepath() function creates a list of ENToken objects, each of which represents a parsed token. A token is delimited by forward slashes.
ENToken objects have three parts to them:
# name: the name of the node represented by the token.
# values: a list of values specified to narrow down the search.
# valpaths: a list of paths to values.
The ENTokens of the examples in the previous section are as follows:
{code}
A/B/C A/B=v1/C A/B[C=v5]/D A/B[C=v4:C/E=v9]/C[F=v10]/G
name:A name:A name:A name:A
values:[] values:[] values:[] values:[]
valpaths:[] valpaths:[] valpaths:[] valpaths:[]
name:B name:B name:B name:B
values:[] values:[v1] values:[v5] values:[v4, v9]
valpaths:[] valpaths:[] valpaths:[C] valpaths:[C, C/E]
name:C name:C name:D name:C
values:[] values:[] values:[] values:[v10]
valpaths:[] valpaths:[] valpaths:[] valpaths:[F]
name:G
values:[]
valpaths:[]
{code}
The core search methods of the TreeAcc class are ___search_node()_, ___match()_, ___is_bracket_match()_ and ___do_dots()_.
___search_node()_ is the main search coordinator method. It calls __match() to look for matches on the current ENToken. If it does not find any, it returns. It then calls __do_dots() to eat and process ".." ENTokens (and possibly end up higher in the data tree). Then it calls itself to check children nodes for a match on the next ENToken. When it has a match for the last ENToken in the nodepath, it appends to the list of results a new TreeAccNode representing the found data. Since __search_node() is recursive, when it returns, control may return to a previous call instance which may then recurse down the tree in another direction. When all recursion is done, a list of found TreeAccNodes is returned to the caller.
___match()_ checks for a match on the current ENToken. If the current ENToken has only a name specified, this is a simple check on the current node. If it has a value specified, the name and value are checked. If the current ENToken has a valpath, __match() calls __is_bracket_match() to recurse down the valpath looking for a match on the corresponding value.
___is_bracket_match()_ does valpath searches. It makes use of __search_node(). Valpath searches are made with the cmp_match flag set so that success is returned and seeking stopped once a single matching value is found.
___do_dots()_, as already stated, reverts back up the tree when it encounters ENTokens with a name field of "..".
{anchor:ManifestServ and ManifestRead implementation modules}
h1. ManifestServ and ManifestRead implementation modules
This section focuses on the ManifestServ and ManifestRead implementation modules which are in the gate at usr/src/lib/install_utils or installed on a system in /usr/lib/python2.4/vendor-packages/osol_install. These modules are not to be confused with other modules of the same name, which are in the gate at /usr/src/cmd/install-tools or installed on a system in /usr/bin. The latter modules are commandline interfaces which operate the former modules.
The ManifestServ and ManifestRead implementation modules comprise an upper software layer to the TreeAcc, DefValProc and other modules already discussed. They provide easy-to-use interfaces for a project (such as the Distribution Constructor) to call to set things up and access data.
{anchor:ManifestServ implementation module}
h3. ManifestServ implementation module
ManifestServ objects export the following methods for setup and use of XML data:
# ___init\_\_()_: Most setup occurs when a project instantiates a ManifestServ object based on an XML manifest. The manifest is loaded, and its [#Preprocessing cycle] where its defaults get set and its content validated is completed. Among other things, the returned object has methods for retrieving data. The constructor takes several arguments:
#* _manifest_: Name of the project manifest file. The ".xml" suffix is optional and will be appended if not provided. Names of the manifest schema and defval manifest files are keyed off this name (less any provided ".xml" suffix) if the _valfile_base_ argument is not provided or None.
#* _valfile_base_: When provided, this name overrides _manifest_ for building the defval_manifest and manifest schema filenames. When provided, the defval_manifest will be called _valfile_base_.defval.xml and the manifest schema will be called _valfile_base_.rng. _valfile_base_ may contain prepended directory structure.
#* _out_manifest_: When provided, a nicely-formatted XML manifest file that includes all preprocessing will be output to this name. Defaults to None if not provided.
#* _verbose_: When _True_, enables on-screen printout of defaults, content validation and schema validation processing. Defaults to _False_.
#* _keep_temp_files_: When _True_, leaves the temporary file around after termination. The temporary file is the one sent to xmllint for final syntactic validation. The temporary file name is of the form "/tmp/_manifest_\_temp\_<pid>.xml". Default is _False_.
# _get_values()_: How a process holding a ManifestServ object retrieves XML data. It takes the following arguments:
#* _request_: An enhanced nodepath of requested data.
#* _is_key_: Special provision is made in the manifest for general key-value pairs. This argument will circumvent the need for a full nodepath for a key-value pair; when _True_ only the name of the _key_ needs to be provided. This works because all key-value pairs defined are placed in an expected location in the manifest. The location is at a level just below the root of the data tree in a section called {{<key_value_pairs>}}. The section in the manifest is formatted as follows: {code}
<key_value_pairs>
<pair key="key1" value="value1">
<pair key="key2" value="value2">
</key_value_pairs>
{code}
The preferred way to retrieve the value of the second key would be: {code}
values = manifest_tree.get_values("key2", "True")
{code}
Note that if a retrieved value is a list, each list item will be returned individually. A value string not enveloped in single- or double-quotes in the manifest is seen as a single value, even if it has spaces. Lists of items may be stored as a single manifest value by enveloping each item with single- or double-quotes.
The remaining ManifestServ implementation object methods deal with the socket server. The socket server makes the XML data available to other processes on the same host. This is needed by the Distribution Constructor, which runs shell and Python scripts in separate process space from the main program which holds the ManifestServ object. Communication goes through an AF_UNIX socket. The protocol is defined in the SocketServProtocol.py module, placed in the same directories as the ManifestServ and ManifestRead modules it serves. The socket server is accessed via the following ManifestServ methods:
# _start_socket_server()_: The socket server is not started by default. Call this method to start it. To debug protocol errors, pass _True_ to its one argument to enable debugging.
# _stop_socket_server_: Call this method to shut down the socket server before terminating the calling process. Failing to do this will leave the socket node lingering in /tmp upon process termination.
# _get_sockname()_: Retrieve the socket name with this call. The socket name is usually retrieved to pass to one of the ManifestRead modules. For example, the Distribution Constructor retrieves it to pass it along to finalizer scripts which call ManifestRead. The socket name will be {{/tmp/ManifestServ.<pid>}}
{anchor:ManifestRead implementation module}
h3. ManifestRead implementation module
Python scripts or the ManifestRead commandline interface module will instantiate a ManifestRead implementation object in order to retrieve data through the socket server. ManifestRead implementation objects have the following methods:
# ___init\_\_()_: Instantiate a ManifestRead object. Takes the socket name as its only argument.
# _get_values()_: Retrieve XML data through the socket and socket server. It has the same syntax and returns the same data as its ManifestServ counterpart. It takes the following arguments:
#* _request_: An enhanced nodepath of requested data.
#* _is_key_: Interpret _request_ as a key. Please see ManifestServ _get_values()_ for more information.
# _set_debug()_: Call this method with a _True_ argument to enable debugging, _False_ to disable.
{anchor:SocketServProtocol module}
h3. SocketServProtocol module
This module defines the protocol used between ManifestServ (server) and ManifestRead (client) implementation modules. Protocol is as follows:
# Client->server: Prerequest containing is_key and request size is sent.
# Server->client: Sends PRE_REQ_ACK (pre-request-acknowledge) back to the client.
# Client->server: Request in the form of a nodepath is sent.
# Server->client: Send count of matching nodes, and the size of the entire results string (which includes all results).
# Client->server: Send RECV_PARAMS_RECVD (receive parameters received) to acknowledge.
# Server->client: Send strings of matching node values as a single string with STRING_SEP in between each. Send EMPTY_STR instead of empty strings, when applicable. Last string (which is at the index = one more than the count returned to the client) is REQ_COMPLETE
# Client->server: Another prerequest, or TERM_LINK is sent.
This module also defines where the key-value pairs are expected in the manifest.
{anchor:ManifestServ and ManifestRead commandline modules}
h1. ManifestServ and ManifestRead commandline modules
{anchor:ManifestServ}
h3. ManifestServ
The ManifestServ commandline module is a useful testing tool for checking out the viability of an XML manifest or a schema or defval manifest which validates it. Invoke with -v to see detailed output of the default setting and semantic validation processing, if that processing is failing. Invoke with -t to save the temporary file to see where manifest validation against the schema is failing. Invoke with -o to generate a new nicely-formatted XML file which accounts for all preprocessing.
The ManifestServ commandline module serves as a standin for a project which will use the data, by creating a ManifestServ implementation object and providing a commandline query loop to retrieve XML data. It's usage is as follows:
{code}
Usage: ManifestServ [-d] [-h|-?] [-s] [-t] [-v] [-f <validation_file_base> ]
[-o <out_manifest.xml file> ] <manifest.xml file>
where:
-d: turn on socket debug output (valid when -s also specified)
-f <validation_file_base>: give basename for schema and defval files
Defaults to basename of manifest (name less .xml suffix) when not provided
-h or -?: print this message
-o <out_manifest.xml file>: write resulting XML after defaults and validation processing
-t: save temporary file
Temp file is "/tmp/<manifest_basename>_temp_<pid>.xml"
-v: verbose defaults/validation output
-s: start socket server for use by ManifestRead
{code}
Once in the query loop, specify the enhanced nodepath to retrieve data. Specify "+key" to switch to a mode to enter keys instead of full nodepaths to retrieve key-value pairs. Specify "-key" to switch back to the normal mode which recognizes enhanced nodepaths.
If "-s" is specified, the name of the socket to specify to ManifestRead will print just before the query loop is started.
The command to run on a customer system to help a customer debug a DC manifest, accounting for where files are installed, would be something like:
{code}
ManifestServ -f /usr/share/distro_const/DC-manifest -v -t <manifest.xml file>
{code}
See the section on the [#ManifestServ implementation module] for more information on storage and retrieval of a value which is a list of items.
{anchor:ManifestRead}
h3. ManifestRead
The ManifestRead commandline module is intended for use by shell scripts, such as finalizer scripts called by the Distribution Constructor. It takes a list of either nodepaths or keys (but not a mixture of the two), and returns the corresponding values. By default, if more than one nodepath or key is requested, the request is prepended to the results. Each result will be on its own output line. See the section on [#ManifestServ implementation module] for more information on storage and retrieval of a value which is a list of items.
ManifestRead has the following usage:
{code}
Usage:
ManifestRead [-d] [-r] <socket name> <nodepath> [ ...<nodepath> ]
ManifestRead [-d] [-r] [-k] <socket name> <key> [ ...<key> ]
ManifestRead [-h|-?]
where:
-d: turn on debug output
-h or -?: print this message
-k: specify keys instead of nodepaths
-r: Always print nodepath next to a value
(even when only one nodepath is specified)
{code}
{anchor:Additional resources}
h1. Additional resources
{anchor:For more information}
h3. For more information
# DC design document:
#* gate: ssh://anon@hg.opensolaris.org/hg/caiman/caiman-docs gate
#* file: distro_constructor/DC_DESIGN_DOC.odt
# RelaxNG schema: http://www.oasis-open.org/committees/relax-ng/tutorial.html
# Python DOM howto: http://pyxml.sourceforge.net/topics/howto/xml-howto.html
# xml.dom information: http://www.python.org/doc/2.4.2/lib/module-xml.dom.html
{anchor:Filing bugs}
h3. Filing bugs
Bugzilla (defect.opensolaris.org)
Classification:development, product:distro-constructor
{anchor:Getting help}
h3. Getting help
email: caiman-discuss@opensolaris.org
chat: #caiman-discuss channel at irc.freenode.net
# [#Introduction]
#* [#Why another XML system?]
#** [#Off-the-shelf XML does not do enough during setup]
#** [#Existing data search tools are not enough]
#* [#Implementation]
# [#The memory-resident tree structure]
#* [#The basis of in-memory XML data is the DOM tree]
#* [#Nodepaths]
# [#The TreeAccNode class]
# [#The TreeAcc class]
#* [#Instantiation]
#* [#Basic search]
#* [#Adding a node]
#* [#Replace the value of an existing node]
#* [#Save the in-memory XML data tree to a file]
#* [#Walk the entire XML data tree]
# [#Pre-processing]
#* [#Input files which come into play]
#* [#Introducing the defval manifest]
#* [#Helper methods]
#* [#Makeup of a defval manifest]
#** [#Declaration of helper methods]
#** [#Mapping of defaults (hardwired and calculated) to a node]
#*** [#Hardwired default values]
#*** [#Calculated default values]
#** [#Mapping of validator helper methods to a node]
#*** [#Nodepath validation]
#*** [#Group validation]
#*** [#Exclude validation]
#* [#How defaults and content validation works]
#* [#Preprocessing cycle]
# [#Enhanced Nodepathing]
#* [#Specifying the constraints as part of the nodepath]
#* [#How it works]
# [#ManifestServ and ManifestRead implementation modules]
#* [#ManifestServ implementation module]
#* [#ManifestRead implementation module]
#* [#SocketServProtocol module]
# [#ManifestServ and ManifestRead commandline modules]
#* [#ManifestServ]
#* [#ManifestRead]
# [#Additional resources]
#* [#For more information]
#* [#Filing bugs]
#* [#Getting help]
{anchor:Introduction}
h1. Introduction
This XML preprocessor and parser system has been developed as part of the OpenSolaris Caiman Distribution Constructor project, but is created to be adaptable to any project which uses an XML manifest. It is currently used for the Caiman Automated Installer as well. It is written because off the shelf XML does not provided enough support for the Caiman project.
{anchor:Why another XML system?}
h3. Why another XML system?
{anchor:Off-the-shelf XML does not do enough during setup}
h4. Off-the-shelf XML does not do enough during setup
Document Type Definitions (DTDs) are the first XML format specification. A DTD is a specification against which an XML document can be checked. DTDs support syntax validation: checks that XML elements and attributes are properly ordered and some type checking. They have limited default setting capabilities.
Schemas have succeeded DTDs. Schemas are like a DTDs but have clearer syntax as they are written in XML. Like DTDs they do syntax validation. However, they do not do default setting. There are many kinds of schemas. The one picked for this system is RelaxNG since it is a powerful schema and support for it is already delivered (as libxml2) as part of OpenSolaris.
This preprocessor uses a schema, but does much more. It provides full setting of defaults, whether calculated or hardwired. More than just plain syntax validation, it offers full semantic validation. Both of these features are fully flexible as users write python methods to do the work. The methods are mapped to data nodes in an in-memory XML data tree. Methods also have full access to all data, in case, for example, a default for one data node depends on the value of another.
While not currently implemented, hooks are in place to do layering of multiple XML files to produce an aggregate.
{anchor:Existing data search tools are not enough}
h4. Existing data search tools are not enough
The existing Document Object Model (DOM) searches and returns all XML data tree nodes with a given tag name using a preorder tree traversal order. Additional processing is required to narrow down to a single node. The new XML parser, by contrast, implements an easy way to specify a specific node, even among nodes with the same name.
{anchor:Implementation}
h3. Implementation
The system described on this page is implemented in the following modules found in the slim_source (ssh://anon@hg.opensolaris.org/hg/caiman/slim_source) gate:
# usr/src/lib/install_utils/TreeAcc.py
#* TreeAccNode class - Representation of an XML data node as used by the system.
#* TreeAcc class - Implementation of the low level data tree manipulation and search API
# usr/src/lib/install_utils/ENParser.py
#* Enhanced Nodepathing Parser
# usr/src/lib/install_utils/DefValProc.py
#* Default setting and content validation engine
# usr/src/lib/install_utils/defval-manifest.rng
#* Schema used to validate defval manifest
# usr/src/lib/install_utils/ManifestServ.py
#* High-level preprocessor and parser API, and support for serving data to remote processes.
# usr/src/lib/install_utils/ManifestRead.py
#* Support for remote processes to read data served by ManifestServ
# usr/src/lib/install_utils/SocketServProtocol.py
#* Definitions used by both usr/src/lib/install_utils/ManifestServ and usr/src/lib/install_utils/ManifestRead modules.
# usr/src/lib/install_utils/install_utils.py
#* Contains support routines used for parsing, among other things.
# usr/src/cmd/install-tools/ManifestServ.py
#* ManifestServ commandline interface
# usr/src/cmd/install-tools/ManifestRead.py
#* ManifestRead commandline interface
{anchor:The memory-resident tree structure}
h1. The memory-resident tree structure
{anchor:The basis of in-memory XML data is the DOM tree}
h3. The basis of in-memory XML data is the DOM tree
The basis of the in-memory XML data is a DOM tree set up by Sax2. DOM is chosen so the data can be kept in memory. Anticipated applications have small, bounded amounts of data so there is no problem keeping it in memory.
The tree consists of DOM nodes. The core DOM node represents an XML element. Other DOM nodes may hang off of an element node to represent its attributes, value, or other aspects. xml.dom Python interfaces are used for setting up, writing out, finding data and getting around the tree at the lowest level.
The DOM interfaces and, to some degree, their tree structure, are hidden, wrapped by the TreeAcc API, discussed later. The tree made available by this system takes a different form, as shown in the next section.
{anchor:Nodepaths}
h3. Nodepaths
Define a _nodepath_ to be a route through various tree nodes to get from one place in a tree (usually the root) to certain destination nodes. The simplest nodepath would contain the names of nodes to traverse to get from point A to point B, separated by forward slashes. For example, consider the following tree:
{code}
A
|
-----------------
| |
B C
--------- ---------
| | | |
D E F G
|
---------
| |
H I
{code}
The simplest nodepath to get from A to I would be A/B/D/I.
Note: when starting from the root, specifying the root node is optional. Since it would always need to be specified, it can be implied and left off. The path to I from the top becomes B/D/I.
The nodepath lists element* nodes and attribute** nodes in the same way. The only apparent difference (other than their type) is that attribute nodes may appear only as the last node of a nodepath. For example, one cannot tell from looking at the tree or nodepath whether H is an attribute node or an element node.
\* An XML _element_ has a name and provides room for content in the form of a value and/or sub-elements.
\*\* An XML _attribute_ represents a property of an element, and is bound to the element it helps define.
{anchor:The TreeAccNode class}
h1. The TreeAccNode class
TreeAccNodes are a new software layer wrapped around or centered around a DOM element node. They provide object-oriented interfaces for getting information about the node and the tree it is in.
There are two kinds of TreeAccNodes:
* ELEMENT type:
** Associated with a DOM node. Takes the value of its DOM node.
** Can appear anywhere in a nodepath.
* ATTRIBUTE type: associated with an attribute of a DOM node.
** The parent TreeAccNode of an ATTRIBUTE TreeAccNode always corresponds to the DOM element containing that attribute. This is an internal implementation detail, however; to the outside, an ATTRIBUTE TreeAccNode is used/referenced the same way as an ELEMENT TreeAccNode.
** Always a tree leaf, the last piece of a nodepath.
** There can be multiple ATTRIBUTE TreeAccNodes hanging from a single ELEMENT TreeAccNode.
TreeAccNodes have the following fields:
# name: String to identify the node.
# type: ELEMENT or ATTRIBUTE
# value: String value of the represented attribute or element.
#* A value is optional to an element.
#* A value is mandatory to an attribute.
# attr_dict: a dictionary of attribute key-value pairs.
#* For ELEMENT type, it contains the list of all attributes under the current element.
#* For ATTRIBUTE type, it contains only the value of the current attribute represented.
# element_node: A handle to the corresponding DOM element represented.
# tree: Current DOM data tree instance.
These fields are accessible via methods:
* get_name(): Return the string name of this node.
* get_path(): Return the path from the root to this node.
* get_value(): Return the string value of this node.
* get_attr_dict(): Return the attributes as key-value pairs.
* get_tree(): Return a reference to the tree instance.
* get_element_node(): Return the corresponding DOM element of this node.
* is_leaf(): Return True or False that this node is at the end of a nodepath.
* is_attr(): Return True or False that this node represents an ATTRIBUTE.
* is_element(): Return True or False that this node represents an ELEMENT.
An implementation detail: TreeAccNodes are created in a just-in-time fashion. They are wrapped around a DOM element node and then returned. They contain a pointer to their reference DOM node.
{anchor:The TreeAcc class}
h1. The TreeAcc class
This class implements low-level data tree manipulation and search. TreeAcc instances correspond to XML data trees. Brief discussion of implementation follows. All of the methods below are bound to TreeAcc instances.
{anchor:Instantiation}
h3. Instantiation
A tree is created as follows:
{code}newTree = TreeAcc("xml_file"){code}
{anchor:Basic search}
h3. Basic search
Search is invoked via
{code}
find_node("path/to/node", starting_node=None)
{code}
This begins a search for nodepath "path/to/node". _starting_node_ specifies where in the tree to begin the search; _None_ specifies the root of the tree. The search is done in pre-fix recursive fashion, eating the leftmost path piece as it descends into the tree. A match is returned when the final path piece is matched. If two sibling nodes matched a non-final path piece, control returns to the second sibling and the search resumes to check its progeny in the tree for more matches.
(Note: searching is covered in much more detail in the section on [#Enhanced Nodepathing].)
{anchor:Adding a node}
h3. Adding a node
Adding a node is useful, for example, when adding a default value to the tree where one was not specified in the input XML file. A node is added with the following method call:
{code}add_node(path, value, type, starting_ta_node=None, is_unique=True){code}
This adds a new element containing the value _value_ of type _type_ at path _path_. _starting_ta_node_ specifies where in the tree to start searching; _None_ means start from the root of the tree. Fail if _is_unique_ is _True_ and a node which matches _path_ already exists. Fail if there can be more than one possible parent for the new node.
Implementation is as follows:
# Strip the last path piece from the path provided. This yields the path of the would-be parent.
# Perform the search with the new parent path.
# If only one node is found, that's the parent.
# Add the new TreeAccNode (at least from the perspective of the caller...). Under the covers, what really happens is:
#* If ELEMENT type is specified, create new DOM element node
#* If ATTRIBUTE type specified, add attribute to found DOM element node.
{anchor:Replace the value of an existing node}
h3. Replace the value of an existing node
Call the following:
{code}replace_value(path, new_value, starting_ta_node=None){code}
to replace the value of an existing element or attribute at _path_ with value _value_. _starting_ta_node_ specifies where in the tree to start searching; _None_ means start from the root of the tree. Replacement occurs only if one node matches.
Implementation is similar to add_node():
# Search the path.
# If only one node matches, then replace the value of that node.
{anchor:Save the in-memory XML data tree to a file}
h3. Save the in-memory XML data tree to a file
The following method:
{code}save_tree(out_file){code}
saves the in memory DOM tree to file _out_file_ using xml.dom methods.
{anchor:Walk the entire XML data tree}
h3. Walk the entire XML data tree
All TreeAccNodes can be returned and processed by doing the following:
{code}
walker = tree.get_tree_walker()
curr_list = tree.walk_tree(walker)
while (curr_list != None):
process_list(curr_list) # Your processing goes here...
curr_list = tree.walk_tree(walker)
{code}
Implementation is a mixture of xml.dom methods and home grown code (get_TreeAccNode_cluster_from_element()) that packages up a DOM element node and its child attribute nodes into a list for return.
{anchor:Pre-processing}
h1. Pre-processing
Raw XML data (as read from a manifest, for example) is processed before it is served or consumed. This processing completes and thoroughly validates the data. Processing consists of the following operations:
# Read the raw XML manifest data into memory.
# Fill in the raw data with defaults.
# Perform semantic validation on the data.
# Validate the data against schema via xmllint. The xmllint utility is part of libxml2.
{anchor:Input files which come into play}
h3. Input files which come into play
# Manifest to be validated: The input specification to the program employing this XML processing.
#* Provided by the user of the project.
#* May be specified incompletely if defaults will be filled in.
#* Name ends in ".xml" suffix, e.g. DC-manifest.xml
# Manifest schema: Schema against which the manifest is validated using xmllint.
#* Developed as part of the project.
#* Name is keyed to the manifest and ends in ".rng", e.g. DC-manifest.rng
# Defval manifest: Specification of defaults and semantic validation processing.
#* Developed as part of the project.
#* Name is keyed to the manifest and ends in ".defval.xml", e.g. DC-manifest.defval.xml
# Defval manifest schema: Schema against which the defval manifest is validated using xmllint.
#* Delivered as a part of this XML system.
#* Name is "defval-manifest.rng"
#* The same for all projects.
{anchor:Introducing the defval manifest}
h3. Introducing the defval manifest
After a project's XML schema is written to define the syntax of a project's XML manifest, a defval manifest can be written to define and set default values in it, and to describe how to perform semantic and content validation on it. This makes the project's manifest data robust.
{anchor:Helper methods}
h3. Helper methods
Helper methods are methods written as part of the project, which perform validation and calculation of defaults. The defval manifest defines mappings of helper methods to nodes of in-memory data. The methods operate on the data, to validate it or to provide a default value for it.
Validator methods are given as an argument a nodepath corresponding to TreeAccNodes to validate. They return True if data validates, False if not. Note that the TreeAccNodes being validated contain a pointer to the tree itself as well as the value to validate, so the method can reference any part of the tree it needs to perform its function.
Deflt-setter methods return a string value which will be placed into a new node. For context, they are passed as an argument a handle to the parent_node of where the new node would be placed. As with validator methods, the deflt-setter methods have access to the entire tree through the node passed to it.
{anchor:Makeup of a defval manifest}
h3. Makeup of a defval manifest
{anchor:Declaration of helper methods}
h4. Declaration of helper methods
Helper methods are defined by their name and the module they live in. The {{<helpers>}} section of the defval manifest helps shorten helper references made throughout the rest of the file, since the defval manifest may refer to these helpers many times.
Each helper method must have an entry in this section. Here is the format:
{code}
<helpers>
<validator <!-- Validator entry looks like this -->
ref="shortcut_name"
module="module-where-method-lives"
method="methodname"
invert="True"
/>
<deflt_setter <!-- Default setter entry looks like this -->
ref="shortcut_name"
module="module-where-method-lives"
method="methodname"
/>
</helpers>
{code}
All attributes except the {{invert}} validator attribute are mandatory. When {{invert}} is specified and set to "True", the result of the validator method is inverted. This is useful to apply the same validator to different nodes which validate with opposite values. For example, the same validator can be used to validate nodes which must be numeric and to validate other nodes which must *not* be numeric.
{anchor:Mapping of defaults (hardwired and calculated) to a node}
h4. Mapping of defaults (hardwired and calculated) to a node
Default values may be calculated or hardwired.
{anchor:Hardwired default values}
h5. Hardwired default values
Here is the format of a defval manifest entry that maps a hardwired value to (nodes matching) a particular nodepath. (Note that there can be multiple nodes which match.)
{code}
<default>
nodepath="/path/to/node"
type=[ "element" | "attribute" ]
missing_parent=["create" | "skip" | "error"]
skip_if_no_exist="ancestor/node/path"
empty_str=["set_default"|"valid"|"error"]
from="value"
hardwired-value
</default>
{code}
Explanations:
* The _nodepath_ is from the root of the tree.
* _missing_parent_ describes how to handle setting a default if no viable parent node exists in the tree.
** _create_ creates one or more ancestor nodes to forge a path from the root to where the default would be set. This is useful for creating nodes with default values for nodes which must exist. Note that no node is created if the location of the created parent cannot be locked down to a single place in the tree.
** _skip_ does not create a default node if a viable parent node does not exist. This is useful when it is inappropriate to create a new parent node. For example, if a "home directory" node exists as a child to a "user" node, it does not make sense to create a new user just to be able to create a home directory for it.
** _error_ causes preprocessing to fail if a viable parent node does not exist.
* _skip_if_no_exist_ dictates what to do if an optional high ancestor (for example, one level down from the root) does not exist. This is useful when a huge part of the tree (as a unit) is not specified nor desired; it prevents the attempted setting of a default in the missing part of the tree from establishing the undesired part of the tree. _skip_if_no_exist_ conditions take precedence over _missing_parent_ conditions. Consider the tree in the [#Nodepaths] section: if the default for H is set up with "skip-if-no-exist=B", then H won't get a default set up for it if B doesn't exist, even if 'missing-parent="create"' is set for it.
* _empty_str_ dictates how to handle nodes with empty strings.
** _set_default_ indicates to treat nodes with empty strings as if they are not specified, and to set a default value in them.
** _valid_ indicates that empty strings are valid values as is.
** _error_ indicates that empty strings trigger errors
* _from="value"_ indicates that this entry specifies a hardwired value. That is, the value of this entry is a hardwired value.
Note that a value string containing spaces is interpreted as a single value. If the value is a list, enclose the individual items in single- or double-quotes to individuate them.
{anchor:Calculated default values}
h5. Calculated default values
Default values, for example those requiring context for determination, may also be calculated by a Python method. To do this, change _from_ to indicate _"helper"_ and indicate the helper method where the hardwired value was:
{code}
<default>
nodepath="/path/to/node"
...
...
from="helper"
helper-method-shortcut-name
</default>
{code}
{anchor:Mapping of validator helper methods to a node}
h4. Mapping of validator helper methods to a node
Semantic and content validation is done by mapping a validator method to a nodepath. All nodes which match the nodepath will be passed to the validator method. The method will return True for each node that validates. There are three ways of invoking validator methods:
{anchor:Nodepath validation}
h5. Nodepath validation
{code}
<validate nodepath="path/to/nodes"
missing= [ "ok" | "ok_if_no_parent" | "error" ]
skip_if_no_exist="ancestor/node/path"
>
"helper-method-shortcut-name" [..."helper-method-shortcut-name"]
</validate>
{code}
Explanations:
* The _nodepath_ is from the root of the tree.
* _missing_ describes what to do if one or more viable parent nodes (whose nodepath matches all but the last part of the given _nodepath_) has no child nodes which match the _nodepath_.
** _ok_ treats missing nodes as if they were present and valid. This could be used as an alternative to setting a default value, allowing the project itself to default to a value internally.
** _ok_if_no_parent_ treats missing nodes as valid only if their parent is also missing. An example of use would be a "home directory" node which is a child to a "user" node. If the user exists the home directory must be valid, but there does not have to be a "home directory" node when there is no parent "user" node.
** _error_ treats missing nodes as errors.
* _skip_if_no_exist_ dictates what to do if an optional high ancestor (for example, one level down from the root) does not exist. This is useful when a huge part of the tree (as a unit) is not specified nor desired; it prevents validation failure when the optional subtree which would contain the nodes to be validated does not exist.
* The value of a nodepath validation entry is a list of one or more validator method shortcut names. Quotes around each name are needed if there is more than one item in the list.
{anchor:Group validation}
h5. Group validation
Validate nodes in a group of nodepaths with a single validator method. This is useful when many types of nodes require the same validation method. Validation occurs only on nodes which are present. Does not err on missing nodes.
{code}
<validate group="helper-method-shortcut-name">
"path/one/to/nodes" [ ..."path/n/to/nodes" ]
</validate>
{code}
{anchor:Exclude validation}
h5. Exclude validation
Like _group validation_, _exclude validation_ validates nodes which match one of many possible nodepaths. However, it is the invert of group validation, checking all nodes which *do not* match any given nodepaths. Validation occurs only on nodes which are present; does not err on missing nodes.
{code}
<validate exclude="helper-method-shortcut-name">
'path/one/to/exclude" [..."path/n/to/exclude" ]
</validate>
{code}
{anchor:How defaults and content validation works}
h3. How defaults and content validation works
Initial defval manifest processing reads the {{<helpers>}} list and builds dictionaries of methods and modules, both indexed by the helper-method-shortcut-name. Later, as a node is presented for processing, strings for the method and module specified for that node are looked up. Method and module strings are passed to the python getattr() builtin to retrieve a callable function object. That function is then called to do the processing.
For all default setting, and for _validate nodepath_ and _validate group_ processing, the TreeAcc search facilities are used. For _validate exclude_ processing, the search is made using the tree walking facilities (also made available by the TreeAcc module).
{anchor:Preprocessing cycle}
h3. Preprocessing cycle
Processing has been described at a high level in the beginning of this chapter. Now that all of the pieces have been described, a more detailed breakdown can be presented:
# The defval manifest data is read into memory from the defval manifest XML file. Call this tree the _defval tree_.
# The _defval tree_ is validated against the defval manifest schema using xmllint.
# The raw data is read into memory from the manifest XML file. Call this tree the _manifest tree_.
# The _manifest tree_ filled in with defaults.
# Semantic validation is done on the _manifest tree_.
# The _manifest tree_ is written out to a temporary file.
# The temporary file is validated against schema via xmllint.
# If everything validates, the project will be served data from the _manifest tree_.
{anchor:Enhanced Nodepathing}
h1. Enhanced Nodepathing
When the TreeAcc class search facilities are given a nodepath which consists of names of nodes, it will return a list of all nodes which match that nodepath. Enhanced Nodepathing constrains the search to a subset or even a specific node, by looking at values of ancestor nodes found during the search. It can further narrow which ancestor nodes are relevant by looking at values of their progeny (child nodes, grand-child nodes, and so on), even though those progeny do not fall in the ancestral search path. Constraints can be aggregated, so searches can be narrowed as much as needed.
{anchor:Specifying the constraints as part of the nodepath}
h3. Specifying the constraints as part of the nodepath
Enhanced Nodepathing introduces a new language for specifying constraints as part of the nodepath itself. Here is the Backus Naur Form for that language:
{code}
nodename :== name of a node (string)
value :== value of a node (string)
nodepath :== nodename | nodepath/nodename
valpathspec :== nodepath=value | valpathspec | valpathspec:valpathspec
token :== token | token/token | nodename | nodename=value | nodename[valpathspec]
enhanced_nodepath :== token
{code}
Consider the following tree:
{code}
A
|
-----------------------
| |
B B
(value=v1) (value=v2)
| |
----------------------- -------------------------
| | | | | |
D C C C C D
(value=v3) (value=v4) (value=v5) (value=v6) (value=v7) (value=v8)
| | \ | | \
E F \ E F \
(value=v9) (value=v10) \ (value=v11) (value=v12) \
G G
(value=v13) (value=v14)
{code}
|| enhanced nodepath || values returned ||
| A/B/C | v4, v5, v6, v7 |
| A/B=v1/C | v4, v5 |
| A/B\[C=v5\]/D | v3 |
| A/B\[C=v4:C/E=v9\]/C\[F=v10\]/G | v13 |
Any value can be put inside single or double quote pairs so that special characters \[\]:=/ are not interpreted by the parser. This allows for values such as {{http://pkg.opensolaris.org}}.
Values which are lists (e.g. '"en_us" "Posix/C"') cannot be specified in enhanced nodepaths.
{anchor:How it works}
h3. How it works
The first thing the search entry point find_node() does is call parse_nodepath() of module ENParser.py to parse the path. The parse_nodepath() function creates a list of ENToken objects, each of which represents a parsed token. A token is delimited by forward slashes.
ENToken objects have three parts to them:
# name: the name of the node represented by the token.
# values: a list of values specified to narrow down the search.
# valpaths: a list of paths to values.
The ENTokens of the examples in the previous section are as follows:
{code}
A/B/C A/B=v1/C A/B[C=v5]/D A/B[C=v4:C/E=v9]/C[F=v10]/G
name:A name:A name:A name:A
values:[] values:[] values:[] values:[]
valpaths:[] valpaths:[] valpaths:[] valpaths:[]
name:B name:B name:B name:B
values:[] values:[v1] values:[v5] values:[v4, v9]
valpaths:[] valpaths:[] valpaths:[C] valpaths:[C, C/E]
name:C name:C name:D name:C
values:[] values:[] values:[] values:[v10]
valpaths:[] valpaths:[] valpaths:[] valpaths:[F]
name:G
values:[]
valpaths:[]
{code}
The core search methods of the TreeAcc class are ___search_node()_, ___match()_, ___is_bracket_match()_ and ___do_dots()_.
___search_node()_ is the main search coordinator method. It calls __match() to look for matches on the current ENToken. If it does not find any, it returns. It then calls __do_dots() to eat and process ".." ENTokens (and possibly end up higher in the data tree). Then it calls itself to check children nodes for a match on the next ENToken. When it has a match for the last ENToken in the nodepath, it appends to the list of results a new TreeAccNode representing the found data. Since __search_node() is recursive, when it returns, control may return to a previous call instance which may then recurse down the tree in another direction. When all recursion is done, a list of found TreeAccNodes is returned to the caller.
___match()_ checks for a match on the current ENToken. If the current ENToken has only a name specified, this is a simple check on the current node. If it has a value specified, the name and value are checked. If the current ENToken has a valpath, __match() calls __is_bracket_match() to recurse down the valpath looking for a match on the corresponding value.
___is_bracket_match()_ does valpath searches. It makes use of __search_node(). Valpath searches are made with the cmp_match flag set so that success is returned and seeking stopped once a single matching value is found.
___do_dots()_, as already stated, reverts back up the tree when it encounters ENTokens with a name field of "..".
{anchor:ManifestServ and ManifestRead implementation modules}
h1. ManifestServ and ManifestRead implementation modules
This section focuses on the ManifestServ and ManifestRead implementation modules which are in the gate at usr/src/lib/install_utils or installed on a system in /usr/lib/python2.4/vendor-packages/osol_install. These modules are not to be confused with other modules of the same name, which are in the gate at /usr/src/cmd/install-tools or installed on a system in /usr/bin. The latter modules are commandline interfaces which operate the former modules.
The ManifestServ and ManifestRead implementation modules comprise an upper software layer to the TreeAcc, DefValProc and other modules already discussed. They provide easy-to-use interfaces for a project (such as the Distribution Constructor) to call to set things up and access data.
{anchor:ManifestServ implementation module}
h3. ManifestServ implementation module
ManifestServ objects export the following methods for setup and use of XML data:
# ___init\_\_()_: Most setup occurs when a project instantiates a ManifestServ object based on an XML manifest. The manifest is loaded, and its [#Preprocessing cycle] where its defaults get set and its content validated is completed. Among other things, the returned object has methods for retrieving data. The constructor takes several arguments:
#* _manifest_: Name of the project manifest file. The ".xml" suffix is optional and will be appended if not provided. Names of the manifest schema and defval manifest files are keyed off this name (less any provided ".xml" suffix) if the _valfile_base_ argument is not provided or None.
#* _valfile_base_: When provided, this name overrides _manifest_ for building the defval_manifest and manifest schema filenames. When provided, the defval_manifest will be called _valfile_base_.defval.xml and the manifest schema will be called _valfile_base_.rng. _valfile_base_ may contain prepended directory structure.
#* _out_manifest_: When provided, a nicely-formatted XML manifest file that includes all preprocessing will be output to this name. Defaults to None if not provided.
#* _verbose_: When _True_, enables on-screen printout of defaults, content validation and schema validation processing. Defaults to _False_.
#* _keep_temp_files_: When _True_, leaves the temporary file around after termination. The temporary file is the one sent to xmllint for final syntactic validation. The temporary file name is of the form "/tmp/_manifest_\_temp\_<pid>.xml". Default is _False_.
# _get_values()_: How a process holding a ManifestServ object retrieves XML data. It takes the following arguments:
#* _request_: An enhanced nodepath of requested data.
#* _is_key_: Special provision is made in the manifest for general key-value pairs. This argument will circumvent the need for a full nodepath for a key-value pair; when _True_ only the name of the _key_ needs to be provided. This works because all key-value pairs defined are placed in an expected location in the manifest. The location is at a level just below the root of the data tree in a section called {{<key_value_pairs>}}. The section in the manifest is formatted as follows: {code}
<key_value_pairs>
<pair key="key1" value="value1">
<pair key="key2" value="value2">
</key_value_pairs>
{code}
The preferred way to retrieve the value of the second key would be: {code}
values = manifest_tree.get_values("key2", "True")
{code}
Note that if a retrieved value is a list, each list item will be returned individually. A value string not enveloped in single- or double-quotes in the manifest is seen as a single value, even if it has spaces. Lists of items may be stored as a single manifest value by enveloping each item with single- or double-quotes.
The remaining ManifestServ implementation object methods deal with the socket server. The socket server makes the XML data available to other processes on the same host. This is needed by the Distribution Constructor, which runs shell and Python scripts in separate process space from the main program which holds the ManifestServ object. Communication goes through an AF_UNIX socket. The protocol is defined in the SocketServProtocol.py module, placed in the same directories as the ManifestServ and ManifestRead modules it serves. The socket server is accessed via the following ManifestServ methods:
# _start_socket_server()_: The socket server is not started by default. Call this method to start it. To debug protocol errors, pass _True_ to its one argument to enable debugging.
# _stop_socket_server_: Call this method to shut down the socket server before terminating the calling process. Failing to do this will leave the socket node lingering in /tmp upon process termination.
# _get_sockname()_: Retrieve the socket name with this call. The socket name is usually retrieved to pass to one of the ManifestRead modules. For example, the Distribution Constructor retrieves it to pass it along to finalizer scripts which call ManifestRead. The socket name will be {{/tmp/ManifestServ.<pid>}}
{anchor:ManifestRead implementation module}
h3. ManifestRead implementation module
Python scripts or the ManifestRead commandline interface module will instantiate a ManifestRead implementation object in order to retrieve data through the socket server. ManifestRead implementation objects have the following methods:
# ___init\_\_()_: Instantiate a ManifestRead object. Takes the socket name as its only argument.
# _get_values()_: Retrieve XML data through the socket and socket server. It has the same syntax and returns the same data as its ManifestServ counterpart. It takes the following arguments:
#* _request_: An enhanced nodepath of requested data.
#* _is_key_: Interpret _request_ as a key. Please see ManifestServ _get_values()_ for more information.
# _set_debug()_: Call this method with a _True_ argument to enable debugging, _False_ to disable.
{anchor:SocketServProtocol module}
h3. SocketServProtocol module
This module defines the protocol used between ManifestServ (server) and ManifestRead (client) implementation modules. Protocol is as follows:
# Client->server: Prerequest containing is_key and request size is sent.
# Server->client: Sends PRE_REQ_ACK (pre-request-acknowledge) back to the client.
# Client->server: Request in the form of a nodepath is sent.
# Server->client: Send count of matching nodes, and the size of the entire results string (which includes all results).
# Client->server: Send RECV_PARAMS_RECVD (receive parameters received) to acknowledge.
# Server->client: Send strings of matching node values as a single string with STRING_SEP in between each. Send EMPTY_STR instead of empty strings, when applicable. Last string (which is at the index = one more than the count returned to the client) is REQ_COMPLETE
# Client->server: Another prerequest, or TERM_LINK is sent.
This module also defines where the key-value pairs are expected in the manifest.
{anchor:ManifestServ and ManifestRead commandline modules}
h1. ManifestServ and ManifestRead commandline modules
{anchor:ManifestServ}
h3. ManifestServ
The ManifestServ commandline module is a useful testing tool for checking out the viability of an XML manifest or a schema or defval manifest which validates it. Invoke with -v to see detailed output of the default setting and semantic validation processing, if that processing is failing. Invoke with -t to save the temporary file to see where manifest validation against the schema is failing. Invoke with -o to generate a new nicely-formatted XML file which accounts for all preprocessing.
The ManifestServ commandline module serves as a standin for a project which will use the data, by creating a ManifestServ implementation object and providing a commandline query loop to retrieve XML data. It's usage is as follows:
{code}
Usage: ManifestServ [-d] [-h|-?] [-s] [-t] [-v] [-f <validation_file_base> ]
[-o <out_manifest.xml file> ] <manifest.xml file>
where:
-d: turn on socket debug output (valid when -s also specified)
-f <validation_file_base>: give basename for schema and defval files
Defaults to basename of manifest (name less .xml suffix) when not provided
-h or -?: print this message
-o <out_manifest.xml file>: write resulting XML after defaults and validation processing
-t: save temporary file
Temp file is "/tmp/<manifest_basename>_temp_<pid>.xml"
-v: verbose defaults/validation output
-s: start socket server for use by ManifestRead
{code}
Once in the query loop, specify the enhanced nodepath to retrieve data. Specify "+key" to switch to a mode to enter keys instead of full nodepaths to retrieve key-value pairs. Specify "-key" to switch back to the normal mode which recognizes enhanced nodepaths.
If "-s" is specified, the name of the socket to specify to ManifestRead will print just before the query loop is started.
The command to run on a customer system to help a customer debug a DC manifest, accounting for where files are installed, would be something like:
{code}
ManifestServ -f /usr/share/distro_const/DC-manifest -v -t <manifest.xml file>
{code}
See the section on the [#ManifestServ implementation module] for more information on storage and retrieval of a value which is a list of items.
{anchor:ManifestRead}
h3. ManifestRead
The ManifestRead commandline module is intended for use by shell scripts, such as finalizer scripts called by the Distribution Constructor. It takes a list of either nodepaths or keys (but not a mixture of the two), and returns the corresponding values. By default, if more than one nodepath or key is requested, the request is prepended to the results. Each result will be on its own output line. See the section on [#ManifestServ implementation module] for more information on storage and retrieval of a value which is a list of items.
ManifestRead has the following usage:
{code}
Usage:
ManifestRead [-d] [-r] <socket name> <nodepath> [ ...<nodepath> ]
ManifestRead [-d] [-r] [-k] <socket name> <key> [ ...<key> ]
ManifestRead [-h|-?]
where:
-d: turn on debug output
-h or -?: print this message
-k: specify keys instead of nodepaths
-r: Always print nodepath next to a value
(even when only one nodepath is specified)
{code}
{anchor:Additional resources}
h1. Additional resources
{anchor:For more information}
h3. For more information
# DC design document:
#* gate: ssh://anon@hg.opensolaris.org/hg/caiman/caiman-docs gate
#* file: distro_constructor/DC_DESIGN_DOC.odt
# RelaxNG schema: http://www.oasis-open.org/committees/relax-ng/tutorial.html
# Python DOM howto: http://pyxml.sourceforge.net/topics/howto/xml-howto.html
# xml.dom information: http://www.python.org/doc/2.4.2/lib/module-xml.dom.html
{anchor:Filing bugs}
h3. Filing bugs
Bugzilla (defect.opensolaris.org)
Classification:development, product:distro-constructor
{anchor:Getting help}
h3. Getting help
email: caiman-discuss@opensolaris.org
chat: #caiman-discuss channel at irc.freenode.net