SiteMap
Page Bottom  Documentation

XML

Probably someone should look at Jonathan Barman's article on XML in Vector 17.4 - conversion of arbitrary arrays to and from XML which is probably how they should be passed around. This code is very portable. Is this our VML parser?

Also, Mark Osborne's digital image facility uses a very simple XML file format. Let's start working with this.

Also see Bill Parke's function "ConvertHTMLtoXHTML" and Eric Lescasse function "ConvertToXHTML ".

XML Tools for APL+Win (excerpts), by Davin Church, Creative Software Design, October 2004 Introduction As XML plays an increasing role in the computer industry, APL developers increasingly need a fast and simple way to process data in this array-oriented environment. This document describes some APL utility functions that can make that job easier. XML Primer If you have not yet learned XML, a brief introduction to its concepts, terminology, and structure is in order. Those already experienced with XML may wish to skip this section. XML is technically a data representation. It is a way of describing data in a standard text-based format so that different computer programs (and even humans) can easily read and write it. XML is not particularly good for use inside a single program ? it is instead designed to transfer data between programs. Thus it is said to be useful ?only at the borders? of applications and machines. XML uses plain ASCII text to represent its data (even numbers are written as text). This text may be stored in any convenient form, but is usually found as a file on disk. It may also be sent as a data stream over the Internet or held in memory during processing. In APL, raw XML may be kept as a simple character vector in a variable. (See the section below for a way of storing it structurally in APL.) XML text contains data, of course, but it also contains a structural representation for that data. In many ways, this is similar to a nested array in APL where data has a size and shape, and each item of that data may itself contain more structured data. This is done in XML in a similar way. XML holds data in something called elements, and each element may itself contain other elements as needed. While data in XML is stored as simple text, XML structural information is set off from that data by enclosing it in a pair of angle-bracket characters (you know these as the less-than and greater-than symbols, ?< >?). Anything enclosed in these symbols is known as an XML tag, and is processed by XML-aware programs to provide structure. Anything outside these symbols is data. An example tag is: XML tags normally come in two forms: a start-tag and an end-tag. In XML, these always come in pairs and surround the data they describe. The first word in a start-tag is called the tag name (also element name, see below) and its matching end-tag has the same name preceded by a slash (?/?) character. So, if a start tag was then its matching end-tag would be . The combination of a start-tag, some optional contents, and an end-tag is called an element in XML. While elements always have a matching start-tag and end-tag, the tag/element is named by the XML developer (similarly to naming a variable). Thus, while all XML files are structurally similar, they all look different because they all name their data differently. Of course, any programs sharing such a file would have to agree on what names to use. Here's an example XML element: DreamĀPark If an element has no contents, then the start-tag may be immediately followed by the end-tag. Or, you may use a special-case composite tag to represent a null element by placing a slash (?/?) at the end of the sole tag (combining both the start & end tags into one), as in: . An element's start-tag may also contain extra information, usually used to describe the element contents in some way. These extra bits of information are called attributes, and they are each composed of an attribute-name (arbitrarily named), an ?=? separator, and an attribute-value (which is simple text, always enclosed in quotation marks). The attributes are listed inside the start-tag, separated by spaces. End-tags never have attributes. For example: Dream Park Multiple XML elements are simply listed one after the other, though they are often placed on separate lines for human readability. For example: _Dream Park _The Barsoom Project _The California Voodoo Game Of course, this doesn't represent very complex data. Usually, XML data is nested, where an element may contain one or more other elements. (It is acceptable, but not often done, that an element may contain both data and sub-elements in any combination.) For instance, if you wanted to track the book's title and author, it might be done in this way: _ __Dream Park __Larry Niven __Steven Barnes _ Notice that the element and sub-elements were shown listed on separate lines and indented. This is purely for readability and is ignored when processing the XML data. The element content itself may also be listed on a separate line, if preferred, as in: _ __Dream Park _ All XML files must contain exactly one main (top-level, outside) element, called the root element. All the actual data in the XML file is contained somewhere within the root element, often as a list of parallel (identically-named) sub-elements. Just in case you'd like to use them, comments may also be included anywhere in the XML text for human readability (they are not processed by applications). Comments are defined as a standalone tag that starts with the text ?? (including the leading space, or at least a character that?s not another dash). Any other text may be contained within the comment tag (except the end marker text), including XML tags & data (which are ignored because they're in a comment). So be sure to end any comments exactly right. Comments may not be nested. Well, that's the basic idea. To learn about the more complicated aspects of XML, numerous books and on-line resources on the subject are available. An XML Data Structure for APL Naturally, programs need to be able to read and write XML. It is possible to do this character-by-character, but this process is extremely clumsy (especially for reading). To make this easier for programmers, generic XML processing programs have been written (in and for various languages and operating systems). For instance, Microsoft has created one such program (/library/object) for Windows called MSXML. However, they wrote it primarily for use by scalar languages like VB and C++. We can certainly use this in APL, but it is slow, awkward, complicated, and requires lots of looping and arcane commands. Wouldn't it be easier just to store the XML in a variable using a nested data structure so we can just process it with APL functions and primitives (particularly ?Each?) as a whole? Here's one way to do just that. Note: The following description contains some seemingly complex technical details. Not all readers will wish to know the internal structures described here (especially if they are only writing XML and not interpreting it) and may prefer to skip over this section until it is needed. This structure is quite simple in concept, but can be deeply nested and the variety of options may be confusing at first, so be patient when reading (and re-reading if necessary) the description below. The content of this XML data structure is inherently a character (text) vector. If it contains no XML coding, then it is a simple (unnested) text vector. If an XML element is found, then the entire element (start-tag, contents, and end-tag) is coded, APL-enclosed into a nested scalar, and substituted into the vector as if it were a single character. For instance, the following XML fragment: _? the book Dream Park is about ? would produce a vector that looks something like: _? the book @ is about ? where the ?@? in the above text is actually a nested scalar. (See below for the structure of the nested item.) Multiple consecutive elements without any text data around them simply produce a vector of these nested items without any normal characters. Since a valid XML file must contain exactly one root element (and rarely has anything special [of interest] outside it), then decoding such a file would typically produce a vector of length one containing the nested version of the sole root element. Does that make sense so far? If not, try reviewing the discussion above once more. Otherwise, you are likely to become more confused when we re-use this same concept again below. Ready? Now for the complicated part... Any XML element will be a nested scalar (as an item in the above vector) containing a three-item vector, as follows: _[1] The element name _[2] The element's attributes (described below) _[3] The element content All XML elements will have exactly those three parts, and each part is (of course) itself nested to contain the above information. The first item is the simplest and just contains the name of the XML element as a character vector. Use this, especially with ?first-each? (??), to locate any elements that you would like to process in parallel. The second element is the most complicated, but fortunately it is the least used. It contains a list of all the attributes given in the element's start-tag. If there are no attributes given (which is quite common), then this will be an empty vector. If there are attributes, then this is a nested vector containing one item per attribute. Each such attribute item is itself a (nested) two-item vector, containing (nested again) the name of the attribute and its value. The attribute name and value are simple character vectors (text strings). Since attributes are always two-item vectors, it would normally make sense to structure them as a two-column matrix. Unfortunately, this makes APL code more difficult to process lists of elements and their attributes with ?each? (?). Since this is likely to happen often with attributes, the structure is instead defined as a more deeply nested vector (but see below). The third item in the element vector is the element?s content. This is defined to be precisely the same as the top level of the data structure as described above! Thus, this could be called a recursively-defined structure. So an XML element that contains only text (the data content) would have a simple (unnested) character vector here. An empty (null) element would have an empty vector. And an element with only one or more sub-elements would have here a nested vector with that many items in it, one for each sub-element and each one defined as above. A mixture of text (data) and sub-elements is also possible, though rarely used in practice. This data structure can become quite deep, depending on the source XML, but processing it is usually rather easy. Most XML files are simply a list of parallel elements. In APL, this is represented by a vector of (nested singleton) items, each of which is one element. Such an element list can be easily processed in sequence by calling a function (to process one element) with ?Each? (?), or by looping through them with ?:FOR?. If those elements contain sub-element lists, then they too can be processed in the same manner. Also, if you have a list of different kinds (names) of elements, then ?First-Each? (??) will extract out the names of those elements. Those names can be examined with simple APL and the vector compressed (/) to select only the desired elements for further processing. The ?2-Pick-Each? (???) on the vector will return only the element?s attributes for examination, and a ?First-Each? (??) on those will return just the names of the attributes for each element. The ?3-Pick-Each? (???) will, of course, return all the elements? contents without their names or attributes. Processing most incoming XML is usually relatively simple because the expected structure is known in advance. However, if you need to detect the actual parent-child structure, then examining the depth of an element?s content will tell you whether it is plain (text) data or whether it still contains sub-elements. Or if you?re more comfortable with matrices, change an (element-only) vector into a three-column matrix with ?disclose? (?) ? the names will be in ????, the attributes in ????, and the content (possibly nested further) in ????. The attributes column could even get a further ?disclose-each? (??) to turn each of them into a two-column matrix, if that is preferred. For example, the following XML: _ __Dream Park __Larry Niven __Steven Barnes _ ?would be structured in APL code (with lines wrapped & indentation added for readability) as: ???????????????????????? _????????????????????? __??????????????????????????????????? __???????????????????????????????????? __????????????????????????????????????? _? ? Finally, there is one additional structure that may occur occasionally. Instead of the 3-item ?element? structure shown above (which always has a relative depth of at least 2), a nested item could instead be a simple (relative depth 1) character vector. This might occur when a ?symbolic name? is included in the XML content. There are several standard symbolic names in XML and these are usually handled automatically for you. But in the cases where the XML document author has invented new symbolic names (with entity declarations), these are not automatically converted into their equivalent values. In such cases, they are identified by being enclosed as an independent, singly-nested item in the coded vector. For instance, the following XML fragment: _? the book &booktitle; is about ? would produce a vector that looks something like: _? the book @ is about ? where the ?@? in the above text is actually a nested scalar containing only the character vector ?&booktitle;? within it. Your program would then have to know what to do with such a nested scalar if it is encountered. Parsing (Decoding) XML When your application is given XML to process, it is necessary to interpret it logically. This is difficult while it is still in plain text form. Rather than using external programs to perform the interpretation for you, use the ???????? utility function to turn the text into the nested APL data structure described above. This function does not use Microsoft?s MSXML library. One reason for this is because MSXML is not guaranteed to be available on any particular machine, and even if it is there the version is in doubt (and is important). Another reason is that MSXML has a reputation for being quite slow. ???????? was written to be a standalone function and to run as fast as APL allows. Once the data structure has been created, it may be processed using loops, subroutine calls with ?Each?, or just straight-line code to handle whole vectors at once or individual scalar pieces. Use of ?First-Each? (??) and ?Pick-Each? (??) are extremely useful in this regard, as noted in the structural description above. However, this style can be quite imposing for many APL programmers and usually produces less-than-readable code. To this end, an additional utility function is available to help process data in this form. It is called ??????? and uses a syntax similar to XML?s standard XPath language. The entire XML structure (or any legal subset of it) is passed to ??????? along with the element name(s) or other information to select from it and the matching subset of the structured XML is returned as a result, ready to be processed. So, a simple example for processing the previous XML example would beonstructing (Encoding) XML For simple XML, it is quite reasonable for your application to produce formatted text directly. However, there are many details that still need to be handled and it can often be rather unwieldy. For one thing, you?ll usually want to create the XML in a variable before disposing of it by writing it to disk or sending it over the Internet or out via email. But if you?re producing a large XML output, this can become very slow due to repeated copying of the data during memory management. Also, you will often wish to produce properly indented lines for good human readability, but keeping track of this is tedious and any changes to the indentation depth (especially if adding a new level at the top) can be particularly frustrating. Large and complex structures are also very error-prone in several different ways. Many other issues are likely to be encountered as well, so an alternative mechanism is in order here. To deal with all of these problems, utility functions have been written to simplify XML-creation coding and make it faster and more readable. The same data structure described above is also used for output. Once the structure is created, the ???????? function is used to turn the whole thing into plain text for final output. To assist in creating the structured data, a function named ??????? is provided that produces a nested singleton containing an entire XML element encoded as described above. For most needs, this is as simple as providing the element name as a left argument and the element contents as the right argument. Here is an example function to create the sample XML shown above: ?????????????????????????? ???????????????????????????????????????????????? ??????????????????????????????????????? ?????????????????????????????????????????????? ???????????????????????????????????????????????? ????????????????????????????????????????????????????????????? ??? ???????????????????????????????????????????????????????????????? ?????????????????????????????? ??? ???????????????????????????????????????????????????????????????? ??????????????????????????????????????????????????????????????? ???????????????????????????????? ???? ???????????????????????????????????????????????????????????? ??????????????????????????????????????????????????????? ?????????????????????? ????? For more complicated situations, subroutines could be called in loops or with ?Each? (?) to produce the output in pieces, and then assemble them together for the final output.. Now, this can still be a bit tedious if you have lots of data or a complex structure. And since we use an array-based language, it might be assumed that we have our data to be encoded already available in a nested array. So here?s an alternate way to do the same thing by using a data array and the ???????? function (designed to work with entire arrays at onceo it?s quite easy to handle regularly structured data, and methods and pieces of data can be combined together as desired. XMLItem (and XMLItems) can also deal with element attributes and unusual elements like comments or declarations. And as you might have noticed, ???????? and ???????? are approximate inverses of one another. Other XML Operations Validation While ???????? is a quick and easy way to get XML text converted into a useful form, it does not also validate the incoming XML at the same time. Validation is a special term used by XML to mean (in general) that the XML conforms to an agreed-upon naming and construction convention. This is not to be confused with another XML term: well-formed. XML that is well-formed obeys the syntactical construction of XML, with angle brackets around tags, matching and nested start and end tags, quotes around attribute values, etc. All XML processing routines (including ????????) verify well-formedness, making sure that they?re looking at legal XML encoding. But validation goes a step further to verify things. For example, one valid element name is and that one or more of them must be (and can only be) contained within a parent element. XML standards allow XML processing engines to be designated as either validating or non-validating processors. ???????? is a non-validating processor. So this means that while it correctly deconstructs the XML, it does not confirm that it was what you were expecting. That is, in general, left up to your application to detect and either ignore or respond to it however you feel is appropriate. This is usually fine as your code is normally already going to have to know what to expect and what to do with it, and you?ll have already determined that the source of your XML is producing correct code for you. However, in some cases you may not be too sure about your source data and you?d like it checked more thoroughly. In that case, you can call Microsoft?s MSXML validating processor to do a complete analysis for you and report any problems. A cover function has been written to do this for you (since it?s not used as often and therefore a standalone solution isn?t as needed). The utility function is named ???????????. Call it with the source XML text and it will return a nested vector of error information. The first item of this vector is either empty (a ??) if the XML is valid or the text of an error message otherwise. Transformation There are many generic things that can be done with XML, and one of the most common is called transformation. This is an operation that takes an XML file as input data, along with a second XML file that describes the transformation to be performed, and it produces a new output using those transformation rules. This second file is designated as an XSL or XSLT file and is called a stylesheet or template. A transformation could potentially produce nearly any kind of text file as output, but its two most common forms are to produce either (1) a new XML file in a different structure, or (2) an HTML file that can be displayed on a browser. In fact, this second usage is so popular that today?s modern browsers know enough to recognize an XML file being returned from a web site and will perform the transformation to HTML internally so it can be displayed to the user already formatted. Sometimes, you will want to perform this transformation yourself under program control. In that case, call the ???????????? utility function with your XML (data) as the right argument and the XSL (stylesheet/template) as the left argument. Microsoft?s MSXML library will be invoked to perform the transformation for you and the text output will be returned as the result. SOAP A relatively new use for XML is to pass processing requests and results back and forth (usually across the Internet) between otherwise unconnected machines. This can be used to implement certain types of Remote Procedure Call (RPC) facilities or similar functionality. An example of this might be a travel reservations web site that accepts a ticketing request in SOAP form, processes it, and returns a SOAP confirmation to the requestor. SOAP is a standardized protocol that uses XML-formatted data to exchange this information. Since we now have tools for reading and writing XML easily in APL, handling SOAP becomes a simple matter. SOAP requests are just application-defined XML data wrapped in a specific structure known as a SOAP Envelope (which is itself also structured as XML). This is normally a very simple process, you may just add the SOAP Envelope yourself when creating SOAP messages and strip it off when reading the results. But building a SOAP Envelope makes a good sample program for demonstrating a simple use of these XML tools and it?s also a useful utility in its own right. So the ???????????? function can be passed a nested XML data structure (produced by ???????[?]) and it will wrap a SOAP envelope structure around it. Syntax Descriptions BuildXML - Convert a nested structure into text Syntax textxml ? [indent [pack]] ???????? xml General Information BuildXML is the final stage in creating XML output. It takes the nested data structure created by XMLItem and XMLItems and converts it into a formatted text vector with proper syntax. It handles line breaks and indentation for neat and easy human viewing. BuildXML is the approximate inverse of ParseXML. Right argument This is a vector of nested XML data, constructed in the form produced by XMLItem and XMLItems. If there are multiple results to be formatted together, they should simply be catenated together into a (longer) vector. Left argument The function?s left argument is used to specify how line breaks and indentation are to be applied. It is optional and defaults to a common formatting style. The left argument may have up to two numeric items: [1]_Indentation spacing The number of spaces to use for each level of element indentation. The default is 4 (spaces). There are three special values that may be used here: ??_All lines start at the left margin. ??_Use single ????? (Tab) characters to indent lines instead of spaces (which uses fewer bytes). ??_Put entire XML result in a single line (no ??????s). [2]_Leaf node packing levels Usually, the lowest-level (leaf) nodes of an XML structure are listed with the start-tag, content, and end-tag on the same line of text (no ??????s between them). The default value of 1 performs this function. Supplying a 0 here suppresses this behavior and will place the element tags and content on separate lines of output. A value larger than 1 will pack more than one level of elements together onto a single line. Result The result of this function is a character (text) vector, usually with imbedded ????? (new line) characters, of the formatted XML information. Examples ???????????????????????????????????????? ???????????????????????????????????????????? ParseXML - Convert text XML into a nested structure Syntax xml ? ???????? textxml General Information Given an XML data stream as a character (text) vector, decompose it into its constituent elements and produce a nested APL vector containing the same data in an array-friendly form. Most XML ?files? are composed of exactly one ?root? element at their outer level. From such an input, ParseXML will produce a nested vector of length 1 as a result. Processing notes * Element names effectively have their ?? brackets removed and their attributes and contents separated out. All these parts of the element are then enclosed in an APL singleton within the returned result. Elements within content are similarly extracted and enclosed at a deeper level. * All leading and trailing white space (including spaces, tabs, and newlines) are removed from the element content at all levels. * All prologue and epilog information (anything outside the root element) is removed. * DTD or XSD validation is not performed (and is removed). * Comments, special declarations, and processing instructions (those terms beginning with ?) are ignored and removed. * strings are decomposed into actual raw data (ready for use). * Symbolic names (those beginning with ?&?) are converted to raw data (ready for APL processing) if they are UTF-8 decimal or hex, or are one of the 5 standard symbols (&, <, >, ", or '). Other symbolic names (which are not commonly encountered) are not expanded and are instead nested as a depth+1 text string for your application to examine. * Improperly-formed XML is reported with an APL error. Processing feedback for very large parsing tasks (such as displaying a progress bar) can be provided by writing an optional custom external feedback function named ParseXMLStatus. It should accept as its right argument the current decoding ?step? and should return a boolean to indicate whether to continue (0) or abort (1) the processing. Further details on the feedback function can be found in the ParseXML comments. ParseXML is the approximate inverse of BuildXML. Right argument Character (text) vector containing valid XML text. This often comes from either reading a text file or downloading data across the Internet. Both newlines (?????) and linefeeds (?????) are treated as white space (and thus ignored), though in APL variables it is common to have only newline characters (as line separators) and not linefeeds. Left argument None. Result The result of parsing XML is a deeply nested vector containing all the XML data represented in a hierarchical structure. This structure is described in detail in an earlier section of this document, but it is fundamentally a text vector with nested XML elements taking the place in the vector of a single character. Example ?????????????????????????????????? SOAPEnvelope - Wrap a SOAP Envelope around nested XML Syntax soap ? [headers] ???????????? body General Information The functionality provided by this routine is minimal, and may need to be customized for particular needs, but it serves as a coding example as well. SOAP messages are built of XML data surrounded by a SOAP Envelope. This ?envelope? is used by SOAP processing programs to identify and handle the message contents. It is a simple matter for your application to add the necessary XML elements to enclose your data in a SOAP wrapper, but this function does just that if you?d prefer to use it. Right argument This is the main content of the SOAP message. It should be an application-specific, nested XML data structure of XMLItem(s) of the SOAP content to be transmitted. This XML data will be enclosed in an element and included in the result. Left argument The left argument is optional (indicating that no headers are present). Simple SOAP applications usually require no SOAP headers. If it is provided, it should be a vector of XMLItem(s) to be used as one or more SOAP header blocks and will be enclosed in an element. SOAP header block contents, if needed, are defined by the SOAP application?s protocol and needs. Result The result of the function is a nested XML data structure. The SOAP header, if any (after being enclosed in its element) is prepended to the SOAP body (after being enclosed in its element). These objects are then enclosed in an element wrapper to complete the SOAP Envelope. A standard prefix is also added as a convenience and the entirety is returned as a nested XML data structure, ready to be formatted by BuildXML. Example ?????????????????????????????????????????????????????????? TransformXML - Convert text XML into another form Syntax output ? textxsl [var]? ???????????? textxml [var]? General Information XSL stylesheets (templates) are used to convert XML data into another form. Usually this new form is either a different XML structure or is HTML suitable for displaying the data to human readers. This function accepts XML input data and an XSL stylesheet/template and returns the transformed data. This allows such transformations to be done easily under program control. The work is done by Microsoft?s MSXML library. All work is done in memory ? temporary disk files are not used. Note: When transforming to HTML, MSXML forces the output to be in the UTF-16 character set. Therefore, trying to set it to use an alternate output character set (like ?windows-1252? or ?iso-8859-1?) will not be successful. Right argument This should be the XML data to be transformed. It should be supplied in text (character vector) form. Advanced feature: XSL ?parameters? may be supplied as extra items in either argument. They should be specified as (enclosed) name-value pairs and appended to the (enclosed) argument data. These can be referenced inside the XSL to vary the processing being performed. Left argument This should be the XSL (stylesheet/template) to control the transformation. It should be supplied in text (character vector) form. Advanced feature: XSL ?parameters? may be supplied as extra items in either argument. They should be specified as (enclosed) name-value pairs and appended to the (enclosed) argument data. These can be referenced inside the XSL to vary the processing being performed. Result The result is the text output from the transformation, as generated by applying the stylesheet/template to the text XML data. Example ????????????????????????????????????????????????????????????????????????? ValidateXML - Verify that text XML follows conventions Syntax error ? ??????????? textxml General Information All XML has to follow the basic XML syntax, which is always verified. But beyond that, XML data should also conform to a naming and structural convention agreed upon by the sender and receiver of that data. This convention is often formalized in a separate document using one of two descriptor languages, either a DTD or an XSD, and referred to by the XML document itself. In such cases, the logical structure of the XML document can then be checked against this description and verified that it follows those conventions. This process is called ?validating? the XML. Normally, the ParseXML utility does no validating of incoming XML. This saves time and is usually not necessary. But for those cases where the XML data needs to be more thoroughly checked, ValidateXML provides an interface to Microsoft?s MSXML library where a complete validation can be done on the data to ensure that it?s in proper form. Right argument The text (character vector) containing the XML to be checked. Normally, this will include within the text a reference to the DTD or XSD document containing the structural definition to be followed. Left argument None. Result The result is a nested vector of error information. The first item of the result is the most reliable indication of an error. If it?s an empty character vector (??), then no error has occurred. Here are the items being returned: [1]_Text error message, or ?? if no error was found. [2]_Numeric error code, or 0 if no error was found. (This doesn?t seem to be reliable on some systems.) [3]_Text of XML source line that caused the error. [4]_Byte position in the XML text where the error occurred. [5]_Line number in the XML text where the error occurred. [6]_Character position of the error within the failing line of XML. [7]_URL of file containing the error (if the error occurred in an external file). Example ???????????????????????????????????????????????? XMLItem - Create an XML element as a nested singleton Syntax xml ? element [attributes] ??????? contents encodedtext ? ??????? rawtext General Information The purpose of this function is to build any single item of the nested XML structure that was described earlier in this document. It takes as arguments the name of the element to create (and optionally any attributes) and the content to be placed inside that element. It returns a one-item nested vector (the one item representing the single element being created) which internally contains the three-part structure defining that element. If more than one of these resulting elements are created, then they (the 1? elements, after encoding each one with XMLItem) should simply be catenated together to form a vector of the same length as the number of sequential elements. If the resulting element(s) are to be enclosed in another element, then use this result directly (or catenated with additional elements) as the right argument to a further call to XMLItem to create the additional level of element nesting. The result of this function may also be catenated with ordinary text if a non-homogenous structure is desired. XMLItem can also perform a secondary utilitarian function. Since non-printing and reserved symbols may not be directly included as XML data, they must be specified using an entity-encoding scheme. XMLItem, when used monadically, can perform this encoding for you. Anytime you?re enclosing potentially unknown text (such as that entered by a user) in an XML element, you should first make sure that any special characters (such as ampersands or angle brackets) are correctly encoded. So pass that text (with an extra call) to XMLItem monadically before passing it on to the usual XMLItem or XMLItems call to create the element vector. For example: _???????????????????????????????? This will ensure that all special characters are converted before being included in the element. Only perform this operation on simple text (character vectors), and not on anything already returned from any call to XMLItem or XMLItems, or the data will be re-encoded incorrectly. This operation is done automatically for attribute values (which can only be text), so it only needs to be done manually for unknown element contents. It does not need to be performed at all on text which you are sure does not contain any special characters (such as typical constant text from your application). Right argument The right argument is the data to be ?enclosed? in this XML element. This should either be plain text data or the (concatenated) results of one or more XMLItem or XMLItems calls. (Mixtures of these are uncommon but permitted.) Empty elements should just supply this value as an empty vector (??). Numeric values are also permitted for coding ease and are ??d before use ? but watch out for formatting problems (like negative numbers, limited precision, etc.) and format them in advance if you have any special requirements. Any data argument that might contain special characters (those that need entity-encoding, like: newlines as data, ampersands, less-than or greater-than symbols, non-ASCII characters, etc.) must be character-encoded prior to enclosing them with XMLItem. (This would generally apply to any data that has been typed in by a user. Any constant text that you know does not have such characters can be used directly.) This can be done by using XMLItem monadically ? see ?General Information? above. Left argument The left argument is the name of the XML element with which to ?enclose? the data/contents. This may be a simple name, or it may optionally include attributes to be inserted into the start-tag of the element. Just a simple element name is the most commonly used form, and the easiest to specify as an argument (just a simple character vector). If attributes are specified, they may be given in any of several forms for maximum programming comfort and flexibility, so just choose the form that you like best. (Most of these forms are just an attempt to handle how you would expect it to ?just work?, so try not to be intimidated by their somewhat detailed descriptions.) Here are the different ways that the element name and attributes may be specified. * Simple element name (or as a nested singleton): Just a simple character (text) vector containing the name to use for the element. Example: ?????? Produces: * Element name + nested vector of attributes: A two-item nested vector: [1]_Simple element name, as above (but nested here). [2]_A nested vector of zero or more attribute name-value pairs. _Each attribute is a ?? nested vector of its name & value: _[1] Attribute name, text vector. _[2] Attribute value, text vector (or numbers will be ??d). Note: Be careful with the depth of nesting needed here. Example: ????????????????????????????????? Produces:_ * Element name + nested matrix of attributes: Similar to the nested vector of attributes, but the attributes are listed in a two-column nested matrix rather than a vector of vectors, as in: [1]_Simple element name, as above (but nested here). [2]_A nested matrix of zero or more attribute name-value pairs. _Each attribute is given on a row of the matrix: _[;1] Attribute name, text vector. _[;2] Attribute value, text vector (or numbers will be ??d). Example: ????????????????????????????????? Produces:_ * Element name followed by a list of attributes: Similar to the nested vector of attributes, but in a more relaxed form. Rather than the attributes all being nested into a single item, they are allowed to be separated as their own items of the left argument, as in: [1]_Simple element name, as above (but nested here). [2]_Attribute #1 (nested name-value pair, as above). [3]_Attribute #2 (nested name-value pair, as above). [4+] (etc.) Example: ??????????????????????????????? Produces:_ Special kinds of element names are also supported, including: * If the element name begins with a ???: Encode as a PI entity (both pseudo-content and pseudo-attributes are allowed). Example: ??????????????????????????????????? Produces:_ * If the element name begins with ????, or is only ??? or ???: Encode contents argument as a comment (attributes not allowed). Example: ????????????????????????? Produces:_ * If the element name otherwise begins with ???: Encode contents argument as a special section or declaration. Example: ???????????????????????????????????????????????? Produces:_ Result The result of XMLItem is a ?? nested vector. This single item can be concatenated into a vector of other similar items (or ordinary text) to produce a longer vector of items. If this item contains other (nested) items, then it may be APL-nested very deeply ? this is quite normal. The concept of returning a ?? nested (singleton) vector is that an XML element and all of its contents represents a single logical entity in a data stream. It is therefore being treated in the same way that a single text character would be treated in such a data stream. In general, this nested result can be displayed, but it?s not very readable by itself. To see what you?re creating (while debugging), use BuildXML to display it. Examples ??????????????????????????? ?????????????????????????????????????????????????????? ?????????????????????????????????????????????????????? ???????????????????????????????????????????????????? ?????????????????????????????????????????????????? ????????????????????????????????????????????????????????????? ??????????????????????????????????????????????????????????????????????? ??????????????????????????????????????????????????????????????????? XMLItems - Call XMLItem on an array Syntax xml ? elements ???????? contents General Information The purpose of XMLItems is to take multiple pieces of data (such as a multi-dimensional array) and run them all through XMLItem. The result is concatenated together into a single valid XMLItem-like result (usually with more than one item in the vector). This is functionally the same as calling XMLItem on each data item and then appropriately catenating the results together and recursing while managing the resulting rank and depth, but this single function is much easier to use. The right argument is given as a nested array of any rank and the left argument is a vector of element names (each item is allowed to be a valid left argument for XMLItem). These left-argument element names are then paired with the right-argument data and XMLItem is called on each one. However, this correlation is not one-for-one. Each item of the left argument (from right to left) is paired with each dimension of the right argument (from last to first), and that one element name is used to enclose all the items along the matching data dimension. This process converts the right-argument rank into XML-depth. You may supply more element names than the rank of the data if desired. In that case, the right-most element names are used to enclose the data dimensions and any additional (leading, left-most) element names are then used to contain the resultant single data item. Note: If fewer element names are given in the left argument than there are dimensions of data, then last-most dimensions will be enclosed in the given elements and the result will be returned as an array of separate results rather than a simple, valid XMLItem result value. The rank of such an array will be the rank of the right argument less the number of items in the left argument that were used to reduce it (i.e. ?????????????????????????????). In this case, these items may not simply be appended into a coded XML vector. Instead, it is recommended that the data be used as an argument to an additional call to XMLItems until enough element names have been provided to reduce out its entire rank and return a valid XMLItem result vector. Right argument The right argument should be one or more content (data) items to be ?enclosed? as XML element(s) by repeatedly using the XMLItem function. The right argument may be a nested scalar, vector, matrix, or array of any rank. Note that a single piece of data must be a nested singleton (e.g. ??????? or ?????) to be properly encoded. Left argument The left argument is a nested vector of one or more element names with which to ?enclose? the data (contents) array using the XMLItem function. Normally there will be one item (element name) in the left argument for each dimension of the right argument data. The first item (element name) is applied to the first dimension of the data and the last item is applied to the last dimension. If there are too few or too many element names, then they are applied in right-to-left order. See note above for more details about this situation. Advanced usage Each element name in the left argument is usually a simple XML element name. However, more complicated structures may optionally be provided to perform more advanced tasks. Each element name (for a given dimension of data) may be structured in any of the following ways: * A simple name: This produces unadorned, consistent element names for each item of data it is enclosing. * A ?? nested vector of an element name with attributes: This structure is a form acceptable for XMLItem, where the first sub-item is the element name and the second sub-item contains element attributes. Multiple attributes may be specified in any form allowed by XMLItem. * An ?n???? nested matrix of multiple (different) element names: This is used to enclose each content (data) item (along the corresponding dimension) within a different element. For instance: _????????????????????????????????????????????????????? may be used to encode 10 sets of first name, last name, and phone number into their respective elements and then enclose each of those sets in a -element, yielding a ??? vector of s. Note: The height of the element-name matrix must equal the length along the corresponding dimension of the data being encoded with that element-list. A nested vector may be turned into a one-column matrix with ??????, if desired. Or, as a programming convenience, this list may also be specified as a simple text matrix rather than a one-column, nested matrix (e.g. ???????????????????????????). * An ?????? nested matrix of multiple element names and attributes: This encodes each data item into a different element name (as described above), but also allows for a specification of attributes to be supplied (in the second column) for each one (in any form acceptable for XMLItem). Result The result is a nested vector of the form described earlier in this document and also used by XMLItem. However, XMLItem always returns a single-item vector and XMLItems may return a multi-item vector (of the same type). This result can be used in the same way as results from XMLItem, including catenating them together (or with ordinary text) or using them as further input to XMLItem or XMLItems. Note: If the number (rho) of element names supplied in the left argument is less than the rank of the data supplied in the right argument, then the XML encapsulation process is incomplete. It cannot be used as a finished XMLItem object as described above until it is reprocessed with XMLItems to provide enough element names. See ?General Information? above for more details. Examples ??????????????????? Returns a ?? vector of 4 XMLItem results, each item of which is a single -element containing a number from 1 to 4. Note that XMLItem normally returns a ?? vector, so this is just 4 such vectors that are simply catenated together. This is equivalent to: _???????????????????????? or just: _????????????????????????????????????????????????????????????? which would result in: _1 _2 _3 _4 ?????????????????????????? Returns a ?? vector of a -element that contains the ?? vector from the example above. Additional prefix element names in the left argument would simply add a single element wrapper for each one. This is equivalent to: _?????????????????????????????????? which would result in: _ __1 __2 __3 __4 _ ???????????????????????? Returns a ?? vector of separately-nested ?? vectors of -elements, which are not logically joined together and are only valid structures separately. Further use of XMLItems is needed. ??????????????????????????????? Returns a ?? vector of -elements, each of which contains 4 -elements (each of which contains a number). This is a valid structure, suitable for catenating to an XMLItem-vector. It would result in: t>1234 t>5678 count>9101112 ????????????????????????????????????? Returns a ?? vector of a -element containing a ?? vector of -elements, as above. XMLPath - Extract selected information from nested XML Syntax subset(s) ? path [path]? ??????? xml General Information When extracting specific information out of a nested XML vector (the result of ParseXML), you may use your usual APL techniques to get what you want. These include Compression (?), Pick (?), First (?), and especially Each (?). But for complicated structures this can get to be tiring and difficult to read. In order to simplify this process, XMLPath has been written to help locate and extract particular elements from the XML vector. XMLPath was designed to use a syntax similar to a simplified version of the standard XPath language. Whether you know this standard XML-related syntax or not, you should find the syntax for XMLPath reasonably easy to use. Note: XMLPath can return either elements or low-level data as directed. Any time that whole elements are returned, they are always in a valid form for nested (parsed) XML data, and thus the result is suitable for further use with XMLPath. This may sometimes cause confusion when a ?? nested vector is returned for a single item rather than disclosing it to reveal its contents. But this is necessary for consistency ? just use First (?) obtain a single result when needed. Note: When processing the result of XMLPath elements (or indeed the originally-parsed vector), it is often convenient to use ?:FOR? or call a subroutine with Each (?). However, doing this causes an implicit Disclose (?) of each item, changing the structure into one that is no longer valid for use with XMLPath. However, simply re-enclosing each item (either outside or inside the loop) will restore the proper structure and avoid much confusion. Right argument The right argument to XMLPath should be a validly nested XML data structure vector (as described earlier in this document). Such a vector is returned from ParseXML, XMLItem/XMLItems, and from some uses of XMLPath. Left argument The left argument provides one or more Paths (character vectors) to indicate the data that should be selected. If more than one Path is provided, then each Path is completely independent of one another, processed as if Each (?) were used, and multiple results are returned. The rest of this description assumes that only one Path is provided. A Path has zero or more Terms, each separated by the ??? character. Each Term beyond the first (in a Path) conceptually ?discloses? the XML by one level. If you?re familiar with XPath, it?s rather like each Term is of axis ?child::? (except for special cases like ?attribute::?). An explicit axis operator (?::?) is therefore not supported. All remaining contents (after ?diving? through Terms/children) is returned. Empty vectors are returned if no matches are found. If multiple matches are made at parent levels, then the children are joined together as a single group (with ???) before proceeding. If attribute pairs or values (??? or ???), element-names-only (???), or plain text or numeric (??? or ???) results are selected, then the returned result is not a valid XML data structure. As such, these can only be used as final Terms in the Path. The final-only Terms are returned as a (nested) vector of items of the same length as the number of parent nodes from which they were extracted. All other returned values are valid data XML node structures (a vector of 0 or more elements) and can be further processed with XMLPath (and its family of XML functions). Each Term may use one of the following syntax choices: ???? Select only elements with that name. ????? Select only elements without that name. (Empty) Select all/only contents at that level (both ??? and ???). This is often used last in a Path by terminating the Path with a ???. ? Select all/only child nodes. The following items may only be used as final Terms: ? Select all/only text contents. ? Exactly as ???, but convert to numeric result (or ?? if not numeric). ? Select all/only element attributes. Each element returns a nested vector of attribute name-value pairs. ????????? Select the value(s) of the named attribute (? if none exists). ????????? Select the value(s) of the named attribute as above (? if none exists), but convert to a numeric result (or ?? if not numeric). ? Return only the names of (child) elements. Equivalent to ?? of a ??? Term. Terms that select nodes (whole elements) may be suffixed by a filtering criteria within square brackets (????) to further restrict which nodes are returned. At present, only one type of filtering is supported, but new filtering syntax (similar to XPath?s) is planned for future enhancement. The following types of filtering are supported: ?? Numeric constant(s) select only those nodes by their sequential positions. Examples of Paths ???? Keep all top-level elements and discard any others. ????????? Get all s? s (elements). ? Select all children (both text and elements). ?????? All children not named . ?????? First child encountered. ????????? Children of the first . ????????? All child element names in . ?????? Text of elements. ?????? All of the s? attributes. ?????????? The lang= attribute of each . Result The result of XMLPath depends upon the extraction request specified in the left argument. If whole elements are being returned (such as when using ???), then the result is still a valid XML data structure (as described earlier in this document). If names (?), text content (? or ?), or attributes (? or ?) are being returned, then those are just data and cannot be further processed by XMLPath. See ?Left argument? above for more details on what is returned for different requests. If multiple extraction requests are made by supplying more than one left argument, then multiple results are nested and returned as if Each-Enclose (??) were used in the call to XMLPath. Examples ?????????????????????????????????????? ????????????????????????????????????????? ?????????????????????????????????????? ????????????????????????????????????????? _? 23 ?

horizontal line
to home page e-mail Page Top