Q2ADPZ User and Developer Manual

Contents

  1. About Q2ADPZ
    1. How to read this document
  2. How does it work?
  3. Requirements and supported platforms
  4. Security issues
  5. Installation and maintenance guide
    1. Obtaining latest working version
    2. Compiling the source code
    3. Installing components
      1. Installing master
      2. Installing slaves
      3. Installing clients
      4. Installing the webserver-based data store
    4. Configuring master and Q2ADPZ users
    5. Configuring slaves
    6. Configuring clients
    7. Upgrading to a new Q2ADPZ version
    8. Analyzing the runs: logs and statistical information
  6. User modes of operation
    1. Starting and configuring jobs using qadpz_run menu system (basic mode)
    2. Preparing and editing the project file manually (intermediate mode)
    3. Writing your own client application (advanced mode)
    4. Writing your own slave library (all modes)
    5. Understading the internal communication protocol (hacker mode)
      1. Interfaces
    6. Participating in the Q2ADPZ development
  7. Appendixes
    1. Terminology
    2. Documentation for XMLData class
    3. Documentation for PostOffice class
    4. Documentation for Crypter class
    5. Credits and licensing issues


1 About Q2ADPZ

The recent growth of computational power of desktop computers calls for their efficient use in larger organizations, especially those which need to run computationally intensive tasks, such as universities and research centers.

Q2ADPZ ['kwod 'pi: 'si:] is a modular C++ implementation of a free, open source, multi-user, multi-platform system for distributing computing requests in a TCP/IP network. The users of the system can submit, monitor, and control computing tasks (grouped into jobs) to be executed by computers participating in the Q2ADPZ system in form of dynamic shared libraries, executables, or interpreted programs (including Java applications). Users can provide software, hardware, and platform requirements for each task and the proper computer is automatically selected. The system automatically delivers the input and output data files. Computers executing tasks detect users logging in, and the tasks are terminated or moved to other computers to minimize the disturbance of regular computer users. Q2ADPZ can operate both in conditions of an open Internet environment or of a closed local TCP/IP network. Internal communication protocol is based on optionally encrypted XML messages. The system provides basic statistics information on usage accounting. Several user modes are supported: from novice users submitting simple binary executable programs to advanced users who can alter the internal communication interfaces for their special needs. We are currently using the system for research tasks in the areas of large scale scientific visualization, evolutionary computation, and simulation of complex neural network models.

Q2ADPZ is being developed by a team in the Division of Intelligent Systems, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway. The best way to contact us is to write an e-mail.


1.1 How to read this document

This document is meant to be a complete manual for all users and developers of Q2ADPZ. In other words, all you need to know should be found here. If it is not, it means it was not written yet. Of course, you don't have to read it all. Here is a guide for you which tells you which sections are important for you.

Pretty much everybody should look at chapter explaining How does it work?.

If you are a regular user and you only want to submit your tasks to an already installed Q2ADPZ system, you only need to read Starting and configuring jobs using qadpz_run menu system.

If you find qadpz_run too tiresome, you don't mind editing XML-formatted files manually, or you want to generate project files from your application, consult the section on Preparing and editing the project file manually. You may also want to read Configuring clients to learn about the options in the client config file. You will also need to contact your local Q2ADPZ administrator who will create a new Q2ADPZ user account for you.

If you want to use Q2ADPZ for more advanced projects, control starting of the tasks in your jobs from your application, use task libraries instead of executable or interpreted tasks, you may want to read Writing your own client application and Writing your own slave library.

If you are interested to set up a new Q2ADPZ system at your location, you may want to become a Q2ADPZ administrator, and then you need to read the full Installation and maintenance guide, and at least the first section of the chapter explaining User modes of operation. You probably want to look at Requirements and supported platforms, and Security issues.

If you would like to join the huge crowds of Q2ADPZ developers, it is very advisable that you read Understading the internal communication protocol and Participating in the Q2ADPZ development before you contact us.

If you would like to reuse some of the components used in the system, one of the appendixes might serve your needs.

We hope you will have fun reading this manual and if there is something missing that you need to know, drop as a line: Zoran, Pavel, Atle and Diego.

And one more wish from us: don't print this document. It works much better on the screen and you can save your paper for something else.


2 How does it work?

A Q2ADPZ system consists of one master, many slaves and multiple clients who are delegating jobs to master (you may wish to see the terminology).

All slaves that participate in the system are running a slave service program (a small resident application that accepts the tasks to be computed). Master is also running permanently and it keeps track of what are the slave computers doing: if they are idle, busy computing some Q2ADPZ task, or disabled because some user is logged in.

When a user wants to use the system, (s)he prepares an user application, consisting of two parts: a slave user program - the code that will be doing the desired computation after being distributed to the slaves, and the client - that will be generating jobs to be computed.

Each job consists of a set of tasks, and each task is generated by a client. (Don't be scared away: if you have a working executable or interpreted application, you can use the Q2ADPZ right now, without writing anything; but you probably want to know what's happening with it).

The main role of the master (the central component of the whole system) is to maintain the current availability status of the slaves, and to start and control the tasks. A client doesn't communicate with the slaves directly, instead it sends all its requests to the master.

As indicated above, there are several user modes. To keep things simple, one can use the Q2ADPZ standard client (qadpz_run), which allows to set up and submit a job. The job description is saved into an XML-formatted project file and can also be manually editted by more advanced users. Alternately, a user might want to write his or her own client application to have a full control over the submission of tasks (for example, if one needs to wait for the results of computations from a few first tasks and then, based on these results, to send either one or another group of tasks).

It is also possible to directly write a slave library to speed up the execution. In that case, the slave service or deamon will not start a new process with the downloaded executable, but a dynamic shared library will be loaded by the slave service process (deamon/service).

Please read the sections on User modes of operation and see the example source codes.


3 Requirements and supported platforms

Q2ADPZ works both on UNIX (including MacOS X) and Windows platforms. It works very well also in multiplatform environments - every node of your installation can be any of the supported platforms. The master and client nodes and master and slave nodes need to be connected by a TCP/IP network (we use only UDP protocol). Although we plan to provide precompiled binaries for the supported platforms, it is recommended that you compile the package yourself (especially for UNIX platforms). In addition to C++ compiler, the following libraries are required:

The list of supported platform configurations is shown in the following table:

Operating System CPU Compiler Comments
Linux i*86 GNU C/C++ (ver. 2.96, 2.95, 2.91) ok
Linux sparc GNU C/C++ (ver. 2.95) ok
Linux sparc64 GNU C/C++ (ver. 2.95) ok
FreeBSD i*86 GNU C/C++ (ver. 2.95, 3.0) ok
SunOS sun4m GNU C/C++ (ver. 2.95) ok
SunOS sun4u GNU C/C++ (ver. 2.95, 3.0) ok
Win32 (Win2k, WinME, Win98, WinXP) i*86 MSVC++ (ver. 6.0) ok
IRIX64 IP27 MIPSpro C++ (ver. 7.3.1.2m) ok

We have also experimented with the following configurations:

Operating system CPU Compiler Notes
Darwin/MacOS X Power Macintosh Apple C/C++ (ver. gcc-932.1) (aka GNU C/C++ ver. 2.95) slave CPU usage high!!
Linux i*86 GNU C/C++ (ver. 3.0) failed to compile qadpz_run_edit due to changed ios interface!?
Win32 (WinNT) i*86 MSVC++ (ver. 6) failed to compile slave due to inexistent CreateToolhelp32Snapshot() calls
SunOS ipc86 GNU C/C++ (ver. 3.0) linking errors for slave shared libraries!?

If you have tried to use Q2ADPZ on some other platform, please let us know.


4 Security issues

Because of the unreliability of the TCP/IP protocol, it is not guaranted that the executional tasks arriving to the slave computers are undubtedly sent by master. This is a serious security threat since it allows for a maliscous hacker to submit any piece of code to the slave nodes (IPspoofing). For that reason, and on the cost of a decreased performance, all communication from clients to master and from master to slaves is crypted or signed. Particularly, the data flow from client to master has to be authorized by a Q2ADPZ user name and password and crypted by a master public key. The data flow from master to slaves is signed by a master private key and the authenticity is verified by master public key on slave nodes.

It is important to note that the data flow from slaves to master and from master to clients is not crypted nor signed, which means that a maliscous hacker can monitor (packet sniffing) or alter (IPspoofing) the data or control information arriving back to master or client nodes and thus:

In other words, the current Q2ADPZ security scheme is designed to protect the security of the computers in the network, i.e. a maliscous hacker cannot submit an alien piece of code to be executed instead of a user computational task. However, this scheme doesn't protect the Q2ADPZ user data. We are considering to allow optional data integrity in the future versions of Q2ADPZ.

To summarize:


5 Installation and maintenance guide

Q2ADPZ runs on multiple computers in a TCP/IP network and therefore the installation requires careful attention. The authors tried to simplify the installation procedure wherever possible. The following sections guide you through the necessary installation steps.


5.1 Obtaining latest working version

The official web site for downloading is: http://sourceforge.net/projects/qadpz/. Here you can find the latest version of the source code. We intend to make also binary distributions for the main platforms in the near future.

The official web site for the project description is: http://qadpz.idi.ntnu.no/. You can find here the latest information about the project. There is also a web interface to the latest development CVS repository. We don't give yet public access to this CVS tree.


5.2 Compiling the source code

After uncompressing the package, the following directory structure appears:

src/ the source code for all the components, contains Makefile.base and the main Makefile.
bin/ the compiled binaries are placed here; this directory contains also sample project files (*.xml), public and private keys (pubkey, privkey) and crypted list of users users.txt.
doc/ all Q2ADPZ documentation
sample/ sources of sample applications (dumb - library-type application; simple - executable/interpreted type of application)
include/ include files from curl and openssl which are required for compilation; please set the CFLAGS, CFLAGS_CURL, CFLAGS_OPENSSL variables in Makefile.base based on your installation of the libraries

There are several ways how the system can be compiled:

To modify these options, set the variables HAVE_CURL and HAVE_OPENSSL in file Makefile.base to 0 or 1.

In order to compile the system, the libcurl and libcrypto from OpenSSL are needed.

Provided makefiles are for GNU C++. For Windows platforms, MSVC++ 6.0 project files and workspace is provided.

When ready, go to the src/ directory and type make (or build all projects in the MSVC workspace). The compiled binaries for all components including the sample applications will be placed in the bin/ directory. Go to bin/ directory and type make keys. The public and private master keys will be generated (files: pubkey, privkey). Remove the users.txt file if present and create an admin user name and password using qadpz_admin -a admin. Your system should be compiled and ready for further installation.


5.3 Installing components

Three main componenents have to be installed: master - installed on a single computer in the Q2ADPZ network, clients - installed on all computers from which users will be submitting their tasks, and slaves - installed on all computers which will compute the slave user tasks. In addition, a data file server for basic and intermediate user modes have to be configured.


5.3.1 Installing master

The following steps are needed to install master:


5.3.2 Installing slaves

The following steps are needed to install slave:


5.3.3 Installing clients

The following steps are needed to install client:


5.3.4 Installing the webserver-based data store

You need to have a www server supporting perl cgi-scripts installed.


5.4 Configuring master and Q2ADPZ users

The configuration of master is done through master.cfg config file, the individual settings are documented there. The master_port variable must match the master_port variables in client.cfg and slave.cfg files. In normal operation, you might want to turn all logs off, since when tens of slave computers are present, their size grows extensively.

When starting master, the current status can be either displayed on console, redirected to a file (for example special pipe file), or turned off completely (by redirecting the console to /dev/null, or NUL), while still producing html status file that can be viewed even remotely, if www-server is installed on the master computer.

The qadpz_admin utility can be used to manipulate the list of Q2ADPZ users and their passwords.

The keys should be generated in the distribution directory using make keys command and copied to master qadpz/bin directory.

Some platforms require random.bin file with random seed TODO: PLEASE EXPLAIN THIS.


5.5 Configuring slaves

The slave is configured through slave.cfg config file, the individual settings are documented there. A temporary directory, where downloaded libraries, programs and data files are saved should be created and accessible for read and write access for the user account under which the qadpz_slave runs. Access to other computer resources for the same account should be restricted. A copy of the master public key (pubkey) should be copied locally to each slave.

For each software that should be detected by the slave and reported to the master, several variables: software, soft_version, soft_detect, soft_detrow, soft_detword should be specified. The autodetection mechanism executes soft_detect command (including the specified command line) and takes the word that occurs at soft_detrow line as number soft_detword from the beginning of the line. This word should describe the version of the software, for example, settings:

software JDK
soft_version unused
soft_detect java -version
soft_detrow 1
soft_detword 3

are used to detect the version of the JDK. The entry soft_version is used only if soft_detrow and soft_detword are set to 0 - in that case the version autodetection is not used, even though the soft_detect command is executed and if the exec fails, the software is not reported.


5.6 Configuring clients

The client is configured through client.cfg config file, the individual settings are documented there. The public master key should be available.


5.7 Upgrading to a new Q2ADPZ version

The system contains the feature of automatic upgrade of all slaves to a new version. A new slave binary is distributed from master to all slaves (provided that the system was compiled with HAVE_CURL setting). The procedure is started by qadpz_admin utility. Because some of the slaves in the Q2ADPZ network might be unreachable, or busy computing some task, which should not be interrupted, there are several different modes for upgrade:

The upgrade is performed only for the specified platforms (os/cpu) - the new version of the qadpz_slave has to be available at a URL specified in the command line to qadpz_admin -u*.

For technical details of upgrade see the comments in the source code.


5.8 Analyzing the runs: logs and statistical information

There are two kinds of logs produced:

In addition, master produces the interim status html file, it's location can be specified in master.cfg configuration file. The future versions of the system will generate statistical log information.


6.User modes of operation


6.1 Starting and configuring jobs using qadpz_run menu system (basic mode)

For the description of the arguments accepted by qadpz_run, execute qadpz_run -h. When qadpz_run is started without arguments, it enters the interactive mode. Each time qadpz_run asks the user to enter some value, it usually displays a default (or previously set) value in brackets. For example, just after it is started, it asks for the name of the project configuration file:

Enter project file to edit [example.xml]:
By simply pressing ENTER, the user confirms a value offered in the brackets.

After the project file is specified, the main menu is displayed.


  1.   Job name: 'name_of_your_job'
  2.   Task menu (? task groups)
  3.   Save & run (project_file.xml)
  4/q. Exit (w or w/o save)

qadpz_run>

When you see this prompt, type in your choice from the menu and press ENTER. Enter 1 to change the name of your job, Enter 4 to exit qadpz_run without starting the job, Enter 3 to save the possible changes you might have done to the project config file and start the job now. To specify or change the tasks in this job, choose option 2 to enter the task menu:

In the task menu, the list of task groups that are currently defined for this job is displayed. You can add a new task group, modify or remove the existing one, duplicate an existing task group to create a new one, and exit the task menu.

When you choose to edit a task, a task editting menu appears, see below.

When you are adding a new task group, you need to supply a new task id - an integer that uniquely identifies the first task of this group within your job. Other tasks in the same group will receive ids with consecutive numbers (i.e. if you set the task id to 5 and there are 10 tasks in this group, they will have ids 5 through 14). The tasks within a group are started in the order of the task ids. The second information you need to supply is the type of your task: Executable program, Interpreted program, or QADPZ library. If you have a binary executable, enter E, if you need an interpreter to execute your program (for example Java Virtual Machine, or Lisp interpreter, enter I, and if you wrote your specific QADPZ slave library, enter L. What follows is a task editing menu.

In task editting menu, the user can view and modify the following options:

Task id: 1,  type: Executable, p. Serial group

1. number of runs: 3
2. datafiles in: './datafiles/'
3. URL of data server: 'http://increment/qadpz/cgi-bin'
4. Don't start before task with id=< not defined > finished
5. Platforms defined: 3...
6. Input files: 1...
7. Output files: 1...
9. Utilities to execute when tasks finished: 1...
0/q. Back to task menu

User specifies the number of runs in this task group. Each run can be refered to by its run number 1,2,3,... and the task id for each task in the group is generated as task id + 0, task id + 1, etc.. In the second option, user specifies the directory where all input and output data files are (will be) located, relative to the current directory where qadpz_run has been started (or absolute path, if you wish). If you use any input or output files, or if your executable is not yet available for download from some URL (that means it is located only in your file system), you need to specify the URL of data server. You can probably leave this option unmodified, or ask your local Q2ADPZ administrator. The option 4 allows you to specify another task in your job that must finished before this task is started. Precisely, you have 3 options:
  1. all tasks in this group can be started when a task specified by a task id is finished (semaphore),
  2. all tasks in this group are finished only after all tasks from the group specified by a task id are finished (barrier),
  3. each task in this group can start only after a corresponding task from another task group if finished. A corresponding task is a task with the same run number (autoincrement).
For example, imagine there's one task group with task id 10 and number of runs: 5, and there's another task group with task id 20 with 10 runs.

The options 5 through 9 enter menus for defining task platform-dependent details, input and output files, and the utilities, these menus are described below.


6.2 Preparing and editing the project file manually (intermediate mode)

TODO: (Pavel) explain the format
example.xml   example client configuration file, contains list of tasks to run

format: the file contains several task and reservation descriptions, each consisting
        of the following (see spec. of TaskInfo). Everything is encapsulated in
		a <Job> tag.

        <Job Name=string>

          <Task ID="integer" Type="Library|Executable|Interpreted">
           [<ReserveName> ... </ReserveName>]
            <TaskInfo> ... </TaskInfo>
            <RunCount>integer</RunCount>
           [<DataPathPrefix>string</DataPathPrefix>]
           [<InputFile[ From="run#" To="run#"][ Constant="Yes"]>string</InputFile>]
           [<OutputFile[ From="run#" To="run#"][ Constant="Yes"]>string</OutputFile>]
           [<FilesURL>string</FilesURL>]
           [<CDataInputFile>string</CDataInputFile>]
           [<CDataOutputFile>string</CDataOutputFile>]
           [<Utility[ From="integer" To="integer"]>string</Utility>]
          </Task>
  or:
          <Reserve>
             <ReserveName> ... </ReserveName>
             <SlavesRequired> .. </SlavesRequired>
            [<Parallel/>]
             <SlaveInfo> ... </SlaveInfo>
          </Reserve>
or

       </Job>

The TaskInfo has the following form:

   <TaskInfo>
      <OS>string</OS>
      <CPU Speed=integer>string</CPU>
      <Memory Unit="MB">Integer</Memory>
      <Disk Unit="MB">Integer</Disk>
     [<Software version="string">string</Software>]
      <TimeOut>Integer</TimeOut>
      <URL>url_string</URL>
      <Executable Type="File|URL">string</Executable>
      <CmdLine>string</CmdLine>
   </TaskInfo>

[<CmdLine>string</CmdLine>]

 * if RunCount is more than 1, other Ids are generated automaticaly (++)
 * TaskInfo and SlaveInfo can appear more times if more slave types are acceptable
 * if From, To are not specified for input or output file, the file is
   used in each run, first run is #1
 * if Constant="Yes" is specified for an input file, it will be taken from the path
   specified otherwise a suffix ".<run#>" is appended for each task run
 * if Constant="Yes" is specified for an output file, it will be saved(copied) to
   file name specified, otherwise an aditional suffix ".<run#>" will be appended
   for each task run. Constant="Yes" makes sense for output files only if RunCount is 1,
   otherwise the output files from previous runs will be overwritten
 * if FilesURL is specified, it should point to a data web server which accepts
   upload and download requests for Q2ADPZ data files. Otherwise default is used.
 * in addition to all these files, a special file Q2ADPZ.task is send to the
   slave tmp_dir. This file has the following structure:
   run# #input_files #output_files max_runtime_sec
   list of all input files (in the order specified in this config file, 1 file per line)
   list of all output files (same order, 1 file per line)
 * DataPathPrefix can be used for convenience - then it is appended in front
   of all file names. The file names can contain paths as well
 * The CmdLine string can contain multiple occurences of substring '#run#' which
   is replaced by run number
 * even though QADPZ supports reserveIDs with the same names, the ReserveNames
   used within one config file should be unique
 * in case of Executable type, URL in task info should contain link to the
   generic library, and an extra tag <Executable> as part of TaskInfo should
   contain URL to executable
 * <Utility> can specify a local program (with optional full path)
   to be started after task is finished. The specified string can contain full commandline
   including arguments, where all occurences of #run# are replaced with the run number
   of the task that just finished; Optional From and To specify interval of run
   numbers - the utility will be started only for those tasks.
   More than one utilities can be specified for each task.
   Utilities are started with DataPathPrefix as the current directory (if specified).
 * note: it makes no sence to make Reservation for serial set of tasks. These
   entries might be ignored by the interactive editor of qadpz_run.
 * note: the comments are deleted by qadpz_run interactive editor!
Type="Library" specifics:
 * the CDATA input for library can be specified in files [DataPathPrefix]CDataInputFile.<run#>
 * the result sent back from a library is saved into [DataPathPrefix]CDataOutputFile.<run#>


Following is an example with 3 runs of simple executable
TODO: provide example


6.3 Writing your own client application (advanced mode)

TODO: how to write client: make it compatible with current interface ClientServ - client service library (see also the file: ~/src/client/ClientServ.h)

User interface functions:


6.4 Writing your own slave library (all modes)

TODO: how to write slave, make sure it is compatible with current interface:

SlaveServ - slave service library

(see also the file: ~/src/slave/SlaveServ.h)

User interface functions:

Example library source code: (see also the file: ~/ssample/dumb/SlaveDumb.cpp)

  #include "SlaveServ.h"
  #include 
  #include       // sprintf()

  // flags for the callbacks
  int isTaskCtrl = 0;  // set to 1 when task control is required
  int isTaskStop = 0;  // set to 1 when task stop is required

  // callback functions for notification from the slave service
  void taskCtrl (const char *arg)
  {
    isTaskCtrl = 1;
    DBUG_PRINT("info", ("taskCtrl called arg=%s", arg));
  }
  void taskStop ()
  {
    isTaskStop = 1;
    DBUG_PRINT("info", ("taskStop"));
  }

  // this is the exec loop on each task-thread
  extern "C" {

  int taskExec (char *data, char *datares
  #ifdef _WIN32
                 ,int  (*q2adpz_slv_task_status) (task_state stat, char *err),
                 void (*q2adpz_slv_setcb_task_stop) (void (*cb) (void)),
                 void (*q2adpz_slv_setcb_task_ctrl) (void (*cb) (const char *arg))
  #endif
                )
  {
    int isFinished = 0;

    DBUG_PRINT("info", ("task computation successfuly started"));

    // set callback functions
    q2adpz_slv_setcb_task_stop (taskStop);
    q2adpz_slv_setcb_task_ctrl (taskCtrl);

    DBUG_PRINT("info", ("input data '%s'", data));

    // start main task loop
    while (1) {

      //do some crunching of the data
      //SLEEP_SEC(1);
      sprintf (datares, "res=%d", 2 * atoi (data));
      isFinished = 1;

      //task needs to be stopped
      if (isTaskStop) {
        DBUG_PRINT("info", ("task stop successfuly executed."));
        break;
      }

      if (isTaskCtrl) {
        // check the arguments from the message
        // ...

        // send back results
        q2adpz_slv_task_status (task_ok, "task ctrl received ok.");
        DBUG_PRINT("info", ("task ctrl successfuly executed."));
      }

      //if crunching finished
      if (isFinished) {
        DBUG_PRINT("info", ("task computation successfuly finished."));
        break;
      }

    } // while

    return 0;
  }

  }  // extern "C"


6.5 Understading the internal communication protocol (hacker mode)

The communication between client, master and slave is in a XML-type language. The communication between application part of client and standard client library and between slave daemon/NT service and slave user program is done using C++ function calls (see previous sections).

Each XML document that is sent can contain multiple messages. It has to contain one UserInfo structure to authenticate the user. This is the structure of all XML documents that are sent:


6.5.1 Interfaces

This section describes the details of XML messages exchanged between system components. (you might want to view it from document with frames)

<Data>

  <!-- for security/accounting purposes -->
  <UserInfo>
    <User>string</User>
    <Pswd>string</Pswd>
  </UserInfo>

  <Message Type="string"> ... </Message>
</Data>

The following data structures are used in the messages (see bellow for message types):


<!-- provides information about slave computer. Version contains an identifier of the slave service version. Version and Address appears when sent from slave. Software may appear multiple times and is defined by the slave config file (hopefully some autodetect routines downloaded by the slave service on startup which will determine the presence of software will exist later) --> <SlaveInfo> [<Version>string</Version>] <OS>string</OS> <CPU Speed=integer>string</CPU> <Memory Unit="MB">Integer</Memory> <Disk Unit="MB">Integer</Disk> [<Software version="string">string</Software>] [<Address>ip_address_string</Address>] </SlaveInfo>
<!-- specifies a single task to be computed for particular platform. Memory and Disk contain the minimal requirements for this task for this platform. TimeOut specifies the time after which the task is killed if it didn't finish. Speed of CPU is optional. Software can appear arbitrary number of times. UserData can contain any data and is reserved for use by user application client and slave (for example the standard qadpz library for submitting executables sends the parameters here). The difference compared to CDATA in TaskInit is that UserData can be different in each TaskInfo, i.e. for each platform. --> <TaskInfo> <OS>string</OS> <CPU Speed=integer>string</CPU> <Memory Unit="MB">Integer</Memory> <Disk Unit="MB">Integer</Disk> [<Software version="string">string</Software>] <TimeOut>Integer</TimeOut> <!-- the following points either to DLL file or UNIX library --> <URL>url_string</URL> <UserData>string</UserData> </TaskInfo>
<!-- name is chosen by client, number is added by master so that no other job with the same name and this number is executed at this time --> <JobID> <Name>string</Name> <ID>Integer</ID> </JobID>
<!-- ID is generated by client and is specific only within this job --> <TaskID> <JobID> ... </JobID> <ID>Integer</ID> </TaskID>
<!-- name is generated by client and the number by master --> <ReserveID> <Name>string</Name> <ID>Integer</ID> </ReserveID>
<!-- mainly for monitor purposes, and response to M_JOB_XXX back to client from master. Running means that the job is registered at Master and some client is taking care of it Abandoned means that the job is still registered, but the client went down, Stopped is a response to M_JOB_CTRL(Stop) and Refused means that the M_JOB_CTRL was refused for some reason (error message is included in Error of M_JOB_STATUS). --> <JobStatus> <Status>Running | Abandoned | Stopped | Refused </Status> <TasksRunning>Integer</TasksRunning> <TasksWaiting>Integer</TasksWaiting> <!-- Reservation appears as many times as is the number of reservations for this client/job if it is response to M_JOB_CTRL(GetStatus), ReserveID appears once if it is a response to M_SLAVE_RESERVE, otherwise neither Reservation nor ReserveID is included --> [<ReserveID> ... </ReserveID> | <Reservation> <ReserveID> ... </ReserveID> [</Parallel>] <SlavesRequired>Integer</SlavesRequired> </Reservation>] </JobStatus>

Client <=> standard client library (function calls)

Client <=> master (XML)

The following messages are recognized:

  • In the direction client to master

    
    
    <!-- controls jobs: - Stop will stop all tasks associated with this job and removes the job from master's agenda. It is nice if clients call this after they receive the result from the last task that belongs to this job so that master can free all resources used for accounting of this job. - StopAllName will ignore the job number in JobID and will stop all jobs with the same name - StopAllUser will stop all users of the user who sends the message --> <Message Type="M_JOB_CTRL"> <JobID> ... </JobID> <Action>Stop | GetStatus | StopAllName | StopAllUser </Action> [<ReserveID> ... </ReserveID>] </Message>
    <!-- master finds appropriate client for starting this task; TaskInfo may appear more than once - one for each different platform that the task can run on. Address should be included if the slave was reserved. Master replies with M_TASK_STATUS after the task is started, or when it is not possible to start it. --> <Message Type="M_TASK_INIT"> <TaskID> ... </TaskID> <TaskInfo> ... </TaskInfo> [ <![CDATA[...anything...]]> ] [<Address> ... </Address>] </Message> (see below)
    <!-- stops the task or sends a control message with optional argument --> <Message Type="M_TASK_CTRL"> <TaskID> ... </TaskID> <Action>Stop | Control</Action> [<Argument>string</Argument>] </Message> (see below)
    <!-- The message is confirmed by master by a reply M_JOB_STATUS with one ReserveID asks master to notify the client about free slave. When a suitable client (according to SlaveInfo) becomes available, master sends M_SLAVE_AVAIL to the client that has controll over the specified job. This is done Number-times, i.e. M_SLAVE_AVAIL will be sent TotalSlaves times, each time immediatelly when the next slave is available. When more clients register, they are round-robin scheduled. When Parallel is specified, there is only one M_SLAVE_AVAIL sent to the client and it is sent only after Number of slaves are available at the same time. They are reserved for this client. SlaveInfo may appear more times, if more types of slaves are acceptable. --> <Message Type="M_SLAVE_RESERVE"> <ReserveID> ... </ReserveID> <SlavesRequired>Integer</SlavesRequired> [<Parallel/>] <JobID> ... </JobID> <SlaveInfo> ... </SlaveInfo> </Message>
    <!-- Client sends this message to known master when it is started and when it quits. Client should specify the job which it is controlling so that master knows where to send M_TASK_FINISH. If Client is creating a new job, it should specify only the job name inside of the JobID structure and set the number to -1. Master will respond with M_JOB_STATUS, where the valid JobID will be returned (in this way there might be more jobs with the same name, they are distinguished by their number). JobID and Address should not be specified in case of ClientStatus Off. --> <Message Type="M_CLIENT_STATUS"> [<JobID>...</JobID>] <ClientStatus>On | Off</ClientStatus> [<Address>ip_address_string</Address>] </Message>
    <!-- controls/configures the slave: - Upgrade = "slave service" is changed on all slaves for which new executable is specified. New executables must be downloadable from a given URLs for each specific platform that the upgrade should be performed, i.e. the URL element can appear multiple times, once for each OS - CPU combination. If Immediate is specified, all currently known slaves that report version different that the one optionally specified in NewVersion are upgraded (if no NewVersion is specified, all of them are upgraded) immediatelly and their running tasks are stopped; If Immediate is not specified, those slaves that are busy will be upgraded when they become ready. If PermanentUpgrade is set to Start, master will keep upgrading all new slaves that will become ready later and report version different that the one specified in NewVersion (in this case NewVersion is compulsory). Master will save permanent upgrade info to its config file, so it will keep upgrading even after restart. PermanentUpgrade feature can be deativated by sending a simple M_SLAVE_CTRL(Upgrade,PermanentUpgrade(Stop)) without other subelements. Upgrade is accepted only from a qadpz_admin client (which is always started by a user 'admin'). If PermanentUpgrade is not specified, it's activation is not altered (i.e. if it was on, it will remain on; it is off by default when the master starts and it was not explicitely turned on before). <Message Type="M_SLAVE_CTRL"> <Action>Upgrade</Action> [<NewVersion>string</NewVersion>] [<Immediate/>] [<PermanentUpgrade>Start | Stop</PermanentUpgrade>] [<URL OS="string" CPU="string">string<URL>] </Message>

  • in the direction master to client:
    
    
    <!-- sent back to client as a reply to M_CLIENT_STATUS and M_JOB_CTRL. Will be also used by monitor. Error contains an explanation message in case of Refused JobStatus, otherwise it's not present. --> <Message Type="M_JOB_STATUS"> <JobID> ... </JobID> <JobStatus> ... </JobStatus> [<Error>string</Error>] </Message>
    <!-- sent back to client when M_SLAVE_RESERVE is active SlavesRequired contains the number of slaves assigned (if not parallel, this is always 1 --> <Message Type="M_SLAVE_AVAIL"> <JobID> ... </JobID> <ReserveID> ... </ReserveID> <SlavesRequired>integer</SlavesRequired> <SlaveInfo> ... </SlaveInfo> <!-- SlaveInfo appears as many times as is the number of allocated computers --> </Message>
    <!-- sent back to client as a response to M_TASK_INIT, M_TASK_CTRL or when the status of the task changed; the message is also sent from the slave to the master (see below). * response to M_TASK_INIT: - Buffered - no slave available for the task, master will try later - Refused - message was inconsistent with current system state - Started - task started successfuly on one of the slaves * response to M_TASK_CTRL: - Refused - message was inconsistent with current system state - Ok - ctrl command successful, Argument contains the result * task status change: - Crashed - the slave user application crashed (master abandons task) - the slave doesn't send status messages anymore - the task is running for too long (timeout kill) - Stopped^ - the slave running the task was stopped (eg. login) - MoveStart - the task is being moved to another slave, it's buffered - MoveEnd - the task was moved to another slave, client gets a SlaveInfo of a new slave, in case of task moved or finished when being stopped, M_TASK_STATUS(Stopped) is not sent * Error is sent in case of Refused and Crashed status * SlaveInfo is sent in case of Started and Moved. ^ this state is sent only from master to slave --> <Message Type="M_TASK_STATUS"> <TaskID> ... </TaskID> <Status>Buffered | Refused | Started | Crashed | Stopped | Ok | MoveStart | MoveEnd</Status> [<Argument>string</Argument>] [<SlaveInfo> ... </SlaveInfo>] [<Error>string</Error>] </Message>
    <!-- sent when the task is finished, or when the slave becomes disabled and generates the M_TASK_FINISH message instead of M_TASK_MOVE. In that case, DATA should specify that the task was not finished. --> <Message Type="M_TASK_FINISH"> <TaskID> ... </TaskID> <![CDATA...anything...]]> </Message> (see below)
    <!-- sent to a client/qadpz_admin as a reply to its M_SLAVE_CTRL message Ok - response to M_SLAVE_CTRL(Upgrade), the upgrade process was started at master. Refused - the upgrade was refused by master (if for example the user is not admin); Error contains the description of the reason. <Message Type="M_SLAVE_STATUS"> <Status>Ok | Refused</Status> [<Error>string</Error> </Message>

    Master <=> slave (XML)

  • in the direction master to slave:
    
    
    <!-- starts the task --> <Message Type="M_TASK_INIT"> <TaskID> ... </TaskID> <URL> ... </URL> <TimeOut>integer</TimeOut> <UserData>string</UserData> [<![CDATA...anything...]]> </Message> (see above)
    <!-- stops the task, or sends a control message to the task. in the second case, the optional argument might be specified --> <Message Type="M_TASK_CTRL"> <Action>Stop | Control</Action> [<Argument>string</Argument>] </Message> (see above)
    <!-- controls/configures the slave: - Disable = put the slave in "Disabled" state (i.e. not accepting any tasks) - Enable = put the slave back to ready state from disabled state - Shutdown = slave is forced to shutdown from the Master - Upgrade = "slave service" on a Slave is changed (in this case the URL specifies the new program); note that the current task is stopped (if any) --> <Message Type="M_SLAVE_CTRL"> <Action>Disable | Enable | Shutdown | Upgrade</Action> [<URL>string<URL>] </Message>

  • in the direction slave to master:

    <!-- sent to inform the client about the status of the slave Off - the slave computer is going to be turned off or the slave service program is going to be quit if the slave was busy and was going to save its partial results locally, this message is sent after the slave successfully saved its partial results. Ready - the slave service program is started, regularly announces that it is still alive or it has just finished a task - slave sends also SlaveInfo Busy - the slave accepted a task and is computing it, also regularly posted to master Disabled - the slave service is still running, but no slave user program can be executed, most likely because somebody loged into the slave computer. This message is sent after the slave optionally saved its partial results locally Upgrade - response to slave service upgrade. If Error is specified, it was unsuccessful. <Message Type="M_SLAVE_STATUS"> <Status>Off | Ready | Busy | Disabled | Upgrade</Status> [<SlaveInfo> ... </SlaveInfo>] [<Error>string</Error>] </Message>
    <!-- sent from the slave to the mastes as a response to M_TASK_INIT, M_TASK_CTRL or when the status of the task changed; the message is also sent from the master to the client (see above). --> <Message Type="M_TASK_STATUS"> <Status>Refused | Started | Crashed | Ok | Moved</Status> ... </Message>
    <!-- task is finished, data contain the result; sent also when the slave is forced to go down and decides to submit the partial results to master through this message --> <Message Type="M_TASK_FINISH"> ... </Message> (see above)
    <!-- the task wants to be moved to another slave because this slave will become disabled; in this case the message is sent instead of M_SLAVE_STATUS(Disabled) --> <Message Type="M_TASK_MOVE"> <TaskID> ... </TaskID> <![CDATA...anything...]]> </Message>


    6.6 Participating in the Q2ADPZ development

    //TODO: info for developers, CVS, etc.


    7 Appendixes


    7.1 Terminology

    This section summarizes the most important terms used in the Q2ADPZ.


    7.2 Documentation for XMLData class

    The constructors and methods of XMLData class can be used to construct, access, and modify the XML elements, their attributes and contents: values or subelements. It allows to input and output XML documents from/to strings and streams. The interface is designed to be easy to use and learn.

    1. Short introduction to data that XMLData class represents
    2. General notes about XMLData
    3. Creating XMLData objects
    4. Accessing element values
    5. Accessing subelements and their values
    6. Accessing element attributes
    7. Stream and string input and output
    8. Other methods


    1. Short introduction to data that XMLData class represents.

    Objects of class XMLData represent XML elements. These are in principle of two types (in simplified XML for the purpose of QADPZ):

    There can be more subelements with the same tags. In addition, XML elements can have attributes, such as:

          <Message Type="M_JOB_INIT">
            <Name>Dumb</Name>
          </Message>
    
    ...here the element Message has one attribute with name "Type" and value "M_JOB_INIT". The quotation marks are mandatory around the value and forbidden around the attribute name. There can be more attributes for one element (but they must have different names), for example:
          <Employee Type="Researcher" Position="Temporary">
             <Name>John</Name>
          </Employee>
    

    In addition, some elements have no subelements or value. They can be expressed in a simplified syntax, for example:

          <Greeting Type="Hello"/>
    or
          <Parallel/>
    

    Which is equivalent to:

          <Greeting Type="Hello">
          </Greeting>
    or
          <Parallel>
          </Parallel>
    
    XMLData class will always generate the simplified syntax when there are no subelements or value.

    The XML documents can contain comments, which start with '<!--' and are termintated with '-->'. In this version, comments are allowed to appear only before an element, not after it, and not before or after it's value content. For example:

          <JobID>
            <!--this is an OK comment-->
            <Name>Dumb</Name>
            <Number> <!--this is NOT OK comment--> 3 </Number>
            <!--this is also NOT OK comment-->
          </JobID>
    
    Finally, it is obvious, that the element values may not contain the '<' character, which starts another subelement. To overcome this problem, a special type of element is defined, which can contain any string terminated by ']]>' - so it obviously cannot contain ']]>'. The element has CDATA tag and starts with '<![CDATA['. For example:
          <![CDATA[This string contains also '<' character]]>
    
    XMLData class treats CDATA as any other element. It is distinguished by its tag CDATA.


    2. General notes about XMLData

    The XMLData is a universal class that holds a single element and its value or a list of subelements, which are again objects of class XMLData. Thus XMLData can be part of a list of elements and contains a pointer to the next XMLData in this list. The element attributes are stored as a linked list of XMLAttrib objects, which are just pairs of strings: name and value.

    XML elements are created by calling the XMLData constructors. Constructors and methods that take pointers to other XMLData objects and pointers to XMLAttrib objects don't create copies (with the exception of the copy constructor) of the objects passed in their arguments. Instead, their arguments have form of pointers and these methods store directly objects pointed by the pointers passed in their arguments. That means you should always use new operator to create new copy. For example, when inserting part of one message into another message, a copy has to be created with the copy constructor (i.e. just by calling new XMLData(...), instead of only passing the pointer to the XMLData object that is part of another message). On the other hand, methods that return pointers to XMLData objects, return pointers to objects contained in the XMLData objects without creating a copy (if not specified otherwise), so the returned objects don't have to be deallocated after they are used, but they should not be modified (unless they are modified :-)).

    Methods, which don't store the passed XMLData objects, (such as searching methods) should receive pointers to XMLData objects that are deallocated outside! (so no need to create a copy for these). See the method descriptions in header file for details.

    XMLData and XMLAttrib clases don't operate on standard C strings. Instead, they use a specialized clas CharStr, which is used to hold both static and dynamic string. <YOU_CAN_SKIP_THIS> Reasons for this decision come from the poor memory-management situation of C++. XMLData has to hold strings that are allocated in a dynamic memory when read from a stream and at the same time it has to hold strings that live in static memory. In this way, the destructor has no idea what to do with a string - dellocate or forget? There are only two possibilities: First is to copy everything, thus there would be only dynamic strings. This is not a very nice option, because a lot of strings (such as "SlaveInfo", "Message", and a lot of contents of the elements) would appear in the memory in many copies without the actual need for this. The second possibility is the specialized class, which will remember the type of the string and count references to dynamic strings. </YOU_CAN_SKIP_THIS>.

    Static strings are created this way:

    #define p_s1 (&s1)
    static CharStr s1("static string", STR_STATIC);
    
    Dynamic strings are created this way:
    CharStr *s2 = new CharStr("dynamic string");
    
    To obtain a real string from CharStr, there is a public str member variable:
      cout << "s1: " << p_s1->str << ", s2: " << s2->str;
    
    Thus there is only a little syntactic sugar to add compared to standard strings. Once a pointer to CharStr is passed to a method, that method takes care of it. That means, your program doesn't have to deallocate CharStr objects passed to XMLData methods. See the examples below. Those methods that don't store strings passed in their arguments take regular C strings.

    For convenience, you can forget about CharStr when passing strings to all methods and constructors. Instead of creating a dynamic string, e.g. x->set(new CharStr("new contents for element x")); you can just call x->set("new contents..."); However, the methods that return strings, return CharStr, so you still have to append '->str' suffix.

    NULL is not used in XMLData and XMLAttrib classes. Instead XMLData::Nil and XMLAttrib::Nil indicate that there are no subelements/attributes, or that there is no next subelement/attribute. As a result, using statements like data->sub("NonExistingSubcomponent")->getString(); will not cause the program to crash. Instead, this call will return CharStr::Error (unless there is a subcomponent called NonExistingSubcomponent).

    A complementary material to this tutorial is the xmltest.cpp program, which tests the functionality of the XMLData class.


    3. Creating XMLData objects

    XMLData class provides several constructors, which create objects representing XML elements. Let's start with examples (see also below for a different method of creating the same XML elements; also note that instead of using dynamic strings, it is better to use static strings defined somewhere else, see src/common/messages.h, src/common/messages.cpp):

    To create this Use this code fragment
    <Parallel/> XMLData *parallel = new XMLData("Parallel", XMLData::Nil);
    <Action>Stop</Action> XMLData *action = new XMLData("Action", "Stop");
    <JobId>
      <Name>Dumb</Name>
      <Number>3</Number>
    </JobId>
    
    XMLData *jobID = new XMLData("JobID",
                                    new XMLData("Name", "Dumb",
                                    new XMLData("Number", 3)));
    
    <Message Type="M_JOB_INIT">
      <Name>Dumb</Name>
    </Message>
    
    XMLData *msg = new XMLData("Message",
                                  new XMLData("Name", "Dumb"),
                                  XMLData::Nil,
                                new XMLAttrib("Type", "M_JOB_INIT"));
    
    <Message Type="M_SLAVE_AVAIL">
      <JobID>
        <Name>Dumb</Name>
        <Number>3</Number>
      </JobID>
      <Number>12<Number>
      <ReserveID>4015</ReserveID>
      <SlaveInfo>
        <OS>Win32</OS>
        <CPU>PIII/500</CPU>
        <Memory Unit="MB">50</Memory>
        <Disk Unit="MB">4</Disk>
        <IP>158.195.16.40</IP>
      </SlaveInfo>
    </Message>
    
    XMLData *msg = new XMLData("Message",
                                  new XMLData("JobID",
                                    new XMLData("Name", "Dumb",
                                    new XMLData("Number", 3)),
                                  new XMLData("Number", 12,
                                  new XMLData("ReserveID", 4015,
                                  new XMLData("SlaveInfo",
                                    new XMLData("OS", "Win32",
                                    new XMLData("CPU", "PIII/500",
                                    new XMLData("Memory", 50,
                                     new XMLData("Disk", 4,
                                      new XMLData("IP", "158.195.16.40"),
                                      new XMLAttrib("Unit", "MB")),
                                     new XMLAttrib("Unit", "MB")))))))),
                                  XMLData::Nil,
                                  new XMLAttrib("Type", "M_SLAVE_AVAIL"));
    
    Note: XMLData::Nil indicates that there are no more subelements. It is used either when the attributes argument is passed, or when creating a simple element without any value or subelements.

    The following constructors with some optional arguments can be used:

      //constructs element with given string content and attributes
      //(e.g. John)
      XMLData(CharStr *tag, CharStr *strValue, XMLData *next = XMLData::Nil, XMLAttrib *attributes = XMLAttrib::Nil);
      //constructs element with given double content and attributes
      //(e.g. )
      XMLData(CharStr *tag, double doubleValue, XMLData *next = XMLData::Nil, XMLAttrib *attributes = XMLAttrib::Nil);
      //constructs element with given attributes containing subelements
      //(e.g. ......)
      XMLData(CharStr *tag, XMLData *subElements, XMLData *next = XMLData::Nil, XMLAttrib *attributes = XMLAttrib::Nil);
    
      //sort of copy constructor  - copies substructures and also all elements that follow
      XMLData(XMLData *other);
    

    Finally, one can use the constructor that creates XMLData object directly from a string, such as:

         XMLData *action = new XMLData("<Action>Stop</Action>");
    
    However, this is a little bit less efficient, because the string has to be parsed. See also 7. Stream and string input and output.

    It can be useful to see if the constructor really constructed what it should just by printing out the XMLData object:

    XMLData *msg = new XMLData("Message", new XMLData("Name", "Dumb"),
                                          XMLData::Nil,
                                          new XMLAttrib("Type", "M_JOB_INIT"));
    
    cout << *msg;
    
    Remember to deallocatte the objects you create with delete, such as:
    XMLData *jobID = new XMLData("JobID",
                                    new XMLData("Name", "Dumb",
                                    new XMLData("Number", 3)));
    // ...use the jobID here...
    
    delete jobID;
    

    Another way how to create larger XML elements is using add(), sub(), set(), and setAttrib() methods (these methods are described below). The last element from the above table of examples can be created with this sequence:

    	XMLData *msg = new XMLData("Message", XMLData::Nil);
    	msg->setAttrib("Type", "M_SLAVE_AVAIL");
    	msg->add(new XMLData("JobID", XMLData::Nil));
    	msg->sub()->add(new XMLData("Name", "Dumb"));
    	msg->sub()->add(new XMLData("Number", 3));
    	msg->add(new XMLData("Number", 12));
    	msg->add(new XMLData("ReserveID", 4015));
    	XMLData *slaveInfo = new XMLData("SlaveInfo", XMLData::Nil);
    	slaveInfo->add(new XMLData("OS", "Win32"));
    	slaveInfo->add(new XMLData("CPU", "PIII/550"));
    	slaveInfo->add(new XMLData("Memory", 50, XMLData::Nil, new XMLAttrib("Unit", "MB")));
    	slaveInfo->add(new XMLData("Disk", 4, XMLData::Nil, new XMLAttrib("Unit", "MB")));
    	slaveInfo->add(new XMLData("IP", "158.195.16.40"));
    	msg->add(slaveInfo);
    


    4. Accessing element values

    The following methods return the value (contents) of the XMLData object:

      CharStr *getString();
      long getLong();
      double getDouble();
    
    The following example shows the use:
      XMLData *action = new XMLData("Action", "Stop");    // constructs Stop
      cout << action->getString()->str << '\n';  // prints the string "Stop" without q.marks
    
      XMLData *num = new XMLData("Number", 3);
      cout << num->getLong() << '\n';            // prints 3
    

    It is possible to modify the value of the element with the following methods:

      void set(CharStr *newContents);
      void set(double newContents);
    
    For example:
      // construct <Action/>
      action = new XMLData("Action", XMLData::Nil);
    
      // modify it to <Action>Stop
      action->set("Stop");
    
      // modify it to <Action>4
      action->set(4);
    


    5. Accessing subelements and their values

    Subelements can be accessed either directly or with help of subelement iterator. Subelement iterator is a pointer associated with every XMLData object. It always points to one of the subelements (if there are any). It is initialized to point to the first subelement.

    First of all similar to previous set() methods, there is a method that sets the subelements contents of the XMLData object:

      //sets the subelementlist of this element to a given list
      void set(XMLData *newContents);
    
    The newContents argument can either specify a single XMLData object, or a linked list of XMLData objects.

    To retrieve the subelements contents of some element, you can call method sub() with no arguments, which returns the subelement where the iterator is pointing. Since the iterator is pointing to the first element before you use it, sub() returns the pointer to the list of all subelements. If you are not sure whether the iterator is at the beginning, simply call reset() first (see below):

      //returns the subelement pointed by the iterator. If there are no elements, returns
      //XMLData::Nil
      XMLData *sub();
    

    You might need to obtain the number of subelements contained in an XML element:

      //returns the number of subelements
      int subCount();
    

    To retrieve a single subelement structure, the follwing method is recommended:

      //retrieves the first (or (skip+1)-th) subelement with a given tag
      //if the tag is the same as in the last call to sub() (please note that the pointer
      //has to be the same, not just strings equal), the method will start searching from
      //the element that follows the current iterator location,
      //otherwise it will start from the first subelement.
      //if the iterator is recursive, this will search recursively
      //the iterator will point to the located element.
      //If no element is found, XMLData::Nil is returned and iterator position is not changed.
      XMLData *sub(char *tag, int skip = 0);
    
    For example if msg points to an XMLData object containing the following structure:
    <JobId>
      <Name>Dumb</Name>
      <Number>3</Number>
    </JobId>
    
    The Number subelement can be retrieved and its contents printed with the following call:
     cout << msg->sub("Number")->getLong();
    
    Similarly, it is possible to modify the value of some subelement:
     msg->sub("Number")->set(5);  // will change the value of Number subelement to 5
    
    If there are several subelements with the same tag, the skip argument can be used. For example for the following XML element:
    <TaskInfo>
      <Library>
        <OS>Win32</OS>
        <URL>http://www.microsoft.com/qadpz.exe</URL>
      </Library>
      <Library>
        <OS>Linux</OS>
        <URL>http://www.linux.org/qadpz.so</URL>
      </Library>
    </TaskInfo>
    
    We can obtain the second library subelement like this:
     cout << msg->sub("Library", 1)->sub("OS")->getString()->str; // will print "Win32"
    
    We can also obtain directly the second URL subelement, if we search recursively. For that, the iterator has to be reset as recursive:
      msg->reset(0, IT_RECURSIVE);
      cout << msg->sub("URL")->str;
    
    To add a new subelement, the following method can be used:
      //adds given subelement list at the end of the subelement list of this element
      void add(XMLData *newContents);
    
    For example, it is possible to add new library to the previous example structure like this:
     msg->add(new XMLData("Library",
                            new XMLData("OS", "Solaris",
                            new XMLData("URL", "http://www.sun.com/qadpz.so"))));
    
    To remove some subelement, use the following method:
      //removes single subelement specified by its tag, the search starts at
      //the first subelement of this element. If the iterator is recursive,
      //the search is recursive
      //iterator is reinitialized to point at the first element before returning
      void remove(char *tag);
    
    For example to remove the Win32 library from the above example (if we know that it is the first library subelement), we can use the following:
      msg->remove("library");
    
    It is possible to search for subelements with specifying more details. For example, this will locate a Linux library in the above example:
      XMLData *searchLinux = new XMLData("Library", new XMLData("OS", "Linux"));
      XMLData *linuxLib = msg->sub(searchLinux);
      delete searchLinux;
    
      cout << linuxLib->sub("OS")->str;  // will print "Linux"
      cout << linuxLib->sub("URL")->str; // will print "http://www.linux.org/qadpz.so"
    
    Similarly, it is possible to delete a subelement structure with specifying more subdetails:
      XMLData *searchWin32 = new XMLData("Library", new XMLData("OS", "Win32"));
      msg->remove(searchWin32);		// will delete the whole Win32 library subelement
      delete searchWin32;
    
    The following variants of methods were used:
      //advanced version of sub()
      //retrieves the first (or (skip+1)-th) subelement which matches a subelement described by
      //the match argument. Match argument may describe one subelement with possible attributes and
      //subelements embedded inside of this element. The located subelement will contain all the
      //specified attributes with specified values and will contain all specified subelements,
      //sub-subelements, etc. but may contain other subelements, which are not specified in the
      //match structure. However, the order in which the subelements appear in match must be the same.
      //If you want exact match, set the exact arguement to 1.
      //If the contents contain numbers, they have to be in the same format to match.
      //if the iterator is recursive, this will search recursively
      //the iterator will point to the located element.
      //If no element is found, XMLData::Nil is returned and iterator position is not changed.
      //the match argument will not be dealocated, so make sure you dealocate it yourself
      //(i.e. in this case don't use new XMLData(...) just as an argument)
      //this starts to search from the first subelement, unless the match is the same pointer
      //as when the sub() with match was called last time. In the later case, the search starts
      //with the element that follows after the iterator.  
      XMLData *sub(XMLData *match, int skip = 0, int exact = 0);
    
      //advanced version of remove, see description of advanced sub()
      void remove(XMLData *match, int exact = 0);
    
    It is also possible to retrieve or remove all subelements of certain type:
      //retrieves all elements with a given tag, returns a newly constructed element list - completely copied
      XMLData *subAll(char *tag);
    
      //removes all subelements with this tag
      void removeAll(char *tag);
    
    For some operations, it might be necessary to iterate through subelements with subelement iterator. The following methods provide this functionality:
      //sets the iterator to point to the first subelement or subelement with a given index
      //if recursive is set to 1, the iterator will recursively enter all subelements of the
      //elements in the subsequent calls to other methods
      //However, the index is relative to the list of subelements on the top level in this call.
      //Note: reseting the iterator will reset the iterator of all subcomponents.
      //reset returns this so that it can be called in a sequence of commands, such as:
      //x->reset()->sub("SlaveInfo");
      XMLData *reset(int index=0, int recursive = 0);
    
      //moves the subelement iterator to the next subelement and returns a pointer to it. If iterator
      //is recursive, the recursive step is performed. Returns XMLData::Nil if there are no more elements.
      XMLData *next();
    
      //returns non-zero, if the current subelement is not the last one (i.e. the next call to next()
      //would return XMLData::Nil). Doesn't move the iterator.
      int more();
    
    Then it is possible to insert a new subelement at a certain location with the help of the following method:
      //inserts subelement(s) after the subelement pointed by subelement iterator
      //if the iterator is recursive, the subelement(s) is(are) inserted at the current level
      //of the iterator
      //if there are no subelements, the new subelement(s) is(are) simply added.
      //the argument will become part of this object and will be dealocated when the object is
      //destroyed, use new XMLData(...) as an argument.
      //if you specify the second argument ontop as nonzero (e.g. INSERT_TOP), the new element
      // will be inserted as the first subelement and iterator will not be used.
      void insert(XMLData *subElement);
    
    For example, if we want to insert the <Parallel/> and GroupKillTimeout subelements into the following structure, we can do it this way:
      <Message Type="M_SLAVE_RESERVE">
        <Number>15</Number>
        <JobID> ... </JobID>
        <SlaveInfo> ... </SlaveInfo>
      </Message>
    
      // assume msg already contains pointer to XMLData object with above element
    
      msg->sub("Number");
      msg->insert(new XMLData("Parallel", XMLData::Nil,
                  new XMLData("GroupKillTimeout", 120)));
    
      // the resulting structure will be as follows:
    
      <Message Type="M_SLAVE_RESERVE">
        <Number>15</Number>
        <Parallel/>
        <GroupKillTimeout>120</GroupKillTimeout>
        <JobID> ... </JobID>
        <SlaveInfo> ... </SlaveInfo>
      </Message>
    
    Not all the possible uses of the methods were covered here, for more details, consult the xmldata.h file.


    6. Accessing element attributes

    The element attributes can be accessed either directly or using an attribute iterator. The number of attributes is returned by attribCount():

      //returns the number of attributes of this element
      int attribCount();
    

    To retrieve a single attribute of an element, use the following method:

       //retrieves the value of a given attribute,
      //If such attribute doesn't exist, XMLAttrib::Error is returned
      CharStr *getAttrib(char *attrName);
    
    For example this call will retrive the type of a message:
    /*  assume msg points to the following message:
    
      <Message Type="M_SLAVE_RESERVE">
        <Number>15</Number>
        <JobID> ... </JobID>
        <SlaveInfo> ... </SlaveInfo>
      </Message>
    */
    
      cout << msg->getAttrib("Type")->str;   // will print "M_SLAVE_RESERVE"
    
    To set the value of an attribute, one of the following methods should be used:
      //sets value of single attribute, if such attribute doesn't exist, it is created,
      //and the optional index can be either -1 (default),
      //then the attribute is appended, if index is 0, it is inserted as the first one,
      //otherwise it is inserted after the attribute on index-th position.
      //If index is larger than the number of attributes, the attribute is just appended
      //If attribute with attrName already exists, index is ignored, attribute is set
      // and the attrName CharStr object is deallocated!
      void setAttrib(CharStr *attrName, CharStr *attrValue, int index = -1);
    
      //for convenience, this takes numeric value and converts it to string
      void setAttrib(CharStr *attrName, double attrValue, int index = -1);
    
    For example to set the type of the message, the following can be used:
      msg->setAttrib("Type", "M_JOB_INIT");
    
    It is possible to remove attribute(s) from the element with the following methods:
      //removes single attribute specified by its name
      void removeAttrib(char *attrName);
    
      //removes all attributes
      void removeAttribs();
    


    7. Stream and string input and output

    It is possible to construct an XMLData object by reading the element or list of elements from an input stream either by calling constructor:

      //reads ONE element ... from the input stream. If the element has subelements,
      //all the subelement objects are constructed and dynamically allocated.
      XMLData(istream &s);
    
    or using an overloaded >> operator.

    Similarly, it is possible to print the XMLData object to output stream either with an overloaded << operator or using the method:

      //outputs the text representation of this component and following
      //elements to the output stream you can specify indent, if you don't
      // want to print from the first column
      //set following to 0 in order to disable printing of the following
      //elements. By default, the whole list is printed.
      void print(ostream &s, int indent = 0, int following = 1);
    
    Analogically, the input can be taken directly from a string:
      //constructs the objects from string the same way as if read from a stream
      XMLData(char *input);
    
    or printed to a string:
      //outputs the text representation of this component and following elements as string
      void print(char *buffer, int maxlen, int indent = 0, int following = 1);
    


    8. Other methods

    It is also possible to access or modify the tag of the element:

       //returns element type
      // (returns CharStr::Error if called on XMLData::Nil)  
      CharStr *tag();
    
      //change/set element type
      void setTag(CharStr *tag);
    


    7.3 Documentation for PostOffice class

    The PostOffice class provides a mechanism for bidirectional communication of processes running on remote (or possibly also local) machines that are interconnected with a network running TCP/IP protocol (such as machines connected to the Internet). The communication consists of sending and receiving XMLData objects. A single PostOffice object can be used by several threads of a process owning the PostOffice object at the same time.

    To use the PostOffice, the object has to be constructed. It is possible to specify the socket port at which the messages will be received:

        // construct PostOffice, incomming packets will be expected
        // at specified port, 0 means use default port
        // n specifies how many ports should be used for sending/receiving
        // the ports are used in a random manner. The application can
        // get the port numbers by calling getlocalPort()
        PostOffice(int port=0, int n = 1, int incremental = UDP_PORT_ITERATE);
    
    On the sending side, a thread can send an XMLData object to a remote process by calling the following method:
        //sends XMLData to a remote, returns 1 only after the remote
        //successfully received and confirmed the data, returns 0, if
        //there was an error while sending.
        //(there is a small chance in case of network congestion
        // that the peer received the data but the confirmation
        // did not come back)
        //if remote is not specified, the default remote is used
        int send(XMLData *data, Address *remote = NULL);
    
    Where data is the object to send and remote is the address (IP address and port) of the remote post office. This method returns only after a receipt confirmation from receiver has been returned. If the confirmation is not received after several retry attempts with timeouts, the method returns 0.

    On the receiving side, a thread can receive message in two ways: blocking and non-blocking. The blocking version of receive will block the calling thread until a new message is received. The non-blocking version of receive will return immediatelly regardless whether the message was received. All received messages are queued in the PostOffice until some thread will collect them with some receive call. In addition the calling thread can either request a message from particular Address, or it can receive any message.

        //receives the next XMLData received from a remote. Returns
        //newly constructed object. If there is no data in the queue,
        //waits until some data arrives.
        //if remote is not specified, the default remote is used
        XMLData *receive(Address *remote = NULL);
    
        //non-blocking version of receive - returns XMLData::Nil if there
        //is no XMLData from remote, in the queue. Otherwise returns a new
        //XMLData with the data from a remote which are on top of the queue.
        //if remote is not specified, the default remote is used
        XMLData *receiveN(Address *remote = NULL);
    
    
        //receives the next XMLData object from any remote and stores
        //the source address to *remote. If there is no data in the queue,
        //waits until some data arrives.
        XMLData *receive_any(Address &remote);
    
        //non-blocking version of receive_any - returns XMLData::Nil, if
        //there is no XMLData in the queue. Otherwise returns the first
        //XMLData object from the top of the queue. Please note that
        //the order of messages doesn't have to be the same
        XMLData *receive_anyN(Address &remote);
    
    Finally, if the PostOffice is used to communicate with only single remote process (or at least most of the time), it is possible to set a default Address by calling method:
        //sets the notoriously used remote address.
        int setRemote (Address &remote);
    
    Then the remote argument to send(), receive() and receiveN() may be omitted and the default remote address will be used.

    The following is an example application that uses PostOffice to send and receive XMLData objects:

    pofficetest.cpp


    7.4 Documentation for Crypter class

    //TODO: document crypter


    7.5 Credits and licensing issues

    Q2ADPZ has been developed by Atle Diego Pavel & Zoran, (c) 2001.

    //TODO: GPL