Q²ADPZ User and Developer Manual

About Q²ADPZ
1. How to read this document
How does it work?
Requirements and supported platforms
Security issues
Installation and maintenance guide
User modes of operation
Appendixes

1 About Q²ADPZ

The recent growth of computational power of desktop computers calls for their efficient use in larger organizations, especially those which need to run computationally intensive tasks, such as universities and research centers.

Q²ADPZ ['kwod 'pi: 'si:] is a modular C++ implementation of a free, open source, multi-user, multi-platform system for distributing computing requests in a TCP/IP network. The users of the system can submit, monitor, and control computing tasks (grouped into jobs) to be executed by computers participating in the Q²ADPZ system in form of dynamic shared libraries, executables, or interpreted programs (including Java applications). Users can provide software, hardware, and platform requirements for each task and the proper computer is automatically selected. The system automatically delivers the input and output data files. Computers executing tasks detect users logging in, and the tasks are terminated or moved to other computers to minimize the disturbance of regular computer users. Q²ADPZ can operate both in conditions of an open Internet environment or of a closed local TCP/IP network. Internal communication protocol is based on optionally encrypted XML messages. The system provides basic statistics information on usage accounting. Several user modes are supported: from novice users submitting simple binary executable programs to advanced users who can alter the internal communication interfaces for their special needs. We are currently using the system for research tasks in the areas of large scale scientific visualization, evolutionary computation, and simulation of complex neural network models.

Q²ADPZ is being developed by a team in the Division of Intelligent Systems, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway. The best way to contact us is to write an e-mail.

1.1 How to read this document

This document is meant to be a complete manual for all users and developers of Q²ADPZ. In other words, all you need to know should be found here. If it is not, it means it was not written yet. Of course, you don't have to read it all. Here is a guide for you which tells you which sections are important for you.

Pretty much everybody should look at chapter explaining How does it work?.

If you are a regular user and you only want to submit your tasks to an already installed Q²ADPZ system, you only need to read Starting and configuring jobs using qadpz_run menu system.

If you find qadpz_run too tiresome, you don't mind editing XML-formatted files manually, or you want to generate project files from your application, consult the section on Preparing and editing the project file manually. You may also want to read Configuring clients to learn about the options in the client config file. You will also need to contact your local Q²ADPZ administrator who will create a new Q²ADPZ user account for you.

If you want to use Q²ADPZ for more advanced projects, control starting of the tasks in your jobs from your application, use task libraries instead of executable or interpreted tasks, you may want to read Writing your own client application and Writing your own slave library.

If you are interested to set up a new Q²ADPZ system at your location, you may want to become a Q²ADPZ administrator, and then you need to read the full Installation and maintenance guide, and at least the first section of the chapter explaining User modes of operation. You probably want to look at Requirements and supported platforms, and Security issues.

If you would like to join the huge crowds of Q²ADPZ developers, it is very advisable that you read Understading the internal communication protocol and Participating in the Q²ADPZ development before you contact us.

If you would like to reuse some of the components used in the system, one of the appendixes might serve your needs.

We hope you will have fun reading this manual and if there is something missing that you need to know, drop as a line: Zoran, Pavel, Atle and Diego.

And one more wish from us: don't print this document. It works much better on the screen and you can save your paper for something else.

2 How does it work?

A Q²ADPZ system consists of one master, many slaves and multiple clients who are delegating jobs to master (you may wish to see the terminology).

All slaves that participate in the system are running a slave service program (a small resident application that accepts the tasks to be computed). Master is also running permanently and it keeps track of what are the slave computers doing: if they are idle, busy computing some Q²ADPZ task, or disabled because some user is logged in.

When a user wants to use the system, (s)he prepares an user application, consisting of two parts: a slave user program - the code that will be doing the desired computation after being distributed to the slaves, and the client - that will be generating jobs to be computed.

Each job consists of a set of tasks, and each task is generated by a client. (Don't be scared away: if you have a working executable or interpreted application, you can use the Q²ADPZ right now, without writing anything; but you probably want to know what's happening with it).

The main role of the master (the central component of the whole system) is to maintain the current availability status of the slaves, and to start and control the tasks. A client doesn't communicate with the slaves directly, instead it sends all its requests to the master.

As indicated above, there are several user modes. To keep things simple, one can use the Q²ADPZ standard client (qadpz_run), which allows to set up and submit a job. The job description is saved into an XML-formatted project file and can also be manually editted by more advanced users. Alternately, a user might want to write his or her own client application to have a full control over the submission of tasks (for example, if one needs to wait for the results of computations from a few first tasks and then, based on these results, to send either one or another group of tasks).

It is also possible to directly write a slave library to speed up the execution. In that case, the slave service or deamon will not start a new process with the downloaded executable, but a dynamic shared library will be loaded by the slave service process (deamon/service).

Please read the sections on User modes of operation and see the example source codes.

3 Requirements and supported platforms

Q²ADPZ works both on UNIX (including MacOS X) and Windows platforms. It works very well also in multiplatform environments - every node of your installation can be any of the supported platforms. The master and client nodes and master and slave nodes need to be connected by a TCP/IP network (we use only UDP protocol). Although we plan to provide precompiled binaries for the supported platforms, it is recommended that you compile the package yourself (especially for UNIX platforms). In addition to C++ compiler, the following libraries are required:

libcurl 7.8 or later (http://curl.haxx.se/), if you will use data file server, which is required for basic and intermediate user modes
libcrypto (from OpenSSL 0.9.6 or later, if you need to crypt the communication stream through master from clients to slaves (otherwise the system has no protection against possible misuse of slave computers)
OpenSSL utilities for generating master keys, if you will use security
www server installed on any computer reachable from your network that supports perl cgi-scripts and perl, if you will use data file server, which is required for basic and intermediate user modes
optionally www server on the master computer to view the current jobs, tasks and slaves status produces by master as html file

The list of supported platform configurations is shown in the following table:

Operating System CPU Compiler Comments

Linux i*86 GNU C/C++ (ver. 2.96, 2.95, 2.91) ok

Linux sparc GNU C/C++ (ver. 2.95) ok

Linux sparc64 GNU C/C++ (ver. 2.95) ok

FreeBSD i*86 GNU C/C++ (ver. 2.95, 3.0) ok

SunOS sun4m GNU C/C++ (ver. 2.95) ok

SunOS sun4u GNU C/C++ (ver. 2.95, 3.0) ok

Win32 (Win2k, WinME, Win98, WinXP) i*86 MSVC++ (ver. 6.0) ok

IRIX64 IP27 MIPSpro C++ (ver. 7.3.1.2m) ok

Operating System	CPU	Compiler	Comments
Linux	i*86	GNU C/C++ (ver. 2.96, 2.95, 2.91)	ok
Linux	sparc	GNU C/C++ (ver. 2.95)	ok
Linux	sparc64	GNU C/C++ (ver. 2.95)	ok
FreeBSD	i*86	GNU C/C++ (ver. 2.95, 3.0)	ok
SunOS	sun4m	GNU C/C++ (ver. 2.95)	ok
SunOS	sun4u	GNU C/C++ (ver. 2.95, 3.0)	ok
Win32 (Win2k, WinME, Win98, WinXP)	i*86	MSVC++ (ver. 6.0)	ok
IRIX64	IP27	MIPSpro C++ (ver. 7.3.1.2m)	ok

We have also experimented with the following configurations:

Operating system CPU Compiler Notes

Darwin/MacOS X Power Macintosh Apple C/C++ (ver. gcc-932.1) (aka GNU C/C++ ver. 2.95) slave CPU usage high!!

Linux i*86 GNU C/C++ (ver. 3.0) failed to compile qadpz_run_edit due to changed ios interface!?

Win32 (WinNT) i*86 MSVC++ (ver. 6) failed to compile slave due to inexistent CreateToolhelp32Snapshot() calls

SunOS ipc86 GNU C/C++ (ver. 3.0) linking errors for slave shared libraries!?

Operating system	CPU	Compiler	Notes
Darwin/MacOS X	Power Macintosh	Apple C/C++ (ver. gcc-932.1) (aka GNU C/C++ ver. 2.95)	slave CPU usage high!!
Linux	i*86	GNU C/C++ (ver. 3.0)	failed to compile qadpz_run_edit due to changed ios interface!?
Win32 (WinNT)	i*86	MSVC++ (ver. 6)	failed to compile slave due to inexistent CreateToolhelp32Snapshot() calls
SunOS	ipc86	GNU C/C++ (ver. 3.0)	linking errors for slave shared libraries!?

If you have tried to use Q²ADPZ on some other platform, please let us know.

4 Security issues

Because of the unreliability of the TCP/IP protocol, it is not guaranted that the executional tasks arriving to the slave computers are undubtedly sent by master. This is a serious security threat since it allows for a maliscous hacker to submit any piece of code to the slave nodes (IPspoofing). For that reason, and on the cost of a decreased performance, all communication from clients to master and from master to slaves is crypted or signed. Particularly, the data flow from client to master has to be authorized by a Q²ADPZ user name and password and crypted by a master public key. The data flow from master to slaves is signed by a master private key and the authenticity is verified by master public key on slave nodes.

It is important to note that the data flow from slaves to master and from master to clients is not crypted nor signed, which means that a maliscous hacker can monitor (packet sniffing) or alter (IPspoofing) the data or control information arriving back to master or client nodes and thus:

put the slave nodes out of operation
put the master node out of operation
modify the result data submitted by slaves
do any other kind of harm to the computational process

In other words, the current Q²ADPZ security scheme is designed to protect the security of the computers in the network, i.e. a maliscous hacker cannot submit an alien piece of code to be executed instead of a user computational task. However, this scheme doesn't protect the Q²ADPZ user data. We are considering to allow optional data integrity in the future versions of Q²ADPZ.

To summarize:

There are two ways of Q²ADPZ use: with crypting/signing protection and without. The planned precompiled libraries will use encryption. To compile Q²ADPZ without ecryption/verification, set the variable HAVE_OPENSSL = 0 in file Makefile.base. The following applies only to crypting/signing mode:
Each user of Q²ADPZ needs a user name and password in order to submit jobs/tasks. The user names are stored at master and created/modified by qadpz_admin utility.
Public and private keys for each Q²ADPZ installation must be generated during install, the public key has to be installed on all client and slave nodes. The keys are generated using command make keys in bin directory.

5 Installation and maintenance guide

Q²ADPZ runs on multiple computers in a TCP/IP network and therefore the installation requires careful attention. The authors tried to simplify the installation procedure wherever possible. The following sections guide you through the necessary installation steps.

5.1 Obtaining latest working version

The official web site for downloading is: http://sourceforge.net/projects/qadpz/. Here you can find the latest version of the source code. We intend to make also binary distributions for the main platforms in the near future.

The official web site for the project description is: http://qadpz.idi.ntnu.no/. You can find here the latest information about the project. There is also a web interface to the latest development CVS repository. We don't give yet public access to this CVS tree.

5.2 Compiling the source code

After uncompressing the package, the following directory structure appears:

src/ the source code for all the components, contains Makefile.base and the main Makefile.

bin/ the compiled binaries are placed here; this directory contains also sample project files (*.xml), public and private keys (pubkey, privkey) and crypted list of users users.txt.

doc/ all Q²ADPZ documentation

sample/ sources of sample applications (dumb - library-type application; simple - executable/interpreted type of application)

include/ include files from curl and openssl which are required for compilation; please set the CFLAGS, CFLAGS_CURL, CFLAGS_OPENSSL variables in Makefile.base based on your installation of the libraries

There are several ways how the system can be compiled:

with or without support for the file data server (which is required for both basic and intermediate user modes)
with or without support for system integrity protection using crypting/signing

To modify these options, set the variables HAVE_CURL and HAVE_OPENSSL in file Makefile.base to 0 or 1.

In order to compile the system, the libcurl and libcrypto from OpenSSL are needed.

Provided makefiles are for GNU C++. For Windows platforms, MSVC++ 6.0 project files and workspace is provided.

When ready, go to the src/ directory and type make (or build all projects in the MSVC workspace). The compiled binaries for all components including the sample applications will be placed in the bin/ directory. Go to bin/ directory and type make keys. The public and private master keys will be generated (files: pubkey, privkey). Remove the users.txt file if present and create an admin user name and password using qadpz_admin -a admin. Your system should be compiled and ready for further installation.

5.3 Installing components

Three main componenents have to be installed: master - installed on a single computer in the Q²ADPZ network, clients - installed on all computers from which users will be submitting their tasks, and slaves - installed on all computers which will compute the slave user tasks. In addition, a data file server for basic and intermediate user modes have to be configured.

5.3.1 Installing master

The following steps are needed to install master:

create a qadpz and qadpz/bin directories somewhere in your directory tree
copy the qadpz_master and qadpz_admin executables into your qadpz/bin subdirectory
copy also pubkey, privkey, users.txt files - also to qadpz/bin subdirectory, if you use encryption
copy the file master.cfg into bin subdirectory and edit it (all settings are documented in this file, set the location of the status html file here as well)
create qadpz/log subdirectory - optional logs will be saved here

5.3.2 Installing slaves

The following steps are needed to install slave:

create a qadpz and qadpz/bin directories somewhere in your directory tree
copy qadpz_slave executable into your qadpz/bin directory
copy slave.cfg and pubkey files to qadpz/bin directory
create a qadpz/tmp or /tmp/qadpz directory and give the user under which the qadpz_slave will run full access to this directory, at the same time restric access for this user to all other resources
on Windows platforms, install the service by running qadpz_slave -install (uninstall with qadpz_slave -remove), on UNIX platforms, update your system boot files so that qadpz_slave deamon will be automatically started

5.3.3 Installing clients

The following steps are needed to install client:

create qadpz and qadpz/bin directories somewhere in your directory tree
copy qadpz_run executbale into qadpz/bin directory
copy client.cfg and pubkey files into qadpz/bin directory as well as example project files (*.xml)
copy also the sample directory with example applications
for advanced user mode, you also need to copy:
- client service API library (libcli_srv or cli_serv.lib)
- Q²ADPZ common library (libcommon, common.lib)
- header files (src/*/*.h, src/*/*/*.h)

5.3.4 Installing the webserver-based data store

You need to have a www server supporting perl cgi-scripts installed.

copy the files from src/wscript qadpz distribution directory somewhere to your public www space on this server so that cgi-scripts from this directory can be executed,
set the location of these scripts in client.cfg and slave.cfg files before installing them accordingly
create qadpz data directory somewhere on your system and make sure that the cgi-scripts will have read and write access; Set the q2adpz_home and q2adpz_www variables in upload.cgi, download.cgi, delete.cgi to point to this directory.

5.4 Configuring master and Q²ADPZ users

The configuration of master is done through master.cfg config file, the individual settings are documented there. The master_port variable must match the master_port variables in client.cfg and slave.cfg files. In normal operation, you might want to turn all logs off, since when tens of slave computers are present, their size grows extensively.

When starting master, the current status can be either displayed on console, redirected to a file (for example special pipe file), or turned off completely (by redirecting the console to /dev/null, or NUL), while still producing html status file that can be viewed even remotely, if www-server is installed on the master computer.

The qadpz_admin utility can be used to manipulate the list of Q²ADPZ users and their passwords.

The keys should be generated in the distribution directory using make keys command and copied to master qadpz/bin directory.

Some platforms require random.bin file with random seed TODO: PLEASE EXPLAIN THIS.

5.5 Configuring slaves

The slave is configured through slave.cfg config file, the individual settings are documented there. A temporary directory, where downloaded libraries, programs and data files are saved should be created and accessible for read and write access for the user account under which the qadpz_slave runs. Access to other computer resources for the same account should be restricted. A copy of the master public key (pubkey) should be copied locally to each slave.

For each software that should be detected by the slave and reported to the master, several variables: software, soft_version, soft_detect, soft_detrow, soft_detword should be specified. The autodetection mechanism executes soft_detect command (including the specified command line) and takes the word that occurs at soft_detrow line as number soft_detword from the beginning of the line. This word should describe the version of the software, for example, settings:

software JDK
soft_version unused
soft_detect java -version
soft_detrow 1
soft_detword 3

are used to detect the version of the JDK. The entry soft_version is used only if soft_detrow and soft_detword are set to 0 - in that case the version autodetection is not used, even though the soft_detect command is executed and if the exec fails, the software is not reported.

5.6 Configuring clients

The client is configured through client.cfg config file, the individual settings are documented there. The public master key should be available.

5.7 Upgrading to a new Q²ADPZ version

The system contains the feature of automatic upgrade of all slaves to a new version. A new slave binary is distributed from master to all slaves (provided that the system was compiled with HAVE_CURL setting). The procedure is started by qadpz_admin utility. Because some of the slaves in the Q²ADPZ network might be unreachable, or busy computing some task, which should not be interrupted, there are several different modes for upgrade:

normal upgrade: all reachable slaves are upgraded (switch -u)
safe upgrade: all reachable slaves that are not busy are upgraded (switch -u2)
permanent upgrade - the slaves, when become ready (are turned on or finish computational task) are automatically upgraded if their version is older (to turn on: switch -up, to turn off: switch -up2) - this setting persists even after master is shutdown and started anew.

The upgrade is performed only for the specified platforms (os/cpu) - the new version of the qadpz_slave has to be available at a URL specified in the command line to qadpz_admin -u*.

For technical details of upgrade see the comments in the source code.

5.8 Analyzing the runs: logs and statistical information

There are two kinds of logs produced:

Q²ADPZ logs - the messages transmitted between the individual components are saved into qadpz/log directory (the location can be changed with option in the master.cfg file). Each master session creates a new directory, the symbolic link last points to the last session. In each session directory, messages related to jobs are saved into jobs subdirectory, where a subdirectory for each job exists and contains files for all tasks that belong to that particular job. Each message is time stamped. All messages related to slaves are saved in the slaves subdirectory, for each slave a file named as the IP-address of the slave exists. For technical details of the Q²ADPZ protocol, see the section describing it.
debug logs, where multiple threads and subcomponents of each component save various debug information. Use these only if in hacker user mode.

In addition, master produces the interim status html file, it's location can be specified in master.cfg configuration file. The future versions of the system will generate statistical log information.

6.User modes of operation

Basic. This is a likely choice of a typical user, who has an executable program running on his workstation and needs to submit it to Q²ADPZ to run it multiple times, with possibly different input files. This can be done using an interactive menu-driven qadpz_run utility, which can be also started with command line arguments, for example from a script.
Intermediate mode user also uses the qadpz_run program to submit tasks to Q²ADPZ. The difference is that the user edits the project configuration file manually (or generates it from his/her user application). This requires the user to understand the format of the XML-formatted project file. On the other hand it improves the possibilities how the user can configure his or her project. Both basic and intermediate modes work with static projects - the whole computational project is configured before it is started.
Advanced mode requires the user to write its specialized client application using the client service library API. The tasks are being submitted and results are obtained by calling C functions from the API. This gives the user full control over the sequence of tasks that are submitted, however, at the moment there is no support for universal slave library, which starts executable slave tasks. This means that the user either has to also write the slave library, or has to implement the client part of the QADPZ user protocol. In other words, the input and output files are not handled automatically, and the user has to explicitly control when and which files are uploaded and downloaded to/from the data file server using the calls to C-functions in the client service library API.
Hacker mode means modification of the QADPZ sources to fit your particular application needs. The users can either alter the client and slave service APIs, modify the Q²ADPZ user protocol, or modify the source code as required. For this purpose, user might want to study the documentation of the internal protocol that is based on message exchange among the master, slave, and client components, and the documentation of classes in the appendixes.

6.1 Starting and configuring jobs using qadpz_run menu system (basic mode)

For the description of the arguments accepted by qadpz_run, execute qadpz_run -h. When qadpz_run is started without arguments, it enters the interactive mode. Each time qadpz_run asks the user to enter some value, it usually displays a default (or previously set) value in brackets. For example, just after it is started, it asks for the name of the project configuration file:


Enter project file to edit [example.xml]:

By simply pressing ENTER, the user confirms a value offered in the brackets.

After the project file is specified, the main menu is displayed.


  1.   Job name: 'name_of_your_job'
  2.   Task menu (? task groups)
  3.   Save & run (project_file.xml)
  4/q. Exit (w or w/o save)

qadpz_run>

When you see this prompt, type in your choice from the menu and press ENTER. Enter 1 to change the name of your job, Enter 4 to exit qadpz_run without starting the job, Enter 3 to save the possible changes you might have done to the project config file and start the job now. To specify or change the tasks in this job, choose option 2 to enter the task menu:

In the task menu, the list of task groups that are currently defined for this job is displayed. You can add a new task group, modify or remove the existing one, duplicate an existing task group to create a new one, and exit the task menu.

When you choose to edit a task, a task editting menu appears, see below.

When you are adding a new task group, you need to supply a new task id - an integer that uniquely identifies the first task of this group within your job. Other tasks in the same group will receive ids with consecutive numbers (i.e. if you set the task id to 5 and there are 10 tasks in this group, they will have ids 5 through 14). The tasks within a group are started in the order of the task ids. The second information you need to supply is the type of your task: Executable program, Interpreted program, or QADPZ library. If you have a binary executable, enter E, if you need an interpreter to execute your program (for example Java Virtual Machine, or Lisp interpreter, enter I, and if you wrote your specific QADPZ slave library, enter L. What follows is a task editing menu.

In task editting menu, the user can view and modify the following options:

Task id: 1,  type: Executable, p. Serial group

1. number of runs: 3
2. datafiles in: './datafiles/'
3. URL of data server: 'http://increment/qadpz/cgi-bin'
4. Don't start before task with id=< not defined > finished
5. Platforms defined: 3...
6. Input files: 1...
7. Output files: 1...
9. Utilities to execute when tasks finished: 1...
0/q. Back to task menu

User specifies the number of runs in this task group. Each run can be refered to by its run number 1,2,3,... and the task id for each task in the group is generated as task id + 0, task id + 1, etc.. In the second option, user specifies the directory where all input and output data files are (will be) located, relative to the current directory where qadpz_run has been started (or absolute path, if you wish). If you use any input or output files, or if your executable is not yet available for download from some URL (that means it is located only in your file system), you need to specify the URL of data server. You can probably leave this option unmodified, or ask your local Q²ADPZ administrator. The option 4 allows you to specify another task in your job that must finished before this task is started. Precisely, you have 3 options:

all tasks in this group can be started when a task specified by a task id is finished (semaphore),
all tasks in this group are finished only after all tasks from the group specified by a task id are finished (barrier),
each task in this group can start only after a corresponding task from another task group if finished. A corresponding task is a task with the same run number (autoincrement).

For example, imagine there's one task group with task id 10 and number of runs: 5, and there's another task group with task id 20 with 10 runs.

The options 5 through 9 enter menus for defining task platform-dependent details, input and output files, and the utilities, these menus are described below.

6.2 Preparing and editing the project file manually (intermediate mode)

TODO: (Pavel) explain the format

example.xml   example client configuration file, contains list of tasks to run

format: the file contains several task and reservation descriptions, each consisting
        of the following (see spec. of TaskInfo). Everything is encapsulated in
		a <Job> tag.

        <Job Name=string>

          <Task ID="integer" Type="Library|Executable|Interpreted">
           [<ReserveName> ... </ReserveName>]
            <TaskInfo> ... </TaskInfo>
            <RunCount>integer</RunCount>
           [<DataPathPrefix>string</DataPathPrefix>]
           [<InputFile[ From="run#" To="run#"][ Constant="Yes"]>string</InputFile>]
           [<OutputFile[ From="run#" To="run#"][ Constant="Yes"]>string</OutputFile>]
           [<FilesURL>string</FilesURL>]
           [<CDataInputFile>string</CDataInputFile>]
           [<CDataOutputFile>string</CDataOutputFile>]
           [<Utility[ From="integer" To="integer"]>string</Utility>]
          </Task>
  or:
          <Reserve>
             <ReserveName> ... </ReserveName>
             <SlavesRequired> .. </SlavesRequired>
            [<Parallel/>]
             <SlaveInfo> ... </SlaveInfo>
          </Reserve>
or

       </Job>

The TaskInfo has the following form:

   <TaskInfo>
      <OS>string</OS>
      <CPU Speed=integer>string</CPU>
      <Memory Unit="MB">Integer</Memory>
      <Disk Unit="MB">Integer</Disk>
     [<Software version="string">string</Software>]
      <TimeOut>Integer</TimeOut>
      <URL>url_string</URL>
      <Executable Type="File|URL">string</Executable>
      <CmdLine>string</CmdLine>
   </TaskInfo>

[<CmdLine>string</CmdLine>]

 * if RunCount is more than 1, other Ids are generated automaticaly (++)
 * TaskInfo and SlaveInfo can appear more times if more slave types are acceptable
 * if From, To are not specified for input or output file, the file is
   used in each run, first run is #1
 * if Constant="Yes" is specified for an input file, it will be taken from the path
   specified otherwise a suffix ".<run#>" is appended for each task run
 * if Constant="Yes" is specified for an output file, it will be saved(copied) to
   file name specified, otherwise an aditional suffix ".<run#>" will be appended
   for each task run. Constant="Yes" makes sense for output files only if RunCount is 1,
   otherwise the output files from previous runs will be overwritten
 * if FilesURL is specified, it should point to a data web server which accepts
   upload and download requests for Q²ADPZ data files. Otherwise default is used.
 * in addition to all these files, a special file Q²ADPZ.task is send to the
   slave tmp_dir. This file has the following structure:
   run# #input_files #output_files max_runtime_sec
   list of all input files (in the order specified in this config file, 1 file per line)
   list of all output files (same order, 1 file per line)
 * DataPathPrefix can be used for convenience - then it is appended in front
   of all file names. The file names can contain paths as well
 * The CmdLine string can contain multiple occurences of substring '#run#' which
   is replaced by run number
 * even though QADPZ supports reserveIDs with the same names, the ReserveNames
   used within one config file should be unique
 * in case of Executable type, URL in task info should contain link to the
   generic library, and an extra tag <Executable> as part of TaskInfo should
   contain URL to executable
 * <Utility> can specify a local program (with optional full path)
   to be started after task is finished. The specified string can contain full commandline
   including arguments, where all occurences of #run# are replaced with the run number
   of the task that just finished; Optional From and To specify interval of run
   numbers - the utility will be started only for those tasks.
   More than one utilities can be specified for each task.
   Utilities are started with DataPathPrefix as the current directory (if specified).
 * note: it makes no sence to make Reservation for serial set of tasks. These
   entries might be ignored by the interactive editor of qadpz_run.
 * note: the comments are deleted by qadpz_run interactive editor!
Type="Library" specifics:
 * the CDATA input for library can be specified in files [DataPathPrefix]CDataInputFile.<run#>
 * the result sent back from a library is saved into [DataPathPrefix]CDataOutputFile.<run#>


Following is an example with 3 runs of simple executable

TODO: provide example

6.3 Writing your own client application (advanced mode)

TODO: how to write client: make it compatible with current interface ClientServ - client service library (see also the file: ~/src/client/ClientServ.h)

User interface functions:

```
 JobId *q2adpz_cli_client_on (const char *user,
                             const char *passwd,
                             const char *jobName,
                             const int jobNumber = -1);
```
- called when the client starts (either starts a new job or takes back control of previously abandoned job)
- jobName specifies the name of the job that this client will handle.
- NEW JOB: (default case) jobNumber should be -1 (or unspecified); a request is sent to master and a number of a newly created job (unique only for this job name) is returned (name together with the number identifies this job in QADPZ (so called JobID).
- OLD JOB: (advanced case - used when client is turned off while slaves compute the tasks - client goes down either deliberately, or because of loss of network connection or some other error): the jobNumber should be specified, the client is trying to take control of previously abandoned job. the same jobnumber is returned if the operation is successful. if this job is already being controlled by some other client at the moment, the call fails and JOB_ALREADY_CONTROLLED is returned. However, if the method returned JOB_ALREADY_CONTROLLED and the method is immediatelly called again for the second time with the same jobName and jobNumber, the client will be allowed to take control over this job and the other client that previously controlled this job loses its control.
- this method can be called again for a different job, but only after the previous job is finished, stopped with jobCtrl(jobID, stop_job), or clientOff() was called.
- returns a new JobId structure, which should not be deallocated or modified by application - it is maintained by the Client
- if there is an error, NULL is returned.
```
 int q2adpz_cli_client_off (int terminate_job = 1);
```
- called when the client is stopped, or when it wants to abandon its current job;
- must be called before exiting, if clientOn() was called
- returns 0 if the client was not ON, or if an error occured,
- otherwise returns 1
```
 JobStatus *q2adpz_cli_job_ctrl (const JobId &jobID,
                               const job_ctrl_action action);
```
- sends a job control command to the master
- action can be: get_job_status, stop_job, (see M_JOB_CTRL for details)
- returns new JobStatus structure, which is received from master; this structure should be deallocated by the application when not needed anymore
- get_job_status can be called also without calling clientOn first
- in case of error, returns JobStatus(job_refused)
```
 TaskStatus *q2adpz_cli_task_create (const TaskId &taskID,
                              List &listTaskInfo,
                              const char *data,
                              const Address *slaveAddress = NULL);
```
- listTaskInfo is an array of TaskInfo structures describing the slave libraries and requirements for executing this task
- data is anything '\0'-terminated that fits into CDATA XML element. If application needs to send other data, it has to establish a direct connection with the slave. Use NULL if no data should be used.
- waits for an answer from the master
- taskId should contain a new valid TaskId with a valid JobId previously retuned by clientOn()
- if this is task has been reserved previously, use slaveAddress returned in SlaveInfo structure of the corresponding SlaveAvail message otherwise set slaveAddress to NULL (or leave NULL by default)
- returns a new TaskStatus structure, which should be deallocated by the application when not needed anymore
- in case of error, returns TaskStatus(task_refused)

 TaskStatus *q2adpz_cli_task_create (const TaskId &taskID,
                              XMLData *listTaskInfo,
                              const char *data,
                              const Address *slaveAddress = NULL);

- same as the taskCreate above, but list of TaskInfo structures is passed as pointer to linked list of XMLData objects

```
 TaskStatus *q2adpz_cli_task_ctrl (const TaskId &taskID,
                                  const task_ctrl_action action,
                                  char *arg = NULL);
```
- sends a task control command to master
- action can be: stop_task, control_task
- in case of control_task action, the ctrl_argument might be non-NULL and specify the argument of this message
- waits for TaskStatus from the master
- returs new TaskStatus structure (application should deallocate it when no more needed
- in case of error, returns TaskStatus(task_refused)
```
 ReserveId *q2adpz_cli_slave_reserve (const char *reserveName,
                                   const int slavesRequired,
                                   const int parallel,
                                   const int nSlaveInfos,
                                   const SlaveInfo *slaveInfos);
```
- reserve a specified number of slaves (for a parallel job or not)
- slaveInfo is an array of infos about acceptable slaves (or just a single object) and nSlaveInfos should contain the size of this array.
- wait for a reserveId from the master
- returns reserveId, it should be deallocated when not needed anymore by application
- in case of error, returns NULL

 ReserveId *q2adpz_cli_slave_reserve (const char *reserveName,
                                   const int slavesRequired,
                                   const int parallel,
                                   const int nSlaveInfos,
                                   XMLData *slaveInfos);

- same as slaveReserve() above, but list of slave infos is passed as pointer to linked list of XMLData object

```
 wait_master_result q2adpz_cli_wait_master (Object *&result,
                                    long timeout = 0);
```
- wait a specified amount of time (in milliseconds) for an event associated with the job that this client handles.
- if timeout is unspecified (or set to 0), the call will block infinitely
- The following events terminate this call:
* wait_timeout no event occured within the specified time
* task_finish some task has finished, a pointer to a new TaskFinish structure is returned in result
* task_status some task has changed its status, a pointer to a new TaskStatus structure is returned in result
* slave_avail there is a new slave available and the application can submit new task. a pointer to a new SlaveAvail structure is returned in result
* in all cases, the application is responsible to deallocate the structure *result when it is no more needed. the event type is returned. in case of error waitmaster_error is returned

 int q2adpz_cli_put_data (char *fileName, int taskId,
                                    char *url_upload = NULL);

- uploads given file to a QADPZ data server at given address, if NULL, then default data server is used

 int q2adpz_cli_get_data (char *fileName, int taskId,
                                    char *url_download = NULL);

- downloads given file from a QADPZ data server at given address, if NULL, default data server is used

```
 const char *state2str (task_state state);
```
- returns string representation of task state
```
 const char *state2str (job_state state);
```
- returns string representation of job state

6.4 Writing your own slave library (all modes)

TODO: how to write slave, make sure it is compatible with current interface:

SlaveServ - slave service library

(see also the file: ~/src/slave/SlaveServ.h)

User interface functions:

 int taskExec (char *data, char *datares); // Unix

 int taskExec (char *data, char *datares,  // Win32
               int  (*q2adpz_slv_task_status) (task_state stat, char *err),
               void (*q2adpz_slv_setcb_task_stop) (void (*cb) (void)),
               void (*q2adpz_slv_setcb_task_ctrl) (void (*cb) (const char *arg))

- this is the entry function of the dynamic library provided by the user
- data represents the data for initializing the task
- datares represents the result data from the task execution
- both data and datares are text strings of MAX_DATA size, are allocated and deallocated by the slave service; they should not be deleted inside the user library

```
 void q2adpz_slv_setcb_task_stop (void (*cb) (void)); 
```
- sets the callback function for a task stop message
- each time a task stop message is received from the master, the cb() callback function is called by the service
```
void q2adpz_slv_setcb_task_ctrl (void (*cb) (const char *arg));
```
- sets the callback function for a task ctrl message
- each time a task ctrl message is received from the master, the cb() callback function is called by the service with the arg being the arguments passed with the control message
```
 int q2adpz_slv_task_status (task_state stat, char *err); 
```
- sends a task status message to the master
- this message is usualy sent be the user when a task ctrl message is successfuly executed
```
 int q2adpz_slv_put_data (char *fileName, char *url_upload); 
```
- uploads given file to a QADPZ data server at given address
- if url_upload=0 then the default data server is used
```
 int q2adpz_slv_get_data (char *fileName, char *url_download); 
```
- downloads given file from a QADPZ data server at given address
- if url_download=0 then the default data server is used

Example library source code: (see also the file: ~/ssample/dumb/SlaveDumb.cpp)

  #include "SlaveServ.h"
  #include 
  #include       // sprintf()

  // flags for the callbacks
  int isTaskCtrl = 0;  // set to 1 when task control is required
  int isTaskStop = 0;  // set to 1 when task stop is required

  // callback functions for notification from the slave service
  void taskCtrl (const char *arg)
  {
    isTaskCtrl = 1;
    DBUG_PRINT("info", ("taskCtrl called arg=%s", arg));
  }
  void taskStop ()
  {
    isTaskStop = 1;
    DBUG_PRINT("info", ("taskStop"));
  }

  // this is the exec loop on each task-thread
  extern "C" {

  int taskExec (char *data, char *datares
  #ifdef _WIN32
                 ,int  (*q2adpz_slv_task_status) (task_state stat, char *err),
                 void (*q2adpz_slv_setcb_task_stop) (void (*cb) (void)),
                 void (*q2adpz_slv_setcb_task_ctrl) (void (*cb) (const char *arg))
  #endif
                )
  {
    int isFinished = 0;

    DBUG_PRINT("info", ("task computation successfuly started"));

    // set callback functions
    q2adpz_slv_setcb_task_stop (taskStop);
    q2adpz_slv_setcb_task_ctrl (taskCtrl);

    DBUG_PRINT("info", ("input data '%s'", data));

    // start main task loop
    while (1) {

      //do some crunching of the data
      //SLEEP_SEC(1);
      sprintf (datares, "res=%d", 2 * atoi (data));
      isFinished = 1;

      //task needs to be stopped
      if (isTaskStop) {
        DBUG_PRINT("info", ("task stop successfuly executed."));
        break;
      }

      if (isTaskCtrl) {
        // check the arguments from the message
        // ...

        // send back results
        q2adpz_slv_task_status (task_ok, "task ctrl received ok.");
        DBUG_PRINT("info", ("task ctrl successfuly executed."));
      }

      //if crunching finished
      if (isFinished) {
        DBUG_PRINT("info", ("task computation successfuly finished."));
        break;
      }

    } // while

    return 0;
  }

  }  // extern "C"

6.5 Understading the internal communication protocol (hacker mode)

The communication between client, master and slave is in a XML-type language. The communication between application part of client and standard client library and between slave daemon/NT service and slave user program is done using C++ function calls (see previous sections).

Each XML document that is sent can contain multiple messages. It has to contain one UserInfo structure to authenticate the user. This is the structure of all XML documents that are sent:

6.5.1 Interfaces

This section describes the details of XML messages exchanged between system components. (you might want to view it from document with frames)

<Data>

  <!-- for security/accounting purposes -->
  <UserInfo>
    <User>string</User>
    <Pswd>string</Pswd>
  </UserInfo>

  <Message Type="string"> ... </Message>
</Data>

The following data structures are used in the messages (see bellow for message types):




    <!-- provides information about slave computer. Version contains
          an identifier of the slave service version. Version and Address
          appears when sent from slave. Software may appear multiple times
          and is defined by the slave config file (hopefully some autodetect
          routines downloaded by the slave service on startup which will
          determine the presence of software will exist later) -->

    <SlaveInfo>
      [<Version>string</Version>]
      <OS>string</OS>
      <CPU Speed=integer>string</CPU>
      <Memory Unit="MB">Integer</Memory>
      <Disk Unit="MB">Integer</Disk>
      [<Software version="string">string</Software>]
      [<Address>ip_address_string</Address>]
    </SlaveInfo>



    <!-- specifies a single task to be computed for particular platform.
            Memory and Disk contain the minimal requirements for
            this task for this platform. TimeOut specifies the time after
            which the task is killed if it didn't finish. Speed of CPU
            is optional. Software can appear arbitrary number of times.
            UserData can contain any data and is reserved for use by
            user application client and slave (for example the standard
            qadpz library for submitting executables sends the parameters
            here). The difference compared to CDATA in TaskInit is that
            UserData can be different in each TaskInfo, i.e. for each
            platform. -->

    <TaskInfo>
      <OS>string</OS>
      <CPU Speed=integer>string</CPU>
      <Memory Unit="MB">Integer</Memory>
      <Disk Unit="MB">Integer</Disk>
     [<Software version="string">string</Software>]
      <TimeOut>Integer</TimeOut>
      <!-- the following points either to DLL file or UNIX library  -->
      <URL>url_string</URL>
      <UserData>string</UserData>
    </TaskInfo>



    <!-- name is chosen by client, number is added by master so that no
            other job with the same name and this number is executed at
            this time -->

    <JobID>
      <Name>string</Name>
      <ID>Integer</ID>
    </JobID>



    <!-- ID is generated by client and is specific only within this job -->

    <TaskID>
      <JobID> ... </JobID>
      <ID>Integer</ID>
    </TaskID>



    <!-- name is generated by client and the number by master -->

    <ReserveID>
      <Name>string</Name>
      <ID>Integer</ID>
    </ReserveID>



    <!-- mainly for monitor purposes, and response to M_JOB_XXX back
            to client from master. Running means that the job is
            registered at Master and some client is taking care of it
            Abandoned means that the job is still registered, but the
            client went down, Stopped is a response to M_JOB_CTRL(Stop)
            and Refused means that the M_JOB_CTRL was refused for some
            reason (error message is included in Error of M_JOB_STATUS).
             -->

    <JobStatus>
     <Status>Running | Abandoned | Stopped | Refused </Status>
     <TasksRunning>Integer</TasksRunning>
     <TasksWaiting>Integer</TasksWaiting>
     <!-- Reservation appears as many times as is the number of
             reservations for this client/job if it is response to
             M_JOB_CTRL(GetStatus),
          ReserveID appears once if it is a response to M_SLAVE_RESERVE,
          otherwise neither Reservation nor ReserveID is included -->
     [<ReserveID> ... </ReserveID> |
      <Reservation>
        <ReserveID> ... </ReserveID>
        [</Parallel>]
        <SlavesRequired>Integer</SlavesRequired>
     </Reservation>]
    </JobStatus>

Client <=> standard client library (function calls)

Client <=> master (XML)

The following messages are recognized:

In the direction client to master




  <!-- controls jobs:
        - Stop will stop all tasks associated with this job and removes the
          job from master's agenda. It is nice if clients call this after
          they receive the result from the last task that belongs to this
          job so that master can free all resources used for accounting of
          this job.
        - StopAllName will ignore the job number in JobID and will stop all
          jobs with the same name
        - StopAllUser will stop all users of the user who sends the message
          -->

  <Message Type="M_JOB_CTRL">
    <JobID> ... </JobID>
    <Action>Stop | GetStatus | StopAllName | StopAllUser </Action>
    [<ReserveID> ... </ReserveID>]
  </Message>



  <!-- master finds appropriate client for starting this task; TaskInfo
          may appear more than once - one for each
          different platform that the task can run on. Address should
          be included if the slave was reserved. Master replies with
          M_TASK_STATUS after the task is started, or when it is not
          possible to start it. -->

  <Message Type="M_TASK_INIT">
    <TaskID> ... </TaskID>
    <TaskInfo> ... </TaskInfo>
    [ <![CDATA[...anything...]]> ]
    [<Address> ... </Address>]
  </Message>

    (see below)



  <!-- stops the task or sends a control message with optional argument -->

  <Message Type="M_TASK_CTRL">
    <TaskID> ... </TaskID>
    <Action>Stop | Control</Action>
    [<Argument>string</Argument>]
  </Message>

    (see below)



  <!-- The message is confirmed by master by a reply M_JOB_STATUS with
  		 one ReserveID
          asks master to notify the client about free slave. When a suitable
          client (according to SlaveInfo) becomes available, master sends
          M_SLAVE_AVAIL to the client that has controll over the specified
          job. This is done Number-times, i.e. M_SLAVE_AVAIL will be sent
          TotalSlaves times, each time immediatelly when the next slave is
          available. When more clients register, they are round-robin
          scheduled. When Parallel is specified, there is only one
          M_SLAVE_AVAIL sent to the client and it is sent only after Number
          of slaves are available at the same time. They are reserved for
          this client. SlaveInfo may appear more times, if more types of
          slaves are acceptable. -->

  <Message Type="M_SLAVE_RESERVE">
    <ReserveID> ... </ReserveID>
    <SlavesRequired>Integer</SlavesRequired>
    [<Parallel/>]
    <JobID> ... </JobID>
    <SlaveInfo> ... </SlaveInfo>
  </Message>



  <!-- Client sends this message to known master when it is started
          and when it quits. Client should specify the job which it is
          controlling so that master knows where to send M_TASK_FINISH.
          If Client is creating a new job, it should specify only the
          job name inside of the JobID structure and set the number
          to -1. Master will respond with M_JOB_STATUS, where the
          valid JobID will be returned (in this way there might be
          more jobs with the same name, they are distinguished by their
          number). JobID and Address should not be specified in case of
          ClientStatus Off.
           -->

  <Message Type="M_CLIENT_STATUS">
    [<JobID>...</JobID>]
    <ClientStatus>On | Off</ClientStatus>
    [<Address>ip_address_string</Address>]
  </Message>


  <!-- controls/configures the slave:
        - Upgrade  = "slave service" is changed on all slaves for which new executable is
             specified.
             New executables must be downloadable from a given URLs for
             each specific platform that the upgrade should be performed,
             i.e. the URL element can appear multiple times, once for
             each OS - CPU combination.
             If Immediate is specified, all currently known slaves
             that report version different that the one optionally specified
             in NewVersion are upgraded (if no NewVersion is specified,
             all of them are upgraded) immediatelly and their running tasks are stopped;
             If Immediate is not specified, those slaves that are busy will be
             upgraded when they become ready.
             If PermanentUpgrade is set to Start, master will keep upgrading
             all new slaves that will become ready later and report version different
             that the one specified in NewVersion (in this case NewVersion is compulsory).
             Master will save permanent upgrade info to its config file, so it will keep
             upgrading even after restart.
             PermanentUpgrade feature can be deativated by sending a simple
             M_SLAVE_CTRL(Upgrade,PermanentUpgrade(Stop)) without other subelements.
             Upgrade is accepted only from a qadpz_admin client (which is always started by
             a user 'admin').
             If PermanentUpgrade is not specified, it's activation is not altered
             (i.e. if it was on, it will remain on; it is off by default when the
             master starts and it was not explicitely turned on before).

  <Message Type="M_SLAVE_CTRL">
    <Action>Upgrade</Action>
    [<NewVersion>string</NewVersion>]
    [<Immediate/>]
    [<PermanentUpgrade>Start | Stop</PermanentUpgrade>]
    [<URL OS="string" CPU="string">string<URL>]
  </Message>

in the direction master to client:




  <!-- sent back to client as a reply to M_CLIENT_STATUS and M_JOB_CTRL.
          Will be also used by monitor. Error contains an explanation
          message in case of Refused JobStatus, otherwise it's not present.  -->

  <Message Type="M_JOB_STATUS">
    <JobID> ... </JobID>
    <JobStatus> ... </JobStatus>
    [<Error>string</Error>]
  </Message>



  <!-- sent back to client when M_SLAVE_RESERVE is active
          SlavesRequired contains the number of slaves assigned
          (if not parallel, this is always 1 -->

  <Message Type="M_SLAVE_AVAIL">
    <JobID> ... </JobID>
    <ReserveID> ... </ReserveID>
    <SlavesRequired>integer</SlavesRequired>
    <SlaveInfo> ... </SlaveInfo>
    <!-- SlaveInfo appears as many times as is the
         number of allocated computers -->
  </Message>



  <!-- sent back to client as a response to M_TASK_INIT, M_TASK_CTRL
       or when the status of the task changed;
       the message is also sent from the slave to the master (see below).
       * response to M_TASK_INIT:
        - Buffered - no slave available for the task, master will try later
        - Refused  - message was inconsistent with current system state
        - Started  - task started successfuly on one of the slaves
       * response to M_TASK_CTRL:
        - Refused  - message was inconsistent with current system state
        - Ok - ctrl command successful, Argument contains the result
       * task status change:
        - Crashed  - the slave user application crashed (master abandons task)
                   - the slave doesn't send status messages anymore
                   - the task is running for too long (timeout kill)
        - Stopped^ - the slave running the task was stopped (eg. login)
        - MoveStart - the task is being moved to another slave, it's buffered
        - MoveEnd   - the task was moved to another slave, client gets
                     a SlaveInfo of a new slave, in case of task moved
                     or finished when being stopped, M_TASK_STATUS(Stopped)
                     is not sent
       * Error is sent in case of Refused and Crashed status
       * SlaveInfo is sent in case of Started and Moved.
       ^ this state is sent only from master to slave  -->

  <Message Type="M_TASK_STATUS">
    <TaskID> ... </TaskID>
    <Status>Buffered | Refused | Started | Crashed | Stopped |
               Ok | MoveStart | MoveEnd</Status>
    [<Argument>string</Argument>]
	[<SlaveInfo> ... </SlaveInfo>]
    [<Error>string</Error>]
  </Message>




  <!-- sent when the task is finished, or when the slave becomes disabled
          and generates the M_TASK_FINISH message instead of M_TASK_MOVE.
          In that case, DATA should specify that the task was not finished. -->

  <Message Type="M_TASK_FINISH">
    <TaskID> ... </TaskID>
    <![CDATA...anything...]]>
  </Message>

    (see below)



  <!-- sent to a client/qadpz_admin as a reply to its M_SLAVE_CTRL message
       Ok - response to M_SLAVE_CTRL(Upgrade), the upgrade process was
            started at master.
       Refused - the upgrade was refused by master (if for example
                  the user is not admin); Error contains the description of the reason.


  <Message Type="M_SLAVE_STATUS">
    <Status>Ok | Refused</Status>
    [<Error>string</Error>
  </Message>

Master <=> slave (XML)

in the direction master to slave:




  <!-- starts the task -->

  <Message Type="M_TASK_INIT">
    <TaskID> ... </TaskID>
    <URL> ... </URL>
    <TimeOut>integer</TimeOut>
    <UserData>string</UserData>
    [<![CDATA...anything...]]>
  </Message>

    (see above)



  <!-- stops the task, or sends a control message to the task.
        in the second case, the optional argument might be specified -->

  <Message Type="M_TASK_CTRL">
    <Action>Stop | Control</Action>
    [<Argument>string</Argument>]
  </Message>

    (see above)



  <!-- controls/configures the slave:
        - Disable  = put the slave in "Disabled" state (i.e. not accepting
                     any tasks)
        - Enable   = put the slave back to ready state from disabled state
        - Shutdown = slave is forced to shutdown from the Master
        - Upgrade  = "slave service" on a Slave is changed (in this case
                     the URL specifies the new program); note that the
                     current task is stopped (if any) -->

  <Message Type="M_SLAVE_CTRL">
    <Action>Disable | Enable | Shutdown | Upgrade</Action>
    [<URL>string<URL>]
  </Message>

in the direction slave to master:




  <!-- sent to inform the client about the status of the slave
       Off - the slave computer is going to be turned off or the slave
             service program is going to be quit if the slave was busy
             and was going to save its partial results locally, this
             message is sent after the slave successfully saved its
             partial results.
       Ready - the slave service program is started, regularly announces
               that it is still alive or it has just finished a task
             - slave sends also SlaveInfo
       Busy - the slave accepted a task and is computing it, also regularly
              posted to master
       Disabled - the slave service is still running, but no slave user
                  program can be executed, most likely because somebody
                  loged into the slave computer. This message is sent after
                  the slave optionally saved its partial results locally
       Upgrade - response to slave service upgrade. If Error is specified,
                 it was unsuccessful.


  <Message Type="M_SLAVE_STATUS">
    <Status>Off | Ready | Busy | Disabled | Upgrade</Status>
    [<SlaveInfo> ... </SlaveInfo>]
    [<Error>string</Error>]
  </Message>



  <!-- sent from the slave to the mastes as a response to M_TASK_INIT,
       M_TASK_CTRL or when the status of the task changed;
       the message is also sent from the master to the client (see above). -->

  <Message Type="M_TASK_STATUS">
    <Status>Refused | Started | Crashed | Ok | Moved</Status>
    ...
  </Message>



  <!-- task is finished, data contain the result; sent also
      when the slave is forced to go down and decides to submit the
      partial results to master through this message -->

  <Message Type="M_TASK_FINISH">
    ...
  </Message>

    (see above)



  <!-- the task wants to be moved to another slave because
       this slave will become disabled; in this case the message
       is sent instead of M_SLAVE_STATUS(Disabled) -->

  <Message Type="M_TASK_MOVE">
    <TaskID> ... </TaskID>
    <![CDATA...anything...]]>
  </Message>

6.6 Participating in the Q²ADPZ development

//TODO: info for developers, CVS, etc.

7 Appendixes

7.1 Terminology

This section summarizes the most important terms used in the Q²ADPZ.

user - a real or fictive person that can use clients to start and control jobs and tasks; user name and password are used for authentication
administrator - a user with extended rights such as viewing logs, manually adjusting the system, etc. The developers are administrators
job - a collection of tasks that belong to the same user and have a common JobID. This allows the user to manipulate and monitor the computation at higher granularity than per-single-task. A client can create more than one job with the same name: these jobs are distinguished by different number (JobID consists of name and number, name is provided by client, number is automatically generated by master). The tasks that belong to each job can be generated on the fly as needed, the structure of the job doesn't have to be known in advance or when the job is started.
task - the smallest computational unit that is distributed to a single slave upon a request made by client to master (each task is requested separately). Each task must belong to some job. Task has an associated TaskID, which is unique only within its job. TaskID is generated by client.
user application - a distributed collaborative computing application written/provided by a user of the system. Consists of slave user program and client. Client is written using the standard client library.
standard client library - a library provided by developers that allows users to write client parts of their applications. This library provides function calls to communicate with master and slaves.
master computer - the computer where the master is running (most likely a dedicated computer with some UNIX/Linux)
slave computer - one of many computers where the distributed collaborative computation takes place, examples: a UNIX server, workstation, computer in a student PC LAB, etc.
client computer - any computer that the user uses to start his user application. It can be a notebook connected to the network using a dialup connection, a computer in the office, lab, etc.
master - process running at the master computer responsible for jobs-tasks-slaves accounting. The slaves talk to master when they join or leave the system, or receive or finish tasks; the clients talk to master when they start or control user jobs or tasks.
slave - process running at the slave computer as a daemon or WinNT-service, communicates with master (possibly also directly with a client), starts slave user process. Without slave running, the slave computer cannot take part in a collaborative computing.
slave user process - the actual program that is performing the computation, it is compiled as a library, and started as a separate thread/process by slave on a slave computer
client - process running on a client computer; communicates with master to start and control jobs and tasks of a specific user, may also communicate directly with slaves; is responsible for scheduling the tasks of user jobs as required by particular user application (beyond the scope of scheduling done at master)
monitor - program obtaining status information about the progress of computation of job(s) in order to display this information to the user; runs on any computer that can connect to the master.

7.2 Documentation for XMLData class

The constructors and methods of XMLData class can be used to construct, access, and modify the XML elements, their attributes and contents: values or subelements. It allows to input and output XML documents from/to strings and streams. The interface is designed to be easy to use and learn.

1. Short introduction to data that XMLData class represents
2. General notes about XMLData
3. Creating XMLData objects
4. Accessing element values
5. Accessing subelements and their values
6. Accessing element attributes
7. Stream and string input and output
8. Other methods

1. Short introduction to data that XMLData class represents.

Objects of class XMLData represent XML elements. These are in principle of two types (in simplified XML for the purpose of QADPZ):

elements that contain some value, such as:
```
      <Action>Stop</Action>
```
...here the string "Stop" is a value of element with tag Action.
elements that contain subelements, such as:
```
      <JobId>
        <Name>Dumb</Name>
        <Number>3</Number>
      </JobId>
```
...here the element JobID has two subelements: Name and Number.

There can be more subelements with the same tags. In addition, XML elements can have attributes, such as:

      <Message Type="M_JOB_INIT">
        <Name>Dumb</Name>
      </Message>

...here the element Message has one attribute with name "Type" and value "M_JOB_INIT". The quotation marks are mandatory around the value and forbidden around the attribute name. There can be more attributes for one element (but they must have different names), for example:

      <Employee Type="Researcher" Position="Temporary">
         <Name>John</Name>
      </Employee>

In addition, some elements have no subelements or value. They can be expressed in a simplified syntax, for example:

      <Greeting Type="Hello"/>
or
      <Parallel/>

Which is equivalent to:

      <Greeting Type="Hello">
      </Greeting>
or
      <Parallel>
      </Parallel>

XMLData class will always generate the simplified syntax when there are no subelements or value.

The XML documents can contain comments, which start with ''. In this version, comments are allowed to appear only before an element, not after it, and not before or after it's value content. For example:

      <JobID>
        <!--this is an OK comment-->
        <Name>Dumb</Name>
        <Number> <!--this is NOT OK comment--> 3 </Number>
        <!--this is also NOT OK comment-->
      </JobID>

Finally, it is obvious, that the element values may not contain the '<' character, which starts another subelement. To overcome this problem, a special type of element is defined, which can contain any string terminated by ']]>' - so it obviously cannot contain ']]>'. The element has CDATA tag and starts with '<![CDATA['. For example:

      <![CDATA[This string contains also '<' character]]>

XMLData class treats CDATA as any other element. It is distinguished by its tag CDATA.

2. General notes about XMLData

The XMLData is a universal class that holds a single element and its value or a list of subelements, which are again objects of class XMLData. Thus XMLData can be part of a list of elements and contains a pointer to the next XMLData in this list. The element attributes are stored as a linked list of XMLAttrib objects, which are just pairs of strings: name and value.

XML elements are created by calling the XMLData constructors. Constructors and methods that take pointers to other XMLData objects and pointers to XMLAttrib objects don't create copies (with the exception of the copy constructor) of the objects passed in their arguments. Instead, their arguments have form of pointers and these methods store directly objects pointed by the pointers passed in their arguments. That means you should always use new operator to create new copy. For example, when inserting part of one message into another message, a copy has to be created with the copy constructor (i.e. just by calling new XMLData(...), instead of only passing the pointer to the XMLData object that is part of another message). On the other hand, methods that return pointers to XMLData objects, return pointers to objects contained in the XMLData objects without creating a copy (if not specified otherwise), so the returned objects don't have to be deallocated after they are used, but they should not be modified (unless they are modified :-)).

Methods, which don't store the passed XMLData objects, (such as searching methods) should receive pointers to XMLData objects that are deallocated outside! (so no need to create a copy for these). See the method descriptions in header file for details.

XMLData and XMLAttrib clases don't operate on standard C strings. Instead, they use a specialized clas CharStr, which is used to hold both static and dynamic string. <YOU_CAN_SKIP_THIS> Reasons for this decision come from the poor memory-management situation of C++. XMLData has to hold strings that are allocated in a dynamic memory when read from a stream and at the same time it has to hold strings that live in static memory. In this way, the destructor has no idea what to do with a string - dellocate or forget? There are only two possibilities: First is to copy everything, thus there would be only dynamic strings. This is not a very nice option, because a lot of strings (such as "SlaveInfo", "Message", and a lot of contents of the elements) would appear in the memory in many copies without the actual need for this. The second possibility is the specialized class, which will remember the type of the string and count references to dynamic strings. </YOU_CAN_SKIP_THIS>.

Static strings are created this way:

#define p_s1 (&s1)
static CharStr s1("static string", STR_STATIC);

Dynamic strings are created this way:

CharStr *s2 = new CharStr("dynamic string");

To obtain a real string from CharStr, there is a public str member variable:

  cout << "s1: " << p_s1->str << ", s2: " << s2->str;

Thus there is only a little syntactic sugar to add compared to standard strings. Once a pointer to CharStr is passed to a method, that method takes care of it. That means, your program doesn't have to deallocate CharStr objects passed to XMLData methods. See the examples below. Those methods that don't store strings passed in their arguments take regular C strings.

For convenience, you can forget about CharStr when passing strings to all methods and constructors. Instead of creating a dynamic string, e.g. x->set(new CharStr("new contents for element x")); you can just call x->set("new contents..."); However, the methods that return strings, return CharStr, so you still have to append '->str' suffix.

NULL is not used in XMLData and XMLAttrib classes. Instead XMLData::Nil and XMLAttrib::Nil indicate that there are no subelements/attributes, or that there is no next subelement/attribute. As a result, using statements like data->sub("NonExistingSubcomponent")->getString(); will not cause the program to crash. Instead, this call will return CharStr::Error (unless there is a subcomponent called NonExistingSubcomponent).

A complementary material to this tutorial is the xmltest.cpp program, which tests the functionality of the XMLData class.

3. Creating XMLData objects

XMLData class provides several constructors, which create objects representing XML elements. Let's start with examples (see also below for a different method of creating the same XML elements; also note that instead of using dynamic strings, it is better to use static strings defined somewhere else, see src/common/messages.h, src/common/messages.cpp):

To create this Use this code fragment

<Parallel/> XMLData *parallel = new XMLData("Parallel", XMLData::Nil);

<Action>Stop</Action> XMLData *action = new XMLData("Action", "Stop");

<JobId> <Name>Dumb</Name> <Number>3</Number> </JobId>

XMLData *jobID = new XMLData("JobID", new XMLData("Name", "Dumb", new XMLData("Number", 3)));

<Message Type="M_JOB_INIT"> <Name>Dumb</Name> </Message>

XMLData *msg = new XMLData("Message", new XMLData("Name", "Dumb"), XMLData::Nil, new XMLAttrib("Type", "M_JOB_INIT"));

<Message Type="M_SLAVE_AVAIL"> <JobID> <Name>Dumb</Name> <Number>3</Number> </JobID> <Number>12<Number> <ReserveID>4015</ReserveID> <SlaveInfo> <OS>Win32</OS> <CPU>PIII/500</CPU> <Memory Unit="MB">50</Memory> <Disk Unit="MB">4</Disk> <IP>158.195.16.40</IP> </SlaveInfo> </Message>

XMLData *msg = new XMLData("Message", new XMLData("JobID", new XMLData("Name", "Dumb", new XMLData("Number", 3)), new XMLData("Number", 12, new XMLData("ReserveID", 4015, new XMLData("SlaveInfo", new XMLData("OS", "Win32", new XMLData("CPU", "PIII/500", new XMLData("Memory", 50, new XMLData("Disk", 4, new XMLData("IP", "158.195.16.40"), new XMLAttrib("Unit", "MB")), new XMLAttrib("Unit", "MB")))))))), XMLData::Nil, new XMLAttrib("Type", "M_SLAVE_AVAIL"));

Note: XMLData::Nil indicates that there are no more subelements. It is used either when the attributes argument is passed, or when creating a simple element without any value or subelements.

To create this	Use this code fragment
`<Parallel/>`	`XMLData *parallel = new XMLData("Parallel", XMLData::Nil);`
`<Action>Stop</Action>`	`XMLData *action = new XMLData("Action", "Stop");`
<JobId> <Name>Dumb</Name> <Number>3</Number> </JobId>	XMLData *jobID = new XMLData("JobID", new XMLData("Name", "Dumb", new XMLData("Number", 3)));
<Message Type="M_JOB_INIT"> <Name>Dumb</Name> </Message>	XMLData *msg = new XMLData("Message", new XMLData("Name", "Dumb"), XMLData::Nil, new XMLAttrib("Type", "M_JOB_INIT"));
<Message Type="M_SLAVE_AVAIL"> <JobID> <Name>Dumb</Name> <Number>3</Number> </JobID> <Number>12<Number> <ReserveID>4015</ReserveID> <SlaveInfo> <OS>Win32</OS> <CPU>PIII/500</CPU> <Memory Unit="MB">50</Memory> <Disk Unit="MB">4</Disk> <IP>158.195.16.40</IP> </SlaveInfo> </Message>	XMLData *msg = new XMLData("Message", new XMLData("JobID", new XMLData("Name", "Dumb", new XMLData("Number", 3)), new XMLData("Number", 12, new XMLData("ReserveID", 4015, new XMLData("SlaveInfo", new XMLData("OS", "Win32", new XMLData("CPU", "PIII/500", new XMLData("Memory", 50, new XMLData("Disk", 4, new XMLData("IP", "158.195.16.40"), new XMLAttrib("Unit", "MB")), new XMLAttrib("Unit", "MB")))))))), XMLData::Nil, new XMLAttrib("Type", "M_SLAVE_AVAIL"));

The following constructors with some optional arguments can be used:

  //constructs element with given string content and attributes
  //(e.g. John)
  XMLData(CharStr *tag, CharStr *strValue, XMLData *next = XMLData::Nil, XMLAttrib *attributes = XMLAttrib::Nil);
  //constructs element with given double content and attributes
  //(e.g. 14.5)
  XMLData(CharStr *tag, double doubleValue, XMLData *next = XMLData::Nil, XMLAttrib *attributes = XMLAttrib::Nil);
  //constructs element with given attributes containing subelements
  //(e.g. ......)
  XMLData(CharStr *tag, XMLData *subElements, XMLData *next = XMLData::Nil, XMLAttrib *attributes = XMLAttrib::Nil);

  //sort of copy constructor  - copies substructures and also all elements that follow
  XMLData(XMLData *other);

tag is a string containing the label of the element
strValue or doubleValue specify the value for elements with value
alternately, subElements can contain a list of subelements of this element (i.e. pointer to the first element in this list)
optional next is a pointer to the element that follows after this element or XMLData::Nil, if it is a single element or last element in the list

optional attributes argument can provide a list of attributes constructed with one of the constructors for XMLAttrib class:

     //constructs attribute with string value, string should not
     //contain quotation marks
     XMLAttrib(CharStr *name, CharStr *strValue, XMLAttrib *next = XMLAttrib::Nil);
     //constructs attribute with double value (converts it internally to string)
     XMLAttrib(CharStr *name, double doubleValue, XMLAttrib *next = XMLAttrib::Nil);
     //sort of copy contstructor
     XMLAttrib(XMLAttrib *other);

Finally, one can use the constructor that creates XMLData object directly from a string, such as:

     XMLData *action = new XMLData("<Action>Stop</Action>");

However, this is a little bit less efficient, because the string has to be parsed. See also 7. Stream and string input and output.

It can be useful to see if the constructor really constructed what it should just by printing out the XMLData object:

XMLData *msg = new XMLData("Message", new XMLData("Name", "Dumb"),
                                      XMLData::Nil,
                                      new XMLAttrib("Type", "M_JOB_INIT"));

cout << *msg;

Remember to deallocatte the objects you create with delete, such as:

XMLData *jobID = new XMLData("JobID",
                                new XMLData("Name", "Dumb",
                                new XMLData("Number", 3)));
// ...use the jobID here...

delete jobID;

Another way how to create larger XML elements is using add(), sub(), set(), and setAttrib() methods (these methods are described below). The last element from the above table of examples can be created with this sequence:

	XMLData *msg = new XMLData("Message", XMLData::Nil);
	msg->setAttrib("Type", "M_SLAVE_AVAIL");
	msg->add(new XMLData("JobID", XMLData::Nil));
	msg->sub()->add(new XMLData("Name", "Dumb"));
	msg->sub()->add(new XMLData("Number", 3));
	msg->add(new XMLData("Number", 12));
	msg->add(new XMLData("ReserveID", 4015));
	XMLData *slaveInfo = new XMLData("SlaveInfo", XMLData::Nil);
	slaveInfo->add(new XMLData("OS", "Win32"));
	slaveInfo->add(new XMLData("CPU", "PIII/550"));
	slaveInfo->add(new XMLData("Memory", 50, XMLData::Nil, new XMLAttrib("Unit", "MB")));
	slaveInfo->add(new XMLData("Disk", 4, XMLData::Nil, new XMLAttrib("Unit", "MB")));
	slaveInfo->add(new XMLData("IP", "158.195.16.40"));
	msg->add(slaveInfo);

4. Accessing element values

The following methods return the value (contents) of the XMLData object:

  CharStr *getString();
  long getLong();
  double getDouble();

The following example shows the use:

  XMLData *action = new XMLData("Action", "Stop");    // constructs Stop
  cout << action->getString()->str << '\n';  // prints the string "Stop" without q.marks

  XMLData *num = new XMLData("Number", 3);
  cout << num->getLong() << '\n';            // prints 3

It is possible to modify the value of the element with the following methods:

  void set(CharStr *newContents);
  void set(double newContents);

For example:

  // construct <Action/>
  action = new XMLData("Action", XMLData::Nil);

  // modify it to <Action>Stop
  action->set("Stop");

  // modify it to <Action>4
  action->set(4);

5. Accessing subelements and their values

Subelements can be accessed either directly or with help of subelement iterator. Subelement iterator is a pointer associated with every XMLData object. It always points to one of the subelements (if there are any). It is initialized to point to the first subelement.

First of all similar to previous set() methods, there is a method that sets the subelements contents of the XMLData object:

  //sets the subelementlist of this element to a given list
  void set(XMLData *newContents);

The newContents argument can either specify a single XMLData object, or a linked list of XMLData objects.

To retrieve the subelements contents of some element, you can call method sub() with no arguments, which returns the subelement where the iterator is pointing. Since the iterator is pointing to the first element before you use it, sub() returns the pointer to the list of all subelements. If you are not sure whether the iterator is at the beginning, simply call reset() first (see below):

  //returns the subelement pointed by the iterator. If there are no elements, returns
  //XMLData::Nil
  XMLData *sub();

You might need to obtain the number of subelements contained in an XML element:

  //returns the number of subelements
  int subCount();

To retrieve a single subelement structure, the follwing method is recommended:

  //retrieves the first (or (skip+1)-th) subelement with a given tag
  //if the tag is the same as in the last call to sub() (please note that the pointer
  //has to be the same, not just strings equal), the method will start searching from
  //the element that follows the current iterator location,
  //otherwise it will start from the first subelement.
  //if the iterator is recursive, this will search recursively
  //the iterator will point to the located element.
  //If no element is found, XMLData::Nil is returned and iterator position is not changed.
  XMLData *sub(char *tag, int skip = 0);

For example if msg points to an XMLData object containing the following structure:

<JobId>
  <Name>Dumb</Name>
  <Number>3</Number>
</JobId>

The Number subelement can be retrieved and its contents printed with the following call:

 cout << msg->sub("Number")->getLong();

Similarly, it is possible to modify the value of some subelement:

 msg->sub("Number")->set(5);  // will change the value of Number subelement to 5

If there are several subelements with the same tag, the skip argument can be used. For example for the following XML element:

<TaskInfo>
  <Library>
    <OS>Win32</OS>
    <URL>http://www.microsoft.com/qadpz.exe</URL>
  </Library>
  <Library>
    <OS>Linux</OS>
    <URL>http://www.linux.org/qadpz.so</URL>
  </Library>
</TaskInfo>

We can obtain the second library subelement like this:

 cout << msg->sub("Library", 1)->sub("OS")->getString()->str; // will print "Win32"

We can also obtain directly the second URL subelement, if we search recursively. For that, the iterator has to be reset as recursive:

  msg->reset(0, IT_RECURSIVE);
  cout << msg->sub("URL")->str;

To add a new subelement, the following method can be used:

  //adds given subelement list at the end of the subelement list of this element
  void add(XMLData *newContents);

For example, it is possible to add new library to the previous example structure like this:

 msg->add(new XMLData("Library",
                        new XMLData("OS", "Solaris",
                        new XMLData("URL", "http://www.sun.com/qadpz.so"))));

To remove some subelement, use the following method:

  //removes single subelement specified by its tag, the search starts at
  //the first subelement of this element. If the iterator is recursive,
  //the search is recursive
  //iterator is reinitialized to point at the first element before returning
  void remove(char *tag);

For example to remove the Win32 library from the above example (if we know that it is the first library subelement), we can use the following:

  msg->remove("library");

It is possible to search for subelements with specifying more details. For example, this will locate a Linux library in the above example:

  XMLData *searchLinux = new XMLData("Library", new XMLData("OS", "Linux"));
  XMLData *linuxLib = msg->sub(searchLinux);
  delete searchLinux;

  cout << linuxLib->sub("OS")->str;  // will print "Linux"
  cout << linuxLib->sub("URL")->str; // will print "http://www.linux.org/qadpz.so"

Similarly, it is possible to delete a subelement structure with specifying more subdetails:

  XMLData *searchWin32 = new XMLData("Library", new XMLData("OS", "Win32"));
  msg->remove(searchWin32);		// will delete the whole Win32 library subelement
  delete searchWin32;

The following variants of methods were used:

  //advanced version of sub()
  //retrieves the first (or (skip+1)-th) subelement which matches a subelement described by
  //the match argument. Match argument may describe one subelement with possible attributes and
  //subelements embedded inside of this element. The located subelement will contain all the
  //specified attributes with specified values and will contain all specified subelements,
  //sub-subelements, etc. but may contain other subelements, which are not specified in the
  //match structure. However, the order in which the subelements appear in match must be the same.
  //If you want exact match, set the exact arguement to 1.
  //If the contents contain numbers, they have to be in the same format to match.
  //if the iterator is recursive, this will search recursively
  //the iterator will point to the located element.
  //If no element is found, XMLData::Nil is returned and iterator position is not changed.
  //the match argument will not be dealocated, so make sure you dealocate it yourself
  //(i.e. in this case don't use new XMLData(...) just as an argument)
  //this starts to search from the first subelement, unless the match is the same pointer
  //as when the sub() with match was called last time. In the later case, the search starts
  //with the element that follows after the iterator.  
  XMLData *sub(XMLData *match, int skip = 0, int exact = 0);

  //advanced version of remove, see description of advanced sub()
  void remove(XMLData *match, int exact = 0);

It is also possible to retrieve or remove all subelements of certain type:

  //retrieves all elements with a given tag, returns a newly constructed element list - completely copied
  XMLData *subAll(char *tag);

  //removes all subelements with this tag
  void removeAll(char *tag);

For some operations, it might be necessary to iterate through subelements with subelement iterator. The following methods provide this functionality:

  //sets the iterator to point to the first subelement or subelement with a given index
  //if recursive is set to 1, the iterator will recursively enter all subelements of the
  //elements in the subsequent calls to other methods
  //However, the index is relative to the list of subelements on the top level in this call.
  //Note: reseting the iterator will reset the iterator of all subcomponents.
  //reset returns this so that it can be called in a sequence of commands, such as:
  //x->reset()->sub("SlaveInfo");
  XMLData *reset(int index=0, int recursive = 0);

  //moves the subelement iterator to the next subelement and returns a pointer to it. If iterator
  //is recursive, the recursive step is performed. Returns XMLData::Nil if there are no more elements.
  XMLData *next();

  //returns non-zero, if the current subelement is not the last one (i.e. the next call to next()
  //would return XMLData::Nil). Doesn't move the iterator.
  int more();

Then it is possible to insert a new subelement at a certain location with the help of the following method:

  //inserts subelement(s) after the subelement pointed by subelement iterator
  //if the iterator is recursive, the subelement(s) is(are) inserted at the current level
  //of the iterator
  //if there are no subelements, the new subelement(s) is(are) simply added.
  //the argument will become part of this object and will be dealocated when the object is
  //destroyed, use new XMLData(...) as an argument.
  //if you specify the second argument ontop as nonzero (e.g. INSERT_TOP), the new element
  // will be inserted as the first subelement and iterator will not be used.
  void insert(XMLData *subElement);

For example, if we want to insert the <Parallel/> and GroupKillTimeout subelements into the following structure, we can do it this way:

  <Message Type="M_SLAVE_RESERVE">
    <Number>15</Number>
    <JobID> ... </JobID>
    <SlaveInfo> ... </SlaveInfo>
  </Message>

  // assume msg already contains pointer to XMLData object with above element

  msg->sub("Number");
  msg->insert(new XMLData("Parallel", XMLData::Nil,
              new XMLData("GroupKillTimeout", 120)));

  // the resulting structure will be as follows:

  <Message Type="M_SLAVE_RESERVE">
    <Number>15</Number>
    <Parallel/>
    <GroupKillTimeout>120</GroupKillTimeout>
    <JobID> ... </JobID>
    <SlaveInfo> ... </SlaveInfo>
  </Message>

Not all the possible uses of the methods were covered here, for more details, consult the xmldata.h file.

6. Accessing element attributes

The element attributes can be accessed either directly or using an attribute iterator. The number of attributes is returned by attribCount():

  //returns the number of attributes of this element
  int attribCount();

To retrieve a single attribute of an element, use the following method:

   //retrieves the value of a given attribute,
  //If such attribute doesn't exist, XMLAttrib::Error is returned
  CharStr *getAttrib(char *attrName);

For example this call will retrive the type of a message:

/*  assume msg points to the following message:

  <Message Type="M_SLAVE_RESERVE">
    <Number>15</Number>
    <JobID> ... </JobID>
    <SlaveInfo> ... </SlaveInfo>
  </Message>
*/

  cout << msg->getAttrib("Type")->str;   // will print "M_SLAVE_RESERVE"

To set the value of an attribute, one of the following methods should be used:

  //sets value of single attribute, if such attribute doesn't exist, it is created,
  //and the optional index can be either -1 (default),
  //then the attribute is appended, if index is 0, it is inserted as the first one,
  //otherwise it is inserted after the attribute on index-th position.
  //If index is larger than the number of attributes, the attribute is just appended
  //If attribute with attrName already exists, index is ignored, attribute is set
  // and the attrName CharStr object is deallocated!
  void setAttrib(CharStr *attrName, CharStr *attrValue, int index = -1);

  //for convenience, this takes numeric value and converts it to string
  void setAttrib(CharStr *attrName, double attrValue, int index = -1);

For example to set the type of the message, the following can be used:

  msg->setAttrib("Type", "M_JOB_INIT");

It is possible to remove attribute(s) from the element with the following methods:

  //removes single attribute specified by its name
  void removeAttrib(char *attrName);

  //removes all attributes
  void removeAttribs();

7. Stream and string input and output

It is possible to construct an XMLData object by reading the element or list of elements from an input stream either by calling constructor:

  //reads ONE element ... from the input stream. If the element has subelements,
  //all the subelement objects are constructed and dynamically allocated.
  XMLData(istream &s);

or using an overloaded >> operator.

Similarly, it is possible to print the XMLData object to output stream either with an overloaded << operator or using the method:

  //outputs the text representation of this component and following
  //elements to the output stream you can specify indent, if you don't
  // want to print from the first column
  //set following to 0 in order to disable printing of the following
  //elements. By default, the whole list is printed.
  void print(ostream &s, int indent = 0, int following = 1);

Analogically, the input can be taken directly from a string:

  //constructs the objects from string the same way as if read from a stream
  XMLData(char *input);

or printed to a string:

  //outputs the text representation of this component and following elements as string
  void print(char *buffer, int maxlen, int indent = 0, int following = 1);

8. Other methods

It is also possible to access or modify the tag of the element:

   //returns element type
  // (returns CharStr::Error if called on XMLData::Nil)  
  CharStr *tag();

  //change/set element type
  void setTag(CharStr *tag);

7.3 Documentation for PostOffice class

The PostOffice class provides a mechanism for bidirectional communication of processes running on remote (or possibly also local) machines that are interconnected with a network running TCP/IP protocol (such as machines connected to the Internet). The communication consists of sending and receiving XMLData objects. A single PostOffice object can be used by several threads of a process owning the PostOffice object at the same time.

To use the PostOffice, the object has to be constructed. It is possible to specify the socket port at which the messages will be received:

    // construct PostOffice, incomming packets will be expected
    // at specified port, 0 means use default port
    // n specifies how many ports should be used for sending/receiving
    // the ports are used in a random manner. The application can
    // get the port numbers by calling getlocalPort()
    PostOffice(int port=0, int n = 1, int incremental = UDP_PORT_ITERATE);

On the sending side, a thread can send an XMLData object to a remote process by calling the following method:

    //sends XMLData to a remote, returns 1 only after the remote
    //successfully received and confirmed the data, returns 0, if
    //there was an error while sending.
    //(there is a small chance in case of network congestion
    // that the peer received the data but the confirmation
    // did not come back)
    //if remote is not specified, the default remote is used
    int send(XMLData *data, Address *remote = NULL);

Where data is the object to send and remote is the address (IP address and port) of the remote post office. This method returns only after a receipt confirmation from receiver has been returned. If the confirmation is not received after several retry attempts with timeouts, the method returns 0.

On the receiving side, a thread can receive message in two ways: blocking and non-blocking. The blocking version of receive will block the calling thread until a new message is received. The non-blocking version of receive will return immediatelly regardless whether the message was received. All received messages are queued in the PostOffice until some thread will collect them with some receive call. In addition the calling thread can either request a message from particular Address, or it can receive any message.

    //receives the next XMLData received from a remote. Returns
    //newly constructed object. If there is no data in the queue,
    //waits until some data arrives.
    //if remote is not specified, the default remote is used
    XMLData *receive(Address *remote = NULL);

    //non-blocking version of receive - returns XMLData::Nil if there
    //is no XMLData from remote, in the queue. Otherwise returns a new
    //XMLData with the data from a remote which are on top of the queue.
    //if remote is not specified, the default remote is used
    XMLData *receiveN(Address *remote = NULL);


    //receives the next XMLData object from any remote and stores
    //the source address to *remote. If there is no data in the queue,
    //waits until some data arrives.
    XMLData *receive_any(Address &remote);

    //non-blocking version of receive_any - returns XMLData::Nil, if
    //there is no XMLData in the queue. Otherwise returns the first
    //XMLData object from the top of the queue. Please note that
    //the order of messages doesn't have to be the same
    XMLData *receive_anyN(Address &remote);

Finally, if the PostOffice is used to communicate with only single remote process (or at least most of the time), it is possible to set a default Address by calling method:

    //sets the notoriously used remote address.
    int setRemote (Address &remote);

Then the remote argument to send(), receive() and receiveN() may be omitted and the default remote address will be used.

The following is an example application that uses PostOffice to send and receive XMLData objects:

pofficetest.cpp

7.4 Documentation for Crypter class

//TODO: document crypter

7.5 Credits and licensing issues

//TODO: GPL

src/	the source code for all the components, contains `Makefile.base` and the main `Makefile`.
bin/	the compiled binaries are placed here; this directory contains also sample project files (*.xml), public and private keys (`pubkey`, `privkey`) and crypted list of users `users.txt`.
doc/	all Q²ADPZ documentation
sample/	sources of sample applications (dumb - library-type application; simple - executable/interpreted type of application)
include/	include files from curl and openssl which are required for compilation; please set the `CFLAGS, CFLAGS_CURL, CFLAGS_OPENSSL` variables in `Makefile.base` based on your installation of the libraries

Q2ADPZ User and Developer Manual

Contents

1 About Q2ADPZ