summaryrefslogtreecommitdiffstats
path: root/doc/architecture.txt
blob: ef2fe3830e7710099c46da4ae0fdaf92eaeef5f2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
.. -*- mode: rst -*-

.. _architecture:

===========================
Detailed Bcfg2 Architecture
===========================

Bcfg2 is based on a client-server architecture. The client is responsible
for interpreting (but not processing) the configuration served by the
server. This configuration is literal, so no local process is required.
After completion of the configuration process, the client uploads a set
of statistics to the server. This section will describe the goals and
then the architecture motivated by it.

Goals
=====

* **Model configurations using declarative semantics.**

  Declarative semantics maximize the utility of configuration management
  tools; they provide the most flexibility for the tool to determine
  the right course of action in any given situation. This means that
  users can focus on the task of describing the desired configuration,
  while leaving the task of transitioning clients states to the tool.

* **Configuration descriptions should be comprehensive.**

  This means that configurations served to the client should be sufficient
  to reproduce all desired functionality. This assumption allows the
  use of heuristics to detect extra configuration, aiding in reliable,
  comprehensive configuration definitions.

* **Provide a flexible approach to user interactions.**

  Most configuration management systems take a rigid approach to user
  interactions; that is, either the client system is always correct,
  or the central system is. This means that users are forced into an
  overly proscribed model where the system asserts where correct data
  is. Configuration data modification is frequently undertaken on both
  the configuration server and clients. Hence, the existence of a single
  canonical data location can easily pose a problem during normal tool
  use. Bcfg2 takes a different approach.

The default assumption is that data on the server is correct, however,
the client has the option to run in another mode where local changes are
catalogued for server-side integration. If the Bcfg2 client is run in dry
run mode, it can help to reconcile differences between current client
state and the configuration described on the server. The Bcfg2 client
also searches for extra configuration; that is, configuration that is
not specified by the configuration description. When extra configuration
is found, either configuration has been removed from the configuration
description on the server, or manual configuration has occurred on the
client. Options related to two-way verification and removal are useful
for configuration reconciliation when interactive access is used.

* Plugins and administrative applications.
* Incremental operations.

The Bcfg2 Client
================

The Bcfg2 client performs all client configuration or reconfiguration
operations. It renders a declarative configuration specification, provided
by the Bcfg2 server, into a set of configuration operations which will,
if executed, attempt to change the client's state into that described by
the configuration specification. Conceptually, the Bcfg2 client serves to
isolate the Bcfg2 server and specification from the imperative operations
required to implement configuration changes.

This isolation allows declarative specifications to be manipulated
symbolically on the server, without needing to understand the properties
of the underlying system tools. In this way, the  Bcfg2 client acts
as a sort of expert system that *knows* how to implement declarative
configuration changes.

The operation of the Bcfg2 client is intended to be as simple as
possible. The normal configuration process consists of four main steps:

* **Probe Execution**

  During the probe execution stage, the client connects to the server
  and downloads a series of probes to execute. These probes reveal
  local facts to the Bcfg2 server. For example, a probe could discover
  the type of video card in a system. The Bcfg2 client returns this
  data to the server, where it can influence the client configuration
  generation process.

* **Configuration Download and Inventory**

  The Bcfg2 client now downloads a configuration specification from the
  Bcfg2 server. The configuration describes the complete target state
  of the machine. That is, all aspects of client configuration should
  be represented in this specification. For example, all software
  packages and services should be represented in the configuration
  specification. The client now performs a local system inventory.
  This process consists of verifying each entry present in the
  configuration specification. After this check is completed, heuristic
  checks are executed for configuration not included in the configuration
  specification. We refer to this inventory process as 2-way validation,
  as first we verify that the client contains all configuration that
  is included in the specification, then we check if the client has
  any extra configuration that isn't present. This provides a fairly
  rigorous notion of client configuration congruence. Once the 2-way
  verification process has been performed, the client has built a list of
  all configuration entries that are out of spec. This list has two parts:
  specified configuration that is incorrect (or missing) and unspecified
  configuration that should be removed.

* **Configuration Update**

  The client now attempts to update its configuration to match the
  specification. Depending on options, changes may not (or only partially)
  be performed. First, if extra configuration correction is enabled,
  extra configuration can be removed. Then the remaining changes
  are processed. The Bcfg2 client loops while progress is made in the
  correction of these incorrect configuration entries. This loop results
  in the client being able to accomplish all it will be able to during
  one execution. Once all entries are fixed, or no progress is being
  made, the loop terminates. Once all configuration changes that can be
  performed have been, bundle dependencies are handled. Bundle groupings
  result in two different behaviors. Contained entries are assumed
  to be inter-dependent. To address this, the client re-verifies each
  entry in any bundle containing an updates configuration entry. Also,
  services contained in modified bundles are restarted.

* **Statistics Upload**

  Once the reconfiguration process has concluded, the client reports
  information back to the server about the actions it performed during the
  reconfiguration process. Statistics function as a detailed return code
  from the client. The server stores statistics information. Information
  included in this statistics update includes (but is not limited to):

  * Overall client status (clean/dirty)
  * List of modified configuration entries
  * List of uncorrectable configuration entries
  * List of unmanaged configuration entries

Architecture Abstraction
------------------------

The Bcfg2 client internally supports the administrative tools available
on different architectures. For example, ``rpm`` and ``apt-get`` are
both supported, allowing operation of Debian, Redhat, SUSE, and Mandriva
systems. The client toolset is determined based on the availability of
client tools.  The client includes a series of libraries which describe
how to interact with the system tools on a particular platform.

Three of the libraries exist. There is a base set of functions, which
contain definitions describing how to perform POSIX operations. Support
for configuration files, directories, symlinks, hardlinks, etc., are
included here. Two other libraries subclass this one, providing support
for Debian and rpm-based systems.

The Debian toolset includes support for apt-get and update-rc.d. These
tools provide the ability to install and remove packages, and to install
and remove services.

The Redhat toolset includes support for rpm and chkconfig. Any other
platform that uses these tools can also use this toolset. Hence, all
of the other familiar rpm-based distributions can use this toolset
without issue.

Other platforms can easily use the POSIX toolset, ignoring support for
packages or services. Alternatively, adding support for new toolsets
isn't difficult. Each toolset consists of about 125 lines of python code.

The Bcfg2 Server
================

The Bcfg2 server is responsible for taking a network description and
turning it into a series of configuration specifications for particular
clients. It also manages probed data and tracks statistics for clients.

The Bcfg2 server takes information from two sources when generating
client configuration specifications. The first is a pool of metadata that
describes clients as members of an aspect-based classing system. That is,
clients are defined in terms of aspects of their behavior. The other is
a file system repository that contains mappings from metadata to literal
configuration. These are combined to form the literal configuration
specifications for clients.

The Configuration Specification Construction Process
----------------------------------------------------

As we described in the previous section, the client connects to the server
to request a configuration specification. The server uses the client's
metadata and the file system repository to build a specification that
is tailored for the client. This process consists of the following steps:

* **Metadata Lookup**

  The server uses the client's IP address to initiate the metadata
  lookup. This initial metadata consists of a (profile, image) tuple. If
  the client already has metadata registered, then it is used. If not,
  then default values are used and stored for future use. This metadata
  tuple is expanded using some profile and class definitions also included
  in the metadata. The end result of this process is metadata consisting
  of hostname, profile, image, a list of classes, a list of attributes
  and a list of bundles.

* **Abstract Configuration Construction**

  Once the server has the client metadata, it is used to create
  an abstract configuration. An abstract configuration contains
  all of the configuration elements that will exist in the final
  specification **without** any specifics. All entries will be typed
  (i.e. the tagname will be one of Package, Path, Action, etc) and will
  include a name. These configuration entries are grouped into bundles,
  which document installation time interdependencies.

* **Configuration Binding**

  The abstract configuration determines the structure of the client
  configuration, however, it doesn't yet contain literal configuration
  information. After the abstract configuration is created, each
  configuration entry must be bound to a client-specific value. The Bcfg2
  server uses plugins to provide these client-specific bindings. The Bcfg2
  server core contains a dispatch table that describes which plugins can
  handle requests of a particular type. The responsible plugin is located
  for each entry. It is called, passing in the configuration entry and
  the client's metadata. The behavior of plugins is explicitly undefined,
  so as to allow maximum flexibility. The behaviours of the stock plugins
  are documented elsewhere in this manual. Once this binding process
  is completed, the server has a literal, client-specific configuration
  specification. This specification is complete and comprehensive; the
  client doesn't need to process it at all in order to use it. It also
  represents the totality of the configuration specified for the client.

The Literal Configuration Specification
=======================================

Literal configuration specifications are served to clients by the
Bcfg2 server. This is a differentiating factor for Bcfg2; all other
major configuration management systems use a non-literal configuration
specification. That is, the clients receive a symbolic configuration that
they process to implement target states. We took the literal approach
for a few reasons:

* A small list of configuration element types can be defined, each of
  which can have a set of defined semantics. This allows the server to
  have a well-formed model of client-side operations. Without a static
  lexicon with defined semantics, this isn't possible. This allows the
  server, for example, to record the update of a package as a coherent
  event.
* Literal configurations do not require client-side processing. Removing
  client-side processing reduces the critical footprint of the tool.
  That is, the Bcfg2 client (and the tools it calls) need to be
  functional, but the rest of the system can be in any state. Yet,
  the client will receive a correct configuration.
* Having static, defined element semantics also requires that all
  operations be defined and implemented in advance. The implementation
  can maximize reliability and robustness. In more ad-hoc setups, these
  operations aren't necessarily safely implemented.

The Structure of Specifications
-------------------------------

Configuration specifications contain some number of clauses. Two types
of clauses exist. Bundles are groups of inter-dependent configuration
entities. The purpose of bundles is to encode installation-time
dependencies such that all new configuration is properly activated
during reconfiguration operations. That is, if a daemon configuration
file is changed, its daemon should be restarted. Another example of
bundle usage is the reconfiguration of a software package. If a package
contains a default configuration file, but it gets overwritten by an
environment-specific one, then that updated configuration file should
survive package upgrade. The purpose of bundles is to describe services,
or reconfigured software packages. Independent clauses contain groups
of configuration entities that aren't related in any way. This provides a
convenient mechanism that can be used for bulk installations of software.

Each of these clauses contains some number of configuration entities. A
number of configuration entities exist including Path, Package, Service,
etc. Each of these correspond to the obvious system item. Configuration
specifications can get quite large; many systems have specifications
that top one megabyte in size. An example of one is included in an
appendix. These configurations can be written by hand, or generated by
the server.

Design Considerations
=====================

This section will discuss several aspects of the design of Bcfg2, and the
particular use cases that motivated them. Initially, this will consist
of a discussion of the system metadata, and the intended usage model
for package indices as well.

System Metadata
---------------

Bcfg2 system metadata describes the underlying patterns in system
configurations. It describes commonalities and differences between these
specifications in a rigorous way. The groups used by Bcfg2's metadata are
responsible for differentiating clients from one another, and building
collections of allocatable configuration.

The Bcfg2 metadata system has been designed with several high-level
goals in mind. Flexibility and precision are paramount concerns; no
configuration should be undescribable using the constructs present in
the Bcfg2 repository. We have found (generally the hard way) that any
assumptions about the inherent simplicity of configuration patterns tend
to be wrong, so obscenely complex configurations must be representable,
even if these requirements seem illogical during the implementation.

In particular, we wanted to streamline several operations that commonly
occurred in our environment.

* Copying one node's profile to another node.

  In many environments, many nodes are instances of a common configuration
  specification. They all have similar roles and software. In our
  environment, desktop machines were the best example of this. Other than
  strictly per-host configuration like SSH keys, all desktop machines
  use a common configuration specification. This trivializes the process
  of creating a new desktop machine.

* Creating a specialized version of an existing profile.

  In environments with highly varied configurations, departmental
  infrastructure being a good example, "another machine like X but with
  extra software" is a common requirement. For this reason, it must be
  trivially possible to inherit most of a configuration specification
  from  some more generic source, while being able to describe overriding
  aspects in a convenient fashion.

* Compose several pre-existing configuration aspects to create a new profile.

  The ability to compose configuration aspects allows the easy creation
  of new profiles based on a series of predefined set of configuration
  specification fragments. The end result is more agility in environments
  where change is the norm.

  In order for a classing system to be comprehensive, it must be usable in
  complex ways. The Bcfg2 metadata system has constructs that map cleanly
  to first-order logic. This implies that any complex configuration
  pattern can be represented (at all) by the metadata, as first-order
  logic is provably comprehensive.  (There is a discussion later in the
  document describing the metadata system in detail, and showing how it
  corresponds to first-order logic)

These use cases motivate several of the design decisions that we
made. There must be a many to one correspondence between clients and
groups. Membership in a given profile group must imbue a client with
all of its configuration properties.

Package Management
------------------

The interface provided in the Bcfg2 repository for package specification
was designed with automation in mind. The goal was to support an
append only interface to the repository, so that users do not need to
continuously re-write already existing bits of specification.