Jump to: navigation, search


Name: V4/Healthcheck
Ticket: #7391
Author: rcritten
Incomplete.png Pending review
Last updated: 2018-11-29 by Rcritten


IPA provides no way to do introspection to discover possible issues. A framework is needed to assist with the identification, diagnosis and potentially repair of problems. This has the benefit of increasing confidence in an IPA installation and reducing costs associated with addressing issues.

The purpose of the healthcheck tool is to find and report error conditions that may impact the IPA environment. Automated repair is possible in some limited cases.

There are two pieces to this tool:

  1. The tool itself which consists of plugins to collect data and store the result in LDAP and optionally perform some repairs
  2. An IPA server plugin which retrieves and displays this data

Use Cases

The high level use case is:

As an administrator I want to be able to identify and correct issues in my IPA installation.

This will be achieved through plugins to a framework which implement the checks. Some specific use cases implemented by these plugins could include:

As an administrator I want to be able to identify and correct replication issues.

As an administrator I want to be able to identify and correct replication conflicts.

As an administrator I want to be able to ensure that my certificates are valid.

As an administrator I want to be able to diagnose issues with my CA infrastructure

As an administrator I want to be able to identify and correct issues with AD Trust.

As an administrator I want to know that all file permissions/ownership are ok.

All tools run without issue during upstream integration testing.

Related Tickets

How to Use

Healthcheck is broken into two components, a collector tool name ipa-healthcheck that runs on an IPA Master as root either via cron or invoked by an administrator, and a reporting too built into IPA using the command prefix healthcheck (ipa healthcheck-find, ipa-healthcheck-show, etc).

The collector executes a series of plugins to collect its information. Each plugin, referred to later as a source, is organized around a specific theme (certificate system, file system permissions and ownership, replication, etc.). A source is a collection of tests, refered to as checks, that should test one small piece of IPA. The purpose is so that when developing and running the tests one can control which are executed.

The output of running a check is nothing if the check is successful. Upon failure an LDAP entry will be created describing the failure, providing the source and check which found the failure and the date the failure was discovered.

An administrator can override a failure and mark it as IGNORED if desired. This is to provide flexibility in case the local system is hardened differently than the IPA default, for example if some file permissions or ownership varies slightly but is still secure and operational.

Since the data will be in LDAP, assuming that replication is successful, then the report should be visible on all masters.

By default the collector will execute once nightly on every master.

Each reported error is given a unique UUID to identify it. This is needed when marking an issue as ignored.

Running it manually

The ipa-healthcheck command will run in cron nightly as a default. It will execute with the -q (quiet) option so there should be no output. If the command fails by returning a non-zero return code then this will be recorded in the journal and may spawn an e-mail to root on the system.

Normally ipa-healthcheck will exit with a returncode of 0, even if any checks discovered issues with the IPA installation. A non-zero returncode means that ipa-healthcheck failed in a non-recoverable way.

To run it manually simply execute:

# ipa-healthcheck

A specific check can be executed as well:

# ipa-healthcheck --source certificate --check expiration

Output will include the list of checks executed but not their findings. The ipa command-line tool healthcheck-find can be used to display the results.

Running it manually is useful if an administrator is attempting to correct issues and wants to double-check that something is resolved.

Displaying the Issues

The ipa healthcheck-find command is used to report on any issues uncovered by the ipa-healthcheck collector. It can include all masters or be run per-master:

# ipa-healtcheck-find
# ipa-healthcheck-find --master ipa.example.com

The output of the check will include as specific an error message as possible and a solution. In some cases the solution may be complex enough to require redirection to an external web page.

Repairing Issues

Repairing an issue involves the administrator making the suggested changes to their system.

Repairing an issue will not be reflected until the next time that ipa-healthcheck executes. It should see that the error condition no longer exists and will mark it as resolved at the time that ipa-healthcheck has run.

The sequence will generally be:

  1. ipa-healthcheck executes and finds one or more issues
  2. ipa healthcheck find is run by an administrator
  3. The issues are addressed
  4. ipa-healthcheck executes and marks previously reported issues that are now in compliance as resolved

Ignoring Issues

The goal is to be flexible so as a means to that any reported error can be marked as ignored. This will prevent any future exact-match failures to not be reported again.

The ipa healthcheck-ignore command is used to mark an issue as resolve. The sequence is:

  1. ipa healthcheck-find --master ipa.example.com
  2. Note the UUID of the issue(s) that want to be ignored
  3. For each one: ipa healthcheck-ignore UUID-of-issue

Ignoring issues should be done cautiously because it doesn't make the underlying issue go away but it does remove the error from the default report output.



The healthcheck tool will reside in its own upstream git repository. It will import IPA existing modules for LDAP support, certificate handling, replication topology and communicating with IPA itself. It will be maintained separately so it can have its own release cycle, increasing the speed of development. Once the framework is in place a rapid development/release process can be done. As more plugins or capabilities are added new releases can be made.

Writing a tool that works across versions can be challenging for the following reasons:

  • Version of python may be limited (e.g. 3.0 was written against Python 2.6)
  • IPA libraries may be in different locations on different releases
  • IPA libraries may return different data types by version
  • Testing across all releases is a challenge

The IPA server plugin for displaying the data will reside in the FreeIPA upstream source repository.

The model is that ipa-healthcheck writes entries into LDAP and the ipa healthcheck-[show|find] will report on that data. The reporting tool doesn't care what the errors are, just that they are in the correct format.

The ipa-healthcheck repository will consist of:

  • ipa-healthcheck framework: options, logging, wrapper to execute plugins and log results
  • ipa-healthcheck plugins which validate a specific operation of IPA (certificates, replication, per-service checks)

It will use similar branching as upstream IPA in order to deal with differing imports, data types, etc. So there will be an ipa-4-6, ipa-4-7 and master branches eventually.

The freeipa repository will contain the LDAP schema, indexes and the healthcheck plugin used for reporting.


Error messages should be descriptive without being paragraphs long. It is very possible that external documentation will be needed to aid a user in resolving some issues.

A balance is needed for space reduction ideally the errors could be a code to be used with a lookup table, perhaps with variable substitution built in. This doesn't make for nice output for a user-rolled report using ldapsearch though.

Each error will have a unique key so can be looked up in the ipa-healthcheck libraries. It will be localized and will use name/value pairs for variable substitution.


Similarly solutions will use the same key and will pull from the ipa-healthcheck libraries. It may include references or links to external documentation.


Severity of a problem is defined as:

Value Severity Definition
0 success The check executed and found no issues.
1 critical Something is terribly wrong (e.g. a service is not started, certificates are expired, etc).
2 error Something is wrong but your IPA master is probably still working (e.g. replication conflict)
3 warning Not an issue yet, but may be (e.g. expiring certificate soon)

A success value is recorded so an administrator can know that all checks have executed.


A new LDAP container will be used to store healthcheck-related data: cn=healthcheck,cn=ipa,cn=etc,dc=example,dc=com.

Failures will be stored in cn=errors,cn=healthcheck,cn=ipa,cn=etc,dc=example,dc=com

The data not be stored by master. All failures will live in this one container.

When a master is removed the entries for that master will be removed.

The error will be stored using a UUID as the key.

There are multiple areas of health checks: certificates, replication, etc. Within those checks are atomic tests looking at a single aspect. Both the overall healthcheck area and the specific test will be stored on failures, along with the details of the failure (TBD).

Each entry will represent a single failure and store:

  • the IPA master
  • date reported
  • date resolved (if any)
  • error source (the plugin that found the error, for sorting)
  • error check (the check within the plugin that found the error)
  • error message(s), a name=value pair.
  • severity (1,2,3)

This makes using ldapsearch a reasonable way to get the data as well as using more complex reporting tool.

The ipa-healthcheck tool will autobind to LDAP over ldapi as root. This will be needed for some tests.

All dates are stored in UTC time.

TBD: We need to store metadata on each master to know the last-run date.

TBD: How are errors reported/collected if the LDAP server is down? It will be reported via the journal/syslog at a minimum because the tool will/should fail.


attributeTypes: ( OID NAME 'ipaErrorDateReported' DESC 'Date error reported' SYNTAX EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch SINGLE-VALUE X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaErrorDateResolved' DESC 'Date error resolved' SYNTAX EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch SINGLE-VALUE X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaErrorSource' DESC 'Source of Error' SYNTAX SINGLE-VALUE X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaErrorCheck' DESC 'Test that failed' SYNTAX SINGLE-VALUE X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaErrorMessage' DESC 'Error message details' SYNTAX SINGLE-VALUE X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaErrorSolution' DESC 'A way to fix a the reported error' SYNTAX X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaSeverity' DESC 'Error severity' SYNTAX SINGLE-VALUE X-ORIGIN 'IPA v4' )
attributeTypes: ( OID NAME 'ipaIgnoreError' DESC 'Ignore this error' SYNTAX EQUALITY booleanMatch SINGLE-VALUE X-ORIGIN 'IPA v4' )
objectClasses: ( OID NAME 'ipaHealthCheckObject' SUP top AUXILIARY DESC 'Class for reporting healthcheck issues' MUST ( fqdn $ ipaErrorDateReported $ ipaErrorSource $ ipaErrorMessage $ ipaSeverity $ ipaIgnoreError $ ipaErrorSolution) MAY ( ipaErrorCheck $ ipaErrorDateResolved ) X-ORIGIN 'IPA v4' )


dn: ipaUniqueID=f87a0a84-f713-11e7-92c7-52540039bfcb, cn=healthcheck, cn=ipa, cn=etc, dc=example,dc=com
objectclass: ipaHealthCheckObject
fqdn: ipa.example.com
ipaErrorDateReported: 20181018184904Z
ipaErrorSource: certificate
ipaErrorCheck: expired
ipaErrorMessage: certificate=Server-Cert
ipaErrorSolution: See https://some/site/for/instructions
ipaErrorMessage: expired_date=2016-10-18 09:55:26 EDT
ipaSeverity: 1
dn: ipaUniqueID=20ee34a8-d6f8-11e8-a94b-5254003c91bc, cn=healthcheck, cn=ipa, cn=etc, dc=example,dc=com
objectclass: ipaHealthCheckObject
fqdn: ipa.example.com
ipaErrorDateReported: 20181018184904Z
ipaErrorSource: files
ipaErrorCheck: permissions
ipaErrorMessage: file=/var/lib/ipa/ra-agent.key
ipaErrorMessage: expected=0600
ipaErrorMessage: got=0644
ipaErrorSolution: chmod 0600 /var/lib/ipa/ra-agent.key
ipaSeverity: 1
ipaIgnoreError: TRUE

ipaErrorMessage is a multi-valued attribute that contains name value pairs that can be passed to a localized string (e.g. the str.format() method in Python). This does not depend on the order of storage or retrieval in LDAP.

The second error entry above is ignored when reporting the data.

Resolution will be stored in cn=solutions,cn=healthcheck,cn=ipa,cn=etc,dc=example,dc=com

The key is the plugin name + the test used, concatenated with a _: cn=certificate_expired

The attribute description will hold free-form data on resolution which can include external links for more complex operations.

objectClasses: ( OID NAME 'ipaHealthCheckSolutionObject' SUP top AUXILIARY DESC 'Class for solutions to healthcheck issues' MUST ( description ) X-ORIGIN 'IPA v4' )
dn: cn=certificate_expired, cn=solutions,cn=healthcheck,cn=ipa,cn=etc,dc=example,dc=com
objectclass: ipaHealthCheckSolutionObject
description: Renew the certificate. See some.external.uri for details.



dn: cn=ipaSeverity,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add


dn: cn=ipaErrorSource,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add


dn: cn=ipaErrorCheck,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add


dn: cn=ipaIgnoreError,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add


dn: cn=ipaErrorDateReported,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add


dn: cn=ipaErrorDateResolved,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add


A permission and privilege granting read access will be added so the IPA server healthcheck plugin can control access to issues on the masters:

managed_permissions = {
    'System: Read Healthcheck Errors': {
        'ipapermbindruletype': 'all',
        'ipapermright': {'read', 'search', 'compare'},
        'ipapermdefaultattr': {
            'cn', 'fqdn', 'ipaerrorcheck', 'ipaerrordatereported',
            'ipaerrordateresolved', 'ipaserrorlevel',
        'default_privileges': {'Healthcheck Administrator'},
    'System: Write Healthcheck Errors': {
        'ipapermright': {'write'},
        'ipapermdefaultattr': {
            'cn', 'fqdn', 'ipaerrorcheck', 'ipaerrordatereported',
            'ipaerrordateresolved', 'ipaserrorlevel',
        'default_privileges': {'Healthcheck Administrator'},

A role will be added with both of these privileges, "Healthcheck Administrator".

The admins group will be a member of this role and it will be added on upgrade.


The main flaw of this decentralized design is that it is decentralized. For example, we require one and only one CRL generator. There is no way to enforce this currently via healthcheck. Each master can see if it should be the master and warn as appropriate but there is no "require only one" option.

Note that for this particular example, and perhaps for all, we can add a server role for CRL generator. Every master would be able to see this role. If it is them then they check the config to confirm they are configured appropriate. If not they raise an error.


The healthcheck plugin framework will be thin, consisting of:

  • option parser
  • setup logging (just for when running manually)
  • LDAP connection (to be passed to plugins)
  • IPA api will be finalized and run in_server=True
  • plugin loader
  • plugin execution
  • recording results in LDAP

A failure entry will be created if a plugin fails to execute, raises an exception, and will be cleared if a subsequent run of the plugin is successful.


Plugins will define a name to be used to in part to record as the ipaErrorSource and to select when manually running individual tests on the command-line. This is called the "source".

The entry point to the plugin is a run() method. This will execute all of the tests provided by the plugin.

Each test will have a short, unique name known as the "check".

Care will be needed to ensure uniqueness of check names within a given source. The framework may be able to enforce this.

So: the healthcheck daemon runs sources which executes checks. Failed checks are stored as errors in LDAP.

Examples of sources and checks:

  • certtool
    • expired
    • expiring-soon
    • tracking
  • replication
    • sync
    • conflict

The on_request class variable will control whether a plugin is executed by default. It will need to be overridden by a plugin to cause it to not run. This can be used in the case of an upgrade to check for only certain types of errors for example.

Plugins will execute one or more discrete tests. Each test should be as atomic as possible. It is better to report:

File /path/to/foo has incorrect permissions, 0644 and should be 0600

Rather than

Files a, b, c, d have incorrect permissions

Plugins will return an error class containing the name/value pairs of errors and the severity as an iterator.

Plugins will return None if no errors are found.

All errors encountered by a plugin should be reported to the tool (so aggressive use of try/except is required). The failure of a source (or check) to execute is a failure that should be reported. There can be zero chance that a failed check can cause the entire healthcheck command execution to fail. If executing a source fails then there will be no value for ipaErrorCheck.

The basic execution will look like:

for source in sources:

   for check in sources.check():

The analysis (deduplicating, writing to LDAP, etc) can be either done per-source or once globally. It would be fewer LDAP searches to do globally perhaps but would probably be fine running for each source as well, at least in the LDAP case.

The initial plugins for the tool are:


  • basic service status (are all services running that should)
  • file permission and ownership
  • SELinux contexts
  • hostname sanity
  • disk utilization (may require config to set threshold)


  • certificate expiration warnings (may require config to define period)
  • certificate tracking issues
  • NSS trust
  • compare CA entries between dogtag and IPA
  • ensure RA agent cert is working
  • ensure there is a renewal master
  • ensure there is a CRL master
  • certmonger request tracking correctness
  • CA chain validation
  • certificate serial number ranges


  • replication consistency (are masters missing entries? expensive)
  • replication status
  • replication conflicts (old and new style)
  • DNA ranges
  • Unused RUVs

AD Trust

  • connectivity


  • validate kvno of keytabs


  •  ???


  • Check number of agreements per master
  • Find weak points in topology
  • Find single points-of-failure


  • verify keys are consistent


This test is normally not executed by default. It needs to be requested on the command-line and is for upgrades only. If any critical failures are reported then an Upgrade failure is recorded in LDAP and the upgrade is aborted.

Reporting the error via LDAP would provide at least one window into alerting users that the IPA upgrade has failed.


As the plugins execute for any given test there will be one of two outcomes: success or failure. Middle ground may be represented in ipaErrorLevel. This purpose of this tool is to report errors, not info.

Upon failure:

  • a search for a matching error message and not resolved
  • if no matches, create a new record
  • otherwise continue

Upon success:

  • a search for a matching error message and empty date resolved
  • if found then mark as resolved with the current date

A 5-minute default timeout will wrap plugin execution to ensure completion (it should be customizable per-plugin).

The definition of match here is TBD and depends on localization. Automatic removal of failures would be done like this:

  1. There is an initial set of errors, perhaps 0
  2. A run is executed, returning 0 or more errors as the current list of errors
  3. The initial errors are compared to the current errors. Errors in the initial list which are not in the current list are marked as fixed
  4. Errors in the current run that are not in the initial set are recorded as new errors

This will automatically account for issues that are fixed either automatically (e.g. certificate renewal) or as part of a larger effort to close issues. It is not required for an administrator to mark anything as fixed. Manually adding a resolved date will make the error re-appear upon the next run. The exception is if the error is marked as ignore.

The tool will return 0 if no errors are found, non-zero otherwise.


The ipa-healthcheck tool will store its configuration in /etc/ipa/healthcheck.conf. It will be an ini-style config file using the same config routines as IPA. The format is


In general it would be best to store configuration in LDAP. For the purposes of timeout LDAP may not be reachable so needs local configuration.

Other configuration identified (may be out-of-scope for initial implementation)

  • disk space threshold
  • days before certificate expiration warnings appear


Kerberos credentials will be required for some operations. Ideally this can be handled as a bind using the host principal. Bind to LDAP will be done using ldapi which should provide read access to any data not available as the host.


Some issues may be repairable automatically. Repair will need to be requested. It will not be interactive. If a repair fails then the LDAP entry will remain in its current state.

Anticipated types of automated repairs:

  • file and directory permission and ownership
  • file and directory mode
  • incorrectly configured certmonger tracking requests

Expected never to be automatable:

  • Removal of replication conflict entries (too much risk of data loss)

Resolving issues

A list of all problems and their resolution is required in order for users to be able to repair their systems. The resolution may be trivial, chmod <mode> <path>, to something extremely complex such as rolling back time to renew certificates.

See #Schema for details on storage.


The design calls for two separate programs: ipa-healthcheck and the healthcheck plugin to the IPA framework. For messages to be displayed in IPA they will need to be exposed in ipa-healthcheck as an importable library.

Defined in ipa-healthcheck somewhere:

errors['filemode'] = _("The file {file} has mode {got} and should have {expected}")

Executed in IPA

from ipahealthcheck import errors
pairs = entry_attrs.get('ipaerrormessage')
<convert pairs to a dict d>
msg = errors.get('filemode')
reported_error = msg.format(**d)


The ipa-healthcheck command and plugins will be distributed as a separate tarball so will be a separate package. The freeipa-server package will have a dependency on this so it will be included by default.

The server healthcheck plugin will be delivered in the freeipa-server package so will be installed by default.

Note: there is still some uncertainty about whether ipa-healthcheck will be a separate upstream project or be included in freeIPA. The advantage to being separate is that it can be updated much more frequently. The disadvantage is the additional packaging work. This is still under discussion but for now it is separate.


Feature Management


The WebUI will only provide reporting via the IPA server plugin. There will be no on-demand execution.

Suggested reports:

  • Display current the errors per master
  • Display current errors for all masters by plugin source
  • Display resolved issues by master
  • Display issues by level (warning, error, critical)


There are two command-line tools. The first is the ipa-healthcheck tool which gathers data.

Command Options
ipa-healthcheck --source execute only a specific set of test
--verbose expanded output
--quiet no, or extremely limited output, rely on the return code
--dry-run don't update LDAP
--repair Try to repair the issue automatically
$ ipa-healthcheck
Check certificate renewal
Check file permissions
The ipa-healthcheck command was successful.

The ipa-healthcheck command return code indicates whether it was able to run successfully, not if it encountered any issues with the IPA installation. A 0 means that all sources and checks were executed. A non-zero means some unrecoverable condition was encountered and needs further investigation.

The ipa-healthcheck tool does not log to a file.

The second is the IPA server plugin which displays the data.

For brevity let's stipulate that these will provide the --raw, --all, --pkey-only, --timelimit and --sizelimit options available for all IPA commands.

--all will include errors marked as ignore.

--pkey-only will return the list of masters for which data is available. ( assuming UI doesn't need UUID )

Show a specific failure (UUID is unique). This will also look-up and display the solution.

If --source is used then --check is required as well.

Command Options
ipa healthcheck-show UUID [options] --severity [code] displays the entries with the specified severity
--source [SOURCE] shows entries generated by a particular source for this host
--check [CHECK] shows entries matching a specific failed check for this host
$ ipa healthcheck-show 25003678-bae7-4d1a-a071-b6d42e3840c1
Source: certcheck
Check: bad_permissions
Severity: Error
Message: The file /etc/httpd/alias/key3.db has incorrect permissions. Expected 0640, got 0755
Solution: See URL
Reported: Wed Nov 14 18:35:11 2018 UTC
Ignored: FALSE

Find failures across all masters.

Command Options
ipa healthcheck-find [criteria] [options] --severity [code] displays the entries with the specified severity
--source [SOURCE] shows entries generated by a particular source for this host
--check [check] shows entries matching a specific failed check for this host
--since [generalized_time] shows entries reported since a provided date
--resolved includes entries that have been resolved
--master [FQDN] shows entries for a given master
--include-ignored show entries that have been marked as ignored
--solution include the solution text in the output
$ ipa healthcheck-find --master=ipa.example.com
UUID: 25003678-bae7-4d1a-a071-b6d42e3840c1
Source: certcheck
Check: bad_permissions
Severity: Error
Message: The file /etc/httpd/alias/key3.db has incorrect permissions. Expected 0640, got 0755
Solution: See URL
Reported: Wed Nov 14 18:35:11 2018 UTC
Ignored: FALSE

UUID: 6aa93ba5-08ac-4eaf-bc19-3c052f831610
Source: certcheck
Check: expiration
Severity: Critical
Message: The certificate 'Server-Cert' in /var/lib/ipa/httpd.pem expires in 7 days
Solution: See URL
Reported: Wed Nov 14 18:35:11 2018 UTC
Ignored: FALSE
$ ipa healthcheck-find --severity=critical
UUID: 6aa93ba5-08ac-4eaf-bc19-3c052f831610
Source: certcheck
Check: expiration
Severity: Critical
Message: The certificate 'Server-Cert' in /var/lib/ipa/httpd.pem expires in 7 days
Solution: See URL
Reported: Wed Nov 14 18:35:11 2018 UTC
Ignored: FALSE

Mark a specific failure as something to ignore. Both --source and --check are required.

Command Options
ipa healthcheck-ignore UUID [options]
$ ipa healthcheck-ignore 6aa93ba5-08ac-4eaf-bc19-3c052f831610
UUID: 6aa93ba5-08ac-4eaf-bc19-3c052f831610
Source: certcheck
Check: expiration
Severity: Critical
Message: The certificate 'Server-Cert' in /var/lib/ipa/httpd.pem expires in 7 days
Solution: See URL
Reported: Wed Nov 14 18:35:11 2018 UTC
Ignored: TRUE




It is possible this could be used as a pre-upgrade checker, perhaps failing if there are any critical issues reported. This will be stored in LDAP like all other critical failures.

The admins group will be added to the new "Health Check" role.

Test Plan

It can be difficult to simulate some issues.

At a minimum it should return 100% success on new installations of the supported IPA versions.

For testing certificates at least one round of certificate renewals should be done.