Skip to end of metadata
Go to start of metadata


The Chef Indexer is comprised of a RabbitMQ message queue, chef-expander, which pulls messages from the queue and formats them, and chef-solr, a thin wrapper around Solr.

RabbitMQ

RabbitMQ provides a queuing service which stores requests for updates to the search index. This allows the API server to handle load spikes while maintaining an even load on Solr.

We recommend you read RabbitMQ's administration guide before deploying it into production.


Configuring Chef's RabbitMQ-Specific Settings

Setting

Default Value

Description

amqp_host

'0.0.0.0'

The IP address or resolvable hostname of the server running RabbitMQ

amqp_port

'5672'

The port on which RabbitMQ listens for requests

amqp_user

'chef'

The username to use when connecting to RabbitMQ

amqp_pass

'testing'

The password to use when connecting to RabbitMQ

amqp_vhost

'/chef'

The RabbitMQ vhost to use

amqp_consumer_id

nil

A tag to append to the queue name. The implications of this setting are discussed in the redundancy and scaling section below

More information:



Node in Knife, but not in Chef Searches?


This has happened when there has been an additional quotation mark, tripping up the indexer. (for example: 3.5" rather than 3.5)

There are two possible solutions: remove offending lines and then run knife index rebuild, or install fast_xs.

Configuring RabbitMQ for Chef

Chef requires you to configure a user and vhost in RabbitMQ. Normally, this configuration will be handled for you by either post-install scripts in Chef packages, or in the rabbitmq_chef recipe included in the bootstrap install. You only need to run these commands if you are doing a fully manual installation or have destroyed your RabbitMQ configuration somehow.

To configure RabbitMQ for Chef using the default vhost "/chef", user "chef", and password "testing". (You may need to do this as "rabbitmq" user or whatever user rabbitmq runs as):

To verify that your configuration is correct:

RabbitMQ Persistence

As of Chef 0.10, you can configure Chef to use RabbitMQ persistence. This means that RabbitMQ will store queued messages on disk, allowing them to be recovered if RabbitMQ should fail (but not if the disk on the box running RabbitMQ fails). To enable persistence, make sure you are running RabbitMQ 2.2 or higher. Then set persistent_queue to true in your server.rb configuration file.

Chef Expander

New in Chef 0.10

Chef Expander is new in Chef 0.10 and replaces chef-solr-indexer

Chef Expander fetches messages from RabbitMQ, processes them into the correct format to be loaded into Solr and loads them into Solr.

Running Chef Expander

Chef Expander is designed for clustered operation, though small installations will only need one worker process. To run Chef Expander with one worker process, run chef-expander -n 1. You will then have a master and worker process, which looks like this in ps:

your-shell> ps aux|grep expander
you   52110   0.1  0.7  2515476  62748 s003  S+    3:49PM   0:00.80 chef-expander worker #1 (vnodes 0-1023)   
you   52108   0.1  0.5  2492880  41696 s003  S+    3:49PM   0:00.91 ruby bin/chef-expander -n 1

Workers are single threaded and therefore cannot use more than 100% of a single CPU. If you find that your queues are getting backlogged (see Operation and Troubleshooting, below), increase the number of workers.

Chef Expander Operation and Troubleshooting

Chef Expander includes chef-expanderctl, a management program that allows you to get status information or change the logging verbosity (without restarting). chef-expanderctl has the following commands:

  • chef-expanderctl help prints usage.
  • chef-expanderctl queue-depth Shows the total number of messages in the queues. See the design section for more explanation.
  • chef-expanderctl queue-status Show the number of messages in each queue. This is mainly of use when debugging a Chef Expander cluster
  • chef-expanderctl log-level LEVEL Sets the log level on a running Chef Expander or cluster.

If you suspect that a worker process is stuck, as long as you are using clustered operation, you can simply kill the worker process and it will be restarted by the master process.

Design

Chef Expander uses 1024 queues (called vnodes in some places) to allow you to scale the number of Chef Expander workers to meet the needs of your infrastructure. When objects are saved in the API server, they are added to queues based on their database IDs. These queues can be assigned to different Chef Expander workers to distribute the load of processing the index updates.

Redundancy and Scaling Options with Chef Indexer

0.9.x and Lower

The discussion below applies only to Chef 0.8.x and 0.9.x. In Chef 0.10 and higher, you can increase Chef Expander throughput by increasing the number of workers in the cluster

The point of setting the queue name is to switch between messages going from 1 producer to N consumers or messages going from 1 producer to (1 of N) consumers. When multiple consumers share the same queue name, you'll have the 1->(1 of N) behavior; with multiple consumers with distinct queue names, you'll get the 1-> N behavior.

The use cases these address in Chef are:

  • With 1 to (1 of N) behavior, you can have multiple chef-solr-indexer processes reading data from the queue, munging the data and then posting the data to the same SOLR instance. You would want to do this if: you want redundancy in chef-solr-indexer and/or the data munging that chef-solr-indexer does becomes a bottleneck. To configure this behavior, configure each instance of chef-solr-indexer to use the same amqp_consumer_id setting.
  • With 1 to N behavior, you have multiple chef-solr-indexer processes each passing the data to unique SOLR instances. You want to do this if: you want redundancy in chef-solr, and you prefer to achieve it this way instead of replicating at the SOLR level. To configure this behavior, configure each instance of chef-solr-indexer to use a different amqp_consumer_id setting, or set amqp_consumer_id to nil.






Labels
  • None