Friday, June 24, 2016

[Ceilometer] To survey how to improve the performance of OpenStack Ceilometer

Frankly speaking, OpenStack Ceilometer will suffer some kind of performance issues sooner or later if you don't modify or tune the configuration. The issues has two parts that need you to consider. One is the message bus and API loading, and the other is database. However, I find some best practices which are easy and quick for us to adopt. Here you go:

1. Telemetry(Ceilometer) best practices

a. Data collection

  1. Based on your needs, you can edit the pipeline.yaml configuration file to include a selected number of meters while disregarding the rest.
  2. By default, Telemetry service polls the service APIs every 10 minutes. You can change the polling interval on a per meter basis by editing the pipeline.yaml configuration file.
    for example:
    vim /etc/ceilometer/ceilometer.conf
    => evaluation_interval=120
    vim /etc/ceilometer/pipeline.yaml
    => interval: 120

  3. you can delay or adjust polling requests by enabling the jitter support. This adds a random delay on how the polling agents send requests to the service APIs. To enable jitter, set shuffle_time_before_polling_task in the ceilometer.conf configuration file to an integer greater than 0.

b. Data storage

  1. We recommend that you avoid open-ended queries.
  2. You can install the API behind mod_wsgi, as it provides more settings to tweak, likethreads and processes in case of WSGIDaemon. a. For more information on how to configure mod_wsgi, see the Telemetry Install Documentation.
  3. The collection service provided by the Telemetry project is not intended to be an archival service. Set a Time to Live (TTL) value to expire data and minimize the database size.
    for example:
    vi /etc/ceilometer/ceilometer.conf
    => time_to_live=302400
  4. Use replica sets in MongoDB. Replica sets provide high availability through automatic failover. If your primary node fails, MongoDB will elect a secondary node to replace the primary node, and your cluster will remain functional.
    1. For more information on replica sets, see the MongoDB replica sets docs.
  5. Use sharding in MongoDB. Sharding helps in storing data records across multiple machines and is the MongoDB’s approach to meet the demands of data growth.

2. Metering Service (Ceilometer): Best Practices and Optimization

a. Modifying the List of Meters

    - name: meter_source
      interval: 604800
          - "instance"
          - "image"
          - "image.size"
          - "image.upload"
          - "image.delete"
          - "volume"
          - "volume.size"
          - "snapshot"
          - "snapshot.size"
          - "ip.floating"
          - "network.*"
          - "compute.instance.create.end"
          - "compute.instance.delete.end"
          - "compute.instance.update"
          - "compute.instance.exists"
          - meter_sink
    - name: meter_sink
          - notifier://

b. Modifying the Polling Intervals

The interval attribute is the time between polls. Meters that are available as both notification and polling are going to be polled at the specified interval. To rely on notifications rather than polling, set the interval attribute to 604800 seconds, or once a week.


One of the main issues operators relayed was the polling that Ceilometer was running on Nova to gather instance information. It had a highly negative impact on the Nova API CPU usage, as it retrieves all the information about instances on regular intervals.

Thursday, June 23, 2016

[Linux] Why does Linux require moving IP from eth interface to bridge interface?

This could be a common problem if you have KVM ( or other hypervisor in Linux ) on your physical server and want to use bridge mode with your VMs. At the same time, you also want to let your physical server has the network that can be accessed from other hosts at the same network subnet. At this moment, when a network interface (e.g., eth0) is added to a Linux bridge (e.g., br0), the IP address must be removed from eth0 and added to br0 for the networking to function properly.

I find some answers as follows:

The NIC represents the uplink cable. A cable is layer 1, not layer 3. Now the Bridge works as the device that is being addressed for network traffic (incoming) on the server - either on layer 2 (Ethernet/MAC) and/or layer 3 (IP). So the device that responds to ARP-requests is the bridge - which is good, since it needs to distribute the traffic to the other interfaces on that bridge. If the responding device were the NIC, traffic would not be passed further on to the bridge.

Normally it does not make sense to put any L3 protocol address on port interfaces - because incoming packets are diverted to the bridge interface before the L3 protocol is examined. This means the L3 protocol running on the port interface will never see any incoming packets.

Wednesday, June 15, 2016

[Neutron] Neutron Cheat Cheat Sheet

In recent days, the hand drawing style is becoming more and more pervasive in Taiwan. I don't know how it happens, but at least I can draw the Neutron cheat cheat sheet to echo this kind of style for fun.

P.S: This picture is originally for my colleagues to trouble shooting the Neutron networking problems.

Thursday, June 2, 2016

[Neutron and SDN] Warm-up for understaning the integration of Neutron and SDN

I just spend some time to study the integration of Neutron and SDN. Furthermore, I also take a look at how ODL and ONOS are integrated with OpenStack Neutron. The following content and picture ( with URL link) are excerpted from the variety of resource in the internet. Some of sections has my comment with P.S. I think it can give you a clear concept of Neutron and SDN controller.

Neutron and SDN
P.S: This picture gives an overall architecture about Neutron and SDN controller that are integrated together.

When an OpenStack user performs any networking related operation (create/update/delete/read on network, subnet and port resources) the typical flow would be as follows:
  1. The user operation on the OpenStack dashboard (Horizon) will be translated into a corresponding networking API and sent to the Neutron server.
  2. The Neutron server receives the request and passes the same to the configured plugin (assume ML2 is configured with an ODL mechanism driver and a VXLAN type driver).
  3. The Neutron server/plugin will make the appropriate change to the DB.
  4. The plugin will invoke the corresponding REST API to the SDN controller (assume an ODL).
  5. ODL, upon receiving this request, may perform necessary changes to the network elements using any of the southbound plugins/protocols, such as OpenFlow, OVSDB or OF-Config.

We should note there exists different integration options with the SDN controller and OpenStack; for example:
  1. one can completely eliminate RPC communications between the Neutron server and agents on the compute node, with the SDN controller being the sole entity managing the network
  2. or the SDN controller manages only the physical switches, and the virtual switches can be managed from the Neutron server directly.

第一種模型a) Neutron相當於SDN控制器,plugin與agent間的通信機制(如rpc)就相當於簡單的南向協議。第二種模型中Neutron作為SDN應用,將業務需求告知SDN控制器,SDN控制器再通過五花八門的南向協議遠程控製網絡設備。
第二種模型b) 也可以把Neutron看做超級控制器或者網絡編排器,去完成OpenStack中網絡業務的集中分發。

Plugin Agent

P.S: This picture shows the process flow between agents, api server and ovs when creating a VM.

About ML2

Neutron plugin体系

How OpenDaylight to integrate with Neutron

How is ONOS to integrate with Neutron

SONA Architecture

onos-networking plugin just forwards (or calls) REST calls from nova to ONOS, and OpenstackSwitching app receives the API calls and returns OK. Main functions to implement the virtual networks are handled in OpenstackSwitching application.

OpenstackSwitching (App on ONOS)

Neutron + SDN Controller (ONOS)  

P.S: ONOS provides its ONOSMechanismDriver instead of OpenvswitchMechanismDriver

Here is an article to talk about writing a dummy mechanism driver to record variables and data in logs