N-Port ID Virtualization or NPIV in virtualization environment ( libvirt, kvm space) Or Assign a lun directly to a VM/guest

I was thinking to write about NPIV long time back and I denied myself till now. Maybe because, it was almost incomplete from libvirt layer. How-ever I can’t do that more, especially after receiving lots of emails on this subject, so here it is:

To write about NPIV, I should share some idea about the fabric and its terminology. Let me start with it.

In a high-level view of the fabric, at least there are HBA ( host bus adapter), Fabric Switch, Target /storage array.

Obviously you know these components are interconnected. When these components are looped, you have to define the specification of these components.

If you really don’t know anything about Fabric, take a look at resources available there on the web:

Atleast you should start with ‘wikipedia’ 🙂
http://en.wikipedia.org/wiki/Fibre_Channel

In a fabric, ‘ports’ are one of the important components. The following types of ports are defined by Fibre Channel: These are called node ports.

N_port – is a port on the node (e.g. host or storage device) . Also known as node port.

NL_port – is a port on the node , also known as Node Loop port.

F_port – is a port on the switch that connects to a node point-to-point (i.e. connects to an N_port). Also known as fabric port. A F_port is not loop capable.

FL_port – is a port on the switch that connects to a FC-AL loop (i.e. to NL_ports). Also known as fabric loop port.
E_port – is the connection between two fibre channel switches. Also known as an Expansion port. When E_ports between two switches form a link, that link is referred to as an inter-switch link (ISL).

…etc

Refer http://en.wikipedia.org/wiki/Fibre_Channel#Ports for more information on other ports .

Once you know about fabric node ports give your attention on WWN, WWPN, WWNN :

WWN = World Wide Name –> This is the generic name,so lets discard it and worry about below:

WWPN = World Wide Port Name
WWNN = World Wide Node Name.

Obviously, if there are mulitple components in a fabric, there should be something which have to be unique. Its called ‘WWPN’. Even-though the ‘node name’ is same , port names will be different. WWPNs are equivalent of the MAC address in the Ethernet protocol or in other terms those are constants which can not be regenerated!!

I believe now you have some idea about fabric and its specs.. All right, lets think about NPIV 🙂

First of all DONT CONFUSE NPIV ( N port ID virtualization ) with ‘N port Virtualization’, its different.

Read it again 🙂

In this article, we only care about, N port ID virtualization. NPIV allows a single host bus adaptor (HBA) or target port on a storage array to register ‘multiple World Wide Port Names (WWPNs)’ and ‘N_Port identification numbers’.

Even-though its not only ‘virtualized’ machines (Vms) can take advantage of NPIV, it take lots of it. That said, in a virtualized environment, its a very common use case that ‘assign a lun’ to a specific VM. It should not be shared among others.

I am sure you would have thought in the same way. It is what getting achived with NPIV.

When a fibre channel device is connected to a fabric SAN , there are different processes happening :

In short:

FLOGI (fabric login) -> This happens on N_port/NL_port to F_port – Flogi frame has been sent to 0xFFFFFE.
PLOGI (port login) -> This happens on N_port to N_port
PRLI (process login) -> This starts from Processes on N_port .

During FLOGI process, a N_Port_ID is assigned by Fibre Channel Switch. A, ‘N_Port’ has one WWPN and one N_Port_ID. Normally it matches.

With NPIV, you have mulitple WWPNs/N_Port_IDs on a single physical N_port.

So, how a ‘N_port’ register multiple WWPNs/ N_port IDs ?. Its achieved by “FDISC’ to address 0xFFFFFE to obtain an additional address.

Refer http://en.wikipedia.org/wiki/NPIV for more information on this.

Once you have multiple WWPNs, you are capable of doing zoning or lun assignment to that unique identifications or wwpns. Who use that WWPN is not a concern for the target/Storage.

Does that also mean, I can assign this WWPN to a VM and next WWPN for another VM ? so that, I can assign different luns to different VMs without sharing to others .. YES, you got it .

Obviously your fibre channel switch should have support for NPIV.

In short, NPIV (N_Port ID Virtualization) is a Fibre Channel technology to
share a single physical Fibre Channel HBA with multiple virtual ports.
Henceforth known as a “virtual port” or “virtual Host Bus Adapter”
(vHBA), each virtual port is identified by its own WWPN (Word Wide
Port Name) and WWNN (Word Wide Node Name). In the virtualization
world the vHBA controls the LUNs for virtual machines.

The libvirt implementation provides flexibility to configure the LUN’s
either directly to the virtual machine or as part of a storage pool
which then can be configured for use on a virtual machine.

NPIV support in libvirt was first added to libvirt 0.6.5; however, the
following sections will primarily describe NPIV functionality as of the
current libvirt release, 1.1.2. There will be a troubleshooting and prior
version considerations section to describe some historical differences.

I believe I described about NPIV pretty much neatly. If not , please convey it via comment 🙂

Any way, let me proceed and get into my intended topic..:)

How to do all these in a virtualization environment? I take ‘livirt/qemu/kvm’ combo here:

Lets start.. below part conclude about NPIV and what libvirt
supports till now, also mention the TODO list.

Big Thanks to Osier for concluding libvirt part!

Follow below steps in order…

1) Discovery

Discovery of HBA(s) capable of NPIV is provided through the virsh
command ‘virsh nodedev-list –cap vports’. If no HBA is returned,
then the host configuration should be checked. The XML output from the
command “virsh nodedev-dumpxml” will list fields , , and
to be used in order to create a vHBA. Take care to also note
the value as this lets you know if the HBA is going to
exceed the maximum vHBA supported.

The following output indicates a host that has two HBAs to support
vHBA and the layout of a HBA’s XML:

  • # virsh nodedev-list –cap vports
    scsi_host4
    scsi_host5

    # virsh nodedev-dumpxml scsi_host5


    scsi_host5 pci_0000_04_00_1
    5

    2001001b32a9da4e
    2101001b32a9da4e
    2001000dec9877c1


    164
    5


    The “max_vports” value indicates there are a possible of 164 vports
    available for use in the HBA configuration. The “vports” value indicates
    the number of vports currently being used.

    Support for detection of HBA’s capable of NPIV support prior to libvirt
    1.0.4 is described in the “Troubleshooting” section.

    2) Creation of a vHBA using the node device driver

    In order to create a vHBA using the node device driver, select an HBA with
    available “vport” space, use the HBA “” field as the “
    field in the following XML:

    scsi_host5



    Then create the vHBA with the command “virsh nodedev-create” (assuming
    above XML file is named “vhba.xml”):

    # virsh nodedev-create vhba.xml
    Node device scsi_host6 created from vhba.xml

    NOTE: If you specify “name” for the vHBA, then it will be ignored.
    The kernel will automatically pick the next SCSI host name in sequence not
    already used. The “wwpn” and “wwnn” values will be automatically generated
    by libvirt.

    In order to see the generated vHBA XML, use the command “virsh
    nodedev-dumpxml” as follows:

    # virsh nodedev-dumpxml scsi_host6

    scsi_host6 scsi_host5

    2001001b32a9da5e
    2101001b32a9da5e


    This vHBA will only be defined as long the host is not rebooted. In
    order to create a persistent vHBA, one must use a libvirt storage pool
    (see next section).

    3) Creation of vHBA by the storage pool

    By design, vHBAs managed by the node device driver are transient across
    host reboots. It is recommended to define a libvirt storage pool based
    on the vHBA in order to preserve the vHBA configuration. Using a storage
    pool has two primary advantage, first the libvirt code will find the
    LUN’s path via simple virsh command output and second migration of
    virtual machine’s requires only defining and starting a storage pool
    with the same vHBA name on the target machine if you use the LUN with
    libvirt storage pool and volume name in virtual machine config (see
    section 5).

    In order to create a persistent vHBA configuration create
    a libvirt ‘scsi’ storage pool using the XML as follows:

    poolvhba0

    /dev/disk/by-path 0700
    0
    0

    You must use the “type=’scsi'” for the pool; The source adapter
    type must be “fc_host”. Attributes “wwnn” and “wwpn” are provided as
    the unique identifier for the vHBA to be created.

    There is an optional attribute “parent” for source the adapter. It
    indicates the name of the HBA which you want to use to create the
    vHBA. Its value should be consistent with what node device driver
    dumps (e.g. scsi_host5). If it’s not specified, libvirt will pick
    the first HBA capable of NPIV that has not exceeded the maximum
    vports it supports.

    NOTE: You can also create a scsi pool with source adapter type “fc_host”
    for a HBA, and in that case the attribute “parent” is not necessary.

    If you prefer to choose which parent HBA to use for your vHBA, then
    you must provide the parent, wwnn, and wwpn in the source adapter XML as
    follows:

    To define the persistent pool (assuming above XML is named as
    poolvhba0.xml):

    # virsh pool-define poolvhba0.xml

    NOTE: One must use pool-define to define the pool as persistent,
    since a pool created by pool-create is transient and it will disappear
    after a system reboot or a libvirtd restart.

    To start the pool:

    # virsh pool-start poolvhba0

    To destroy the pool:

    # virsh pool-destroy poolvhba0

    When starting the pool, libvirt will check if the vHBA with same
    “wwpn:wwpn” already exists. If it does not exist, a new vHBA with the
    provided “wwpn:wwnn” will be created. Correspondingly,when destroying
    the pool the vHBA is destroyed too.

    Finally, in order to ensure that subsequent reboots of your host will
    automatically define vHBA’s for use in virtual machines, one must set the
    storage pool autostart feature as follows (assuming the name of the created
    pool was “poolvhba0”):

    # virsh pool-autostart poolvhba0

    4) Finding LUNs on your vHBA

    4.1) Utilizing LUN’s from a vHBA created by the storage pool

    Assuming that a storage pool was created for a vHBA, use the command
    “virsh vol-list” command in order to generate a list of available LUN’s
    on the vHBA, as follows:

    # virsh vol-list poolvhba0 –details
    Name Path
    ———————————————————————

    unit:0:2:0 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203500a0b85ad1d7-lun-0 block

    The list of LUN names displayed will be available for use as disk volumes
    in virtual machine configurations.

    4.2) Utilizing LUN’s from a vHBA created using the node device driver

    Finding an available LUN from a vHBA created using the node device driver
    can be achieved either via use of the “virsh nodedev-list” command or
    through manual searching of the hosts system file system.

    Use the “virsh nodedev-list –tree | more” and find the parent HBA
    to which the vHBA was configured. The following example lists the
    pertinent part of the tree for the example HBA “scsi_host5”:

    +- scsi_host5
    |
    +- scsi_host7
    +- scsi_target5_0_0
    | |
    | +- scsi_5_0_0_0
    |
    +- scsi_target5_0_1
    | |
    | +- scsi_5_0_1_0
    |
    +- scsi_target5_0_2
    | |
    | +- scsi_5_0_2_0
    | |
    | +- block_sdb_3600a0b80005adb0b0000ab2d4cae9254
    |
    +- scsi_target5_0_3
    |
    +- scsi_5_0_3_0

    The “block_” indicates it’s a block device, the “sdb_” is a
    convention to signify the the short device path of “/dev/sdb”, and the
    short device path or the number can be used to search the
    “/dev/disk/by-{id,path,uuid,label}/” name space for the specific LUN
    by name, for example:

    # ls /dev/disk/by-id/ | grep 3600a0b80005adb0b0000ab2d4cae9254
    scsi-3600a0b80005adb0b0000ab2d4cae9254

    # ls /dev/disk/by-path/ -l | grep sdb
    lrwxrwxrwx. 1 root root 9 Sep 16 05:58 pci-0000:04:00.1-fc-0x203500a0b85ad1d7-lun-0 -> ../../sdb

    As an option to using “virsh nodedev-list”, it is possible to manually
    iterate through the “/sys/bus/scsi/device” and “/dev/disk/by-path”
    directory trees in order to find a LUN using the following steps:

    1. Iterate over all the directories beginning with the SCSI host number
    of the vHBA under the “/sys/bus/scsi/devices” tree. For example, if the
    SCSI host number is 6, the command would be:

    # ls /sys/bus/scsi/devices/6:* -d
    /sys/bus/scsi/devices/6:0:0:0 /sys/bus/scsi/devices/6:0:1:0
    /sys/bus/scsi/devices/6:0:2:0 /sys/bus/scsi/devices/6:0:3:0

    2. List the “block” names of all the entries belongs to the SCSI host
    as follows:

    # ls /sys/bus/scsi/devices/6:*/block/
    /sys/bus/scsi/devices/6:0:2:0/block/:
    sdc
    /sys/bus/scsi/devices/6:0:3:0/block/:
    sdd

    This indicates that “scsi_host6” has two LUNs, one is attached to
    “6:0:2:0”, with the short device name “sdc”, and the other is attached
    to “6:0:3:0”, with the short device name “sdd”.

    3. Determine the stable path to the LUN.

    Unfortunately a device name such as “sdc” is not stable enough for use
    by libvirt. In order to get the stable path, use the “ls -l /dev/disk/by-path”
    and look for the “sdc” path:

    # ls -l /dev/disk/by-path/ | grep sdc
    lrwxrwxrwx. 1 root root 9 Sep 10 22:28 pci-0000:08:00.1-fc-0x205800a4085a3127-lun-0 -> ../../sdc

    Thus “/dev/disk/by-path/pci-0000:08:00.1-fc-0x205800a4085a3127-lun-0”
    is the stable path of the LUN attached to address “6:0:2:0” and will be
    used in virtual machine configurations.

    5) Virtual machine configuration change to use vHBA LUN

    Adding the vHBA LUN to the virtual machine configuration is done via
    an XML modification to the virtual machine.

    5.1) Using a LUN from a vHBA created by the storage pool

    Adding the vHBA LUN to the virtual machine is handled via XML to create
    a disk volume on the virtual machine with the following example XML:



    In particular note the usage of the “” directive with the “pool”
    and “volume” attributes listing the storage pool and the short volume
    name.

    5.2) Using a LUN from a vHBA created using the node device driver

    Configuring a vHBA on the virtual machine can be done with its
    stable path (path of {by-id|by-path|by-uuid|by-label}). The following is an
    XML example of a direct LUN path:



    NOTE: The use of “device=’disk'” and the long “” device name.
    The example uses the “by-path” option. The backslashes prior to the
    colons are required, since colons can be considered as delimiters.

    5.3) To configure the LUN as a pass-through device, use the following XML
    examples.

    For a vHBA created using the node device driver:



    NOTE: The use of “device=’lun'” and again the long “” device
    name. Again, the backslashes prior to the colons are required.

    For a vHBA created by a storage pool:



    Although it is possible to use the LUN’s path as the disk source for a
    vHBA created by the storage pool, it is recommended to use libvirt storage
    pool and storage volume instead.

    6) Destroying a vHBA

    A vHBA created by the storage pool can be destroyed by the virsh command
    “pool-destroy”, for example:

    # virsh pool-destroy poolvhba0

    NOTE: If the storage pool is persistent, the vHBA will also be removed
    by libvirt when it destroys the storage pool.

    A vHBA created using the node device driver can be destroyed by the
    command “virsh nodedev-destroy”, for example (assuming that scsi_host6
    was created as shown earlier):

    # virsh nodedev-destroy scsi_host6

    Destroying a vHBA removes it just as a reboot would do since the node
    device driver does not support persistent configurations.

    7) Troubleshooting

    7.1) Discovery of HBA capable of NPIV prior to 1.0.4

    Prior to libvirt 1.0.4, discovery of HBAs capable of NPIV
    requires checking each of the HBAs on the host for the capability flag
    “vport_ops”, as follows:

    First you need to find out all the HBA by capability flag “scsi_host”:

    # virsh nodedev-list –cap scsi_host
    scsi_host0
    scsi_host1
    scsi_host2
    scsi_host3
    scsi_host4
    scsi_host5

    Now check each HBA to find one with the “vport_ops” capability, either
    one at a time as follows:

    # virsh nodedev-dumpxml scsi_host3

    scsi_host3 pci_0000_00_08_0
    3

    That says “scsi_host3” doesn’t support vHBA

    # virsh nodedev-dumpxml scsi_host5

    scsi_host5 pci_0000_04_00_1
    5

    2001001b32a9da4e
    2101001b32a9da4e
    2001000dec9877c1



    But “scsi_host5” supports it.

    NOTE: In addition to libvirt 1.0.4 automating the lookup of HBA’s capable
    of supporting a vHBA configuration, the XML tags “max_vports” and “vports”
    will describe the maximum vports allowed and the current vports in use.

    As an alternative and smarter way, you can avoid above cumbersome steps
    by simple script like:

    for i in $(virsh nodedev-list –cap scsi_host); do
    if virsh nodedev-dumpxml $i | grep vport_ops > /dev/null; then
    echo $i;
    fi
    done

    NOTE: It is possible that node device is named “pci_10df_fe00_scsi_host_0”.
    This is because libvirt supports two backends for the node device driver
    (“udev” and “HAL”), but they lead to completely different naming styles.
    The udev backend is preferred over the HAL backend since HAL support
    is in maintenance mode. The udev backend is more common; however, if
    your destribution packager built the libvirt binaries without the
    udev backend, then the more complicated names such as
    “pci_10df_fe00_scsi_host_0” must be used.

    7.2) Creation of a vHBA using the node device driver prior to 0.9.10

    For libvirt prior to 0.9.10, you will need to specify the “wwnn” and “wwpn”
    manually when creating a vHBA, example XML as follows:


    scsi_host6 scsi_host5

    2001001b32a9da5e
    2101001b32a9da5e


    7.3) Creation of storage pool based on vHBA prior to 1.0.5

    Prior to libvirt 1.0.5, one can define a “scsi” type pool based on a
    vHBA by it’s SCSI host name (e.g. “host5” in XML below), using an example
    XML as follows:

    poolhba0
    e9392370-2917-565e-692b-d057f46512d6
    0
    0
    0

    /dev/disk/by-path 0700
    0
    0

    There are two disadvantage of using the SCSI host name as the source
    adapter. First the SCSI host number is not stable, thus it may cause trouble
    for your storage pool after a system reboot. Second, the adapter name
    (e.g. “host5”) is not consistent with node device name (e.g. “scsi_host5”).

    Moreover, using the SCSI host name as the source adapter doesn’t
    allow you to create a vHBA.

    NOTE: Since 1.0.5, the source adapter name was changed to be consistent
    with node device name, thus the second disadvantage is destroyed.

    [1]http://www.linuxtopia.org/online_books/rhel6/rhel_6_virtualization/

    As always I welcome your feedback!!!!!