Announcing Open Source Virtualization India meetup group

We are glad to announce the Opensource Virtualization India meetup group!! Its been long time we are answering/discussing virt related queries via emails or irc, so the effort to bring virtualizers under one group.

This is a forum which discuss about various opensource Llinux virtualization technologies like Xen, KVM, Ovirt..and the friends libvirt, virt-manager..etc

If you would like to be part of it, please join www.meetup.com/open-virtualization-india/

We have scheduled 3 meetups ( Bangalore, Pune, Aurangabad) for near future, yet to decide the exact dates though.


Pune – Organizer: Anil Vettathu

Aurangabad – Organizer: Ashutosh Sudhakar Bhakare

Please let me know if you would like to volunteer for any of the meetups in your area.

** Welcome Virtualizers!! **

virt-manager is not able to detect the ISO file ?

[ ping blog]

Any way its a common complaint that, the virt-manager is not able to scan or show the iso files stored in a system … Even after making sure the permissions and other bits correct , virt-manager is not able to populate it..

First of all I would like to ask where is that ISO stored ?

if its ** NOT ** in below path, can u move the subject ISO file to this location and see it helps ?

/var/lib/libvirt/images

Once you move the ISO file to above location normally issue should go away.. If yes/no please let me know ..

We can poke more… Hope this helps..

How to list qemu device options using command line in KVM environment..

[Back to blogging after busy season of ….]

Any way, its a very common query on qemu/kvm and most of the people dont know .. It is really useful to know which options are available with some of the devices and qemu-kvm binary…

Here is an example on the same: You have to replace your qemu binary path accordingly

First of all if you want to list the devices available for the qemu binary, please list it using :

If you want to list specific options of a particular device, view it by invoking in similar way as below.. Here I used “virtio-net-pci” as an example..

The same way you can list available CPUs and Machine types

-M ?
-cpu ?

For ex:

Understanding Qcow2 image , its header and verifying corruption..

qcow2 is one of the best image formats available for virtual machines running on top of KVM .. Its known as qemu image format version 2. Use it to have smaller images (useful if your filesystem does not supports holes, for example on Windows), optional AES encryption, zlib based compression and support of multiple VM snapshots.

Supported options:

Qcow2 header is composed by below byte allocation..

Byte 0 – 3: magic
QCOW magic string (“QFI\xfb”)

4 – 7: version
Version number (valid values are 2 and 3)

8 – 15: backing_file_offset
Offset into the image file at which the backing file name
is stored (NB: The string is not null terminated). 0 if the
image doesn’t have a backing file.

16 – 19: backing_file_size
Length of the backing file name in bytes. Must not be
longer than 1023 bytes. Undefined if the image doesn’t have
a backing file.

20 – 23: cluster_bits
Number of bits that are used for addressing an offset
within a cluster (1 << cluster_bits is the cluster size). Must not be less than 9 (i.e. 512 byte clusters). Note: qemu as of today has an implementation limit of 2 MB as the maximum cluster size and won't be able to open images with larger cluster sizes. 24 - 31: size Virtual disk size in bytes 32 - 35: crypt_method 0 for no encryption 1 for AES encryption 36 - 39: l1_size Number of entries in the active L1 table 40 - 47: l1_table_offset Offset into the image file at which the active L1 table starts. Must be aligned to a cluster boundary. 48 - 55: refcount_table_offset Offset into the image file at which the refcount table starts. Must be aligned to a cluster boundary. 56 - 59: refcount_table_clusters Number of clusters that the refcount table occupies 60 - 63: nb_snapshots Number of snapshots contained in the image 64 - 71: snapshots_offset Offset into the image file at which the snapshot table starts. Must be aligned to a cluster boundary. Now lets verify a qcow image header by following above byte allocation. I have an image whose first 1 MB look like :

Lets start verifying the image integrity ( well atleast its header) by manually unwinding above data in hexadecimal notation... The first 4 bytes represents its magic string.. so starting from offset 0x00000000 we can see its “51 46 49 fb” which is equivalanto to “QFI” . luckily atleast these bytes are not corrupted..

lets see next 4 bytes.. 00 00 00 02 , Isnt it saying its version is “2” ?? yes..

Now, lets see what is the size of this image : Bytes from 24-31 tells us the size of the image..

3e 80 00 00 00 00 00 00 which is equivalent to
00111110100000000000000000000000 in binary or 1048576000 in decimal.. 1048576000 == 1 GB ? yes, it is..

Lets see whether this image has any snapshots ? 60-63 says it all..
00 00 00 00 -> so, NO SNAPSHOTS!

If you extract above hexadecimal values you would end up in :

Hmmmm.. looks like there is no corruption reported at least in this header.. well, is there any binary available for checking integrity of the image itself ? Yes, ‘qemu-img’ check can be used for the same purpose..

lot many operations can be performed against qcow2 image using this binary..

To get some information about this image we can try:

Also its possible to check the integrity of the image by :

Hope this helps!!

/me @ KVM forum 2013, UK Edinburgh..

Oh yeah.. It was one of the wonderful week in my life!!.

This year I was lucky enough to get green flag to KVM forum and I really enjoyed it. I would say, it was an awesome time which allowed me to experience Scottish beauty together with friends – renowned virtualization developers in opensource community.

I met lot many people with whom I worked in the past, or I would say the crew with whom I am working at present .

Day 1 @ KVM forum:

The sessions were tracked mainly in 3 suites , so couldnt get into all. Obviously I missed 2/3 sessions per day 🙂

I will write more about the sessions I attended in different tracks, how-ever I am place holding these videos till then..

Keynote by Gleb

Obviously Gleb has the privilege to present the keynote on opening day and here it is :

Modern QEMU Device Emulation – Andreas Färber, Suse

Andreas presented the device model, QOM with a demo . . Device emulation has undergone many changes lately and it is a recurring topic how exactly new devices should be written (which example to copy and which not) or how to rebase out-of-tree device models against changing upstream. Andreas took an in-tree device that has not even been qdev’ified yet and turn it into a modern QOM device and showcase how the management infrastructure surrounding QOM allows to inspect and manipulate that device once it is in the proper form. In short its easy to define a Qemu Object Model and play with it.

Virgil3D – Virtio Based 3D Capable GPU- Dave Airlie , Red Hat

This session was an attempt to provide missing capability to provide 3D rendering capabilities to guest OSes inside qemu. The Virgil3D project aims to to provide a virtual GPU device that can be used by guest OSes to provide OpenGL or Direct3D capabilities. The host side of the device will use OpenGL on the host to render the command stream from the guest. The command stream will be based on the Mesa project’s Gallium3D framework, using similar states and shader encoding. Dave gave a demo at end of the session and was happy to see the gaming inside the guest. 🙂

[sound is broken]

VGA Assignment Using VFIO – Alex Williamson, Red Hat

VGA and GPU device assignment is an often requested feature for adding new functionality to to virtual machines, whether it be for gaming or traditional graphics workstation usage. The VFIO userspace driver framework has been extended to include VGA support, enabling QEMU assignment of graphics cards. In this presentation, Alex Williamson gave an overview of the architecture for doing VGA assignment, explain the differences between VGA assignment and traditional PCI device assignment, and provided current status report for VGA/GPU device assignment in QEMU.

Unfortunately I dont see the video for this session.

Platform Device Passthrough Using VFIO – Stuart Yoder, Freescale Semiconductor

This session was all about platform device passthrough.. Stuart started the session by explaining The linux driver model. virtio-pci binding and unbinding process.
VFIO provides a framework for securely allowing user space applications to directly access and use I/O devices, including QEMU which allows pass-
through of devices to virtual machines. QEMU and the upstream Linux kernel currently support VFIO for PCI devices. System-on-a-chip processors frequently
have I/O devices that are not PCI-based and use the platform bus framework in Linux. An increasing number of QEMU/KVM users have the need to pass
through platform devices to virtual machines using VFIO.

This presentation described:
vfio
how VFIO-based passthrough of PCI devices is similar and different for platform devices
issues and challenges in solving platform device pass-through
proposed kernel changes to enable this

sysfs_bind_only
boolean addition to sysfs

[sound is broken]

Gerd on x86 firmware maze:

Explanation bits on SeaBIOS, TianoCore, coreboot, iPXE, UEFI, CSM, ACPI, fw_cfg . Then about which firmwares exist in the qemu world and the interaction among them. seabios initialization sequence.. Later explained how the hardware configuration and initialization works in qemu and which interfaces are used to handle this.

[slides] docs.google.com/file/d/0BzyAwvVlQckecXpCSnBRekN2bDQ/edit

Nested Virtualization presentation by Orit ..

Orit explained about what is nested virtualization.. L0, L1 & L2 levels in nested virtualization and bits around it..

Nested EPT to Make Nested VMX Faster – Gleb Natapov,
Gleb started to explain about ‘shadow paging’ and the reason for being its slow.. With the help of EPT, we can avoid shadow paging table altogether and we relay on Two Level Paging in HW which improves performance a lot..

Memory virtualization overhead has a huge impact on virtual machine performance.To reduce the overhead two level paging was introduced to virtualization extensions by all X86 vendors. On Intel it is called Extended Page Table or EPT. Nested guests running on nested VMX cannot enjoy the benefits of two level paging though and have to rely on much less efficient shadow paging mechanism which, combined with other overheads that nested vitalization incur, makes nested guests unbearably slow. Nested EPT is a mechanism that allows nested guest to benefit from EPT extension and greatly reduce overhead of memory virtualization for nested guests.

[slides ] docs.google.com/file/d/0BzyAwvVlQckedmpobUY1Sm0zNWc/edit

Red Hat Running Windows 8 on top of Android with KVM – Zhi Wang, Intel

Running windows guest on android!!!

Zhi Wang from Intel discussed about how they were able to run Windows 8 guest efficiently (including virtio drivers) on top of an x86 Android tablet with KVM. . With KVM, they enabled H/W-based virtualization simply and efficiently on such small devices, taking advantage of the Linux-based system, including Android. At the same time, they found various challenges especially with qemu mainly because of the differences in 1) the user level infrastructure ( display, input, sound..etc), such as libraries ( Bionic, limbo-android..) , the graphics system, and system calls, 2) scheduling (e.g. foreground apps are suspended). He concluded his sessions with the next steps, such as, sensor support, Connected Standby for Windows 8.

Slides: docs.google.com/file/d/0Bx_UwXmBKWsyWnhRaFgxNDlfZlU/edit
Demo: docs.google.com/file/d/0Bx_UwXmBKWsyVS01WVBSM3FSM3M/edit

[OSv -> Best Cloud Operating System]

Avi and Glauber presented whats Osv , Why ..etc . Personally I had looked into some of the previous presentations and docs about Osv in recent past ( noted here ), so I felt this session as a bit of repetition 🙂 .. How-ever, there was a good takeaway from this session by Glauber..ie “There are 2 types of C++ Programming”.. If you really want to get it , please listen to the video. 🙂

Dinner at a restaurant in Haymarket street with Omer Frenkel , Oved, Arik .etc..

Day 2 @ KVM forum

Started with Antony’s Qemu weather Report which included Major features and fixes, Rleases, GSoC projects and growing community..

Effective Multi-Threading in QEMU- Paolo Bonzini, Red Hat

Paolo explained about Qemu architecutre in past and present, virtio-blk-dataplane archtitecure, unlocked memory dispatch and unlocked MMIO.

Block Layer Status Report – Stefan Hajnoczi, Red Hat & Kevin Wolf, Red Hat

Kevein and Stefan presented the changes came in block layer recently and the features they are working on.. This included performance improvement ( data deduplication, corruption prevention, COW , INternal COW, journalling,) followed with drive specific configuration , data plane. Concluded by mentioned future thoughts like image fleecing, point in time snapshots, incremental backups, image syncing…etc..) Also there is an addition of new command line called ‘qemu-img map’ ..

An Introduction to OpenStack and its use of KVM Virtualization – Daniel Berrange, Red Hat

Daniel explained about openstack and how KVM is tightly coupled with openstack .. He started with openstack components briefly , followed with main point of integration attempts on KVM and openstack. Obviously on Nove computing engine. The talk also outlined the overall OpenStack architecture with a focus on Nova, the capabilities of KVM as used in Nova, how KVM integrates with the OpenStack storage and networking sub-projects, and what developments to expect in future releases of OpenStack.

New Developments and Advanced Features in the Libvirt Management API – Daniel Berrange, Red Hat
Danp on New Developments and advanced features in libvirt. In this session he started with libvirt disk access permission implementation, so on ‘sanlock’ and virtlockd. Granular access control, also covered bits on svirt, selinux ..etc. Later he came to cgroups. and finally he concluded his talk with tuning cpu, memory and block..


Empowering Data Center Virtualization Using KVM – Livnat Peer, Red Hat

In this session Livnat explained about what is ovirt project ( the management of multi-host, multi-tenant virtual data centers, including high availability, VM and storage live migration, storage and network management, system scheduler and more_. Its integration point for several open source virtualization technologies, including kvm, libvirt, spice, oVirt node and numerous OpenStack components such as Neutron and Glance. ) and how it can be used to manage a data center

One Year Later: And There are Still Things to Improve in Migration! – Juan Quintela,

Juan was busy with finding latencies, bottle necks in live migration and trying to improve it :). He explained about the changes that have happened to improve migration in machines with huge amount of memory/vcpus. There has also been changes integrating migration over RDMA.

Amit presented an idea about static checker to avoid live migration compatibility issues.. This is not a implemented solution rather a thought on reducing ‘issues’ wrt to live migration failure caused by compatibility features around qemu ..

Debugging Live Migration – Alexander Graf, SUSE
Alexander came up with an interesting session on debugging live migration dynamically.. I really loved this session especially with the slides he prepared and the content , may be the way of presentation.

Automatic memory ballooning – Luiz Capitulino

When a Linux host is running out of memory, the kernel will take action to reclaim memory. This action may be detrimental to KVM guests performace (eg. swapping). To help avoiding this scenario, a KVM guest could automatically return memory to the host when the host is facing memory pressure. By doing so the guest may also get into memory pressure so we also need a way to allow the guest to automatically get memory back. This is what the automatic ballooning project is about. In this talk Luiz drived into the project’s implementation, challenges and discussed current results.

The day was ended with a party at Cargo Bar where most of my time was spent with Hai Huang, Osier Yang, Vinod Chegu, Eduardo Habkost, Bandan Das, Sean Cohen..

From the hotel, Hai Huang lead us ( Amos Kong, Fam, Mike Cao, Asias He, Osier Yang ) to the hotel.

Day 3 @ KVM forum

The day started with OVA update :

how-ever this day was almost dedicated to ovirt sessions.. Itamar started with below presentation where he discussed about ovirt project status.. The oVirt project just released version 3.3 with many new features and integration with other open source projects like Foreman, Gluster, OpenStack Glance and Neutron. Itamer covered the current state of the project, as well as roadmap and plans going forward.

Doron taught us what is Chicken and Egg problem , ah.. I forgot to say Chicken & Egg in Ovirt Context 🙂 .. This talk was about ‘self hosted engine’ feature of Ovirt..

oVirt for PowerPC – Leonardo Bianconi, Instituto de Pesquisas Eldorado

So, ovirt is planning to extend or include PPC arch.. In this talk, Leonardo discussed about a work which add PowerPC architecture awareness to oVirt, which currently makes various assumptions based on the x86 architecture. Many projects are being involved in this task, like: LIBVIRT, QEMU and KVM.

Rik talked about the methods to reduce context switches overhead :

The Future Integration Points for oVirt Storage and Storage Consumption – Sean Cohen; Ayal Baron

This was almost an interactive session lead by Sean and Ayal. They discussed about whats new in ovirt 3.3 & whats planned for 3.4 and beyond.

slides: www.slideshare.net/SeanCohen/kvm-forum-2013-future-integration-points-for-ovirt-storage

Using oVirt and Spice for VDI – Frantisek Kobzik

This was a nice presentation about spice in ovirt and also about other display protocols support in ovirt..

LinuxCon + KVM forum people landed on National Museum of Scotland this evening and it was a remarkable time !! I dont know to whom and all I talked that day, how-ever I was fully engaged with discussions. Spent some time at Casino here , oh .. it was not only me, almost half of the crowd.. After all, we ( Beijing team + /me) made a walk to the hotel with lot many discussions, even-though I dont remember, Osier or some one may.

N-Port ID Virtualization or NPIV in virtualization environment ( libvirt, kvm space) Or Assign a lun directly to a VM/guest

I was thinking to write about NPIV long time back and I denied myself till now. May be because, it was almost incomplete from libvirt layer. How-ever I cant do that more, especially after receiving lots of mails on this subject, so here it is:

To write about NPIV, I should share some idea about the fabric and its terminology. Let me start with it.

In a high level view of fabric, atleast there are HBA ( host bus adapter) , Fabric Switch , Target /storage array.

Obviously you know these components are interconnected. When these components are looped, you have to define the specification of these components.

If you really dont know anything about Fabric, take a look at resources available there in the web:

Atleast you should start with ‘wikipedia’ 🙂
en.wikipedia.org/wiki/Fibre_Channel

In a fabric, ‘ports’ are one of the important component. The following types of ports are defined by Fibre Channel: These are called node ports.

N_port – is a port on the node (e.g. host or storage device) . Also known as node port.

NL_port – is a port on the node , also known as Node Loop port.

F_port – is a port on the switch that connects to a node point-to-point (i.e. connects to an N_port). Also known as fabric port. An F_port is not loop capable.

FL_port – is a port on the switch that connects to a FC-AL loop (i.e. to NL_ports). Also known as fabric loop port.
E_port – is the connection between two fibre channel switches. Also known as an Expansion port. When E_ports between two switches form a link, that link is referred to as an inter-switch link (ISL).

…etc

Refer en.wikipedia.org/wiki/Fibre_Channel#Ports for more information on other ports .

Once you know about fabric node ports give your attention on WWN, WWPN, WWNN :

WWN = World Wide Name –> This is the generic name,so lets discard it and worry about below:

WWPN = World Wide Port Name
WWNN = World Wide Node Name.

Obviously, if there are mulitple components in a fabric, there should be something which have to be unique. Its called ‘WWPN’. Even-though the ‘node name’ is same , port names will be different. WWPNs are equivalent of the MAC address in the Ethernet protocol or in other terms those are constants which can not be regenerated!!

I believe now you have some idea about fabric and its specs.. All right, lets think about NPIV 🙂

First of all DONT CONFUSE NPIV ( N port ID virtualization ) with ‘N port Virtualization’, its different.

Read it again 🙂

In this article we only care about, N port ID virtualization. NPIV allows a single host bus adaptor (HBA) or target port on a storage array to register ‘multiple World Wide Port Names (WWPNs)’ and ‘N_Port identification numbers’.

Even-though its not only ‘virtualized’ machines (Vms) can take advantage of NPIV, it take lots of it. That said, in a virtualized environment, its a very common use case that ‘assign a lun’ to a specific VM . It should not be shared among others.

I am sure you would have thought in the same way. It is what getting achived with NPIV.

When a fibre channel device is connected to a fabric SAN , there are different processes happening :

In short:

FLOGI (fabric login) -> This happens on N_port/NL_port to F_port – Flogi frame has been sent to 0xFFFFFE.
PLOGI (port login) -> This happens on N_port to N_port
PRLI (process login) -> This starts from Processes on N_port .

During FLOGI process, a N_Port_ID is assigned by Fibre Channel Switch. A, ‘N_Port’ has one WWPN and one N_Port_ID. Normally it matches.

With NPIV, you have mulitple WWPNs/N_Port_IDs on a single physical N_port.

So, how a ‘N_port’ register multiple WWPNs/ N_port IDs ?. Its achieved by “FDISC’ to address 0xFFFFFE to obtain an additional address.

Refer en.wikipedia.org/wiki/NPIV for more information on this.

Once you have multiple WWPNs, you are capable of doing zoning or lun assignment to that unique identifications or wwpns. Who use that WWPN is not a concern for the target/Storage.

Does that also mean, I can assign this WWPN to a VM and next WWPN for another VM ? so that, I can assign different luns to different VMs without sharing to others .. YES, you got it .

Obviously your fibre channel switch should have support for NPIV.

In short, NPIV (N_Port ID Virtualization) is a Fibre Channel technology to
share a single physical Fibre Channel HBA with multiple virtual ports.
Henceforth known as a “virtual port” or “virtual Host Bus Adapter”
(vHBA), each virtual port is identified by its own WWPN (Word Wide
Port Name) and WWNN (Word Wide Node Name). In the virtualization
world the vHBA controls the LUNs for virtual machines.

The libvirt implementation provides flexibility to configure the LUN’s
either directly to the virtual machine or as part of a storage pool
which then can be configured for use on a virtual machine.

NPIV support in libvirt was first added to libvirt 0.6.5; however, the
following sections will primarily describe NPIV functionality as of the
current libvirt release, 1.1.2. There will be a troubleshooting and prior
version considerations section to describe some historical differences.

I believe I described about NPIV pretty much neatly. If not , please convey it via comment 🙂

Any way, let me proceed and get into my intended topic..:)

How to do all these in a virtualization environment? I take ‘livirt/qemu/kvm’ combo here:

Lets start.. below part conclude about NPIV and what libvirt
supports till now, also mention the TODO list.

Big Thanks to Osier for concluding libvirt part!

Follow below steps in order…

1) Discovery

Discovery of HBA(s) capable of NPIV is provided through the virsh
command ‘virsh nodedev-list –cap vports’. If no HBA is returned,
then the host configuration should be checked. The XML output from the
command “virsh nodedev-dumpxml” will list fields , , and
to be used in order to create a vHBA. Take care to also note
the value as this lets you know if the HBA is going to
exceed the maximum vHBA supported.

The following output indicates a host that has two HBAs to support
vHBA and the layout of a HBA’s XML:

The “max_vports” value indicates there are a possible of 164 vports
available for use in the HBA configuration. The “vports” value indicates
the number of vports currently being used.

Support for detection of HBA’s capable of NPIV support prior to libvirt
1.0.4 is described in the “Troubleshooting” section.

2) Creation of a vHBA using the node device driver

In order to create a vHBA using the node device driver, select an HBA with
available “vport” space, use the HBA “” field as the “
field in the following XML:

Then create the vHBA with the command “virsh nodedev-create” (assuming
above XML file is named “vhba.xml”):

NOTE: If you specify “name” for the vHBA, then it will be ignored.
The kernel will automatically pick the next SCSI host name in sequence not
already used. The “wwpn” and “wwnn” values will be automatically generated
by libvirt.

In order to see the generated vHBA XML, use the command “virsh
nodedev-dumpxml” as follows:

This vHBA will only be defined as long the host is not rebooted. In
order to create a persistent vHBA, one must use a libvirt storage pool
(see next section).

3) Creation of vHBA by the storage pool

By design, vHBAs managed by the node device driver are transient across
host reboots. It is recommended to define a libvirt storage pool based
on the vHBA in order to preserve the vHBA configuration. Using a storage
pool has two primary advantage, first the libvirt code will find the
LUN’s path via simple virsh command output and second migration of
virtual machine’s requires only defining and starting a storage pool
with the same vHBA name on the target machine if you use the LUN with
libvirt storage pool and volume name in virtual machine config (see
section 5).

In order to create a persistent vHBA configuration create
a libvirt ‘scsi’ storage pool using the XML as follows:

You must use the “type=’scsi'” for the pool; The source adapter
type must be “fc_host”. Attributes “wwnn” and “wwpn” are provided as
the unique identifier for the vHBA to be created.

There is an optional attribute “parent” for source the adapter. It
indicates the name of the HBA which you want to use to create the
vHBA. Its value should be consistent with what node device driver
dumps (e.g. scsi_host5). If it’s not specified, libvirt will pick
the first HBA capable of NPIV that has not exceeded the maximum
vports it supports.

NOTE: You can also create a scsi pool with source adapter type “fc_host”
for a HBA, and in that case the attribute “parent” is not necessary.

If you prefer to choose which parent HBA to use for your vHBA, then
you must provide the parent, wwnn, and wwpn in the source adapter XML as
follows:

To define the persistent pool (assuming above XML is named as
poolvhba0.xml):

# virsh pool-define poolvhba0.xml

NOTE: One must use pool-define to define the pool as persistent,
since a pool created by pool-create is transient and it will disappear
after a system reboot or a libvirtd restart.

To start the pool:

# virsh pool-start poolvhba0

To destroy the pool:

# virsh pool-destroy poolvhba0

When starting the pool, libvirt will check if the vHBA with same
“wwpn:wwpn” already exists. If it does not exist, a new vHBA with the
provided “wwpn:wwnn” will be created. Correspondingly,when destroying
the pool the vHBA is destroyed too.

Finally, in order to ensure that subsequent reboots of your host will
automatically define vHBA’s for use in virtual machines, one must set the
storage pool autostart feature as follows (assuming the name of the created
pool was “poolvhba0”):

# virsh pool-autostart poolvhba0

4) Finding LUNs on your vHBA

4.1) Utilizing LUN’s from a vHBA created by the storage pool

Assuming that a storage pool was created for a vHBA, use the command
“virsh vol-list” command in order to generate a list of available LUN’s
on the vHBA, as follows:

The list of LUN names displayed will be available for use as disk volumes
in virtual machine configurations.

4.2) Utilizing LUN’s from a vHBA created using the node device driver

Finding an available LUN from a vHBA created using the node device driver
can be achieved either via use of the “virsh nodedev-list” command or
through manual searching of the hosts system file system.

Use the “virsh nodedev-list –tree | more” and find the parent HBA
to which the vHBA was configured. The following example lists the
pertinent part of the tree for the example HBA “scsi_host5”:

The “block_” indicates it’s a block device, the “sdb_” is a
convention to signify the the short device path of “/dev/sdb”, and the
short device path or the number can be used to search the
“/dev/disk/by-{id,path,uuid,label}/” name space for the specific LUN
by name, for example:

As an option to using “virsh nodedev-list”, it is possible to manually
iterate through the “/sys/bus/scsi/device” and “/dev/disk/by-path”
directory trees in order to find a LUN using the following steps:

1. Iterate over all the directories beginning with the SCSI host number
of the vHBA under the “/sys/bus/scsi/devices” tree. For example, if the
SCSI host number is 6, the command would be:

2. List the “block” names of all the entries belongs to the SCSI host
as follows:

This indicates that “scsi_host6” has two LUNs, one is attached to
“6:0:2:0”, with the short device name “sdc”, and the other is attached
to “6:0:3:0”, with the short device name “sdd”.

3. Determine the stable path to the LUN.

Unfortunately a device name such as “sdc” is not stable enough for use
by libvirt. In order to get the stable path, use the “ls -l /dev/disk/by-path”
and look for the “sdc” path:

# ls -l /dev/disk/by-path/ | grep sdc
lrwxrwxrwx. 1 root root 9 Sep 10 22:28 pci-0000:08:00.1-fc-0x205800a4085a3127-lun-0 -> ../../sdc

Thus “/dev/disk/by-path/pci-0000:08:00.1-fc-0x205800a4085a3127-lun-0”
is the stable path of the LUN attached to address “6:0:2:0” and will be
used in virtual machine configurations.

5) Virtual machine configuration change to use vHBA LUN

Adding the vHBA LUN to the virtual machine configuration is done via
an XML modification to the virtual machine.

5.1) Using a LUN from a vHBA created by the storage pool

Adding the vHBA LUN to the virtual machine is handled via XML to create
a disk volume on the virtual machine with the following example XML:

In particular note the usage of the “” directive with the “pool”
and “volume” attributes listing the storage pool and the short volume
name.

5.2) Using a LUN from a vHBA created using the node device driver

Configuring a vHBA on the virtual machine can be done with its
stable path (path of {by-id|by-path|by-uuid|by-label}). The following is an
XML example of a direct LUN path:



NOTE: The use of “device=’disk'” and the long “” device name.
The example uses the “by-path” option. The backslashes prior to the
colons are required, since colons can be considered as delimiters.

5.3) To configure the LUN as a pass-through device, use the following XML
examples.

For a vHBA created using the node device driver:

NOTE: The use of “device=’lun'” and again the long “” device
name. Again, the backslashes prior to the colons are required.

For a vHBA created by a storage pool:

Although it is possible to use the LUN’s path as the disk source for a
vHBA created by the storage pool, it is recommended to use libvirt storage
pool and storage volume instead.

6) Destroying a vHBA

A vHBA created by the storage pool can be destroyed by the virsh command
“pool-destroy”, for example:

# virsh pool-destroy poolvhba0

NOTE: If the storage pool is persistent, the vHBA will also be removed
by libvirt when it destroys the storage pool.

A vHBA created using the node device driver can be destroyed by the
command “virsh nodedev-destroy”, for example (assuming that scsi_host6
was created as shown earlier):

# virsh nodedev-destroy scsi_host6

Destroying a vHBA removes it just as a reboot would do since the node
device driver does not support persistent configurations.

7) Troubleshooting

7.1) Discovery of HBA capable of NPIV prior to 1.0.4

Prior to libvirt 1.0.4, discovery of HBAs capable of NPIV
requires checking each of the HBAs on the host for the capability flag
“vport_ops”, as follows:

First you need to find out all the HBA by capability flag “scsi_host”:

Now check each HBA to find one with the “vport_ops” capability, either
one at a time as follows:

That says “scsi_host3” doesn’t support vHBA

But “scsi_host5” supports it.

NOTE: In addition to libvirt 1.0.4 automating the lookup of HBA’s capable
of supporting a vHBA configuration, the XML tags “max_vports” and “vports”
will describe the maximum vports allowed and the current vports in use.

As an alternative and smarter way, you can avoid above cumbersome steps
by simple script like:

NOTE: It is possible that node device is named “pci_10df_fe00_scsi_host_0”.
This is because libvirt supports two backends for the node device driver
(“udev” and “HAL”), but they lead to completely different naming styles.
The udev backend is preferred over the HAL backend since HAL support
is in maintenance mode. The udev backend is more common; however, if
your destribution packager built the libvirt binaries without the
udev backend, then the more complicated names such as
“pci_10df_fe00_scsi_host_0” must be used.

7.2) Creation of a vHBA using the node device driver prior to 0.9.10

For libvirt prior to 0.9.10, you will need to specify the “wwnn” and “wwpn”
manually when creating a vHBA, example XML as follows:

7.3) Creation of storage pool based on vHBA prior to 1.0.5

Prior to libvirt 1.0.5, one can define a “scsi” type pool based on a
vHBA by it’s SCSI host name (e.g. “host5” in XML below), using an example
XML as follows:

There are two disadvantage of using the SCSI host name as the source
adapter. First the SCSI host number is not stable, thus it may cause trouble
for your storage pool after a system reboot. Second, the adapter name
(e.g. “host5”) is not consistent with node device name (e.g. “scsi_host5”).

Moreover, using the SCSI host name as the source adapter doesn’t
allow you to create a vHBA.

NOTE: Since 1.0.5, the source adapter name was changed to be consistent
with node device name, thus the second disadvantage is destroyed.

[1]www.linuxtopia.org/online_books/rhel6/rhel_6_virtualization/

As always I welcome your feedback!!!!!