Begin with QEMU educational PCI device
Author: Percy
Updated at 2020/09/07
0 Introduction of PCI structure
This part mainly quoted from wiki Link
PCI represents Peripheral Component Interconnect.
0.1 Main Concepts
- PCI device: A device that conforms to the PCI bus standard is called a PCI device, and the PCI bus architecture can contain multiple PCI devices.
- PCI bus: There can be multiple in the system, similar to a tree structure for expansion. Each PCI bus can connect multiple PCI devices. The upper and lower PCI bus interconnections are realized through the bridge.
- PCI bridge: Connect the link between the PCI bus.
------------------------------------------------------------------ Host bus
|
| HOST/PCI bridge
| | PCI device1 | PCI device2
------------------------------------------------------------------ PCI BUS1
|
| PCI/PCI bridge
| | PCI device3
------------------------------------------------------------------ PCI BUS2
-
Other:
-
PCI is a parallel bus. In one clock cycle, 32 bits (later expanded to 64) are simultaneously transmitted. Address and data are respectively transmitted once in a clock cycle according to the protocol.
-
The PCI address space is isolated from the processor address space. The processor bus and PCI bus work at their respective clock frequencies and do not interfere with each other. (with the buffers in host bridge)
The “Host Bridge” is what connects the tree of PCI busses (which are internally connected with PCI-to-PCI Bridges) to the rest of the system. Usually the processor(s) and memory are on the “other” side of the Host Bridge.
-
0.2 PCI device intro
Every PCI device has a configuration space and several address space.
0.2.1 PCI configuration space
In order to implements hot-plugin, configuration space whose size is 256 bytes totally is neccessary.
Access method
Write IO ports CFCh and CF8h. Only the first 256 bytes of PCI/PCIe devices can be accessed. As mentioned above, the whole configuration space size of PCI device is 256 bytes.
There are two types configuration space, agent and bridge.
Agent configuration space
DW | Byte3 | Byte2 | Byte1 | Byte0 | Addr
---+---------------------------------------------------------+-----
0 | Device ID | Vendor ID | 00
---+---------------------------------------------------------+-----
1 | Status | Command | 04
---+---------------------------------------------------------+-----
2 | Class Code | Revision ID | 08
---+---------------------------------------------------------+-----
3 | BIST | Header Type | Latency Timer | Cache Line | 0C
---+---------------------------------------------------------+-----
4 | Base Address 0 | 10
---+---------------------------------------------------------+-----
5 | Base Address 1 | 14
---+---------------------------------------------------------+-----
6 | Base Address 2 | 18
---+---------------------------------------------------------+-----
7 | Base Address 3 | 1C
---+---------------------------------------------------------+-----
8 | Base Address 4 | 20
---+---------------------------------------------------------+-----
9 | Base Address 5 | 24
---+---------------------------------------------------------+-----
10 | CardBus CIS pointer | 28
---+---------------------------------------------------------+-----
11 | Subsystem Device ID | Subsystem Vendor ID | 2C
---+---------------------------------------------------------+-----
12 | Expansion ROM Base Address | 30
---+---------------------------------------------------------+-----
13 | Reserved(Capability List) | 34
---+---------------------------------------------------------+-----
14 | Reserved | 38
---+---------------------------------------------------------+-----
15 | Max_Lat | Min_Gnt | IRQ Pin | IRQ Line | 3C
-------------------------------------------------------------------
Registers Meanings:
-
Device ID & Vendor ID: The manufacturer of a device and the specific device are marked. For example, the Vendor ID of Intel’s device is usually 0x8086, and the Device ID is determined by the manufacturer.
-
Class code: There are three bytes in total, which are class code, subclass code, and programming interface. The class code is not only used to distinguish the device type, but also the specification of the programming interface
-
IRQ Line: The PC used to manage 16 hardware interrupts by two 8259 chips. Now in order to support symmetric multi-processors, there is APIC (Advanced Programmable Interrupt Controller), which supports the management of 24 interrupts.
-
IRQ Pin: PCI has 4 interrupt pins. This register indicates which pin the device is connected to.
Use lspci
command to see the info of pci devices.
parallels@parallels-Parallels-Virtual-Platform:~$ lspci -mk
00:00.0 "Host bridge" "Intel Corporation" "82P965/G965 Memory Controller Hub" -r02 "Parallels, Inc." "82P965/G965 Memory Controller Hub"
00:01.0 "PCI bridge" "Intel Corporation" "82G35 Express PCI Express Root Port" -r02 -p01 "" ""
parallels@parallels-Parallels-Virtual-Platform:~$ lspci -vv
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
Subsystem: Parallels, Inc. 82P965/G965 Memory Controller Hub
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
00:01.0 PCI bridge: Intel Corporation 82G35 Express PCI Express Root Port (rev 02) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00006000-00007fff
Memory behind bridge: e2000000-edffffff
Prefetchable memory behind bridge: 00000000b0000000-00000000dfffffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
1 Create PCI device
Look into the file /hw/misc/edu.c.
static void pci_edu_register_types(void)
{
static InterfaceInfo interfaces[] = {
{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
{ },
};
static const TypeInfo edu_info = {
.name = TYPE_PCI_EDU_DEVICE,
.parent = TYPE_PCI_DEVICE,
.instance_size = sizeof(EduState),
.instance_init = edu_instance_init,
.class_init = edu_class_init, // pci device init func
.interfaces = interfaces,
};
type_register_static(&edu_info); // register device structure
}
type_init(pci_edu_register_types)
In func edu_class_init
, the content of configuration space is written. What’s more, the member of PCIDeviceClass realize
is set as well.
static void edu_class_init(ObjectClass *class, void *data)
{
DeviceClass *dc = DEVICE_CLASS(class);
PCIDeviceClass *k = PCI_DEVICE_CLASS(class);
k->realize = pci_edu_realize;
k->exit = pci_edu_uninit;
k->vendor_id = PCI_VENDOR_ID_QEMU;
k->device_id = 0x11e8;
k->revision = 0x10;
k->class_id = PCI_CLASS_OTHERS;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
}
Member realize
is a function pointer.
void (*realize)(PCIDevice *dev, Error **errp);
The function pointed to is pci_edu_realize
.
static void pci_edu_realize(PCIDevice *pdev, Error **errp)
{
EduState *edu = EDU(pdev);
uint8_t *pci_conf = pdev->config;
pci_config_set_interrupt_pin(pci_conf, 1);
if (msi_init(pdev, 0, 1, true, false, errp)) {
return;
}
timer_init_ms(&edu->dma_timer, QEMU_CLOCK_VIRTUAL, edu_dma_timer, edu);
qemu_mutex_init(&edu->thr_mutex);
qemu_cond_init(&edu->thr_cond);
qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
edu, QEMU_THREAD_JOINABLE);
memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
"edu-mmio", 1 * MiB); // register MemoryRegion struct, alloc
pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio); // register a bar whose type is mmio
}
2 Data Communication
After the emulation of device info and memory region, the most significant part needs to be resolve. The establishment of communication channel between the PCI driver of linux kernel and the PCI device emulated by QEMU.
2.1 Read and Write
Let’s focus on the function edu_mmio_read
and edu_mmio_write
who have defined the I/O operations.
What is the real ability of this device?
A driver with I/Os, IRQs, DMAs and such.
The devices behaves very similar to the PCI bridge present in the COMBO6 cards developed under the Liberouter wings.
According to the addr which is the offset, the operator can set or get the different properties. With these various behavoirs.
edu_mmio_read
addr | size (Byte) | info |
---|---|---|
0x00 | 4 | const |
0x04 | 4 | EduState / addr4 |
0x08 | 4 | EduState / fact |
0x20 | 4 | EduState / status |
0x24 | 4 | EduState / irq_status |
0x80 | 8 | EduState / dma_state / src |
0x88 | 8 | EduState / dma_state / dst |
0x90 | 8 | EduState / dma_state / cnt |
0x98 | 8 | EduState / dma_state / cmd |
edu_mmio_write
addr | size (Byte) | info |
---|---|---|
0x04 | 4 | EduState / addr4 |
0x08 | 4 | EduState / fact (Mutex) |
0x20 | 4 | EduState / status |
0x60 | 4 | edu_raise_irq |
0x64 | 4 | edu_lower_irq |
0x80 | 8 | EduState / dma_state / src |
0x88 | 8 | EduState / dma_state / dst |
0x90 | 8 | EduState / dma_state / cnt |
0x98 | 8 | EduState / dma_state / cmd |
2.2 DMA
static void edu_dma_timer(void *opaque)
{
EduState *edu = opaque;
bool raise_irq = false;
if (!(edu->dma.cmd & EDU_DMA_RUN)) {
return;
}
if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
uint64_t dst = edu->dma.dst;
edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
dst -= DMA_START;
pci_dma_read(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
edu->dma_buf + dst, edu->dma.cnt);
} else {
uint64_t src = edu->dma.src;
edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
src -= DMA_START;
pci_dma_write(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
edu->dma_buf + src, edu->dma.cnt);
}
edu->dma.cmd &= ~EDU_DMA_RUN;
if (edu->dma.cmd & EDU_DMA_IRQ) {
raise_irq = true;
}
if (raise_irq) {
edu_raise_irq(edu, DMA_IRQ);
}
}
static void dma_rw(EduState *edu, bool write, dma_addr_t *val, dma_addr_t *dma,
bool timer)
{
if (write && (edu->dma.cmd & EDU_DMA_RUN)) {
return;
}
if (write) {
*dma = *val;
} else {
*val = *dma;
}
if (timer) {
timer_mod(&edu->dma_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
}
According to the members of dma_state
, the state of DMA can be decided, is the last bit of cmd
is 1. It means that DMA is running. The corresponding rw operation can be done with function dma_rw
and the bool variable write
.