Re: Request for feedback on common data backstore in the kernel



Hans Petter Selasky wrote this message on Wed, Sep 26, 2007 at 01:31 +0200:
Please keep me CC'ed, hence I'm not on all these lists.

In the kernel we currently have two different data backstores:

struct mbuf

and

struct buf

These two backstores serve two different device types. "mbufs" are for network
devices and "buf" is for disk devices.

I don't see how this relates to the rest of your email, but even though
they are used similarly, their normal size is quite different... mbufs
normally contain 64-256 byte packets, w/ large file transfers attaching
a 2k cluster (which comes from a different pool than the core mbuf) to
the mbuf... buf is usually something like 16k-64k...

Problem:

The current backstores are loaded into DMA by using the BUS-DMA framework.
This appears not to be too fast according to Kip Macy. See:

http://perforce.freebsd.org/chv.cgi?CH=126455

This only works on x86/amd64 because of the direct mapped memory that
they support.. This would complete break arches like sparc64 that
require an iommu to translate the addresses... and also doesn't address
keeping the buffers in sync on arches like arm... sparc64 may have many
gigs of memory, but only a 2GB window for mapping main memory...

It sounds like the x86/amd64 bus_dma implementation needs to be improved
to run more quickly... As w/ all things, you can hardcode stuff, but then
you loose portability...

Some ideas I have:

When a buffer is out out of range for a hardware device and a data-copy is
needed I want to simply copy that data in smaller parts to/from a
pre-allocated bounce buffer. I want to avoid allocating this buffer
when "bus_dmamap_load()" is called.

For pre-allocated USB DMA memory I currently have:

struct usbd_page

struct usbd_page {
void *buffer; // virtual address
bus_size_t physaddr; // as seen by one of my devices
bus_dma_tag_t tag;
bus_dmamap_t map;
uint32_t length;
};

Mostly only "length == PAGE_SIZE" is allowed. When USB allocates DMA memory it
allocates the same size all the way and that is PAGE_SIZE bytes.

I could see attaching preallocated memory to a tag, and having maps
that attempt to use this memory, but that's something else...

If two different PCI controllers want to communicate directly passing DMA
buffers, technically one would need to translate the physical address for
device 1 to the physical address as seen by device 2. If this translation
table is sorted, the search will be rather quick. Another approach is to
limit the number of translations:

#define N_MAX_PCI_TRANSLATE 4

struct usbd_page {
void *buffer; // virtual address
bus_size_t physaddr[N_MAX_PCI_TRANSLATE];
bus_dma_tag_t tag;
bus_dmamap_t map;
uint32_t length;
};

Then PCI device 1 on bus X can use physaddr[0] and PCI device 2 on bus Y can
use physaddr[1]. If the physaddr[] is equal to some magic then the DMA buffer
is not reachable and must be bounced.

Then when two PCI devices talk together all they need to pass is a structure
like this:

struct usbd_page_cache {
struct usbd_page *page_start;
uint32_t page_offset_buf;
uint32_t page_offset_end;
};

And the required DMA address is looked up in some nanos.

Has someone been thinking about this topic before ?

There is no infastructure to support passing dma address between hardware
devices, and is complete unrelated to the issues raised above... This
requires the ability to pass in a map to a tag and create a new map...
It is possible, as on the sun4v where you have two iommu's.. You'd have
to program on iommu to point to the other one, to support that... But
it is rare to see devices to dma directly to each other... You usually
end up dma'ing to main memory, and then having the other device dma it
out of memory.. The only time you need to dma between devices is if one
has local memory, and the other device is able to sanely populate it...
This is very rare...

Also, the PCI bus length can get quite long.. With PCIe, each device is
now it's own PCI bus, so you're starting to see PCI bus counts in the
10's and 20's, if not higher.. having an area of all of those, and
calculating them and filling them out sounds like a huge expense...

I'm a bit puzzeled as to what you wanted to solve, as the problem you
stated doesn't relate to the solutions you were thinking about... Maybe
I'm missing something? Can you give me an example of where cxgb is
writing to the memory on another pci bus, and not main memory?

P.S. I redirected to -arch as this seems more related than the other
lists...

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Request for feedback on common data backstore in the kernel
    ... struct mbuf ... a 2k cluster (which comes from a different pool than the core mbuf) to ... The current backstores are loaded into DMA by using the BUS-DMA ... gigs of memory, but only a 2GB window for mapping main memory... ...
    (freebsd-arch)
  • Re: Request for feedback on common data backstore in the kernel
    ... struct mbuf ... a 2k cluster (which comes from a different pool than the core mbuf) to ... The current backstores are loaded into DMA by using the BUS-DMA framework. ... gigs of memory, but only a 2GB window for mapping main memory... ...
    (freebsd-net)
  • Re: DMA API issues
    ... ARM platforms now have three macros to ... > define if they want to override the default struct page to DMA address ... >> DMA API calls would be a good starting point. ... DMA pools in turn take their memory from ...
    (Linux-Kernel)
  • Re: DMA API issues
    ... define if they want to override the default struct page to DMA address ... > DMA API calls would be a good starting point. ... the issue seems to surround DMA coherent memory ... We have the latter solved with Deepak's DMA bounce code ...
    (Linux-Kernel)
  • Re: swsusp problems [was Re: Your opinion on the merge?]
    ... struct range * first; ... int allocs; ... we allocate a special memory pool. ... the atomic copy of pageset1. ...
    (Linux-Kernel)