vllm.attention.selector ¶
_cached_get_attn_backend cached ¶
_cached_get_attn_backend(
backend,
head_size: int,
dtype: dtype,
kv_cache_dtype: CacheDType | None,
block_size: int | None,
use_mla: bool = False,
has_sink: bool = False,
use_sparse: bool = False,
attn_type: str | None = None,
) -> type[AttentionBackend]
Source code in vllm/attention/selector.py
_cached_get_mamba_attn_backend cached ¶
_cached_get_mamba_attn_backend(
mamba_type: str,
) -> type[AttentionBackend]
Source code in vllm/attention/selector.py
get_attn_backend ¶
get_attn_backend(
head_size: int,
dtype: dtype,
kv_cache_dtype: str | None,
block_size: int | None,
use_mla: bool = False,
has_sink: bool = False,
use_sparse: bool = False,
attn_type: str | None = None,
) -> type[AttentionBackend]
Selects which attention backend to use and lazily imports it.
Source code in vllm/attention/selector.py
get_mamba_attn_backend ¶
get_mamba_attn_backend(
mamba_type: str,
) -> type[AttentionBackend]
Select which mamba attention backend to use and lazily import it.