Quick Start

To get started with Arctic Inference optimization in vLLM, follow the steps below:

Install the Arctic Inference package:
```
pip install arctic-inference[vllm]
```
Select the Arctic Inference optimization(s) you want to use. You can choose one (or mix and match) the following optimizations:
- Optimized Generative AI:
- Optimized Embeddings:
  - Optimized Embeddings

Add any necessary command-line arguments to your vLLM command. For example, to use Shift Parallelism, you would run:

python -m vllm.entrypoints.openai.api_server \
    ${vLLM_kwargs} \
    --ulysses-sequence-parallel-size 8 \
    --enable-shift-parallel