Quick Start
To get started with Arctic Inference optimization in vLLM, follow the steps below:
Install the Arctic Inference package:
pip install arctic-inference[vllm]
Select the Arctic Inference optimization(s) you want to use. You can choose one (or mix and match) the following optimizations:
Optimized Generative AI:
Optimized Embeddings:
Add any necessary command-line arguments to your vLLM command. For example, to use Shift Parallelism, you would run:
python -m vllm.entrypoints.openai.api_server \ ${vLLM_kwargs} \ --ulysses-sequence-parallel-size 8 \ --enable-shift-parallel