.. _quickstart:

===========
Quick Start
===========

To get started with Arctic Inference optimization in vLLM, follow the steps below:

1. Install the Arctic Inference package:

   .. code-block:: bash

      pip install arctic-inference[vllm]

2. Select the Arctic Inference optimization(s) you want to use. You can
   choose one (or mix and match) the following optimizations:

   - Optimized Generative AI:

     - :ref:`shift`
     - :ref:`ulysses`
     - :ref:`spec-decode`
     - :ref:`swiftkv`

   - Optimized Embeddings:

     - :ref:`embeddings`

3. Add any necessary command-line arguments to your vLLM command. For example, to use
   Shift Parallelism, you would run:

   .. code-block:: bash

      python -m vllm.entrypoints.openai.api_server \
          ${vLLM_kwargs} \
          --ulysses-sequence-parallel-size 8 \
          --enable-shift-parallel