SmashConfig User Manual
=========================

SmashConfig is an essential tool in Pruna for configuring parameters to optimize your models. This manual explains how to define and use SmashConfig.

Defining a SmashConfig
------------------------

Define a SmashConfig using the following code:

.. code-block:: python

    from pruna.algorithms.SmashConfig import SmashConfig
    smash_config = SmashConfig()

After creating a SmashConfig, you can set the parameters for optimization:

.. code-block:: python

    smash_config['compilers'] = ['diffusers2']

Passing a SmashConfig to the Smash Function
---------------------------------------------

Pass a SmashConfig to the smash function as follows:

.. code-block:: python

    from pruna.smash import smash

    mashed_model = smash(
        model=pipe,
        api_key='<your-api-key>',  # Replace <your-api-key> with your actual API key
        smash_config=smash_config,
    )

SmashConfig Parameters
------------------------

Optimization Methods
^^^^^^^^^^^^^^^^^^^^

There are four types of optimization methods:

1. Compilation - Use ``smash_config['compilers']``
2. Quantization - Use ``smash_config['quantizers']``
3. Pruning - Use ``smash_config['pruners']``
4. Factorization - Use ``smash_config['factorizers']``

Compilation Methods
^^^^^^^^^^^^^^^^^^^

Compilation methods optimize the model for specific hardware. Supported methods include:

- **all**: 
    - Optimize computer vision models for any hardware.
    - Required Argument:
      - `device`: 'cpu' or 'cuda'. e.g. ``smash_config['device'] = 'cuda'``
    - Time: Approximately 15-20 minutes.
    - Quality: Similar to the original model.

- **diffusers**:
    - Optimize `diffusers` models for NVIDIA GPUs.
    - Required Argument:
      - None.
    - Time: Approximately 15-20 minutes.
    - Quality: Same as the original model.

- **diffusers2**:
    - Optimize `diffusers2` models for NVIDIA GPUs.
    - Optional Argument:
      - `save_dir`: Working directory during compilation (temporary directory created if not specified). e.g. ``smash_config['save_dir'] = '/tmp/optimized_model.pkl'``
    - Time: About 10 seconds.
    - Quality: Same as the original model.

- **c_translation**:
    - Transform Huggingface `transformers` translation models to C++ code.
    - Required Argument:
      - `tokenizer`: Associated tokenizer. e.g. ``smash_config['tokenizer'] = AutoTokenizer.from_pretrained('facebook/opt-125m')``
    - Optional Argument:
      - `n_quantization_bits`: 8 or 16 bits (default 16). e.g. ``smash_config['n_quantization_bits'] = 8``
    - Time: A few minutes.
    - Quality: Same as the original model.

- **c_generation**:
    - Compiles generation models from Huggingface's `transformers` library into C++ code.
    - Required Argument:
      - `tokenizer`: The tokenizer associated with your generation model.
    - Optional Argument:
      - `n_quantization_bits`: Specify 8 or 16 bits (16 by default).
    - Time: A few minutes.
    - Quality: Equivalent to the original model.

- **c_whisper**:
    - Converts whisper models from Huggingface's `transformers` library to C++ code.
    - Required Argument:
      - `processor`: The processor for your whisper model.
    - Optional Argument:
      - `n_quantization_bits`: Choose between 8 or 16 bits (16 if unspecified). e.g. ``smash_config['n_quantization_bits'] = 8``
    - Time: A few minutes.
    - Quality: Same as the original model.

- **ifw**:
    - Optimizes whisper models from Huggingface's `transformers` library using advanced batching and chunking techniques.
    - Required Arguments:
      - `processor`: Processor for your whisper model. e.g. ``smash_config['processor'] = AutoProcessor.from_pretrained('"openai/whisper-large-v3"')``
      - `device`: Target hardware ('cpu' or 'cuda'). e.g. ``smash_config['device'] = 'cuda'``
    - Time: Seconds.
    - Quality: Comparable to the original model.

- **s2t**:
    - Enhances c_whisper or whisper models from Huggingface's `transformers` library with reduced hallucination issues and advanced techniques.
    - Required Argument:
      - `processor`: Processor for your whisper model. e.g. ``smash_config['processor'] = AutoProcessor.from_pretrained('"openai/whisper-large-v3"')``
    - Time: Seconds.
    - Quality: Maintains original model performance.

- **hypertiles**:
    - Compiles `diffusers` models for optimal inference speed on target GPUs.
    - Time: About 15-20 minutes.
    - Quality: Similar to the original model.

- **step_caching**:
    - Optimizes `diffusers` models by intelligently selecting diffusion steps.
    - Time: Seconds.
    - Quality: Very close to the original model.

- **cv_fast**:
    - Rapidly compiles computer vision models for NVIDIA GPUs.
    - Time: Approximately 10 seconds.
    - Quality: Unchanged from the original model.

Quantization
^^^^^^^^^^^^
Quantization methods reduce the precision of the model's weights and activations making them much smaller in terms of memory required at the cost of some quality loss. Supported methods include:

- **llm-int**:
    - Quantizes the model to either 8-bit or 4-bit integers.
    - Required Argument:
        - `n_quantization_bits`: 4 or 8 bits. e.g. ``smash_config['n_quantization_bits'] = 8``
    - Time: A few minutes.
    - Quality: Lower than the original model with 4 bits worse than 8 bits.

- **gptq**:
    - Quantizes the model to 8-bit ir 4-bit or 3-bit or 2-bit integers.
    - Required Argument:
        - `n_quantization_bits`: 2, 3, 4, or 8 bits. e.g. ``smash_config['n_quantization_bits'] = 4``
    - Time: A few minutes to an hour depending on the size model.
    - Quality: Lower than the original model with 2 bits worse than 3 bits worse than 4 bits worse than 8 bits.

Pruning
^^^^^^^

Coming Soon!

Factorization
^^^^^^^^^^^^^

Coming Soon!