|
@@ -150,7 +150,8 @@ Optionally, you can use the following command-line flags:
|
|
|
| `--deepspeed` | Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration. |
|
|
| `--deepspeed` | Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration. |
|
|
|
| `--nvme-offload-dir NVME_OFFLOAD_DIR` | DeepSpeed: Directory to use for ZeRO-3 NVME offloading. |
|
|
| `--nvme-offload-dir NVME_OFFLOAD_DIR` | DeepSpeed: Directory to use for ZeRO-3 NVME offloading. |
|
|
|
| `--local_rank LOCAL_RANK` | DeepSpeed: Optional argument for distributed setups. |
|
|
| `--local_rank LOCAL_RANK` | DeepSpeed: Optional argument for distributed setups. |
|
|
|
-| `--rwkv-strategy RWKV_STRATEGY` | The strategy to use while loading RWKV models. Examples: `"cpu fp32"`, `"cuda fp16"`, `"cuda fp16 *30 -> cpu fp32"`. |
|
|
|
|
|
|
|
+| `--rwkv-strategy RWKV_STRATEGY` | RWKV: The strategy to use while loading the model. Examples: "cpu fp32", "cuda fp16", "cuda fp16i8". |
|
|
|
|
|
+| `--rwkv-cuda-on` | RWKV: Compile the CUDA kernel for better performance. |
|
|
|
| `--no-stream` | Don't stream the text output in real time. This improves the text generation performance.|
|
|
| `--no-stream` | Don't stream the text output in real time. This improves the text generation performance.|
|
|
|
| `--settings SETTINGS_FILE` | Load the default interface settings from this json file. See `settings-template.json` for an example. If you create a file called `settings.json`, this file will be loaded by default without the need to use the `--settings` flag.|
|
|
| `--settings SETTINGS_FILE` | Load the default interface settings from this json file. See `settings-template.json` for an example. If you create a file called `settings.json`, this file will be loaded by default without the need to use the `--settings` flag.|
|
|
|
| `--extensions EXTENSIONS [EXTENSIONS ...]` | The list of extensions to load. If you want to load more than one extension, write the names separated by spaces. |
|
|
| `--extensions EXTENSIONS [EXTENSIONS ...]` | The list of extensions to load. If you want to load more than one extension, write the names separated by spaces. |
|