пре 3 година · 348acdf626
--- a/README.md
+++ b/README.md
@@ -15,15 +15,16 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
 
				 * Switch between different models using a dropdown menu.
			
 
				 * Notebook mode that resembles OpenAI's playground.
			
 
				 * Chat mode for conversation and role playing.
			
 
				-* Advanced chat features (send images, get audio responses with TTS).
			
 
				 * Generate nice HTML output for GPT-4chan.
			
 
				 * Generate Markdown output for [GALACTICA](https://github.com/paperswithcode/galai), including LaTeX support.
			
 
				 * Support for [Pygmalion](https://huggingface.co/models?search=pygmalionai/pygmalion) and custom characters in JSON or TavernAI Character Card formats ([FAQ](https://github.com/oobabooga/text-generation-webui/wiki/Pygmalion-chat-model-FAQ)).
			
 
				+* Advanced chat features (send images, get audio responses with TTS).
			
 
				 * Stream the text output in real time.
			
 
				 * Load parameter presets from text files.
			
 
				 * Load large models in 8-bit mode ([see here](https://github.com/oobabooga/text-generation-webui/issues/20#issuecomment-1411650652) if you are on Windows).
			
 
				 * Split large models across your GPU(s), CPU, and disk.
			
 
				 * CPU mode.
			
 
				+* DeepSpeed ZeRO-3 offload.
			
 
				 * Get responses via API.
			
 
				 * Supports softprompts.
			
 
				 * Supports extensions ([guide](https://github.com/oobabooga/text-generation-webui/wiki/Extensions)).
			
@@ -142,11 +143,15 @@ Optionally, you can use the following command-line flags:
 
				 | `--picture`  | Adds an ability to send pictures in chat UI modes. Captions are generated by BLIP. |
			
 
				 | `--cpu`       | Use the CPU to generate text.|
			
 
				 | `--load-in-8bit`  | Load the model with 8-bit precision.|
			
 
				+| `--bf16`  | Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
			
 
				 | `--auto-devices` | Automatically split the model across the available GPU(s) and CPU.|
			
 
				 | `--disk` | If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk. |
			
 
				 | `--disk-cache-dir DISK_CACHE_DIR` | Directory to save the disk cache to. Defaults to `cache/`. |
			
 
				 | `--gpu-memory GPU_MEMORY` | Maximum GPU memory in GiB to allocate. This is useful if you get out of memory errors while trying to generate text. Must be an integer number. |
			
 
				 | `--cpu-memory CPU_MEMORY`    | Maximum CPU memory in GiB to allocate for offloaded weights. Must be an integer number. Defaults to 99.|
			
 
				+| `--deepspeed`    | Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration. |
			
 
				+| `--nvme-offload-dir NVME_OFFLOAD_DIR`    | DeepSpeed: Directory to use for ZeRO-3 NVME offloading. |
			
 
				+| `--local_rank LOCAL_RANK`    | DeepSpeed: Optional argument for distributed setups. |
			
 
				 | `--no-stream`   | Don't stream the text output in real time. This improves the text generation performance.|
			
 
				 | `--settings SETTINGS_FILE` | Load the default interface settings from this json file. See `settings-template.json` for an example.|
			
 
				 | `--extensions EXTENSIONS` | The list of extensions to load. If you want to load more than one extension, write the names separated by commas and between quotation marks, "like,this". |