Optimize Vector Parameters
Fine-tune your vectorβs chunking and processing settings to achieve optimal retrieval performance for your AI chatbots. Customize how content is split and embedded to enhance accuracy and efficiency.
Chunking Methods
Indite offers three chunking methods to split your content before embedding, each tailored to different content types:
1. Recursive Text Splitter (Default)
- Splits text hierarchically into logical sections.
- Preserves context across headings, paragraphs, and sub-sections.
- Ideal for: Long-form documents, manuals, or structured content.
- Example: A 5,000-character product manual is split into chunks retaining headings and subheadings.
2. Simple Text Splitter
- Divides text sequentially based on character count.
- Ignores headings or formatting for straightforward splitting.
- Ideal for: Unstructured text, quick embedding, or short content.
- Example: A 2,000-character FAQ is split into two 1,000-character chunks.
3. Markdown Text Splitter
- Designed for Markdown files, splitting based on headings (
#,##,###). - Preserves bullet lists and code blocks for context.
- Ideal for: Documentation, knowledge bases, or README files.
- Example: A Markdown tutorial is split at sections and code blocks to maintain structure.
π‘
Select the chunking method based on your content type. Use Recursive Text Splitter for structured documents, Markdown Splitter for .md files, and Simple Splitter for raw text.
Chunk Size & Overlap
- Chunk Size: Number of characters per chunk (default: 1000)
- Smaller chunks improve retrieval precision.
- Larger chunks preserve more context.
- Overlap Size: Number of characters overlapping between chunks (default: 100)
- Ensures context continuity across chunks.
β οΈ
Test chunking parameters with sample data to optimize retrieval accuracy and performance.
Best Practices
- Documents: Use Recursive Text Splitter or Markdown Splitter with 800β1200 character chunks.
- Code or FAQs: Use Simple Text Splitter with smaller ~500-character chunks.
- Overlap: Set to 10β20% of chunk size to retain context.
- Monitoring: Verify vector count after training to ensure reasonable chunking.
π‘
Apply these best practices to ensure efficient vector creation and accurate retrieval for your AI chatbots.