Skip to main content

Data Sources Overview

ChatMaven supports multiple ways to add and manage your chatbot's knowledge base. This guide provides an overview of available data sources and helps you choose the right option for your needs.

Available Data Sources

File Upload

  • Best for: Documentation, knowledge bases, and structured content
  • Supported formats: PDF, DOCX, TXT, MD, HTML
  • Features:
    • Bulk upload support
    • Automatic text extraction
    • Document structure preservation
    • File organization
  • Learn more about File Upload

Website Crawling

  • Best for: Public websites, documentation sites, and blogs
  • Features:
    • Automatic content discovery
    • Scheduled crawling
    • URL pattern filtering
    • Authentication support
  • Learn more about Website Crawling

API Integration

  • Best for: Dynamic content, real-time data, and custom integrations
  • Features:
    • Real-time updates
    • Custom data formatting
    • Webhook support
    • Rate limiting controls
  • Learn more about API Integration

Choosing the Right Data Source

Factors to Consider

  1. Content Type

    • Static vs. dynamic content
    • Structured vs. unstructured data
    • Update frequency
  2. Technical Requirements

    • Integration complexity
    • Development resources
    • Maintenance needs
  3. Scale

    • Data volume
    • Update frequency
    • Performance requirements

Common Use Cases

Documentation Sites

  • Recommended: Website Crawling
  • Benefits:
    • Automatic updates
    • Structure preservation
    • Easy maintenance

Knowledge Base

  • Recommended: File Upload
  • Benefits:
    • Bulk processing
    • Version control
    • Organized storage

Dynamic Content

  • Recommended: API Integration
  • Benefits:
    • Real-time updates
    • Custom formatting
    • Flexible integration

Best Practices

Data Quality

  1. Content Structure

    • Use clear headings
    • Maintain consistent formatting
    • Include relevant metadata
  2. Content Quality

    • Keep information accurate
    • Update regularly
    • Remove outdated content
  3. Organization

    • Use logical categories
    • Maintain clear hierarchy
    • Tag content appropriately

Performance Optimization

  1. File Size

    • Optimize large files
    • Split lengthy documents
    • Remove unnecessary formatting
  2. Update Frequency

    • Schedule regular updates
    • Monitor processing time
    • Balance freshness vs. load
  3. Error Handling

    • Set up notifications
    • Monitor failed updates
    • Implement retry logic

Next Steps