Chat Flow
Flow Builder
Blocks
Integrations
Groq

Groq Integration

What it does: Access ultra-fast AI inference using Groq's LPU (Language Processing Unit) technology for lightning-speed chat completions.

In simple terms: Groq is like having a supercharged AI engine. It runs the same models as other providers but at incredibly fast speeds, perfect for real-time applications.

When to Use This

Use Groq when you need:

  • ✅ Ultra-fast response times (up to 10x faster than traditional GPUs)
  • ✅ Real-time AI conversations without lag
  • ✅ Cost-effective inference at scale
  • ✅ Support for popular open-source models
  • ✅ Low-latency applications (chat, voice, live support)

Example: Build a real-time customer support chatbot that responds instantly, or create live AI-powered voice assistants.

Key Features

  • Blazing Fast: Industry-leading inference speed
  • Open Source Models: Llama, Mixtral, Gemma, and more
  • Low Latency: Perfect for real-time applications
  • Cost-Effective: Competitive pricing for fast inference
  • OpenAI-Compatible API: Easy migration from OpenAI
  • High Throughput: Handle many requests simultaneously

Setup Guide

Step 1: Get Groq API Key

  1. Go to console.groq.com (opens in a new tab) and sign up
  2. Navigate to API Keys section
  3. Click "Create API Key"
  4. Give it a descriptive name
  5. Copy and save your API key securely

Step 2: Configure the Block

Connection Settings:

  1. Credentials: Select your Groq credentials from the dropdown or create new ones

    • API Key: Your Groq API key
    • The system will automatically fetch available models
  2. Model: Select the AI model you want to use

    • llama-3.3-70b-versatile: Latest Llama 3.3 model, great for general tasks
    • llama-3.1-70b-versatile: Previous generation, still very capable
    • mixtral-8x7b-32768: Excellent for complex reasoning
    • gemma-7b-it: Compact model for faster responses
  3. Messages: Configure the conversation

    • System Message: Instructions that guide the AI's behavior
    • User Messages: Add user queries and context
    • Dialogue: Use conversation history from variables
  4. Temperature: Control response randomness (0-2)

    • 0: Deterministic, focused responses
    • 1: Balanced creativity (default)
    • 2: Maximum creativity and randomness
  5. Response Mapping: Save AI responses to variables

    • Map "Message content" to workflow variables
    • Access total tokens used
    • Store response for later use

Available Models

Llama 3 Family

llama-3.3-70b-versatile (Recommended):

  • Latest and most capable Llama model
  • Excellent reasoning and understanding
  • Fast inference on Groq's LPU
  • Use for: General chatbots, Q&A, content generation

llama-3.1-70b-versatile:

  • Previous generation, proven reliability
  • Great balance of speed and capability
  • Use for: Most general-purpose tasks

llama-3.1-8b-instant:

  • Smaller, faster variant
  • Lower latency for real-time needs
  • Use for: Quick responses, simple queries

Mixtral

mixtral-8x7b-32768:

  • Mixture of Experts architecture
  • Excellent for complex reasoning
  • Large context window (32K tokens)
  • Use for: Technical questions, code generation, analysis

Gemma

gemma-7b-it:

  • Google's open model
  • Compact and efficient
  • Use for: Resource-conscious applications

Message Configuration

System Message

Define how the AI should behave:

You are a helpful AI assistant. Provide clear, accurate, and concise responses.
Focus on being direct and informative.

User Messages

Add the user's query:

  • Use workflow variables: {{user_question}}
  • Combine context: Answer this based on {{context}}: {{question}}
  • Multi-turn: Previous answer: {{last_response}}. Follow-up: {{new_question}}

Dialogue History

Reference conversation history:

  • Select a dialogue variable storing past messages
  • Maintains conversation context
  • Enables natural multi-turn conversations

Common Use Cases

Real-Time Chatbot

Ultra-responsive conversational AI:

  • Model: llama-3.3-70b-versatile
  • System Message:
    You are a friendly chatbot. Respond quickly and naturally to user messages.
    Keep responses concise but helpful.
  • User Message: {{user_input}}
  • Temperature: 0.7
  • Why Groq: Lightning-fast responses create smooth conversations

Live Customer Support

Instant support responses:

  • Model: llama-3.1-8b-instant (for maximum speed)
  • System Message:
    You are a customer support agent. Answer questions about our products
    and services. Be helpful, professional, and efficient.
    
    Product info: {{product_knowledge}}
  • User Message: {{customer_question}}
  • Temperature: 0.4
  • Why Groq: No lag between customer question and AI response

Code Assistant

Fast programming help:

  • Model: mixtral-8x7b-32768
  • System Message:
    You are a coding assistant. Provide clear code examples with brief explanations.
    Focus on best practices and working solutions.
  • User Message: {{code_question}}
  • Temperature: 0.2
  • Why Groq: Rapid code generation and explanations

Content Summarization

Quick document summaries:

  • Model: llama-3.3-70b-versatile
  • System Message:
    You are a summarization expert. Extract key points and create concise summaries.
    Maintain the main ideas while being brief.
  • User Message: Summarize this: {{document}}
  • Temperature: 0.3
  • Why Groq: Process long documents quickly

Voice Assistant Backend

Power voice-enabled AI:

  • Model: llama-3.1-8b-instant
  • System Message:
    You are a voice assistant. Provide brief, natural-sounding responses
    optimized for text-to-speech. Avoid long lists or complex formatting.
  • User Message: {{transcribed_speech}}
  • Temperature: 0.6
  • Why Groq: Minimal latency critical for voice interactions

Advanced Features

Stream Responses

Enable real-time streaming for even better UX:

  • Responses appear word-by-word as generated
  • User sees output immediately
  • Perfect for chat interfaces
  • Reduces perceived latency

Conversation Context

Maintain Context:

  1. Store conversation in dialogue variable
  2. Include relevant history in each request
  3. Update history after each exchange

Example:

1. User: "What's the weather in Paris?"
2. AI: "I don't have real-time weather data..."
3. User: "Then tell me about the city"
4. AI: (Knows "the city" = Paris from context)

Response Mapping

Extract data from responses:

FieldDescription
Message contentThe AI-generated text response
Total tokensToken count for usage tracking

Example:

  • Save response to {{ai_response}}
  • Track usage with {{token_count}}
  • Display {{ai_response}} to user
  • Log {{token_count}} for analytics

Performance Optimization

Model Selection

For Speed (< 100ms latency):

  • llama-3.1-8b-instant
  • Use when: Voice apps, real-time chat, instant feedback

For Quality (still fast, ~200ms):

  • llama-3.3-70b-versatile
  • Use when: Complex questions, detailed answers

For Reasoning (balanced):

  • mixtral-8x7b-32768
  • Use when: Technical content, code, analysis

Temperature Guidelines

Task TypeRecommended Temperature
Facts, data lookup0 - 0.2
Customer support0.3 - 0.5
General chat0.6 - 0.8
Creative writing0.9 - 1.2
Brainstorming1.3 - 2.0

Message Optimization

Keep it concise:

  • Shorter prompts = faster responses
  • Be direct and specific
  • Remove unnecessary context

Before:

I was wondering if you could possibly help me understand
what the difference might be between machine learning and
deep learning, if that's okay?

After:

Explain the difference between machine learning and deep learning.

What You Get Back

Response includes:

  • Message Content: The AI's text response
  • Total Tokens: Number of tokens used
  • Latency: Processing time (typically <1 second)

Tips for Success

  1. Leverage speed - Design workflows that benefit from fast responses

    • Real-time chat
    • Live support
    • Interactive applications
  2. Choose the right model - Match model to task

    • Instant models: Speed-critical applications
    • 70B models: Quality-critical applications
    • Mixtral: Technical/complex tasks
  3. Optimize prompts - Faster responses with better prompts

    • Be concise and specific
    • Remove fluff
    • Use clear instructions
  4. Monitor costs - Track token usage

    • Map token counts to variables
    • Set up usage alerts
    • Optimize prompt length
  5. Test streaming - Better UX for long responses

    • Enable streaming where UI supports it
    • Shows response as it's generated
    • Reduces perceived wait time

Troubleshooting

ProblemLikely CauseSolution
No models loadingInvalid API keyVerify API key at console.groq.com
Rate limit errorsToo many requestsImplement request queuing or upgrade plan
Incomplete responsesToken limit reachedUse models with larger context windows
Slow responsesNetwork issuesCheck network, Groq is typically <1s
Empty responseModel overloadRetry or switch to different model

Best Practices

  • Design for speed - Build experiences that showcase fast inference
  • Use appropriate models - Don't use 70B models when 8B will do
  • Implement retries - Handle occasional rate limits gracefully
  • Cache when possible - Save common queries to reduce API calls
  • Monitor latency - Track response times to ensure performance
  • Test different models - Find the best speed/quality balance
  • Keep context minimal - Only include necessary conversation history

Groq vs Other Providers

Why Choose Groq:

  • Speed: Up to 10x faster than traditional GPU inference
  • 💰 Cost: Competitive pricing for performance
  • 🔓 Open Models: Access to leading open-source models
  • 🔄 Compatibility: OpenAI-compatible API for easy migration

When to Use Alternatives:

  • Need proprietary models (GPT-4, Claude)
  • Require specific model fine-tuning
  • Need very large context windows (>32K tokens)

Pricing

Groq offers competitive per-token pricing:

  • Pay only for usage
  • No minimum commitments
  • Free tier available for testing
  • Volume discounts for scale

Check console.groq.com/settings/billing (opens in a new tab) for current rates.

Indite Documentation v1.4.0
PrivacyTermsSupport