Troubleshooting
Troubleshooting Guide

Troubleshooting Guide

This comprehensive troubleshooting guide covers common issues, diagnostic procedures, and solutions for the MyNATCA platform.

General Diagnostics

Quick Health Check

# Test all core services
curl https://discord.mynatca.org/api/health
curl https://api.mynatca.org/api/health
curl https://hub.mynatca.org/api/health
 
# Check database connectivity
npm run test:connections
 
# Verify environment variables
npm run validate:config

Log Analysis

# View recent logs
tail -f logs/app.log
tail -f logs/error.log
 
# Search for specific errors
grep -i "error" logs/app.log | tail -20
grep -i "auth0" logs/app.log | tail -10
 
# Monitor real-time activity
pm2 logs mynatca-discord --lines 50

Performance Monitoring

# Check system resources
htop
free -m
df -h
 
# Monitor application performance
pm2 monit
npm run health:detailed

Discord Bot Issues

Guild Access Errors (NEW - October 2025)

Symptoms

  • /refresh command fails with "Cannot read properties of null (reading 'members')"
  • Commands fail when trying to access guild resources
  • TypeError: interaction.guild is null errors in logs
  • Commands work sometimes but fail randomly

Root Cause

Discord.js can return null for interaction.guild in certain edge cases, even when commands are executed in a guild context. This occurs intermittently and is not related to bot permissions.

Diagnostic Steps

# 1. Check logs for null guild errors
grep -i "guild is null" logs/discord-bot.log
 
# 2. Verify command is not being used in DMs
# Check if command has DM permission disabled
 
# 3. Test command in guild
# Use /refresh command in Discord server

Solutions

1. Implement Guild Access Fallback Pattern (REQUIRED)

All commands that access guild resources must use the fallback pattern:

// CORRECT: Guild access with fallback
const guild = interaction.guild || interaction.client.guilds.cache.first();
 
if (!guild) {
  return interaction.reply({
    content: 'This command can only be used in a server.',
    ephemeral: true
  });
}
 
// Now safely access guild resources
const member = await guild.members.fetch(userId);

2. Add DM Permission Protection

Prevent commands from being used in DMs:

const { SlashCommandBuilder } = require('discord.js');
 
module.exports = {
  data: new SlashCommandBuilder()
    .setName('refresh')
    .setDescription('Refresh member roles')
    .setDMPermission(false),  // Prevent DM usage
 
  async execute(interaction) {
    const guild = interaction.guild || interaction.client.guilds.cache.first();
    // ... command logic
  }
};

3. Update Existing Commands

Apply the pattern to all affected commands:

  • /refresh user - Refresh specific user roles
  • /refresh member - Refresh by member number
  • /refresh all - Refresh all verified members
  • All administrative commands

4. Redeploy Commands

After updating command definitions:

npm run deploy

Prevention

For New Commands:

  • Always use guild access fallback pattern
  • Set .setDMPermission(false) for guild-only commands
  • Validate guild exists before accessing resources
  • Test commands thoroughly in guild context

Code Review Checklist:

  • Guild access uses fallback pattern
  • DM permissions set correctly
  • Guild null check before resource access
  • Error handling for guild access failures

Bot Not Responding to Commands

Symptoms

  • Commands don't appear when typing /
  • Bot doesn't respond to executed commands
  • Commands show as "Application did not respond"

Diagnostic Steps

# 1. Check bot status
curl https://discord.mynatca.org/api/health
 
# 2. Verify bot permissions in Discord
# Check bot role hierarchy and permissions
 
# 3. Check environment variables
echo $DISCORD_TOKEN
echo $DISCORD_CLIENT_ID
echo $DISCORD_GUILD_ID
 
# 4. Test Discord API connectivity
node -e "
const { Client } = require('discord.js');
const client = new Client({ intents: [] });
client.login(process.env.DISCORD_TOKEN)
  .then(() => console.log('✅ Discord connection successful'))
  .catch(err => console.error('❌ Discord connection failed:', err));
"

Solutions

  1. Redeploy Commands

    npm run deploy
  2. Check Bot Permissions

    • Ensure bot has "Use Slash Commands" permission
    • Verify bot role is above managed roles
    • Check channel-specific permissions
  3. Restart Bot Service

    pm2 restart mynatca-discord-bot
    # or
    npm run dev
  4. Regenerate Bot Token

    • Go to Discord Developer Portal
    • Navigate to Bot section
    • Reset token and update environment variables

Registration Help Messages in #verify Channel (NEW - October 2025)

Feature Overview

The Discord bot now provides automatic help messages in the #verify channel when users make common registration mistakes.

Behavior:

  • Detects messages that aren't the /register command
  • Posts helpful @mention messages with guidance
  • Auto-deletes help messages after 30 seconds to keep channel clean
  • Catches common mistakes like typing member numbers directly

Common Detected Mistakes:

  1. Member numbers without /register - e.g., typing "123456" instead of using /register
  2. Typos and spacing errors - e.g., "register 123456" instead of /register
  3. Wrong commands in #verify - Using other commands in verification channel
  4. Random messages - Any non-command text receives catch-all reminder

Expected User Experience

Scenario 1: User types member number directly

User: 123456
Bot: @User Hey! To register, use the /register command and follow the prompts. Don't just type your member number.
[Message auto-deletes after 30 seconds]

Scenario 2: User makes typo

User: /regsiter
Bot: @User I think you meant /register - use the slash command to start verification.
[Message auto-deletes after 30 seconds]

Scenario 3: Any other message

User: How do I register?
Bot: @User Only the /register command is allowed in this channel. Type /register to begin.
[Message auto-deletes after 30 seconds]

Troubleshooting Registration Help

Help messages not appearing:

# 1. Verify bot has permissions in #verify channel
# Required permissions:
# - Read Messages
# - Send Messages
# - Mention Everyone
# - Manage Messages (for auto-delete)
 
# 2. Check messageCreate event handler is running
grep "messageCreate" logs/discord-bot.log
 
# 3. Verify channel name/ID matches configuration
# Check VERIFY_CHANNEL_ID in environment variables

Help messages not auto-deleting:

# Verify bot has "Manage Messages" permission
# Check for errors in logs
grep "delete.*message" logs/discord-bot.log

False positives (help for valid commands):

# Check message detection logic
# Should only trigger for non-slash-command messages
# Verify: msg.content.startsWith('/') check exists

Implementation Reference

The registration help system is implemented in the messageCreate event handler:

// Event handler detects:
// - Messages in #verify channel
// - Messages that aren't slash commands
// - Common registration mistakes
 
// Response pattern:
// 1. Post @mention help message
// 2. Set 30-second auto-delete timer
// 3. Clean up to keep channel tidy

Verification Flow Issues

"Invalid Verification Link" Error

Symptoms

  • Users receive "Invalid Verification Link" message
  • Verification links expire immediately
  • Database connection issues

Diagnostic Steps

# 1. Check Supabase connection
node -e "
const { createClient } = require('@supabase/supabase-js');
const client = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_KEY);
client.from('verification_requests').select('count').then(console.log).catch(console.error);
"
 
# 2. Verify verification_requests table exists
npx supabase status
npx supabase db reset --linked
 
# 3. Check verification creation logs
grep "verification" logs/app.log | tail -20

Solutions

  1. Create Missing Table

    -- Run in Supabase SQL editor
    CREATE TABLE IF NOT EXISTS verification_requests (
      id SERIAL PRIMARY KEY,
      verification_id UUID UNIQUE NOT NULL,
      discord_id TEXT NOT NULL,
      discord_username TEXT NOT NULL,
      status TEXT DEFAULT 'pending',
      auth0_user_id TEXT,
      member_number TEXT,
      created_at TIMESTAMPTZ DEFAULT NOW(),
      expires_at TIMESTAMPTZ NOT NULL,
      completed_at TIMESTAMPTZ,
      updated_at TIMESTAMPTZ DEFAULT NOW()
    );
  2. Fix RLS Policies

    ALTER TABLE verification_requests ENABLE ROW LEVEL SECURITY;
     
    CREATE POLICY "Service role access" ON verification_requests
      FOR ALL USING (auth.role() = 'service_role');
  3. Clear Expired Verifications

    npm run cleanup:expired-verifications

Staff and RNAV Member Command Issues (NEW - October 2025)

Issue: /status Command Fails for Staff or RNAV Members

Symptoms:

  • /status command works for regular members (membertypeid=6)
  • /status fails or returns "not verified" for Staff (membertypeid=8)
  • /status fails or returns "not verified" for RNAV (membertypeid=10)
  • Error: "No member data found" for valid Staff/RNAV members

Root Cause: The /status command query was using .eq('membertypeid', 6) which only returned regular members, excluding Staff and RNAV members.

Solution: Update query to include all valid member types using .in():

// OLD (incorrect) - only returns regular members
const { data: memberData } = await supabase
  .from('members')
  .select('*')
  .eq('discordid', userId)
  .eq('membertypeid', 6)  // ❌ Only regular members
  .single();
 
// NEW (correct) - returns all member types
const { data: memberData } = await supabase
  .from('members')
  .select('*')
  .eq('discordid', userId)
  .in('membertypeid', [6, 8, 10])  // ✅ Regular, Staff, and RNAV
  .single();

Verification:

# Test /status command for each member type
# In Discord, as a Staff member:
/status
 
# Should show:
# ✅ Verified Member
# Member Type: NATCA Staff
# Roles: NATCA Staff
 
# As RNAV member:
/status
 
# Should show:
# ✅ Verified Member
# Member Type: RNAV Member
# Roles: RNAV Member

Issue: Staff Position Nicknames Not Showing Correctly

Symptoms:

  • Staff members don't get "(Staff)" in nickname
  • Chief of Staff not showing "(Chief of Staff)" designation
  • Staff role assigned but nickname uses facility format

Root Cause: setMemberNickname function not receiving positions parameter or not detecting Staff positions.

Solution: Ensure setMemberNickname receives positions and checks for Staff:

// Correct implementation
async setMemberNickname(member, memberData, positions) {
  const { firstname, lastname } = memberData;
 
  // Check for Chief of Staff position
  const isChiefOfStaff = positions?.some(p =>
    p.positiontype === 'staff' &&
    p.position.toLowerCase().includes('chief of staff')
  );
 
  // Check for any staff position
  const hasStaffPosition = positions?.some(p => p.positiontype === 'staff');
 
  let nickname;
 
  if (isChiefOfStaff) {
    nickname = `${firstname} ${lastname} (Chief of Staff)`;
  } else if (hasStaffPosition) {
    nickname = `${firstname} ${lastname} (Staff)`;
  } else {
    // Regular member nickname logic
    nickname = `${firstname} ${lastname} (${region}/${facility})`;
  }
 
  await member.setNickname(nickname);
}

Verification:

# Check Staff member nickname in Discord
# Should show:
# - "John Doe (Chief of Staff)" for Chief of Staff
# - "Jane Smith (Staff)" for other staff positions
 
# Check logs for nickname assignment
grep -i "staff.*nickname" logs/discord-bot.log

Testing Staff and RNAV Support

Test Suite:

# Run command tests including Staff/RNAV scenarios
npm test
 
# Look for these test cases:
# ✓ /status command works for NATCA Members (membertypeid=6)
# ✓ /status command works for Staff (membertypeid=8)
# ✓ /status command works for RNAV (membertypeid=10)
 
# Run with verbose output for details
npm run test:verbose

Manual Testing:

  1. Create test Staff member in database (membertypeid=8)
  2. Create test RNAV member in database (membertypeid=10)
  3. Test /status command with each member type
  4. Verify role assignment includes "NATCA Staff" and "RNAV Member"
  5. Check nicknames match position types

Role Assignment Problems

Symptoms

  • Members verified but roles not assigned
  • Incorrect roles assigned
  • Role assignment errors in logs
  • Staff role not assigned to Staff members

Diagnostic Steps

# 1. Check bot role hierarchy
# Bot role must be above all managed roles in Discord
 
# 2. Test role assignment manually
node scripts/test-role-assignment.js
 
# 3. Check member data
curl -H "Authorization: Bearer $AUTH0_TOKEN" \
  "https://api.mynatca.org/api/members/123456"
 
# 4. Verify role mapping configuration
npm run test:role-mapping

Solutions

  1. Fix Role Hierarchy

    • Move bot role above managed roles in Discord
    • Ensure bot has "Manage Roles" permission
  2. Update Role Mapping

    // lib/roleManager.js
    const positionRoleMap = {
      'facrep': 'FacRep',
      'comchair': 'Committee Member',
      'neb': 'NEB',
      'regional': 'Regional Rep'
    };
  3. Refresh Member Roles

    # Refresh specific member
    curl -X POST \
      -H "Authorization: Bearer $SERVICE_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"discord_id": "123456789012345678"}' \
      https://discord.mynatca.org/api/discord/roles/refresh

Auth0 Integration Issues

Critical: "secret" is required Error

Symptoms

  • Application fails to start with error: "secret" is required
  • Deployment succeeds but runtime errors occur
  • Auth0 authentication endpoints return 500 errors

Root Cause

The NextJS Auth0 SDK requires AUTH0_SECRET environment variable for session encryption. This is NOT the same as AUTH0_CLIENT_SECRET.

Immediate Solution

# 1. Generate AUTH0_SECRET (32+ characters required)
AUTH0_SECRET=$(openssl rand -hex 32)
 
# 2. Add to Vercel environment variables
vercel env add AUTH0_SECRET production
 
# 3. Set the value when prompted
# Example: a1b2c3d4e5f6789012345678901234567890abcdef1234567890
 
# 4. Redeploy application
vercel --prod

Verification

# Check that AUTH0_SECRET is set
vercel env ls | grep AUTH0_SECRET
 
# Test deployment
curl -I https://discord.mynatca.org/api/auth/login
# Should return 302 redirect, not 500 error

AUTH0_CLIENT_SECRET vs AUTH0_SECRET

Important Distinction

These are two different secrets with different purposes:

VariablePurposeSourceFormat
AUTH0_CLIENT_SECRETOAuth2 flow authenticationAuth0 Dashboard → App SettingsBase64/string from Auth0
AUTH0_SECRETSession cookie encryptionGenerated by developer32+ random hex characters

Common Mistake

# ❌ WRONG - Using client secret as session secret
AUTH0_SECRET=your_auth0_client_secret_from_dashboard
 
# ✅ CORRECT - Using generated random secret
AUTH0_SECRET=a1b2c3d4e5f6789012345678901234567890abcdef1234567890

Authentication Failures

Symptoms

  • "Login Required" errors
  • JWT token validation failures
  • Redirect loops during login
  • Session not persisting after login

Diagnostic Steps

# 1. Verify all Auth0 environment variables are set
echo "AUTH0_DOMAIN: $AUTH0_DOMAIN"
echo "AUTH0_CLIENT_ID: $AUTH0_CLIENT_ID"
echo "AUTH0_CLIENT_SECRET: [hidden]"
echo "AUTH0_SECRET: [hidden]"
echo "AUTH0_BASE_URL: $AUTH0_BASE_URL"
echo "AUTH0_ISSUER_BASE_URL: $AUTH0_ISSUER_BASE_URL"
 
# 2. Test Auth0 connectivity
curl https://natca-prod.us.auth0.com/.well-known/jwks.json
 
# 3. Check Auth0 endpoints
curl -I https://discord.mynatca.org/api/auth/login
curl -I https://discord.mynatca.org/api/auth/callback
 
# 4. Validate callback URLs in Auth0 dashboard

Solutions

  1. Missing AUTH0_SECRET

    # Generate and set AUTH0_SECRET
    openssl rand -hex 32
    # Add to environment variables
  2. Update Callback URLs

    • Check Auth0 dashboard settings
    • Ensure production URLs are configured:
      • Callback URL: https://discord.mynatca.org/api/auth/callback
      • Logout URL: https://discord.mynatca.org/api/auth/logout
    • Update development URLs for local testing
  3. Refresh Auth0 Secrets

    # Generate new session secret (different for each environment)
    AUTH0_SECRET_STAGING=$(openssl rand -hex 32)
    AUTH0_SECRET_PRODUCTION=$(openssl rand -hex 32)
     
    # Update environment variables
    vercel env add AUTH0_SECRET staging
    vercel env add AUTH0_SECRET production
  4. Fix Domain Configuration

    # Ensure consistent domain configuration
    AUTH0_DOMAIN=natca-prod.us.auth0.com
    AUTH0_ISSUER_BASE_URL=https://natca-prod.us.auth0.com
    AUTH0_BASE_URL=https://discord.mynatca.org

Session Persistence Issues (NEW - October 2025)

Issue: Sessions Not Persisting After Login on Production

Symptoms:

  • User successfully authenticates with Auth0
  • Immediately logged out after redirect
  • Session cookie not being set in browser
  • Works fine on localhost but fails on production (Digital Ocean, nginx, etc.)

Root Cause: Platform deployed behind a reverse proxy without trust proxy configuration. Express doesn't recognize HTTPS from X-Forwarded-Proto header and fails to set secure cookies.

Diagnostic Steps:

# 1. Check if running behind reverse proxy
# Look for X-Forwarded-* headers in request
curl -I https://platform.natca.org/api/health | grep -i x-forwarded
 
# 2. Test cookie being set
curl -c cookies.txt -v https://platform.natca.org/api/auth/login 2>&1 | grep -i set-cookie
 
# Expected: Set-Cookie: platform.session=...; Secure; HttpOnly
# If missing 'Secure' flag, trust proxy not configured
 
# 3. Check Express trust proxy setting
# In server.js logs, look for req.protocol and req.secure values

Solution: Enable Trust Proxy (REQUIRED for Production)

Add this line to server.js BEFORE session middleware:

// server.js
const express = require('express');
const app = express();
 
// CRITICAL: Trust first proxy (Digital Ocean, nginx, etc.)
app.set('trust proxy', 1);
 
// Now Express recognizes X-Forwarded-Proto: https
app.use(session({
  cookie: {
    secure: process.env.NODE_ENV === 'production'  // Works correctly now
  }
}));

Why This Works:

  1. Reverse proxy adds X-Forwarded-Proto: https header
  2. Express (without trust proxy) ignores it, sees HTTP
  3. Secure cookie requires HTTPS, fails to set
  4. With trust proxy enabled, Express recognizes HTTPS
  5. Secure cookie sets correctly, session persists

Environment-Specific Configuration:

// Digital Ocean App Platform
app.set('trust proxy', 1);
 
// nginx reverse proxy
app.set('trust proxy', 1);
 
// Multiple proxies (nginx → load balancer → app)
app.set('trust proxy', 2);  // Trust first 2 proxies
 
// Trust specific proxy IP
app.set('trust proxy', '127.0.0.1');

Issue: Redis Connection Failures

Symptoms:

  • Intermittent session failures
  • "Redis connection timeout" errors
  • Sessions lost randomly

Solution:

// Robust Redis configuration with reconnection
const redis = require('redis');
 
const redisClient = redis.createClient({
  url: process.env.REDIS_URL,
  socket: {
    connectTimeout: 10000,
    commandTimeout: 5000,
    reconnectStrategy: (retries) => {
      if (retries > 10) {
        logger.error('Redis max retries exceeded');
        return new Error('Max retries reached');
      }
      // Exponential backoff: 100ms, 200ms, 400ms, etc.
      return Math.min(retries * 100, 3000);
    }
  }
});
 
// Error handling
redisClient.on('error', (err) => {
  logger.error('Redis client error', err);
});
 
redisClient.on('connect', () => {
  logger.info('Redis client connected');
});

Session Issues (General)

Symptoms

  • Users logged out immediately after login
  • Session cookies not persisting
  • "Invalid session" errors

Diagnostic Steps

# 1. Check session configuration
echo "SESSION_SECRET: [hidden]"
echo "REDIS_URL: $REDIS_URL"
echo "NODE_ENV: $NODE_ENV"
 
# 2. Test session creation
curl -c cookies.txt https://platform.natca.org/api/auth/login
curl -b cookies.txt https://platform.natca.org/api/auth/session
 
# 3. Check cookie security settings
# Inspect browser dev tools → Application → Cookies
# Verify: Secure, HttpOnly, SameSite flags present
 
# 4. Test Redis connection
redis-cli -u $REDIS_URL ping

Solutions

  1. Enable Trust Proxy (Most Common Issue)

    // Add to server.js BEFORE session middleware
    app.set('trust proxy', 1);
  2. Ensure Unique SESSION_SECRET per Environment

    # Different secrets prevent cookie conflicts
    openssl rand -base64 32  # Generate new secret
    # Use different secrets for staging and production
  3. Configure Session Settings

    app.use(session({
      store: new RedisStore({ client: redisClient }),
      secret: process.env.SESSION_SECRET,
      name: 'platform.session',
      resave: false,
      saveUninitialized: false,
      rolling: true,  // Refresh TTL on each request
      cookie: {
        maxAge: 7 * 24 * 60 * 60 * 1000,  // 7 days
        httpOnly: true,
        secure: process.env.NODE_ENV === 'production',
        sameSite: 'lax'
      }
    }));
  4. Check Cookie Domain for Cross-Subdomain Sessions

    cookie: {
      domain: '.natca.org',  // Note the leading dot
      path: '/'
    }

Management API Issues

Symptoms

  • "Insufficient scope" errors
  • User creation/update failures
  • Metadata sync issues

Diagnostic Steps

# 1. Test Management API token
curl -X GET \
  -H "Authorization: Bearer $M2M_TOKEN" \
  "https://natca-prod.us.auth0.com/api/v2/users?per_page=1"
 
# 2. Check granted scopes
node -e "
const ManagementClient = require('auth0').ManagementClient;
const client = new ManagementClient({
  domain: process.env.AUTH0_DOMAIN,
  clientId: process.env.AUTH0_M2M_CLIENT_ID,
  clientSecret: process.env.AUTH0_M2M_CLIENT_SECRET
});
client.getUsers({ per_page: 1 }).then(console.log).catch(console.error);
"

Solutions

  1. Update M2M Application Scopes

    • Go to Auth0 Dashboard > APIs > Management API
    • Select your M2M application
    • Grant required scopes:
      • read:users
      • update:users
      • create:users
      • update:user_metadata
      • update:user_app_metadata
  2. Regenerate M2M Credentials

    • Create new M2M application if needed
    • Update environment variables with new credentials

Database Issues

Supabase Connection Problems

Symptoms

  • Database timeout errors
  • Connection refused errors
  • SSL certificate issues

Diagnostic Steps

# 1. Test Supabase connectivity
curl -H "apikey: $SUPABASE_KEY" \
  "$SUPABASE_URL/rest/v1/members?select=count"
 
# 2. Check Supabase project status
npx supabase status --linked
 
# 3. Verify connection string
node -e "
const { createClient } = require('@supabase/supabase-js');
const client = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_KEY);
client.from('members').select('count').then(console.log).catch(console.error);
"

Solutions

  1. Update Supabase URL/Key

    • Check Supabase dashboard for correct values
    • Ensure using service key for server-side operations
  2. Fix RLS Policies

    -- Enable RLS on tables
    ALTER TABLE members ENABLE ROW LEVEL SECURITY;
     
    -- Create service role policy
    CREATE POLICY "Service role access" ON members
      FOR ALL USING (auth.role() = 'service_role');
  3. Reset Database Connection

    # Restart application
    pm2 restart all
     
    # Clear connection pool
    npm run db:reset-connections

MySQL Sync Issues

Symptoms

  • Sync failures or timeouts
  • "Connection lost" errors
  • Data inconsistencies

Diagnostic Steps

# 1. Test MySQL connectivity
node -e "
const mysql = require('mysql2/promise');
mysql.createConnection({
  host: process.env.MYSQL_HOST,
  user: process.env.MYSQL_USER,
  password: process.env.MYSQL_PASS,
  database: process.env.MYSQL_DB
}).then(conn => {
  console.log('✅ MySQL connected');
  return conn.end();
}).catch(console.error);
"
 
# 2. Check sync status
npm run sync health
 
# 3. Run sync validation
npm run sync validate
 
# 4. Check data counts
npm run sync:compare-counts

Solutions

  1. Fix Connection Settings

    # Update MySQL configuration
    MYSQL_TIMEOUT=60000
    MYSQL_POOL_SIZE=10
    MYSQL_SSL=true
  2. Optimize Sync Parameters

    # Reduce batch size for slow connections
    npm run sync sync-all --batch-size=500
     
    # Increase retry count
    npm run sync sync-all --retries=5
  3. Manual Sync Recovery

    # Clear stuck sync status
    npm run sync:clear-status
     
    # Restart sync process
    npm run sync sync-all --force

Data Synchronization Issues (Updated October 2025)

Sync Process Failures

Symptoms

  • Sync gets stuck in "running" state
  • High failure rates
  • Data inconsistencies between systems
  • Production sync writing to dev schema instead of public schema

Diagnostic Steps

# 1. Check sync health
npm run sync health --json
 
# 2. Review sync logs - verify target schema
grep -i "syncing to" logs/sync.log | tail -50
 
# 3. Check database locks
npm run db:check-locks
 
# 4. Verify data integrity
npm run sync verify-data
 
# 5. Verify target schema in Supabase
psql -c "SELECT schemaname, tablename, n_live_tup FROM pg_stat_user_tables WHERE schemaname IN ('public', 'dev');"

Solutions

  1. Production Sync Targeting Wrong Schema

    Symptom: Running node sync/sync-all.js positions --env=prod but data appears in dev.positions instead of public.positions

    Root Cause: Sync script not properly detecting or respecting --env=prod flag

    Solution:

    # 1. Verify script has environment detection
    # Check sync-positions.js for:
    # const env = process.argv.includes('--env=prod') ? 'prod' : 'dev';
    # const targetSchema = env === 'prod' ? 'public' : 'dev';
     
    # 2. Look for schema confirmation in output
    node sync/sync-all.js positions --env=prod
    # Should display: "🎯 Syncing to public schema"
     
    # 3. If missing, update script to match pattern from sync-teams.js
    # See platform/sync/sync-teams.js for reference implementation
  2. Missing Production Schema Columns

    Symptom: Sync fails with "column does not exist" errors

    Common Missing Columns:

    • public.positions.enddate - End date for positions
    • Unique constraint on (membernumber, positiontype)

    Solution:

    # Run required migrations first
    cd platform
    psql -f migrations/add_positions_enddate.sql
    psql -f migrations/add_positions_unique_constraint.sql
     
    # Verify migrations applied
    psql -c "\d public.positions"
     
    # Then retry sync
    node sync/sync-all.js positions --env=prod
  3. Dependency Sync Failures

    Symptom: Foreign key constraint violations during sync

    Cause: Syncing dependent tables before base tables

    Solution:

    # Wrong: Skip deps on first sync
    node sync/sync-all.js positions --skip-deps --env=prod  # May fail
     
    # Correct: Full sync respects dependencies
    node sync/sync-all.js --env=prod
     
    # Or sync dependencies first
    node sync/sync-all.js members --env=prod
    node sync/sync-all.js positions --env=prod
  4. Using --skip-deps Flag Correctly

    When to use --skip-deps:

    • Re-syncing after initial full sync completed
    • Testing individual sync scripts
    • Quick updates when dependencies haven't changed

    When NOT to use --skip-deps:

    • First-time environment setup
    • After schema migrations affecting multiple tables
    • When foreign key relationships changed

    Example:

    # Initial setup - do NOT use --skip-deps
    node sync/sync-all.js --env=prod
     
    # Later, quick position re-sync - OK to use --skip-deps
    node sync/sync-all.js positions --skip-deps --env=prod
  5. Clear Stuck Sync

    # Reset sync metadata
    npm run sync:reset-metadata
     
    # Force restart sync
    npm run sync sync-all --force
  6. Fix Data Inconsistencies

    # Compare record counts between schemas
    psql -c "SELECT 'public.members' as table, COUNT(*) FROM public.members UNION ALL SELECT 'dev.members', COUNT(*) FROM dev.members;"
     
    # Resync specific table
    node sync/sync-all.js members --env=prod
  7. Optimize Sync Performance

    # Adjust batch sizes in sync scripts
    SYNC_BATCH_SIZE=500 npm run sync sync-all
     
    # Use --skip-deps for faster individual table syncs
    node sync/sync-all.js positions --skip-deps --env=prod

New Sync Commands (October 2025)

Teams Sync

Sync committees and councils data:

# Development
node sync/sync-all.js teams
 
# Production
node sync/sync-all.js teams --env=prod

Individual Table Sync with Environment Flag (Enhanced October 2025)

All sync commands now support --env=prod flag correctly:

# Positions sync (fixed October 2025 to respect --env=prod)
node sync/sync-all.js positions --env=prod
# Output should show: "🎯 Syncing to public schema"
 
# Members sync
node sync/sync-all.js members --env=prod
 
# Teams sync (committees and councils)
node sync/sync-all.js teams --env=prod
 
# Facilities sync
node sync/sync-all.js facilities --env=prod
 
# Regions sync
node sync/sync-all.js regions --env=prod
 
# Fast re-sync with --skip-deps flag (New October 2025)
node sync/sync-all.js positions --skip-deps --env=prod
# Skips dependency syncs, only updates positions table

Key Improvement (October 2025): The positions sync script was fixed to properly respect the --env=prod flag. Previously, it would always sync to the dev schema regardless of the environment flag. Now it correctly syncs to the public schema when --env=prod is specified.

Verification: Look for this log message to confirm correct schema targeting:

🎯 Syncing to public schema

If you see "🎯 Syncing to dev schema" when using --env=prod, the script needs to be updated.

Verify Sync Results

# Check record counts in production
psql -c "SELECT COUNT(*) FROM public.members;"
psql -c "SELECT COUNT(*) FROM public.positions;"
psql -c "SELECT COUNT(*) FROM public.teams;"
 
# Compare dev vs production counts
psql -c "
SELECT
  'members' as table,
  (SELECT COUNT(*) FROM public.members) as prod,
  (SELECT COUNT(*) FROM dev.members) as dev
UNION ALL
SELECT
  'positions',
  (SELECT COUNT(*) FROM public.positions),
  (SELECT COUNT(*) FROM dev.positions);
"

Network and Connectivity Issues

External Service Timeouts

Symptoms

  • Timeout errors when connecting to Auth0/Discord/Supabase
  • Intermittent connection failures
  • SSL/TLS handshake failures

Diagnostic Steps

# 1. Test network connectivity
ping discord.com
ping auth0.com
ping supabase.co
 
# 2. Check DNS resolution
nslookup discord.mynatca.org
nslookup natca-prod.us.auth0.com
 
# 3. Test SSL connectivity
openssl s_client -connect discord.com:443 -servername discord.com
 
# 4. Check firewall/proxy settings
curl -v https://discord.com/api/v10/gateway

Solutions

  1. Increase Timeout Values

    HTTP_TIMEOUT=30000
    AUTH0_TIMEOUT=30000
    DISCORD_TIMEOUT=30000
  2. Configure Retry Logic

    // lib/http-client.js
    const retryConfig = {
      retries: 3,
      retryDelay: 1000,
      retryCondition: (error) => {
        return error.code === 'ECONNRESET' ||
               error.code === 'ETIMEDOUT';
      }
    };
  3. Check Proxy Configuration

    HTTP_PROXY=http://proxy.company.com:8080
    HTTPS_PROXY=http://proxy.company.com:8080
    NO_PROXY=localhost,127.0.0.1,.internal

Performance Optimization

Memory Issues

Symptoms

  • Out of memory errors
  • Gradual memory increase
  • Application crashes

Diagnostic Steps

# 1. Monitor memory usage
node --inspect app.js
# Open chrome://inspect in Chrome
 
# 2. Check for memory leaks
npm run test:memory-leak
 
# 3. Analyze heap dump
node --heapdump app.js

Solutions

  1. Optimize Batch Processing

    // Process smaller batches
    const BATCH_SIZE = 500; // Reduce from 1000
     
    // Clear references after processing
    batch = null;
    if (global.gc) global.gc();
  2. Implement Connection Pooling

    // Configure database connection pools
    const poolConfig = {
      max: 10,
      min: 2,
      acquireTimeoutMillis: 30000,
      idleTimeoutMillis: 600000
    };

CPU Optimization

Symptoms

  • High CPU usage
  • Slow response times
  • Request timeouts

Solutions

  1. Implement Caching

    // Cache member data
    const NodeCache = require('node-cache');
    const memberCache = new NodeCache({ stdTTL: 300 }); // 5 minutes
  2. Use Worker Threads

    // Offload heavy processing
    const { Worker, isMainThread, parentPort } = require('worker_threads');
     
    if (isMainThread) {
      const worker = new Worker(__filename);
      worker.postMessage(data);
    }

Security Issues

Token Validation Failures

Symptoms

  • "Invalid token" errors
  • "Token expired" messages
  • Authentication bypasses

Solutions

  1. Implement Proper Token Validation

    // Verify JWT tokens properly
    const jwt = require('jsonwebtoken');
    const jwksClient = require('jwks-rsa');
     
    const client = jwksClient({
      jwksUri: `https://${process.env.AUTH0_DOMAIN}/.well-known/jwks.json`
    });
  2. Secure Environment Variables

    # Use strong secrets
    AUTH0_SECRET=$(openssl rand -base64 32)
    SESSION_SECRET=$(openssl rand -base64 32)
     
    # Rotate secrets regularly
    npm run rotate:secrets

Recovery Procedures

Disaster Recovery

Data Recovery

# 1. Restore from backup
npm run restore:database --date=2023-01-01
 
# 2. Verify data integrity
npm run verify:data-integrity
 
# 3. Restart all services
pm2 restart all
 
# 4. Run health checks
npm run health:full-check

Service Recovery

# 1. Check service status
pm2 status
 
# 2. Restart failed services
pm2 restart mynatca-discord-bot
pm2 restart mynatca-discord-web
 
# 3. Verify functionality
curl https://discord.mynatca.org/api/health
npm run test:integration

Rollback Procedures

Application Rollback

# 1. Identify previous working deployment
vercel deployments list
 
# 2. Promote previous deployment
vercel promote <deployment-url>
 
# 3. Update DNS if needed
# 4. Verify rollback success

Database Rollback

# 1. Stop sync processes
pm2 stop sync-scheduler
 
# 2. Restore database
npm run db:restore --backup=backup_2023_01_01
 
# 3. Restart services
pm2 restart all

This comprehensive troubleshooting guide provides systematic approaches to identifying and resolving issues across the MyNATCA platform.