All files / src/cli/templates index.ts

0% Statements 0/15
0% Branches 0/1
0% Functions 0/3
0% Lines 0/14

Press n or j to go to the next uncovered block, b, p or k for the previous block.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
import { TemplateType } from '../commands/generate';
 
/**
 * Template structure
 */
export interface Template {
  name: string;
  description: string;
  files: Record<string, string>;
}
 
/**
 * Basic scraper template
 */
const basicTemplate: Template = {
  name: 'Basic Scraper',
  description: 'Simple page scraping template',
  files: {
    'src/index.ts': `import { CrawleeScraperEngine, ScraperDefinition } from 'crawlee-scraper-toolkit';
import { createLogger } from 'crawlee-scraper-toolkit/utils';
 
// Scraper definition
export const definition: ScraperDefinition<string, any> = {
  id: '{{name}}', // This will be replaced by the actual scraper name during generation
  name: '{{name}}', // This will be replaced
  description: '{{description}}',
  url: '{{url}}',
  navigation: {{navigation}},
  waitStrategy: {{waitStrategy}},
  requiresCaptcha: false,
  parse: async (context) => {
    const { page, input } = context;
    
    // Extract data from the page
    const title = await page.textContent('h1') || '';
    const description = await page.textContent('p') || '';
    
    // Custom extraction logic based on output fields
    const result: any = {
      title,
      description,
      url: page.url(),
      timestamp: new Date().toISOString(),
    };
 
    // Add configured output fields
    const outputFields = {{outputFields}};
    for (const field of outputFields) {
      try {
        let value: string | null = null;
        
        switch (field.type) {
          case 'text':
            value = await page.textContent(field.selector);
            break;
          case 'html':
            value = await page.innerHTML(field.selector);
            break;
          case 'attribute':
            value = await page.getAttribute(field.selector, field.attribute || 'href');
            break;
        }
        
        result[field.name] = value;
      } catch (error) {
        console.warn(\`Failed to extract field \${field.name}: \${error}\`);
        result[field.name] = null;
      }
    }
    
    return result;
  },
  options: {{options}},
};
 
// Main execution
async function main() {
  const logger = createLogger({ level: 'info', format: 'text', console: true });
  const engine = new CrawleeScraperEngine({
    browserPool: {
      maxSize: 3,
      maxAge: 30 * 60 * 1000,
      launchOptions: { headless: true, args: ['--no-sandbox'] },
      cleanupInterval: 5 * 60 * 1000,
    },
    defaultOptions: {
      retries: 3,
      retryDelay: 1000,
      timeout: 30000,
      useProxyRotation: false,
      headers: {},
      javascript: true,
      loadImages: false,
      viewport: { width: 1920, height: 1080 },
    },
    plugins: [],
    globalHooks: {},
    logging: { level: 'info', format: 'text' },
  }, logger);
 
  // Register the scraper
  // Register the scraper
  // For standalone execution, the definition is already available.
  // If this were part of a larger system where definitions are dynamically loaded,
  // engine.register(definition) would be used by that system.
 
  // Example usage
  const input = process.argv[2] || 'default-input';
  
  try {
    // When running directly, execute the local 'definition'
    const result = await engine.execute(definition, input);
    
    if (result.success) {
      console.log('Scraping successful!');
      console.log(JSON.stringify(result.data, null, 2));
    } else {
      console.error('Scraping failed:', result.error?.message);
    }
  } catch (error) {
    console.error('Error:', error);
  } finally {
    await engine.shutdown();
  }
}
 
if (require.main === module) {
  main().catch(console.error);
}
 
 
 
// Note: The primary export 'definition' is used when this scraper is run by 'crawlee-scraper run'.
// The main() function is for direct execution or testing of this specific scraper.
`,
    'README.md': `# {{name}}
 
{{description}}
 
## Usage
 
\`\`\`bash
# Install dependencies
npm install
 
# Run the scraper
npm run dev [input]
 
# Build for production
npm run build
npm start [input]
\`\`\`
 
## Configuration
 
The scraper is configured to:
- Target URL: {{url}}
- Navigation: {{navigation}}
- Wait Strategy: {{waitStrategy}}
 
## Output Fields
 
The scraper extracts the following fields:
{{#each outputFields}}
- **{{name}}**: {{selector}} ({{type}})
{{/each}}
 
## Customization
 
Edit \`src/index.ts\` to modify the scraper behavior:
- Update the \`parse\` function to extract different data
- Modify the navigation or wait strategies
- Add custom hooks or plugins
`,
  },
};
 
/**
 * API scraper template
 */
const apiTemplate: Template = {
  name: 'API Scraper',
  description: 'Template for extracting data from API responses',
  files: {
    'src/index.ts': `import { CrawleeScraperEngine, ScraperDefinition } from 'crawlee-scraper-toolkit';
import { createLogger } from 'crawlee-scraper-toolkit/utils';
 
// Scraper definition for API-based extraction
export const definition: ScraperDefinition<string, any> = {
  id: '{{name}}', // This will be replaced
  name: '{{name}}', // This will be replaced
  description: '{{description}}',
  url: '{{url}}',
  navigation: {{navigation}},
  waitStrategy: {{waitStrategy}},
  requiresCaptcha: false,
  parse: async (context) => {
    const { page, input } = context;
    
    // Wait for the API response
    const response = await page.waitForResponse(
      res => res.url().includes('{{waitStrategy.config.urlPattern}}'),
      { timeout: 30000 }
    );
    
    // Parse the JSON response
    const data = await response.json();
    
    // Process the API data
    const result = {
      input,
      url: page.url(),
      timestamp: new Date().toISOString(),
      apiData: data,
      // Add custom processing here
      processedData: processApiData(data),
    };
    
    return result;
  },
  options: {{options}},
};
 
// Helper function to process API data
function processApiData(data: any): any {
  // Implement your custom data processing logic here
  // This is where you would extract specific fields from the API response
  
  if (Array.isArray(data)) {
    return data.map(item => ({
      id: item.id,
      title: item.title || item.name,
      description: item.description,
      // Add more fields as needed
    }));
  }
  
  return {
    id: data.id,
    title: data.title || data.name,
    description: data.description,
    // Add more fields as needed
  };
}
 
// Main execution
async function main() {
  const logger = createLogger({ level: 'info', format: 'text', console: true });
  const engine = new CrawleeScraperEngine({
    browserPool: {
      maxSize: 3,
      maxAge: 30 * 60 * 1000,
      launchOptions: { headless: true, args: ['--no-sandbox'] },
      cleanupInterval: 5 * 60 * 1000,
    },
    defaultOptions: {
      retries: 3,
      retryDelay: 1000,
      timeout: 30000,
      useProxyRotation: false,
      headers: {},
      javascript: true,
      loadImages: false,
      viewport: { width: 1920, height: 1080 },
    },
    plugins: [],
    globalHooks: {},
    logging: { level: 'info', format: 'text' },
  }, logger);
 
  // Register the scraper
  // engine.register(definition); // Not needed for standalone main() with local definition
 
  // Example usage
  const input = process.argv[2] || 'default-query';
  
  try {
    const result = await engine.execute(definition, input);
    
    if (result.success) {
      console.log('API scraping successful!');
      console.log(JSON.stringify(result.data, null, 2));
    } else {
      console.error('API scraping failed:', result.error?.message);
    }
  } catch (error) {
    console.error('Error:', error);
  } finally {
    await engine.shutdown();
  }
}
 
if (require.main === module) {
  main().catch(console.error);
}
 
if (require.main === module) {
  main().catch(console.error);
}
`,
    'README.md': `# {{name}} - API Scraper
 
{{description}}
 
This scraper extracts data from API responses by intercepting network requests.
 
## Usage
 
\`\`\`bash
# Install dependencies
npm install
 
# Run the scraper
npm run dev [query]
 
# Build for production
npm run build
npm start [query]
\`\`\`
 
## How it works
 
1. Navigates to: {{url}}
2. Waits for API response matching: {{waitStrategy.config.urlPattern}}
3. Extracts and processes the JSON data
4. Returns structured results
 
## Customization
 
Edit the \`processApiData\` function in \`src/index.ts\` to customize how the API response is processed and what fields are extracted.
`,
  },
};
 
/**
 * Form scraper template
 */
const formTemplate: Template = {
  name: 'Form Scraper',
  description: 'Template for form-based scraping',
  files: {
    'src/index.ts': `import { CrawleeScraperEngine, ScraperDefinition } from 'crawlee-scraper-toolkit';
import { createLogger } from 'crawlee-scraper-toolkit/utils';
 
// Scraper definition for form-based extraction
export const definition: ScraperDefinition<string, any> = {
  id: '{{name}}', // Will be replaced
  name: '{{name}}', // Will be replaced
  description: '{{description}}',
  url: '{{url}}',
  navigation: {{navigation}},
  waitStrategy: {{waitStrategy}},
  requiresCaptcha: false,
  parse: async (context) => {
    const { page, input } = context;
    
    // Fill the form
    await page.fill('{{navigation.config.inputSelector}}', input);
    
    // Submit the form
    await page.click('{{navigation.config.submitSelector}}');
    
    // Wait for results to load
    await page.waitForSelector('{{waitStrategy.config.selector}}', { timeout: 30000 });
    
    // Extract results
    const results = await page.$$eval('{{waitStrategy.config.selector}} .result-item', elements => {
      return elements.map(el => ({
        title: el.querySelector('h3')?.textContent?.trim() || '',
        description: el.querySelector('p')?.textContent?.trim() || '',
        link: el.querySelector('a')?.href || '',
      }));
    });
    
    // Custom extraction logic based on output fields
    const customFields: any = {};
    const outputFields = {{outputFields}};
    
    for (const field of outputFields) {
      try {
        let value: string | null = null;
        
        switch (field.type) {
          case 'text':
            value = await page.textContent(field.selector);
            break;
          case 'html':
            value = await page.innerHTML(field.selector);
            break;
          case 'attribute':
            value = await page.getAttribute(field.selector, field.attribute || 'href');
            break;
        }
        
        customFields[field.name] = value;
      } catch (error) {
        console.warn(\`Failed to extract field \${field.name}: \${error}\`);
        customFields[field.name] = null;
      }
    }
    
    return {
      query: input,
      url: page.url(),
      timestamp: new Date().toISOString(),
      results,
      customFields,
      totalResults: results.length,
    };
  },
  options: {{options}},
};
 
// Main execution
async function main() {
  const logger = createLogger({ level: 'info', format: 'text', console: true });
  const engine = new CrawleeScraperEngine({
    browserPool: {
      maxSize: 3,
      maxAge: 30 * 60 * 1000,
      launchOptions: { headless: true, args: ['--no-sandbox'] },
      cleanupInterval: 5 * 60 * 1000,
    },
    defaultOptions: {
      retries: 3,
      retryDelay: 1000,
      timeout: 30000,
      useProxyRotation: false,
      headers: {},
      javascript: true,
      loadImages: false,
      viewport: { width: 1920, height: 1080 },
    },
    plugins: [],
    globalHooks: {},
    logging: { level: 'info', format: 'text' },
  }, logger);
 
  // Register the scraper
  // engine.register(definition); // Not needed for standalone main()
 
  // Example usage
  const input = process.argv[2] || 'default-search';
  
  try {
    const result = await engine.execute(definition, input);
    
    if (result.success) {
      console.log('Form scraping successful!');
      console.log(\`Found \${result.data.totalResults} results\`);
      console.log(JSON.stringify(result.data, null, 2));
    } else {
      console.error('Form scraping failed:', result.error?.message);
    }
  } catch (error) {
    console.error('Error:', error);
  } finally {
    await engine.shutdown();
  }
}
 
if (require.main === module) {
  main().catch(console.error);
}
 
if (require.main === module) {
  main().catch(console.error);
}
`,
    'README.md': `# {{name}} - Form Scraper
 
{{description}}
 
This scraper fills and submits forms to extract search results or other data.
 
## Usage
 
\`\`\`bash
# Install dependencies
npm install
 
# Run the scraper
npm run dev [search-term]
 
# Build for production
npm run build
npm start [search-term]
\`\`\`
 
## Form Configuration
 
- Input field: {{navigation.config.inputSelector}}
- Submit button: {{navigation.config.submitSelector}}
- Results container: {{waitStrategy.config.selector}}
 
## Customization
 
Edit \`src/index.ts\` to:
- Modify form selectors
- Change result extraction logic
- Add additional form fields
- Handle pagination
`,
  },
};
 
/**
 * Advanced scraper template
 */
const advancedTemplate: Template = {
  name: 'Advanced Scraper',
  description: 'Full-featured template with custom navigation and hooks',
  files: {
    'src/index.ts': `import { CrawleeScraperEngine, ScraperDefinition } from 'crawlee-scraper-toolkit';
import { createLogger } from 'crawlee-scraper-toolkit/utils';
import { RetryPlugin, CachePlugin, MetricsPlugin } from 'crawlee-scraper-toolkit/plugins';
 
// Advanced scraper definition with custom hooks and plugins
export const definition: ScraperDefinition<string, any> = {
  id: '{{name}}', // Will be replaced
  name: '{{name}}', // Will be replaced
  description: '{{description}}',
  url: '{{url}}',
  navigation: {{navigation}},
  waitStrategy: {{waitStrategy}},
  requiresCaptcha: false,
  
  // Custom hooks
  hooks: {
    beforeRequest: [
      async (context) => {
        console.log(\`Starting scrape for: \${context.input}\`);
        // Add custom headers, cookies, etc.
        await context.page.setExtraHTTPHeaders({
          'X-Custom-Header': 'crawlee-scraper-toolkit',
        });
      },
    ],
    afterRequest: [
      async (context) => {
        console.log(\`Completed scrape for: \${context.input}\`);
        // Log metrics, save screenshots, etc.
        if (process.env.SAVE_SCREENSHOTS === 'true') {
          await context.page.screenshot({
            path: \`screenshots/\${context.input}-\${Date.now()}.png\`,
          });
        }
      },
    ],
    onError: [
      async (context) => {
        console.error(\`Scrape failed for: \${context.input}\`, context.error);
        // Custom error handling
      },
    ],
    onRetry: [
      async (context) => {
        console.log(\`Retrying scrape for: \${context.input} (attempt \${context.attempt})\`);
        // Clear cookies, change user agent, etc.
      },
    ],
  },
  
  parse: async (context) => {
    const { page, input } = context;
    
    // Custom navigation logic
    if (context.navigation.type === 'custom') {
      await customNavigationLogic(page, input);
    }
    
    // Wait for content to load
    await page.waitForLoadState('networkidle');
    
    // Extract data with error handling
    const result: any = {
      input,
      url: page.url(),
      timestamp: new Date().toISOString(),
      metadata: {
        userAgent: await page.evaluate(() => navigator.userAgent),
        viewport: await page.viewportSize(),
      },
    };
    
    // Extract configured output fields
    const outputFields = {{outputFields}};
    for (const field of outputFields) {
      try {
        result[field.name] = await extractField(page, field);
      } catch (error) {
        console.warn(\`Failed to extract field \${field.name}: \${error}\`);
        result[field.name] = null;
      }
    }
    
    // Extract additional structured data
    result.structuredData = await extractStructuredData(page);
    
    return result;
  },
  
  // Custom validation
  validateInput: (input: string) => {
    if (!input || input.trim().length === 0) {
      return 'Input cannot be empty';
    }
    if (input.length > 100) {
      return 'Input too long (max 100 characters)';
    }
    return true;
  },
  
  validateOutput: (output: any) => {
    if (!output || typeof output !== 'object') {
      return 'Output must be an object';
    }
    if (!output.timestamp) {
      return 'Output must include timestamp';
    }
    return true;
  },
  
  options: {{options}},
};
 
// Custom navigation logic
async function customNavigationLogic(page: any, input: string): Promise<void> {
  // Implement custom navigation steps
  console.log('Executing custom navigation...');
  
  // Example: Handle complex multi-step navigation
  await page.goto('{{url}}');
  await page.waitForSelector('.search-form');
  await page.fill('.search-input', input);
  await page.click('.search-button');
  await page.waitForSelector('.results');
}
 
// Field extraction helper
async function extractField(page: any, field: any): Promise<string | null> {
  switch (field.type) {
    case 'text':
      return await page.textContent(field.selector);
    case 'html':
      return await page.innerHTML(field.selector);
    case 'attribute':
      return await page.getAttribute(field.selector, field.attribute || 'href');
    default:
      throw new Error(\`Unknown field type: \${field.type}\`);
  }
}
 
// Structured data extraction
async function extractStructuredData(page: any): Promise<any> {
  // Extract JSON-LD structured data
  const jsonLd = await page.$$eval('script[type="application/ld+json"]', scripts => {
    return scripts.map(script => {
      try {
        return JSON.parse(script.textContent || '');
      } catch {
        return null;
      }
    }).filter(Boolean);
  });
  
  // Extract Open Graph data
  const openGraph = await page.$$eval('meta[property^="og:"]', metas => {
    const og: any = {};
    metas.forEach(meta => {
      const property = meta.getAttribute('property');
      const content = meta.getAttribute('content');
      if (property && content) {
        og[property.replace('og:', '')] = content;
      }
    });
    return og;
  });
  
  return { jsonLd, openGraph };
}
 
// Main execution with plugins
async function main() {
  const logger = createLogger({ level: 'info', format: 'text', console: true });
  
  const engine = new CrawleeScraperEngine({
    browserPool: {
      maxSize: 5,
      maxAge: 30 * 60 * 1000,
      launchOptions: { 
        headless: process.env.HEADLESS !== 'false',
        args: ['--no-sandbox', '--disable-setuid-sandbox'],
      },
      cleanupInterval: 5 * 60 * 1000,
    },
    defaultOptions: {
      retries: 5,
      retryDelay: 2000,
      timeout: 60000,
      useProxyRotation: false,
      headers: {
        'Accept-Language': 'en-US,en;q=0.9',
      },
      javascript: true,
      loadImages: false,
      viewport: { width: 1920, height: 1080 },
    },
    plugins: [],
    globalHooks: {},
    logging: { level: 'info', format: 'text' },
  }, logger);
 
  // Install plugins
  engine.use(new RetryPlugin({ maxBackoffDelay: 30000 }));
  engine.use(new CachePlugin({ defaultTtl: 10 * 60 * 1000 }));
  engine.use(new MetricsPlugin());
 
  // Register the scraper
  // engine.register(definition); // Not needed for standalone main()
 
  // Example usage
  const input = process.argv[2] || 'default-input';
  
  try {
    console.log('Starting advanced scraper...'); // This console.log should ideally use the logger too
    const result = await engine.execute(definition, input);
    
    if (result.success) {
      console.log('Advanced scraping successful!');
      console.log(JSON.stringify(result.data, null, 2));
      
      // Show metrics
      const metricsPlugin = engine.getPlugin('metrics') as MetricsPlugin;
      if (metricsPlugin) {
        console.log('\\nMetrics:', metricsPlugin.getMetrics());
      }
    } else {
      console.error('Advanced scraping failed:', result.error?.message);
    }
  } catch (error) {
    console.error('Error:', error);
  } finally {
    await engine.shutdown();
  }
}
 
if (require.main === module) {
  main().catch(console.error);
}
 
if (require.main === module) {
  main().catch(console.error);
}
`,
    'README.md': `# {{name}} - Advanced Scraper
 
{{description}}
 
This is a full-featured scraper with custom hooks, plugins, and advanced error handling.
 
## Features
 
- ✅ Custom navigation logic
- ✅ Retry with exponential backoff
- ✅ Result caching
- ✅ Metrics collection
- ✅ Screenshot capture
- ✅ Structured data extraction
- ✅ Input/output validation
- ✅ Custom hooks
 
## Usage
 
\`\`\`bash
# Install dependencies
npm install
 
# Run with default settings
npm run dev [input]
 
# Run with screenshots enabled
SAVE_SCREENSHOTS=true npm run dev [input]
 
# Run in headed mode (show browser)
HEADLESS=false npm run dev [input]
\`\`\`
 
## Configuration
 
Environment variables:
- \`SAVE_SCREENSHOTS\`: Set to 'true' to save screenshots
- \`HEADLESS\`: Set to 'false' to run in headed mode
 
## Customization
 
This template includes examples of:
- Custom navigation logic
- Field extraction helpers
- Structured data extraction
- Plugin usage
- Hook implementation
 
Edit \`src/index.ts\` to customize for your specific use case.
`,
    'src/types.ts': `// Custom types for the scraper
export interface ScrapedData {
  input: string;
  url: string;
  timestamp: string;
  metadata: {
    userAgent: string;
    viewport: { width: number; height: number } | null;
  };
  structuredData: {
    jsonLd: any[];
    openGraph: Record<string, string>;
  };
}
 
export interface FieldConfig {
  name: string;
  selector: string;
  type: 'text' | 'html' | 'attribute';
  attribute?: string;
}
`,
  },
};
 
/**
 * Template registry
 */
const infiniteScrollTemplate: Template = {
  name: 'Infinite Scroll Scraper',
  description: 'Handles pages with infinite scrolling pagination.',
  files: {
    'src/index.ts': `import { Actor } from 'apify';
import { PlaywrightCrawler, log } from 'crawlee';
 
interface Input {
    startUrls: string[];
    maxScrolls: number;
    scrollDelayMs: number;
}
 
interface Output {
    url: string;
    title: string | null;
    scrapedItemCount: number;
    // Add other fields you want to scrape
}
 
async function main() {
    await Actor.init();
 
    const {
        startUrls = ['https://example.com/scrollable-page'], // Replace with your target URL
        maxScrolls = 10,
        scrollDelayMs = 1000,
    } = await Actor.getInput<Input>() ?? {} as Input;
 
    const crawler = new PlaywrightCrawler({
        requestHandlerTimeoutSecs: 120,
        async requestHandler({ request, page, enqueueLinks, log: crawlerLog }) {
            crawlerLog.info(\`Processing \${request.url}...\`);
            const title = await page.title();
            let scrapedItemCount = 0;
 
            let currentScroll = 0;
            let lastHeight = await page.evaluate(() => document.body.scrollHeight);
 
            while (currentScroll < maxScrolls) {
                crawlerLog.info(\`Scrolling... (Attempt \${currentScroll + 1}/\${maxScrolls})\`);
                await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
                await page.waitForTimeout(scrollDelayMs); // Wait for content to load
 
                const newHeight = await page.evaluate(() => document.body.scrollHeight);
                if (newHeight === lastHeight) {
                    crawlerLog.info('No new content loaded after scroll. Stopping.');
                    break;
                }
                lastHeight = newHeight;
 
                // Add your scraping logic here for newly loaded items
                // Example: count newly loaded items
                // const newItems = await page.locator('.newly-loaded-item').count();
                // scrapedItemCount += newItems;
                // crawlerLog.info(\`Found \${newItems} new items.\`);
 
                currentScroll++;
            }
 
            // Scrape initial items or items present after scrolling
            // Example:
            // const items = await page.locator('.item-selector').all();
            // for (const item of items) {
            //     // process item
            //     scrapedItemCount++;
            // }
 
            crawlerLog.info(\`Finished scrolling for \${request.url}. Total items (example): \${scrapedItemCount}\`);
 
            await Actor.pushData({ url: request.url, title, scrapedItemCount });
 
            // Example: Add logic to enqueue next pages if any, or detail pages
            // await enqueueLinks({
            //     selector: '.next-page-selector',
            // });
        },
        // Add other PlaywrightCrawler options as needed (e.g., proxyConfiguration)
    });
 
    await crawler.run(startUrls);
 
    await Actor.exit();
}
 
main().catch((err) => {
    log.error('Actor failed:', err);
    process.exit(1);
});
`,
    'README.md': `# Infinite Scroll Scraper Template
 
This template is designed for scraping web pages that use infinite scrolling to load content. Instead of traditional pagination links, new content is loaded dynamically as the user scrolls down the page.
 
## How it Works
 
1.  **Initial Load**: The scraper navigates to the start URL.
2.  **Scrolling Loop**:
    *   It scrolls to the bottom of the page using \`window.scrollTo(0, document.body.scrollHeight)\`.
    *   It waits for a specified delay (\`scrollDelayMs\`) to allow new content to load.
    *   It checks if new content has actually loaded by comparing the page height before and after the scroll. If the height hasn't changed, it assumes no new content is available and stops scrolling.
    *   This process repeats up to a maximum number of scrolls (\`maxScrolls\`) to prevent indefinite loops.
3.  **Data Extraction**: You should add your data extraction logic *within* the scrolling loop if you want to process items as they load, or *after* the loop if you want to process all items once scrolling is complete. The current example includes placeholders for this.
4.  **Output**: Scraped data is pushed to the Apify dataset.
 
## Configuration
 
The scraper accepts the following input parameters (defined in \`INPUT_SCHEMA.json\` if you use Apify Console, or passed via API):
 
*   \`startUrls\`: An array of URLs to start scraping from.
*   \`maxScrolls\`: The maximum number of times the scraper will scroll down the page. (Default: 10)
*   \`scrollDelayMs\`: The time (in milliseconds) to wait after each scroll for new content to load. (Default: 1000)
 
## Customization
 
*   **Target URL**: Change \`startUrls\` in your input to the page you want to scrape.
*   **Scraping Logic**: Modify the \`requestHandler\` function:
    *   Implement the logic to identify and extract data from newly loaded items within or after the scroll loop. Update selectors like \`.newly-loaded-item\` or \`.item-selector\`.
    *   If necessary, adjust how \`scrapedItemCount\` is calculated.
*   **Scroll Parameters**: Adjust \`maxScrolls\` and \`scrollDelayMs\` based on the target website's behavior. Some sites may require longer delays or more/less scrolls.
*   **Stopping Condition**: Enhance the condition for stopping the scroll loop if the height check is not reliable for your target site (e.g., look for a "no more items" element).
*   **Enqueueing Links**: If items link to detail pages, add \`enqueueLinks\` calls.
 
Remember to install dependencies (\`pnpm install\`) and build the Actor (\`pnpm build\`) before running.
`,
  },
};
 
const jsHeavySiteTemplate: Template = {
  name: 'JS-Heavy Site Scraper',
  description: 'Scrapes sites with heavy JavaScript, focusing on advanced waitFor strategies.',
  files: {
    'src/index.ts': `import { Actor } from 'apify';
import { PlaywrightCrawler, log } from 'crawlee';
 
interface Input {
    startUrls: string[];
    waitForSelectorTimeoutMs: number;
}
 
interface Output {
    url: string;
    title: string | null;
    dataFromDynamicContent?: string;
    eventTriggered?: boolean;
    // Add other fields you want to scrape
}
 
async function main() {
    await Actor.init();
 
    const {
        startUrls = ['https://example.com/js-heavy-page'], // Replace with your target URL
        waitForSelectorTimeoutMs = 5000,
    } = await Actor.getInput<Input>() ?? {} as Input;
 
    const crawler = new PlaywrightCrawler({
        requestHandlerTimeoutSecs: 180, // Increased timeout for potentially slow JS sites
        navigationTimeoutSecs: 120,
        async requestHandler({ request, page, enqueueLinks, log: crawlerLog }) {
            crawlerLog.info(\`Processing \${request.url}...\`);
            const title = await page.title();
            let output: Output = { url: request.url, title, scrapedItemCount: 0 };
 
            // Example 1: Wait for a specific element that appears after JS execution
            try {
                const dynamicElement = await page.waitForSelector('#dynamically-loaded-content', { timeout: waitForSelectorTimeoutMs });
                if (dynamicElement) {
                    output.dataFromDynamicContent = await dynamicElement.textContent() ?? "N/A";
                    crawlerLog.info('Successfully captured data from dynamically loaded content.');
                }
            } catch (e) {
                crawlerLog.warning(\`Could not find #dynamically-loaded-content within \${waitForSelectorTimeoutMs}ms.\`);
            }
 
            // Example 2: Wait for a specific function to return true
            // This is useful if a global variable is set or a condition is met after JS processing
            try {
                await page.waitForFunction(() => (window as any).myAppReady === true, { timeout: 10000 });
                crawlerLog.info('Condition window.myAppReady === true was met.');
                // You can now safely interact with elements that depend on this condition
            } catch (e) {
                crawlerLog.warning('waitForFunction condition was not met in time.');
            }
 
            // Example 3: Wait for a specific network request to finish
            // Useful if data is fetched via AJAX and you need to wait for that data
            try {
                const [response] = await Promise.all([
                    page.waitForResponse(resp => resp.url().includes('/api/data') && resp.status() === 200, { timeout: 15000 }),
                    // Add the action that triggers the network request if necessary, e.g., clicking a button
                    // page.click('#load-data-button'),
                ]);
                const responseBody = await response.json();
                crawlerLog.info('Received response from /api/data:', responseBody);
                // Process responseBody
            } catch (e) {
                crawlerLog.warning('Did not receive expected network response from /api/data in time.');
            }
 
            // Example 4: Wait for a specific event to be emitted on the page
            // This requires the page to use \`window.dispatchEvent(new CustomEvent('myCustomEvent'))\` or similar
            try {
                await page.waitForEvent('myCustomEvent', { timeout: 10000 });
                crawlerLog.info('myCustomEvent was dispatched on the page.');
                output.eventTriggered = true;
            } catch (e) {
                crawlerLog.warning('waitForEvent: myCustomEvent was not dispatched in time.');
            }
 
            // Add your main scraping logic here, assuming JS has loaded
            // const mainContent = await page.locator('#main-content-area').textContent();
            // output.mainContent = mainContent;
 
 
            await Actor.pushData(output);
 
            // Example: Add logic to enqueue next pages if any
            // await enqueueLinks({
            //     selector: '.next-page-selector',
            // });
        },
        // Consider using preNavigationHooks or postNavigationHooks for complex interactions
        // headless: 'new', // Try different headless modes if issues arise
    });
 
    await crawler.run(startUrls);
 
    await Actor.exit();
}
 
main().catch((err) => {
    log.error('Actor failed:', err);
    process.exit(1);
});
`,
    'README.md': `# JS-Heavy Site Scraper Template
 
This template is designed for scraping websites that heavily rely on JavaScript to render content, fetch data, or handle user interactions. It demonstrates various \`waitFor\` strategies provided by Playwright to ensure elements are available and actions are completed before proceeding with scraping.
 
## Key Features & Strategies
 
*   \`page.waitForSelector(selector, options)\`: Waits for an element matching the selector to appear in the DOM. Useful for content that loads dynamically.
    *   Example: Waiting for \`#dynamically-loaded-content\`.
*   \`page.waitForFunction(fn, arg, options)\`: Waits until the provided function, executed in the page context, returns a truthy value.
    *   Example: Waiting for a global flag like \`window.myAppReady === true\`.
*   \`page.waitForResponse(urlOrPredicate, options)\`: Waits for a network response that matches a URL or a predicate function.
    *   Example: Waiting for an API call like \`/api/data\` to complete.
*   \`page.waitForEvent(event, optionsOrPredicate)\`: Waits for a specific DOM event to be emitted on the page.
    *   Example: Waiting for a custom event \`myCustomEvent\` that might signify JS initialization is complete.
*   **Increased Timeouts**: Default timeouts for \`requestHandlerTimeoutSecs\` and \`navigationTimeoutSecs\` are increased as JS-heavy sites can be slower to load and process.
 
## How it Works
 
1.  **Navigation**: The scraper navigates to the start URL.
2.  **Waiting Strategies**: Before attempting to extract data, the \`requestHandler\` employs one or more \`waitFor\` methods to ensure the page is in the desired state.
    *   The examples in \`src/index.ts\` show how to use these methods. Uncomment or adapt them as needed.
3.  **Data Extraction**: Once the necessary conditions are met (e.g., elements are visible, API calls have returned), your data extraction logic can run.
4.  **Output**: Scraped data is pushed to the Apify dataset.
 
## Configuration
 
The scraper accepts the following input parameters:
 
*   \`startUrls\`: An array of URLs to start scraping from.
*   \`waitForSelectorTimeoutMs\`: Timeout in milliseconds for \`page.waitForSelector\`. (Default: 5000)
 
## Customization
 
*   **Target URL**: Change \`startUrls\` in your input.
*   **Waiting Logic**: This is the most crucial part to customize.
    *   Analyze your target website's behavior using browser developer tools (Network tab, Console, Elements panel).
    *   Identify which elements, network calls, or JavaScript events signal that the content you need is ready.
    *   Choose the appropriate \`waitFor\` methods and configure their selectors, predicates, and timeouts.
    *   You might need to chain multiple \`waitFor\` calls or use them in combination with actions like clicks (\`page.click()\`).
*   **Scraping Logic**: Implement your data extraction logic after the \`waitFor\` conditions are met.
*   **Error Handling**: The examples include basic \`try...catch\` blocks for \`waitFor\` methods. Enhance this as needed.
*   **Headless Mode**: Experiment with \`headless: false\` or \`headless: 'new'\` in \`PlaywrightCrawler\` options if you encounter issues specific to headless browsing. Some JS-heavy sites behave differently in headless mode.
*   **Proxy Configuration**: Essential for sites that might block based on IP. Configure \`proxyConfiguration\` in \`PlaywrightCrawler\`.
 
Remember to install dependencies (\`pnpm install\`) and build the Actor (\`pnpm build\`) before running.
`,
  },
};
 
const templates: Record<TemplateType, Template> = {
  basic: basicTemplate,
  api: apiTemplate,
  form: formTemplate,
  advanced: advancedTemplate,
  'infinite-scroll': infiniteScrollTemplate,
  'js-heavy': jsHeavySiteTemplate,
};
 
/**
 * Get template by type
 */
export function getTemplate(type: TemplateType): Template {
  const template = templates[type];
  Iif (!template) {
    throw new Error(`Template not found: ${type}`);
  }
  return template;
}
 
/**
 * Get all available templates
 */
export function getAvailableTemplates(): Array<{ type: TemplateType; template: Template }> {
  return Object.entries(templates).map(([type, template]) => ({
    type: type as TemplateType,
    template,
  }));
}