Switch MaterialDimensions inheritance from TPH (single table with discriminator) to TPC (table per concrete type) with individual tables per shape. Add Swagger for dev API exploration, expand SeedController with export/import endpoints and Alro catalog JSON dataset, and include Python scraper for Alro catalog PDFs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4.0 KiB
Alro Steel SmartGrid Scraper — Remaining Steps
Status: Script is READY TO RUN
The scraper at scripts/AlroCatalog/scrape_alro.py is complete and tested. Discovery mode confirmed it works correctly against the live site.
What's Done
- Script written with correct ASP.NET control IDs (discovered via
--discovermode) - Level 1 (main grid) navigation: working
- Level 2 (popup grid) navigation: working
- Level 3 (dims panel) scraping: working — uses cascading dropdowns
ddlDimA→ddlDimB→ddlDimC→ddlLength - Grade filter: 11 common grades (A-36, 1018, 1045, 1144, 12L14, etc.)
- Size string normalization: "1-1/2"" matches O'Neal format
- Progress save/resume: working
- Discovery mode verified: A-36 Round bars → 27 sizes, 80 items (lengths include "20 FT", "Custom Cut List", "Drop/Remnant" — non-stock entries filtered out in catalog builder)
Remaining Steps
Step 1: Run the full scrape
cd C:\Users\aisaacs\Desktop\Projects\CutList
python scripts/AlroCatalog/scrape_alro.py
- This scrapes all 3 categories (Bars, Pipe/Tube, Structural) for 11 filtered grades
- Takes ~30-60 minutes (cascading dropdown selections with 1.5s delay each)
- Progress saved incrementally to
scripts/AlroCatalog/alro-scrape-progress.json - If interrupted, resume with
python scripts/AlroCatalog/scrape_alro.py --resume - To scrape ALL grades:
python scripts/AlroCatalog/scrape_alro.py --all-grades
Step 2: Review output
- Output:
CutList.Web/Data/SeedData/alro-catalog.json - Verify material counts, shapes, sizes
- Spot-check dimensions against myalro.com
- Compare shape coverage to O'Neal catalog
Step 3: Post-scrape adjustments (if needed)
Dimension mapping for Structural/Pipe shapes: The build_size_and_dims() function handles all shapes but Structural (Angle, Channel, Beam) and Pipe/Tube shapes haven't been tested live yet. After scraping, check the screenshots in scripts/AlroCatalog/screenshots/ to verify dimension mapping. The first item of each new shape gets a screenshot + HTML dump.
Known dimension mapping assumptions:
- Angle: DimA = leg size, DimB = thickness →
"leg1 x leg2 x thickness"(assumes equal legs) - Channel: DimA = height, DimB = flange → needs verification
- IBeam: DimA = depth, DimB = weight/ft →
"W{depth} x {wt}" - SquareTube: DimA = size, DimB = wall
- RectTube: DimA = width, DimB = height, DimC = wall
- RoundTube: DimA = OD, DimB = wall
- Pipe: DimA = NPS, DimB = schedule
If dimension mapping is wrong for a shape: Edit the build_size_and_dims() function in scrape_alro.py and re-run just the catalog builder:
python -c "
import json
from scripts.AlroCatalog.scrape_alro import build_catalog
data = json.load(open('scripts/AlroCatalog/alro-scrape-progress.json'))
catalog = build_catalog(data['items'])
json.dump(catalog, open('CutList.Web/Data/SeedData/alro-catalog.json', 'w'), indent=2)
"
Step 4: Part numbers (optional future enhancement)
The current scraper captures sizes and lengths but NOT part numbers. To get part numbers, the script would need to:
- Select DimA + DimB + Length
- Click the "Next >" button (
btnSearch) - Capture part number from the results panel
- Click Back
This adds significant time per item. The catalog works without part numbers — the supplierOfferings have empty partNumber/supplierDescription fields.
Key Files
| File | Purpose |
|---|---|
scripts/AlroCatalog/scrape_alro.py |
The scraper script |
scripts/AlroCatalog/alro-scrape-progress.json |
Incremental progress (resume support) |
scripts/AlroCatalog/screenshots/ |
Discovery HTML/screenshots per shape |
CutList.Web/Data/SeedData/alro-catalog.json |
Final output (same schema as oneals-catalog.json) |
CutList.Web/Data/SeedData/oneals-catalog.json |
Reference format |
Grade Filter (editable in script)
Located at line ~50 in scrape_alro.py. Current filter:
- A-36, 1018 CF, 1018 HR, 1044 HR, 1045 CF, 1045 HR, 1045 TG&P
- 1144 CF, 1144 HR, 12L14 CF, A311/Stressproof
To add/remove grades, edit the GRADE_FILTER set in the script.