Back to projects

Automated Pipeline Shopify (Scraping → IA → Matrixify)

Automated Pipeline Shopify (Scraping → IA → Matrixify)
Simon RochwergSimon Rochwerg

Implementation of an automated pipeline to extract a supplier catalog or website data, enrich it with AI, and bulk-import it into Shopify using Matrixify.

Shopify Catalog Automation – Scraping → AI → Matrixify Pipeline

📞 Want to Automate Your Shopify Catalog? Phone / WhatsApp: +33 6 95 01 61 92 Book a call: https://calendly.com/simon-rochwerg-dx_b/30min

🎯 Project Objective

The client needed to rapidly scale their Shopify catalog (several thousand products) while maintaining a high level of quality across all product pages. The project involved handling complex metadata: descriptions, images, technical specifications, PDFs, variants, and accessories — a process that was slow, error-prone, and impossible to scale manually.

The goal was to build a fully automated pipeline capable of:

  • extracting product data from a supplier website
  • cleaning and normalizing all information
  • enriching product content using AI
  • generating a complete Matrixify-ready CSV file
  • automatically importing products, variants, and accessories into Shopify

🛠️ What I Built

1. Advanced Supplier Website Scraping

Development of a robust scraper capable of collecting:

  • titles and descriptions
  • technical specifications
  • high-resolution images
  • technical documents (PDF)
  • product variants
  • accessories

All extracted data is structured, cleaned, and standardized to fit Shopify’s data model.


2. Normalization & Data Cleaning

Implementation of a full data cleaning pipeline including:

  • duplicate detection and removal (SKU & EAN based)
  • brand harmonization
  • logistics weight processing
  • removal of null or invalid values in technical sheets
  • SEO formatting (70-character titles, 160-character meta descriptions)

3. AI Enrichment (OpenAI API)

Creation and refinement of dedicated prompts to automatically generate:

  • clean and professional product descriptions
  • structured technical specifications (YAML format)
  • complete FAQ sections (inner metafields)

Client-provided prompts were stabilized to ensure consistent results across thousands of products.


4. Matrixify CSV Generation

Development of an automatic Matrixify CSV generator producing:

  • main products
  • individual variants
  • linked accessories
  • metafields and structured metadata
  • product images
  • technical PDFs uploaded directly into Shopify

Each batch is imported into Shopify in draft mode so the client can review and validate before publishing.


5. Bulk Import Into Shopify

Using Matrixify, the system supports:

  • importing dozens of products in a single operation
  • managing all relationships (product ↔ variant ↔ accessory)
  • automatic upload of associated PDF documents
  • clean, consistent, SEO-optimized product pages

🚀 Results Achieved

  • Entire product categories imported automatically in minutes
  • AI-enhanced product pages that are consistent, readable, and SEO-optimized
  • Zero manual work required for the e-commerce team
  • Reusable pipeline for any future supplier or data source
  • Infrastructure capable of scaling to thousands of products effortlessly

📈 Business Impact

  • Massive reduction in integration time (days → minutes)
  • Elimination of human errors (variants, PDFs, metadata inconsistencies)
  • Significant improvement in perceived product quality
  • A solid technical foundation for future catalog automation across suppliers

🧩 Technologies Used

  • Python (scraping & transformation)
  • OpenAI API
  • Matrixify (Shopify)
  • Shopify Admin API

📞 Want to Automate Your Shopify Catalog?

Phone / WhatsApp: +33 6 95 01 61 92
Book a call: https://calendly.com/simon-rochwerg-dx_b/30min

Have a similar project?

Let's discuss your data needs together.

Get a Quote