[{"content":" Table of Contents Introduction Background / Goal Related Work Approach Dataset Planned Experiments Fixing Imbalances Future Steps Experiments \u0026amp; Results Experiment 1: Experiment 2: Experiment 3: Experiment 4: Experiment 5: Conclusion Recap! Current Limitations :c Future Work summary References Introduction just ome content here!\nBackground / Goals Related Work Pre-Deep Learning:\nPretty much these are early methods that used handicrafted features (Gabor Filters, LBP) with SVM\u0026rsquo;s, but mostly struggled to generalize across conditions\nResNet (He et al., 2016):\nIntroduced a deep residual learning algorithm, which now became the standard backbone for any recent ViT (Vision Transformers) model\nTransfer Learning:\nThis is the practice of using a fine-tuned pretrained CNN model on small industrial datasets\nClass Imbalance:\nA persistent challenge that rules over GC10-DET (our dataset), that is later addressed via weighted losses and many oversampling strategies\nAdvanced Detectors:\nRecently this comes in the form of: YOLO, R-CNN\u0026rsquo;s, and ViT for detection tasks\nOur Baseline:\nIf we need a classification with limited data, a fine-tuned ResNet remains a competitive baseline.\nApproach Basically, we took a pretrained ResNet 18 and fine-tuned it for our 10-class problem. Resizing all images to 224x224 and normalize it using ImageNet\u0026rsquo;s mean and standard deviation; required since it\u0026rsquo;s what the pretrained weights are expecting.\nWe then split the data to a 75/15/10 \u0026lt;-\u0026gt; train/val/test split with a fixed random seed for reproducibility. But this brings the issue of of handling class imbalances.\nWe do this by using a WeightedRandomSampler, which oversamples rare classes in each batch. Importantly, we then used an unweighted CrossEntropyLoss since any balances could happen only through the sampler to avoid double-counting.\nTherefore, by also training with Adam (an optimizer that adjusts learning rates during training) at a learning rate of 0.001, a batch size of 32, and 25 epochs with the CosineAnnealingLr method to gain metrics such as:\nnormalized confusion matrix\nprecision/recall/F1\nCohen\u0026rsquo;s Kappa\nPlanned Experiments To elaborate more, we made three planned experiments at the start of this project!\nRegular Canny Edge Detection - Where we just highlighted surface features (like oil spots, cracks, creases, and other structural anomalies)\nWeld Seam Analysis - Analyzed any weld seams for cracks / incomplete fusions using close-up images. Then applied a Canny Edge Detection model to highlight seam irregularities along those weld lines (to see if it can detect irregularities that might be difficult to spot normally)\nEvaluation Metrics - Maintained a confidence score measuring how close a predicted class is to the TRUE defect type\nReally, we intended to use a Canny Edge Detector as a preprocessing step to highlight structural defect features, and then apply it specifically to a weld seam analysis, and track a confidence score as our evaluation metric\nHowever, as you\u0026rsquo;ll see later, Experiment 1 directly tests this detection idea (but it didn\u0026rsquo;t work the way we expected)\nDataset This GC10-DET dataset contains images that are collected in real industrial settings across 10 surface defect classes\nIn other words, this dataset contains REAL industrial images across 10 defect classes. As you can see the class imbalance is very severe since \u0026ldquo;silk_spot\u0026rdquo; has 497 training images (highlighted in green), while the \u0026ldquo;crease\u0026rdquo; class only has 45 training images (highlighted in red). Thats obviously an 11x difference!\nMeaning that without any correction, the model would just learn to predict \u0026ldquo;silk_spot\u0026rdquo; constantly since that minimizes our loss. Which is WHY the WeightedRandomSampler is super critical!\nTo see more details -\u0026gt; look in next section!*\nSplitting the Dataset* Here is the actual code from our program for how we split the data. We first carve out the test set from the full dataset, then split the remaining portion into its train and validation. Meaning that the fixed generator means anyone running this code will get identical splits\n## USED TO PREVENT OVERFITTING ! ! ! test_ratio = 0.10 val_ratio = 0.15 train_ratio = 1 - test_ratio - val_ratio generator = torch.Generator() generator.manual_seed(42) train_val, test = random_split( dataset, [train_val_size, test_size], generator=generator ) train, val = random_split( train_val, [train_size, val_size], generator=generator ) Fixing The Imbalances ! So why\nFuture Steps ! Experiments \u0026amp; Results Experiment 1: Experiment 2: Experiment 3: Experiment 4: Experiment 5: Conclusion Recap! Limitations! Future Work Summary References ","permalink":"https://jsilab.github.io/posts/2026-04-17/vit-metal-detection-model/","summary":"\u003c!-- raw HTML omitted --\u003e\n\u003ch1 id=\"table-of-contents\"\u003eTable of Contents\u003c/h1\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"#intro\"\u003eIntroduction\u003c/a\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"#back-goals\"\u003eBackground / Goal\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#related\"\u003eRelated Work\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#approach\"\u003eApproach\u003c/a\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"#dataset\"\u003eDataset\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#planned\"\u003ePlanned Experiments\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#fixing-imbalances\"\u003eFixing Imbalances\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#future-steps\"\u003eFuture Steps\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#experi-results\"\u003eExperiments \u0026amp; Results\u003c/a\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"#experi_1\"\u003eExperiment 1:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#experi_2\"\u003eExperiment 2:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#experi_3\"\u003eExperiment 3:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#experi_4\"\u003eExperiment 4:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#experi_5\"\u003eExperiment 5:\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#conclusion\"\u003eConclusion\u003c/a\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"#recap\"\u003eRecap!\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#limitations\"\u003eCurrent Limitations :c\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#the-future\"\u003eFuture Work\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#summary\"\u003esummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#references\"\u003eReferences\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c!-- raw HTML omitted --\u003e\n\u003cp\u003e\u003c!-- raw HTML omitted --\u003e\u003c!-- raw HTML omitted --\u003e\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003ejust ome content here!\u003c/p\u003e\n\u003cp\u003e\u003c!-- raw HTML omitted --\u003e\u003c!-- raw HTML omitted --\u003e\u003c/p\u003e","title":"ViT Metal Detection Model"}]