๐Ÿ“”
Blog
HOMEPORTFOLIOHIRE MEGITHUB
  • Home
  • ๐Ÿ™‡Database
    • Oracle Database
      • ๐ŸLDOM Oracle VM for SPARC
      • ๐ŸŽOracle Processor Core Factor Table
      • ๐ŸCheck Oracle Error in Alert Log with SQL*Plus
      • ๐ŸŠCreate Oracle Corrupt Data Block
      • ๐Ÿ‹RMAN Backup Tuning
      • ๐ŸŒInstall NTP on Oracle Linux
      • ๐Ÿ‰Best Practice Check Tablespace Size on Oracle
      • ๐Ÿ‡Fix Password File Missing on Oracle
      • ๐ŸซCheck Oracle Error Message with OERR
      • ๐Ÿ“Change and Restore Password on Oracle
      • ๐ŸˆFix Oracle could not find Archive Log
      • ๐Ÿ’Check Database Uptime on Oracle
      • ๐Ÿ‘Fix Oracle Date Format
      • ๐ŸฅญIdentity Column on Oracle
      • ๐ŸFix Oracle Service handle not Initialized
      • ๐ŸฅฅGoldenGate Credential Store
      • ๐ŸฅCreate Auto Increment on Oracle
      • ๐Ÿ…Recompile Object Invalid on Oracle
      • ๐Ÿ†Create Database Link on Oracle
      • ๐Ÿฅ‘Cluster vs Grid
      • ๐Ÿซ’Install DBSAT for Oracle Database 11gR2
      • ๐ŸฅฆFix Oracle End-of-File on Communication Channel
      • ๐ŸฅฌFix Oracle Database Out of Memory
      • ๐Ÿซ‘Export and Import Oracle Database with Data Pump
      • ๐Ÿฅ’Monitor Log Switch each Hour of Day on Oracle with SQL*Plus
      • ๐ŸŒถ๏ธChange Column Format on Oracle with SQL*Plus
      • ๐ŸŒฝCheck Version Component on Oracle with SQL*Plus
      • ๐Ÿฅ•Check Database Size on Oracle with SQL*Plus
      • ๐Ÿง„Migrate Oracle Database with RMAN
      • ๐Ÿง…Fix Enterprise Manager Error OC4J Configuration issue
    • Microsoft SQL Server
      • ๐Ÿ’ŠBackup SQL Server Database with SQL Backup and FTP
      • ๐Ÿ’‰Backup Full SQL Server with SQL Script
  • ๐Ÿ™‡โ€โ™€๏ธINFRASTRUCTURE
    • DNS
      • ๐Ÿ“ฆFix Ldap Error Search Configuration Naming Context failed
      • ๐Ÿ“ฆJoin Domain to Active Directory on Ubuntu 22.04
      • ๐Ÿ“ฆSender Policy Framework Record
      • ๐Ÿ“ฆMigrate User on Active Directory to Another Domain
      • ๐Ÿ“ฆFix canโ€™t Delete Object on Active Directory
      • ๐Ÿ“ฆCreate Conditional Forwarder DNS on Windows Server 2012 R2
      • ๐Ÿ“ฆCreate Stub DNS Zone on Windows Server 2012 R2
      • ๐Ÿ“ฆDomain Permission Admin
      • ๐Ÿ“ฆDomain Model
      • ๐Ÿ“ฆActive Directory ( AD ) 101
    • Network
      • ๐Ÿ‘ฅLocal Area Network 101
      • ๐Ÿ‘ฅExport and Import Session on SecureCRT
      • ๐Ÿ‘ฅVPN Protocol
    • Security
      • ๐ŸŽฉScam Mail
      • ๐ŸŽฉDisable Antivirus Protection on Windows 10
      • ๐ŸŽฉFix Security Certificate is not Trusted
      • ๐ŸŽฉBest Practice Ransomware Protection
      • ๐ŸŽฉDigital Forensic
      • ๐ŸŽฉCheck Installed Patch Compare Vulnerable on Windows 10
    • ISO 27001
      • ๐ŸถScenario-Based Risk
  • ๐Ÿ™‡โ€โ™‚๏ธServer
    • Windows Server
      • ๐Ÿ‘บUpgrade Windows Server 2003 SP2 to Windows Server 2008 R2
      • ๐Ÿ‘บBest Practice After Install Windows Server
      • ๐Ÿ‘บWindows Server Product Key
      • ๐Ÿ‘บHow to convert PFX to CRT and KEY File on Windows Server
      • ๐Ÿ‘บCheck Security Windows Application with Winchecksec
      • ๐Ÿ‘บFix Internet Explorer Block Website on Windows Server 2019
      • ๐Ÿ‘บInstall Windows Admin Center on Windows Server 2019
    • Linux Server
      • ๐Ÿ‘ฟFix SSH Error: no matching key exchange method found
      • ๐Ÿ‘ฟFix Ubuntu Package Manager Lock: Could not get lock /var/lib/dpkg/lock
      • ๐Ÿ‘ฟInstall Kali Linux GUI on Windows Subsystem for Linux
      • ๐Ÿ‘ฟHow to get DateTime History on Linux
      • ๐Ÿ‘ฟChange IP and Hostname on Ubuntu 18.04
      • ๐Ÿ‘ฟiSCSI Initiator on CentOS 7
      • ๐Ÿ‘ฟTMUX Command
      • ๐Ÿ‘ฟCreate User on Linux
      • ๐Ÿ‘ฟChange Username and UID / GID on Linux
    • User
      • ๐Ÿ‘ฝOSQuery
      • ๐Ÿ‘ฝHow to decompress GZIP, BZIP and TAR File on Windows 10
      • ๐Ÿ‘ฝUpgrade Windows 10 Version 1803 to 1903
      • ๐Ÿ‘ฝJoin Windows Insider Program on Windows 10
      • ๐Ÿ‘ฝFix RDP Error Exceeded the Maximum Number of Allowed Connections
      • ๐Ÿ‘ฝHow to enable DNS over HTTPS on Firefox
  • ๐Ÿง‘โ€๐ŸผSoftware
    • VMware
      • ๐ŸŒ Fix Deploy OVA Error no Support Hardware Versions on VMware ESXi 6.7
      • ๐ŸŒ VMware Workstation and VMware ESXi Compatible
      • ๐ŸŒ Promiscuous Mode and Forged Transmits on VMware vSphere 6.5
      • ๐ŸŒ Update Patches on VMware ESXi 6.7 with ESXCLI
      • ๐ŸŒ Fix Alert Hyperthreading Unmitigated on VMware ESXi 6.7
      • ๐ŸŒ Fix VMware Error Client Session is no Longer Authenticated
      • ๐ŸŒ Merge AVHDX and VHDX for Convert to VMDK
      • ๐ŸŒ Convert VMDK to VHDX with Microsoft Virtual Machine Converter
      • ๐ŸŒ Fix VMware Workstation Device / Credential Guard are not Compatible
      • ๐ŸŒ Convert VMDK to VHDX with StarWind V2V Converter
    • Veeam Backup
      • ๐ŸงŠInstall Veeam Backup & Replication 11 on Windows
      • ๐ŸงŠBackup Planning Matrix
      • ๐ŸงŠFix Veeam Backup Error Full Backup File Merge Failed
    • ESET
      • ๐Ÿ•โ€๐ŸฆบFix canโ€™t Uninstall ESET Management Agent
  • ๐Ÿ‘ฉโ€๐ŸผProgramming
    • Fundamental
      • ๐Ÿ”ตID Token vs Access Token
      • ๐Ÿ”ดNULL vs NOT NULL
      • ๐ŸŸฃMicrosoft Universal Data Access
      • ๐ŸŸกFlow Graph Aptitude Test
      • ๐ŸŸ C Pyramid โ€“ Part I
      • ๐ŸŸคC Pyramid โ€“ Part II
      • โšชJSON Web Token
      • ๐ŸŸฆMemory Architecture
    • Tools
      • AI Tools
        • โค๏ธGenerative AI Tools
        • ๐Ÿ’™Prompt Tools
        • ๐Ÿ’šDataset Tools
      • Extension Tools
        • โ„๏ธTop Extension Microsoft Edge
        • โ„๏ธTop Extension Visual Studio Code
      • Other Tools
        • ๐Ÿš—Business Tools
        • ๐Ÿš•CI/CD Tools
        • ๐Ÿš™Design Tools
        • ๐Ÿ›ปFreelance Tools
        • ๐ŸšŒHands-On Lab Tools
        • ๐ŸšŽProductive Tools
        • ๐ŸŽ๏ธProgramming Tools
        • ๐Ÿš“SEO Tools
        • ๐Ÿš‘Mac Tools
      • Package Tools
        • ๐ŸงถTop NuGet Package .NET Core
      • Visual Studio Code
        • ๐ŸŒ‘How to show Folder .git in Visual Studioย Code
        • ๐ŸŒ‘Fix CMake not Found Visual Studio
        • ๐ŸŒ‘Backup Extension for Visual Studio Code
    • Python
      • Poetry
        • ๐ŸPoetry Virtual Environment Command
    • PHP
      • Laravel
        • ๐Ÿ‚Laravel Redis Key-Value Store
    • .NET Core
      • ๐Ÿน.NET Core Automatic Code Review with SonarQube
      • ๐Ÿน.NET Core Disable Authentication in Development Environment
      • ๐ŸนOverview .NET Core
  • ๐Ÿ‘จโ€๐ŸผOther
    • Blog
      • ๐ŸŽƒBlog Dell
      • ๐ŸŽƒBlog Gitbook
      • ๐ŸŽƒBlog Network
    • Big Data
      • ๐ŸฑBig Data เธ‚เธญเธ‡เธเธฃเธฐเธ—เธฃเธงเธ‡เธชเธฒเธ˜เธฒเธฃเธ“เธชเธธเธ‚
    • Chat Bot
      • Chatfuel
        • ๐ŸญFacebook Chatbot with Chatfuel
      • Dialogflow
        • โ›ฑ๏ธDialogflow Connect Multiple Firebase Realtime Database
        • โ›ฑ๏ธChatbot with Dialogflow and Firebase Realtime Database
    • Docker
      • ๐ŸณHow to Trust Sign Image on Docker
      • ๐ŸณUpgrade Docker Compose
      • ๐ŸณInstall Nginx-RTMP and FSTV-Monitor on Docker
    • Machine Learning
      • ๐ŸงคPC SPEC for Deep Learning 2021
      • ๐ŸงคInstall YOLO Object Detection on Windows 10
      • ๐ŸงคThai Natural Language Processing with Python
      • ๐ŸงคInstall Tensorflow with GPU on Windows 10
      • ๐ŸงคPC SPEC for Deep Learning 2019
      • ๐ŸงคSpeech to Text with Google API
      • ๐ŸงคCUDA-Z
      • ๐ŸงคDVC Version Control for Machine Learning
    • Standard
      • ๐ŸงฌDigital ID
      • ๐Ÿ’ปComputer Naming Convention
      • ๐ŸŽเธกเธฒเธ•เธฃเธเธฒเธ™เธ‚เน‰เธญเธกเธนเธฅเธเธฅเธฒเธ‡
    • Policy
      • ๐Ÿ”GDPR & PDPA
      • ๐ŸŸTDPG 2.0
      • ๐Ÿ•Cookie Consent
    • WSL
      • ๐ŸงฟHow to move Distribution Data WSL to new Location
      • ๐ŸงฟExport and Import Distro on Windows Subsystem for Linux
      • ๐ŸงฟInstall Docker on Windows Subsystem for Linux
  • ๐Ÿง™โ€โ™‚๏ธMICROSOFT 365
    • Outlook
      • ๐Ÿ“ฉHow to enable Forward Email on Microsoft Office 365 to Another Domain
      • ๐Ÿ“ฉFix Mailbox Storage Limit on Microsoft Office 365 with Compliance Search
      • ๐Ÿ“ฉFix canโ€™t Search Thai Language on Microsoft Outlook
    • Power Automate
      • ๐Ÿค–How to Rename all Files in Folder use UUID with Power Automate Desktop
      • ๐Ÿค–How to get SharePoint List Comment with Power Automate
      • ๐Ÿค–How to post Approve Comment to SharePoint List with Power Automate
      • ๐Ÿค–Generate Unique ID when Submit Microsoft Form with Power Automate
      • ๐Ÿค–Notification Maintenance when SharePoint List Create with Power Automate
      • ๐Ÿค–Send Email and Share File Word Document with Power Automate
      • ๐Ÿค–Generate Word Document when Submit Microsoft Form with Power Automate
Powered by GitBook
On this page
  • Workflow
  • Download
  • Get Started

Was this helpful?

  1. Other
  2. Machine Learning

DVC Version Control for Machine Learning

Last updated 1 year ago

Was this helpful?

เธเธฒเธฃเนƒเธŠเน‰เธ‡เธฒเธ™ Version Control เธกเธตเธกเธฒเธ™เธฒเธ™เนเธฅเน‰เธงเธชเธณเธซเธฃเธฑเธšเน€เธซเธฅเนˆเธฒ Programmer เนเธฅเธฐ Developer เน‚เธ”เธขเนƒเธ™เธชเธฒเธขเธ‡เธฒเธ™เธ”เน‰เธฒเธ™ Data Science เธ—เธตเนˆเธ—เธณเน€เธเธตเนˆเธขเธงเธเธฑเธš Machine Learning เธเน‡เธกเธต Version Control เน€เธซเธกเธทเธญเธ™เธเธฑเธ™ เน€เธฃเธตเธขเธเธงเนˆเธฒ DVC เธ‹เธถเนˆเธ‡เธˆเธฐเธ„เธฅเน‰เธฒเธข เน† เธเธฑเธš Git

Workflow

เน‚เธ”เธขเธ›เธเธ•เธดเธเธฒเธฃเธชเธฃเน‰เธฒเธ‡ Model เธ‚เธญเธ‡ Machine Learning เธˆเธฐเธ›เธฃเธฐเธเธญเธšเน„เธ›เธ”เน‰เธงเธข 3 เธชเนˆเธงเธ™ เธ„เธทเธญ Code, Data เนเธฅเธฐ Configuration เธ™เธณเธกเธฒ Train เน€เธžเธทเนˆเธญเนƒเธซเน‰เน„เธ”เน‰ Model เนเธฅเธฐเธˆเธฐเธกเธตเธเธฒเธฃเธ—เธณ Reproduce

เธซเธฅเธฑเธเธเธฒเธฃเธ—เธณเธ‡เธฒเธ™เธ‚เธญเธ‡ DVC เธˆเธฐเธ„เธฅเน‰เธฒเธข เน† เธเธฑเธš Git เนเธ•เนˆเธˆเธฐเนเธšเนˆเนˆเธ‡เธเธฒเธฃเน€เธเน‡เธšเธญเธญเธเน€เธ›เน‡เธ™ 2 เนเธšเธš เธ„เธทเธญ เธชเนˆเธงเธ™เธ—เธตเนˆเน€เธ›เน‡เธ™ Code เธˆเธฐเน€เธเน‡เธšเธญเธขเธนเนˆเนƒเธ™ Remote Code Storage เธ‚เธญเธ‡ Git Server เนเธฅเธฐเธชเนˆเธงเธ™เธ—เธตเนˆเน€เธ›เน‡เธ™ Model เธˆเธฐเน€เธเน‡เธšเธญเธขเธนเนˆเนƒเธ™ Remote Data Storage เน€เธŠเนˆเธ™ S3, GS, Azure, SSH เธ•เธฒเธกเธฃเธนเธ›เธ”เน‰เธฒเธ™เธฅเนˆเธฒเธ‡

Download

Get Started

  • เธ—เธณเธเธฒเธฃเธ”เธฒเธงเธ™เนŒเน‚เธซเธฅเธ”เนเธฅเธฐเธ•เธดเธ”เธ•เธฑเน‰เธ‡ DVC

  • เธ—เธณเธเธฒเธฃเธ”เธฒเธงเธ™เนŒเน‚เธซเธฅเธ” Code เนเธฅเธฐเธชเธฃเน‰เธฒเธ‡ Git Repository

C:\dvc>
git init
C:\dvc>
wget https://dvc.org/s3/examples/so/code.zip
C:\dvc>
unzip code.zip && rm -f code.zip
C:\dvc>
git add code/
git commit -m "download and initialize code"
  • เธ—เธณเธเธฒเธฃเธชเธฃเน‰เธฒเธ‡ Virtual Environment

C:\dvc>
mkvirtualenv venv
C:\dvc>
workon venv
  • เธ—เธณเธเธฒเธฃเธ•เธดเธ”เธ•เธฑเน‰เธ‡ Package เธˆเธฒเธเน„เธŸเธฅเนŒ requirements.txt

(venv) C:\nlp>
pip install -r code/requirements.txt
  • เธ—เธณเธเธฒเธฃเธชเธฃเน‰เธฒเธ‡ DVC Repository

(venv) C:\nlp>
dvc init
(venv) C:\nlp>
git commit -m "initialize DVC"
  • เธ—เธณเธเธฒเธฃเธ”เธฒเธงเธ™เนŒเน‚เธซเธฅเธ” Dataset เนเธฅเธฐเธ—เธณเธเธฒเธฃ Add เนƒเธ™ DVC เธ”เน‰เธงเธข

mkdir data
(venv) C:\nlp>
wget -P data https://dvc.org/s3/examples/so/Posts.xml.zip
(venv) C:\nlp>
dvc add data/Posts.xml.zip
  • เธ—เธณเธเธฒเธฃ Commit เธเธฒเธฃเน€เธ›เธฅเธตเนˆเธขเธ™เนเธ›เธฅเธ‡เน„เธ›เธขเธฑเธ‡ Git Repository

(venv) C:\nlp>
git add data/Posts.xml.zip.dvc data/.gitignore
(venv) C:\nlp>
git commit -m "add dataset"
  • เธ—เธณเธเธฒเธฃ Run เนƒเธ™ DVC เน€เธžเธทเนˆเธญเธฃเธงเธšเธฃเธงเธกเธ„เธณเธชเธฑเนˆเธ‡เนƒเธ™เนเธ•เนˆเธฅเธฐ Stage

(venv) C:\nlp>
dvc run -d data/Posts.xml.zip ^
        -o data/Posts.xml ^
        -f extract.dvc ^
        unzip data/Posts.xml.zip -d data
  • เธ—เธณเธเธฒเธฃ Convert เน„เธŸเธฅเนŒเธˆเธฒเธ XML เน€เธ›เน‡เธ™ TSV เนƒเธ™ DVC เน€เธžเธทเนˆเธญเธ—เธณ Feature Extraction เน„เธ”เน‰เธ‡เนˆเธฒเธขเธ‚เธถเน‰เธ™

(venv) C:\nlp>
dvc run -d code/xml_to_tsv.py -d data/Posts.xml ^
          -o data/Posts.tsv ^
          -f prepare.dvc ^
          python code/xml_to_tsv.py data/Posts.xml data/Posts.tsv
  • เธ—เธณเธเธฒเธฃ Split Dataset เนƒเธ™ DVC เน€เธžเธทเนˆเธญเนเธšเนˆเธ‡เธ‚เน‰เธญเธกเธนเธฅเธ—เธตเนˆเนƒเธŠเน‰เนƒเธ™เธเธฒเธฃ Training เนเธฅเธฐ Test เน‚เธ”เธขเธเธณเธซเธ™เธ”เนƒเธซเน‰ Test Dataset เธกเธตเธญเธฑเธ•เธฃเธฒเธชเนˆเธงเธ™เน€เธ›เน‡เธ™ 0.2 เนเธฅเธฐเธเธณเธซเธ™เธ”เธ„เนˆเธฒ Seed เนƒเธ™เธเธฒเธฃ Random เน€เธ›เน‡เธ™ 20170426

(venv) C:\nlp>
dvc run -d code/split_train_test.py -d data/Posts.tsv ^
          -o data/Posts-train.tsv -o data/Posts-test.tsv ^
          -f split.dvc ^
          python code/split_train_test.py data/Posts.tsv 0.2 20170426 ^
          data/Posts-train.tsv data/Posts-test.tsv
  • เธ—เธณเธเธฒเธฃ Extract Feature and Label เนƒเธ™ DVC เธ‹เธถเนˆเธ‡เธˆเธฐเน„เธ”เน‰เน„เธŸเธฅเนŒ Pickle

(venv) C:\nlp>
dvc run -d code/featurization.py -d data/Posts-train.tsv -d data/Posts-test.tsv ^
        -o data/matrix-train.pkl -o data/matrix-test.pkl ^
        -f featurize.dvc ^
        python code/featurization.py data/Posts-train.tsv data/Posts-test.tsv ^
        data/matrix-train.pkl data/matrix-test.pkl
  • เธ—เธณเธเธฒเธฃ Train Model เธเธฑเธš Training Dataset เนƒเธ™ DVC

(venv) C:\nlp>
dvc run -d code/train_model.py -d data/matrix-train.pkl ^
        -o data/model.pkl ^
        -f train.dvc ^
python code/train_model.py data/matrix-train.pkl 20170426 data/model.pkl
  • เธ—เธณเธเธฒเธฃ Evaluate Model เธเธฑเธš Test Dataset เนƒเธ™ DVC

dvc run -d code/evaluate.py -d data/model.pkl -d data/matrix-test.pkl ^
          -M auc.metric ^
          -f evaluate.dvc ^
python code/evaluate.py data/model.pkl data/matrix-test.pkl auc.metric
  • เธ—เธณเธเธฒเธฃเธ•เธฃเธงเธˆเธชเธญเธš Accuracy เนƒเธ™ DVC เธ”เน‰เธงเธข Metric

(venv) C:\nlp>
dvc metrics show
auc.metric: AUC: 0.587951
  • เธ—เธณเธเธฒเธฃ Commit เธเธฒเธฃเน€เธ›เธฅเธตเนˆเธขเธ™เนเธ›เธฅเธ‡เน„เธ›เธขเธฑเธ‡ Git Repository

(venv) C:\nlp>
git add *.dvc auc.metric
(venv) C:\nlp>
git commit -am "create pipeline"
  • เธ—เธณเธเธฒเธฃเนเธเน‰เน„เธ‚เน„เธŸเธฅเนŒ code/featurization.py ( เธšเธฃเธฃเธ—เธฑเธ”เธ—เธตเนˆ 72-73 )

(venv) C:\nlp>
notepad code/featurization.py
bag_of_words = CountVectorizer(stop_words='english',
                               max_features=5000,
                               ngram_range=(1, 2))
  • เธ—เธณเธเธฒเธฃ Reproduce เธชเธณเธซเธฃเธฑเธšเธ—เธธเธ Stage เธ‹เธถเนˆเธ‡เธˆเธฐเธ—เธณเนเธšเธš Auto เธซเธฒเธเธกเธตเธเธฒเธฃเนเธเน‰เน„เธ‚เน„เธŸเธฅเนŒ

(venv) C:\nlp>
dvc repro evaluate.dvc
  • เธ—เธณเธเธฒเธฃเธ•เธฃเธงเธˆเธชเธญเธš Accuracy เนƒเธ™ DVC เธ”เน‰เธงเธข Metric เนƒเธ™เธ—เธธเธ Branch

(venv) C:\nlp>
dvc metrics show -a
  • เธ—เธณเธเธฒเธฃ Commit เธเธฒเธฃเน€เธ›เธฅเธตเนˆเธขเธ™เนเธ›เธฅเธ‡เน„เธ›เธขเธฑเธ‡ Git Repository

(venv) C:\nlp>
git add evaluate.dvc auc.metric
(venv) C:\nlp>
git commit -m "add evaluation step to the pipeline
  • เธ—เธณเธเธฒเธฃ Tag เนƒเธ™เธเธฒเธฃเน€เธเน‡เธš Checkpoint เน€เธžเธทเนˆเธญเนƒเธŠเน‰เนƒเธ™เธเธฒเธฃ Compare

(venv) C:\nlp>
git tag -a "baseline-experiment" -m "baseline"
  • เธ—เธณเธเธฒเธฃ Show Pipeline เนเธšเธš ASCII

(venv) C:\nlp>
dvc pipeline show --ascii train.dvc
(venv) C:\nlp>
dvc pipeline show --ascii train.dvc --commands
(venv) C:\nlp>
dvc pipeline show --ascii train.dvc --outs
  • เธˆเธฐเนเธชเธ”เธ‡ Visualize Pipeline เนƒเธ™เนเธšเธš ASCII

เธญเนˆเธฒเธ™เน€เธžเธดเนˆเธกเน€เธ•เธดเธก :

๐Ÿ‘จโ€๐Ÿผ
๐Ÿงค
DVC
https://bit.ly/2FOQM5v