Pular para o conteúdo principal

Dados Tabulares

Treine modelos em dados estruturados como CSVs e bancos de dados.

Início Rápido

aitraining tabular --train \
  --model xgboost \
  --data-path ./data.csv \
  --target-columns target \
  --valid-split validation \
  --project-name tabular-model
O treinamento tabular requer --valid-split para especificar uma divisão de validação nos seus dados.

Tipos de Tarefas

Classificação

Prever rótulos categóricos:
aitraining tabular --train \
  --model xgboost \
  --data-path ./customers.csv \
  --target-columns churn \
  --valid-split validation \
  --task classification \
  --project-name churn-predictor

Regressão

Prever valores contínuos:
aitraining tabular --train \
  --model xgboost \
  --data-path ./houses.csv \
  --target-columns price \
  --valid-split validation \
  --task regression \
  --project-name price-predictor

Parâmetros

ParameterDescriptionDefault
--modelModel typexgboost
--data-pathPath to CSV/dataNone (required)
--project-nameOutput directoryproject-name
--target-columnsTarget variable(s)["target"]
--taskclassification/regressionclassification
--train-splitTraining data splittrain
--valid-splitValidation data splitNone (required)
--id-columnID column to excludeid
--categorical-columnsCategorical featuresNone
--numerical-columnsNumerical featuresNone
--num-trialsNumber of hyperparameter trials10
--time-limitTime limit in seconds600
--seedRandom seed42

Modelos Disponíveis

ModelClassificationRegression
xgboostYesYes
random_forestYesYes
extra_treesYesYes
gradient_boostingYesYes
adaboostYesYes
decision_treeYesYes
logistic_regressionYesYes
ridgeYesYes
svmYesYes
knnYesYes
naive_bayesYesNo
lassoNoYes
linear_regressionNoYes

Formato dos Dados

Formato CSV

feature1,feature2,feature3,target
1.5,category_a,100,1
2.3,category_b,200,0

Tratamento de Features

Especifique colunas categóricas:
aitraining tabular --train \
  --model gradient_boosting \
  --data-path ./data.csv \
  --target-columns target \
  --valid-split validation \
  --categorical-columns "color,size,region" \
  --project-name model
Excluir colunas de ID:
aitraining tabular --train \
  --model xgboost \
  --data-path ./data.csv \
  --target-columns target \
  --valid-split validation \
  --id-column customer_id \
  --project-name model

Exemplos

Churn de Clientes

aitraining tabular --train \
  --model xgboost \
  --data-path ./customers.csv \
  --target-columns churned \
  --valid-split validation \
  --id-column customer_id \
  --task classification \
  --project-name churn-model

Predição de Preço de Casa

aitraining tabular --train \
  --model gradient_boosting \
  --data-path ./houses.csv \
  --target-columns sale_price \
  --valid-split validation \
  --id-column house_id \
  --task regression \
  --project-name house-prices

Classificação Multi-Classe

aitraining tabular --train \
  --model extra_trees \
  --data-path ./products.csv \
  --target-columns category \
  --valid-split validation \
  --categorical-columns "brand,color,material" \
  --task classification \
  --project-name product-classifier

Comparação de Modelos

Para comparar diferentes modelos nos seus dados:
# Train XGBoost
aitraining tabular --train \
  --model xgboost \
  --data-path ./data.csv \
  --target-columns target \
  --valid-split validation \
  --project-name model-xgb

# Train Random Forest
aitraining tabular --train \
  --model random_forest \
  --data-path ./data.csv \
  --target-columns target \
  --valid-split validation \
  --project-name model-rf

# Train Gradient Boosting
aitraining tabular --train \
  --model gradient_boosting \
  --data-path ./data.csv \
  --target-columns target \
  --valid-split validation \
  --project-name model-gb

Saída

Após o treinamento, você encontrará:
project-name/
├── model.joblib        # Trained model
├── metrics.json        # Evaluation metrics
├── feature_importance.json
└── config.yaml         # Training config

Carregando Modelos

import joblib

model = joblib.load("project-name/model.joblib")
predictions = model.predict(new_data)

Próximos Passos