Datos Tabulares
Entrena modelos en datos estructurados como CSVs y bases de datos.
Inicio Rápido
aitraining tabular --train \
--model xgboost \
--data-path ./data.csv \
--target-columns target \
--valid-split validation \
--project-name tabular-model
El entrenamiento tabular requiere --valid-split para especificar una división de validación en tus datos.
Tipos de Tareas
Clasificación
Predecir etiquetas categóricas:
aitraining tabular --train \
--model xgboost \
--data-path ./customers.csv \
--target-columns churn \
--valid-split validation \
--task classification \
--project-name churn-predictor
Regresión
Predecir valores continuos:
aitraining tabular --train \
--model xgboost \
--data-path ./houses.csv \
--target-columns price \
--valid-split validation \
--task regression \
--project-name price-predictor
Parámetros
| Parameter | Description | Default |
|---|
--model | Model type | xgboost |
--data-path | Path to CSV/data | None (required) |
--project-name | Output directory | project-name |
--target-columns | Target variable(s) | ["target"] |
--task | classification/regression | classification |
--train-split | Training data split | train |
--valid-split | Validation data split | None (required) |
--id-column | ID column to exclude | id |
--categorical-columns | Categorical features | None |
--numerical-columns | Numerical features | None |
--num-trials | Number of hyperparameter trials | 10 |
--time-limit | Time limit in seconds | 600 |
--seed | Random seed | 42 |
Modelos Disponibles
| Model | Classification | Regression |
|---|
xgboost | Yes | Yes |
random_forest | Yes | Yes |
extra_trees | Yes | Yes |
gradient_boosting | Yes | Yes |
adaboost | Yes | Yes |
decision_tree | Yes | Yes |
logistic_regression | Yes | Yes |
ridge | Yes | Yes |
svm | Yes | Yes |
knn | Yes | Yes |
naive_bayes | Yes | No |
lasso | No | Yes |
linear_regression | No | Yes |
feature1,feature2,feature3,target
1.5,category_a,100,1
2.3,category_b,200,0
Manejo de Features
Especifica columnas categóricas:
aitraining tabular --train \
--model gradient_boosting \
--data-path ./data.csv \
--target-columns target \
--valid-split validation \
--categorical-columns "color,size,region" \
--project-name model
Excluir columnas de ID:
aitraining tabular --train \
--model xgboost \
--data-path ./data.csv \
--target-columns target \
--valid-split validation \
--id-column customer_id \
--project-name model
Ejemplos
Abandono de Clientes
aitraining tabular --train \
--model xgboost \
--data-path ./customers.csv \
--target-columns churned \
--valid-split validation \
--id-column customer_id \
--task classification \
--project-name churn-model
Predicción de Precio de Casa
aitraining tabular --train \
--model gradient_boosting \
--data-path ./houses.csv \
--target-columns sale_price \
--valid-split validation \
--id-column house_id \
--task regression \
--project-name house-prices
Clasificación Multi-Clase
aitraining tabular --train \
--model extra_trees \
--data-path ./products.csv \
--target-columns category \
--valid-split validation \
--categorical-columns "brand,color,material" \
--task classification \
--project-name product-classifier
Comparación de Modelos
Para comparar diferentes modelos en tus datos:
# Train XGBoost
aitraining tabular --train \
--model xgboost \
--data-path ./data.csv \
--target-columns target \
--valid-split validation \
--project-name model-xgb
# Train Random Forest
aitraining tabular --train \
--model random_forest \
--data-path ./data.csv \
--target-columns target \
--valid-split validation \
--project-name model-rf
# Train Gradient Boosting
aitraining tabular --train \
--model gradient_boosting \
--data-path ./data.csv \
--target-columns target \
--valid-split validation \
--project-name model-gb
Salida
Después del entrenamiento, encontrarás:
project-name/
├── model.joblib # Trained model
├── metrics.json # Evaluation metrics
├── feature_importance.json
└── config.yaml # Training config
Cargar Modelos
import joblib
model = joblib.load("project-name/model.joblib")
predictions = model.predict(new_data)
Próximos Pasos