Skip to content

Latest commit

 

History

History
107 lines (72 loc) · 2.54 KB

using-trieve-vector-inference.mdx

File metadata and controls

107 lines (72 loc) · 2.54 KB
title description icon
Install Trieve vector inference
Install Trieve Vector Inference
files

Installation Requirements

You'll also need a license to run Trieve Vector Inference

Getting your license

(contact us here)

Check AWS quota

Ensure you have quotas for

  1. At least 4 vCPUs for On-Demand G and VT instances in the region of choice.

Check quota for us-east-2 here

  • At least **1 load-balancer per each model you want.

Check quota for us-east-2 here

Deploying the Cluster

Setting up environment variables

Create eks cluster and install needed plugins

Your AWS Account ID:

export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)"

Your AWS REGION:

export AWS_REGION=us-east-2

Your Kubernetes cluster name:

export CLUSTER_NAME=trieve-gpu

Your machine types, we recommend g4dn.xlarge, as it is the cheapest on AWS. A single small node is needed for extra utility.

export CPU_INSTANCE_TYPE=t3.small
export GPU_INSTANCE_TYPE=g4dn.xlarge
export GPU_COUNT=1

Create your cluster

curl ./create_cluster.sh | sh

This will take around 25 minutes to complete

Install Trieve Vector Inference

Specify your embedding models

Modify embedding_models.yaml for the models that you want to use

Install the helm chart

helm upgrade -i vector-inference \
    oci://709825985650.dkr.ecr.us-east-1.amazonaws.com/trieve/trieve-embeddings  \
    -f embedding_models.yaml

Get your model endpoints

kubectl get ingress

Using Trieve Vector Inference

curl -X POST   -H "Content-Type: application/json"   -d '{"inputs": "cancer" ,"model": "en"}

Optional: Delete the cluster

cluster_name=trieve-gpu
region=us-east-2

helm uninstall vector-release
helm uninstall nvdp -n kube-system
helm uninstall aws-load-balancer-controller -n kube-system
eksctl delete cluster --region=${REGION} --name=${CLUSTER_NAME}