PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS’ design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.
Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.
pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).
pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).
pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (
Malkov et al., TPAMI 2018).
PECOS can be installed using pip as follows:
python3 -m pip install libpecos
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y groupinstall 'Development Tools'
One needs to install at least one BLAS library to compile PECOS, e.g.
sudo apt-get install -y libopenblas-dev
sudo amazon-linux-extras install epel -y sudo yum install openblas-devel -y
git clone https://github.com/amzn/pecos cd pecos python3 -m pip install --editable ./
To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.
The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices
X, of shape
N by Din
SciPy CSR format
Y, of shape
N by Lin
SciPy CSR format
Some toy data matrices are available in the
PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):
>>> from pecos.xmc.xlinear.model import XLinearModel >>> from pecos.xmc import Indexer, LabelEmbeddingFactory # Build hierarchical label tree and train a XR-Linear model >>> label_feat = LabelEmbeddingFactory.create(Y, X) >>> cluster_chain = Indexer.gen(label_feat) >>> model = XLinearModel.train(X, Y, C=cluster_chain) >>> model.save("./save-models")
After learning the model, we do prediction and evaluation
>>> from pecos.utils import smat_util >>> Yt_pred = model.predict(Xt) # print precision and recall at k=10 >>> print(smat_util.Metrics.generate(Yt, Yt_pred))
PECOS also offers optimized C++ implementation for fast real-time inference
>>> model = XLinearModel.load("./save-models", is_predict_only=True) >>> for i in range(X_tst.shape): >>> y_tst_pred = model.predict(X_tst[i], threads=1)
If you find PECOS useful, please consider citing the following paper:
Some papers from our group using PECOS:
Copyright (2021) Amazon.com, Inc.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.