Adaptor API#

class bocoel.models.adaptors.BigBenchAdaptor(*args, **kwargs)[source]#
class bocoel.models.adaptors.BigBenchChoiceType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class bocoel.models.adaptors.BigBenchMatchType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
class bocoel.models.adaptors.BigBenchMultipleChoice(lm: ClassifierModel, inputs: str = 'inputs', multiple_choice_targets: str = 'multiple_choice_targets', multiple_choice_scores: str = 'multiple_choice_scores', choice_type: str | BigBenchChoiceType = BigBenchChoiceType.SUM_OF_SCORES)[source]#
__init__(lm: ClassifierModel, inputs: str = 'inputs', multiple_choice_targets: str = 'multiple_choice_targets', multiple_choice_scores: str = 'multiple_choice_scores', choice_type: str | BigBenchChoiceType = BigBenchChoiceType.SUM_OF_SCORES) None[source]#
evaluate(data: Mapping[str, Any]) Sequence[float] | ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

data – A mapping from column names to the data in that column.

Returns:

The scores for each entry. Scores must be floating point numbers.

static numeric_choices(question: str, choices: Sequence[str]) str[source]#

Convert a multiple choice question into a numeric choice question. Returns a tuple of generated prompt and list of valid choices.

class bocoel.models.adaptors.BigBenchQuestionAnswer(lm: GenerativeModel, inputs: str = 'inputs', targets: str = 'targets', matching_type: str | BigBenchMatchType = BigBenchMatchType.EXACT)[source]#
__init__(lm: GenerativeModel, inputs: str = 'inputs', targets: str = 'targets', matching_type: str | BigBenchMatchType = BigBenchMatchType.EXACT) None[source]#
evaluate(data: Mapping[str, Sequence[Any]]) Sequence[float] | ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

data – A mapping from column names to the data in that column.

Returns:

The scores for each entry. Scores must be floating point numbers.

class bocoel.models.adaptors.AdaptorMapping(adaptors: Mapping[str, Adaptor])[source]#
__init__(adaptors: Mapping[str, Adaptor]) None[source]#
class bocoel.models.adaptors.GlueAdaptor(lm: ClassifierModel, texts: str = 'text', label: str = 'label', label_text: str = 'label_text', choices: Sequence[str] = ('negative', 'positive'))[source]#

The adaptor for the glue dataset provided by setfit.

Glue is a collection of datasets for natural language understanding tasks. The datasets are designed to be challenging and diverse, and they are collected from a variety of sources. They are mostly sentence-level classification tasks.

This adaptor is compatible with all classifier models, and it is designed to work with the glue dataset (in the format of setfit datasets on huggingface datasets).

Setfit datasets have the following columns:

  • text: The text to classify.

  • label: The label of the text.

  • label_text: The text of the label.

__init__(lm: ClassifierModel, texts: str = 'text', label: str = 'label', label_text: str = 'label_text', choices: Sequence[str] = ('negative', 'positive')) None[source]#

Initialize the adaptor.

Parameters:
  • lm – The language model to use for classification.

  • texts – The column name for the text to classify.

  • label – The column name for the label of the text.

  • label_text – The column name for the text of the label.

  • choices – The valid choices for the label.

evaluate(data: Mapping[str, Sequence[Any]]) Sequence[float] | ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

data – A mapping from column names to the data in that column.

Returns:

The scores for each entry. Scores must be floating point numbers.

static task_choices(name: Literal['sst2', 'mrpc', 'mnli', 'qqp', 'rte', 'qnli'], split: Literal['train', 'validation', 'test']) Sequence[str][source]#

Get the valid choices for a particular task and split.

Parameters:
  • name – The name of the task.

  • split – The split of the task.

Returns:

The valid choices for the task and split.

class bocoel.models.adaptors.Sst2QuestionAnswer(lm: ClassifierModel, sentence: str = 'sentence', label: str = 'label', choices: Sequence[str] = ('negative', 'positive'))[source]#

The adaptor for the SST-2 dataset. This adaptor assumes that the dataset has the following columns: - idx: The index of the entry. - sentence: The sentence to classify. - label: The label of the sentence.

Each entry in the dataset must be a single sentence.

__init__(lm: ClassifierModel, sentence: str = 'sentence', label: str = 'label', choices: Sequence[str] = ('negative', 'positive')) None[source]#
Parameters:
  • lm – The language model to use for classification.

  • sentence – The column name for the sentence to classify.

  • label – The column name for the label of the sentence.

  • choices – The valid choices for the label.

evaluate(data: Mapping[str, Sequence[Any]]) Sequence[float] | ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

data – A mapping from column names to the data in that column.

Returns:

The scores for each entry. Scores must be floating point numbers.

class bocoel.models.adaptors.Adaptor(*args, **kwargs)[source]#

Adaptors are the glue between scores and the corpus. It is designed to handle running a particular score on a particular corpus / dataset.

abstract evaluate(data: Mapping[str, Sequence[Any]]) Sequence[float] | ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

data – A mapping from column names to the data in that column.

Returns:

The scores for each entry. Scores must be floating point numbers.

on_storage(storage: Storage, indices: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:
  • storage – The storage to evaluate.

  • indices – The indices to evaluate.

Returns:

The scores for each entry. The shape must be the same as the indices.

on_corpus(corpus: Corpus, indices: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) ndarray[Any, dtype[_ScalarType_co]][source]#

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:
  • corpus – The corpus to evaluate.

  • indices – The indices to evaluate.

Returns:

The scores for each entry. The shape must be the same as the indices.

__init__(*args, **kwargs)#
class bocoel.models.adaptors.AdaptorBundle(*args, **kwargs)[source]#
__init__(*args, **kwargs)#