Tags down


Bert sentence embeddings

By : Tolky
Date : October 01 2020, 03:00 AM
this will help Im trying to obtain sentence embeddings for Bert but Im not quite sure if Im doing it properly... and yes Im aware that exist such tools already such as bert-as-service but I want to do it myself and understand how it works. , as I know, BERT had a comment line in its source code:
code :

Share : facebook icon twitter icon

How to use pre-trained BERT model for next sentence labeling?

By : user2312010
Date : March 29 2020, 07:55 AM
wish helps you The answer is to use weights, what was used nor next sentence trainings, and logits from there. So, to use Bert for nextSentence input two sentences in a format used for training:
code :
def convert_single_example(ex_index, example, label_list, max_seq_length,
"""Converts a single `InputExample` into a single `InputFeatures`."""
label_map = {}
for (i, label) in enumerate(label_list):
    label_map[label] = i

tokens_a = tokenizer.tokenize(example.text_a)
tokens_b = None
if example.text_b:
    tokens_b = tokenizer.tokenize(example.text_b)

if tokens_b:
    # Modifies `tokens_a` and `tokens_b` in place so that the total
    # length is less than the specified length.
    # Account for [CLS], [SEP], [SEP] with "- 3"
    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
    # Account for [CLS] and [SEP] with "- 2"
    if len(tokens_a) > max_seq_length - 2:
        tokens_a = tokens_a[0:(max_seq_length - 2)]

# The convention in BERT is:
# (a) For sequence pairs:
#  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
#  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1
# (b) For single sequences:
#  tokens:   [CLS] the dog is hairy . [SEP]
#  type_ids: 0     0   0   0  0     0 0
# Where "type_ids" are used to indicate whether this is the first
# sequence or the second sequence. The embedding vectors for `type=0` and
# `type=1` were learned during pre-training and are added to the wordpiece
# embedding vector (and position vector). This is not *strictly* necessary
# since the [SEP] token unambiguously separates the sequences, but it makes
# it easier for the model to learn the concept of sequences.
# For classification tasks, the first vector (corresponding to [CLS]) is
# used as as the "sentence vector". Note that this only makes sense because
# the entire model is fine-tuned.
tokens = []
segment_ids = []
for token in tokens_a:

if tokens_b:
    for token in tokens_b:

input_ids = tokenizer.convert_tokens_to_ids(tokens)

# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)

# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:

assert len(input_ids) == max_seq_length
assert len(input_mask) == max_seq_length
assert len(segment_ids) == max_seq_length

label_id = label_map[example.label]
if ex_index < 5:
    tf.logging.info("*** Example ***")
    tf.logging.info("guid: %s" % (example.guid))
    tf.logging.info("tokens: %s" % " ".join(
        [tokenization.printable_text(x) for x in tokens]))
    tf.logging.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
    tf.logging.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))
    tf.logging.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids]))
    tf.logging.info("label: %s (id = %d)" % (example.label, label_id))

feature = InputFeatures(
return feature
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
             labels, num_labels, use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(

# In the demo, we are doing a simple classification task on the entire
# segment.
# If you want to use the token-level output, use model.get_sequence_output()
# instead.
output_layer = model.get_pooled_output()

hidden_size = output_layer.shape[-1].value

with tf.variable_scope("cls/seq_relationship"):
    output_weights = tf.get_variable(
        "output_weights", [num_labels, hidden_size])

    output_bias = tf.get_variable(
        "output_bias", [num_labels])

with tf.variable_scope("loss"):
    if is_training:
        # I.e., 0.1 dropout
        output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)

Finetune Text embeddings using BERT?

By : Rianz
Date : March 29 2020, 07:55 AM
help you fix your problem If you are using the original BERT repository published by Google, all layers are trainable; meaning: no freezing at all. You can check that by printing tf.trainable_variables().

BERT sentence embedding by summing last 4 layers

By : user3274747
Date : March 29 2020, 07:55 AM
I hope this helps . You create a list using a list comprehension that iterates over token_embeddings. It is a list that contains one tensor per token - not one tensor per layer as you probably thought (judging from your for layer in token_embeddings). You thus get a list with a length equal to the number of tokens. For each token, you have a vector that is a sum of BERT embeddings from the last 4 layers.
More efficient would be avoiding the explicit for loops and list comprehenions:
code :
summed_last_4_layers = torch.stack(encoded_layers[-4:]).sum(0)

Using BERT for next sentence prediction

By : Maria Panteghini
Date : March 29 2020, 07:55 AM
hop of those help? Hugging face did it for you: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L854
code :
class BertForNextSentencePrediction(BertPreTrainedModel):
    """BERT model with next sentence prediction head.
    This module comprises the BERT model followed by the next sentence classification head.
        config: a BertConfig class instance with the configuration to build a new model.
        `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
            with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
            `extract_features.py`, `run_classifier.py` and `run_squad.py`)
        `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
            types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
            a `sentence B` token (see BERT paper for more details).
        `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
            selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
            input sequence length in the current batch. It's the mask that we typically use for attention when
            a batch has varying length sentences.
        `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
            with indices selected in [0, 1].
            0 => next sentence is the continuation, 1 => next sentence is a random sentence.
        if `next_sentence_label` is not `None`:
            Outputs the total_loss which is the sum of the masked language modeling loss and the next
            sentence classification loss.
        if `next_sentence_label` is `None`:
            Outputs the next sentence classification logits of shape [batch_size, 2].
    Example usage:
    # Already been converted into WordPiece token ids
    input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
    input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
    token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
    config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
        num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
    model = BertForNextSentencePrediction(config)
    seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
    def __init__(self, config):
        super(BertForNextSentencePrediction, self).__init__(config)
        self.bert = BertModel(config)
        self.cls = BertOnlyNSPHead(config)

    def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
        _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
        seq_relationship_score = self.cls( pooled_output)

        if next_sentence_label is not None:
            loss_fct = CrossEntropyLoss(ignore_index=-1)
            next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
            return next_sentence_loss
            return seq_relationship_score

How to train a neural network model with bert embeddings instead of static embeddings like glove/fasttext?

By : Numa Neto
Date : March 29 2020, 07:55 AM
Hope this helps If you are using Pytorch. You can use https://github.com/huggingface/pytorch-pretrained-BERT which is the most popular BERT implementation for Pytorch (it is also a pip package!). Here I'm just going to outline how to use it properly.
For this particular problem there are 2 approaches - where you obviously cannot use the Embedding layer:
code :
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel

batch_size = 32
X_train, y_train = samples_from_file('train.csv') # Put your own data loading function here
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
X_train = [tokenizer.tokenize('[CLS] ' + sent + ' [SEP]') for sent in X_train] # Appending [CLS] and [SEP] tokens - this probably can be done in a cleaner way
bert_model = BertModel.from_pretrained('bert-base-uncased')
bert_model = bert_model.cuda()

X_train_tokens = [tokenizer.convert_tokens_to_ids(sent) for sent in X_train]
results = torch.zeros((len(X_test_tokens), bert_model.config.hidden_size)).long()
with torch.no_grad():
    for stidx in range(0, len(X_test_tokens), batch_size):
        X = X_test_tokens[stidx:stidx + batch_size]
        X = torch.LongTensor(X).cuda()
        _, pooled_output = bert_model(X)
        results[stidx:stidx + batch_size,:] = pooled_output.cpu()
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', labels=num_labels) # Where num_labels is the number of labels you need to classify.
Related Posts Related Posts :
  • How to monitor windows manchine in grafana using prometheus?
  • Produce new word2vec model from existing one
  • Migrating Rails from Asset Pipeline to Webpacker: Uncaught ReferenceError: $ is not defined in rails-ujs.js
  • Extract lines with string and variable number pattern
  • Configuration priority - best practise
  • WebAssembly dynamic module unloading
  • Call SWS Via Sabre Red Workspace From Native API Bridge Application
  • How to set query timeout when using Presto CLI?
  • What's the difference between agent.add() and conv.ask() on dialogflow
  • Pymodbus - Read input register of Energy meter over rs485 on uart of raspberry pi3
  • Execute bash script on a dataproc cluster from a composer
  • Gremlin: select vertex based on comparison of two property values
  • How do you createRef in Suave Fable?
  • I am having trouble building Azerothcore on Windows 10 Home, VS 2017
  • Why is testcafe-docker.sh ignoring app-init-delay parameter?
  • DynamoDB Adjacency List Pattern
  • Is there a way for my aplication to detect beacons in Powerapps?
  • "Initialize interactive with Project" is missing for .Net Core Projects in Visual Studio 2019
  • Cosmos db Order by on 'computed field'
  • let a rpm to automatically install centos-release-scl-rh
  • What is the "Stage" folder inside MarkLogic Installed Directory? How does MarkLogic use this folder?
  • Implement requestHooks in cucumber/testCafe
  • Jhipster: How can I only generate a back-end microservice application
  • Building a database of average speed from two cameras using cloudant entries
  • Move file from inbound adapter after publish subscribe flow
  • Is there enough of a difference between WebSphere 8.5.5 on Linux vs Windows to warrant testing our application in WebSph
  • Wait some seconds before agent's reply
  • Is there a Apache Beam + Cloud Bigtable connector in Golang?
  • How I can convert ampl file to cplex?
  • Is there a description of the mecab (Japanese word parser) algorithm?
  • CALL SYMPUT a character operand was found in the %EVAL function
  • Problem 1 Write the PRETTY-PRINT procedure, which takes one argument (a generalized list), and prints it using the follo
  • How to get the merchant, where a NFC-enabled pass is used?
  • Determine RFC caller?
  • Does appium-dotnet-driver support .net core 2.x?
  • Error:Internal error: (java.lang.ClassNotFoundException) com.google.wireless.android.sdk.stats.IntellijIndexingStats$Ind
  • RxJS do not throw error while mapping even when underlying observable throws error
  • What is the difference between last and publishLast operator in rxJS?
  • Displaying Select Box from enum data
  • How to disable and hide the pagination footer for react-table?
  • Airflow 1.10.3 SubDag can only run 1 task in parallel even the concurrency is 8
  • Red Hat Fuse ESB Community vs Enterprise edition
  • Map subtask_id to TaskManager in Flink
  • Why do we need semaphores on single cpu?
  • appRole defined in AzureAD application not being included for guest user of type "External Azure Active Directory&q
  • Angular material mat menu styling issue
  • OctoberCMS from input to databse
  • cloud function with pub sub trigger does not work across regions
  • Eventlistener for paper-dropdown-menu in Lit-html
  • Combining the elements of array and reformatting the output
  • How do i generate Agent Credentials for Bosch IoT Permissions?
  • Unable to interact with the ledger (invoke and query only happening on world state (couchdb))
  • Kentico 12 MVC - Customize BizForm response
  • AutoHotkey: list all open windows
  • Docompose tag by its content/text
  • Make concat_lines_of( ) work for rawstring
  • Naming steps as Tasks vs Statuses in Process Design
  • Why is a true value rendered as "value"?
  • JSON Validate check based on response from arrayElement
  • Is it posible to have multiple grapesjs instances on the same page?
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk