How to select first 10 words of a sentence?

PhpStringSubstringTrim

Php Problem Overview


How do I, from an output, only select the first 10 words?

Php Solutions


Solution 1 - Php

implode(' ', array_slice(explode(' ', $sentence), 0, 10));

To add support for other word breaks like commas and dashes, preg_match gives a quick way and doesn't require splitting the string:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

As Pebbl mentions, PHP doesn't handle UTF-8 or Unicode all that well, so if that is a concern then you can replace \w for [^\s,\.;\?\!] and \W for [\s,\.;\?\!].

Solution 2 - Php

Simply splitting on spaces will function incorrectly if there is an unexpected character in place of a space in the sentence structure, or if the sentence contains multiple conjoined spaces.

The following version will work no matter what kind of "space" you use between words and can be easily extended to handle other characters... it currently supports any white space character plus , . ; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

Regular expressions are perfect for this issue, because you can easily make the code as flexible or strict as you like. You do have to be careful however. I specifically approached the above targeting the gaps between words — rather than the words themselves — because it is rather difficult to state unequivocally what will define a word.

Take the \w word boundary, or its inverse \W. I rarely rely on these, mainly because — depending on the software you are using (like certain versions of PHP) — they don't always include UTF-8 or Unicode characters.

In regular expressions it is better to be specific, at all times. So that your expressions can handle things like the following, no matter where they are rendered:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

Avoiding splitting could be worthwhile however, in terms of performance. So you could use Kelly's updated approach but switch \w for [^\s,\.;\?\!]+ and \W for [\s,\.;\?\!]+. Although, personally I like the simplicity of the splitting expression used above, it is easier to read and therefore modify. The stack of PHP functions however, is a bit ugly :)

Solution 3 - Php

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
	$retval = $string;	//	Just in case of a problem
	$array = explode(" ", $string);
	/*  Already short enough, return the whole thing*/
	if (count($array)<=$wordsreturned)
	{
		$retval = $string;
	}
	/*  Need to chop of some words*/
	else
	{
		array_splice($array, $wordsreturned);
		$retval = implode(" ", $array)." ...";
	}
	return $retval;
}

Solution 4 - Php

I suggest to use str_word_count:

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

The above example will output:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

The use a loop to get the words you want.

Source: http://php.net/str_word_count

Solution 5 - Php

To select 10 words of the given text you can implement following function:

function first_words($text, $count=10)
{
    $words = explode(' ', $text);

    $result = '';
    for ($i = 0; $i < $count && isset($words[$i]); $i++) {
        $result .= $words[$i];
    }

    return $result;
}

Solution 6 - Php

This can easily be done using str_word_count()

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));

Solution 7 - Php

This might help you. Function to return N no. of words

public function getNWordsFromString($text,$numberOfWords = 6)
{
    if($text != null)
    {
        $textArray = explode(" ", $text);
        if(count($textArray) > $numberOfWords)
        {
            return implode(" ",array_slice($textArray, 0, $numberOfWords))."...";
        }
        return $text;
    }
    return "";
    }
}

Solution 8 - Php

Try this

$str = 'Lorem ipsum dolor sit amet,consectetur adipiscing elit. Mauris ornare luctus diam sit amet mollis.';
 $arr = explode(" ", str_replace(",", ", ", $str));
 for ($index = 0; $index < 10; $index++) {
 echo $arr[$index]. " ";
}

I know this is not time to answer , but let the new comers choose their own answers.

Solution 9 - Php

It is totally what we are searching Just cut n pasted into your program and ran.

function shorten_string($string, $wordsreturned)
/*  Returns the first $wordsreturned out of $string.  If string
contains fewer words than $wordsreturned, the entire string
is returned.
*/
{
$retval = $string;      //  Just in case of a problem

$array = explode(" ", $string);
if (count($array)<=$wordsreturned)
/*  Already short enough, return the whole thing
*/
{
$retval = $string;
}
else
/*  Need to chop of some words
*/
{
array_splice($array, $wordsreturned);
$retval = implode(" ", $array)." ...";
}
return $retval;
}

and just call the function in your block of code just as

$data_itr = shorten_string($Itinerary,25);

Solution 10 - Php

I do it this way:

function trim_by_words($string, $word_count = 10) {
    $string = explode(' ', $string);
    if (empty($string) == false) {
        $string = array_chunk($string, $word_count);
        $string = $string[0];
    }
    $string = implode(' ', $string);
    return $string;
}

Its UTF8 compatible...

Solution 11 - Php

This might help you. Function to return 10 no. of words.

function num_of_word($text,$numb) {
 $wordsArray = explode(" ", $text);
 $parts = array_chunk($wordsArray, $numb);
 
 $final = implode(" ", $parts[0]);
 
 if(isset($parts[1]))
     $final = $final." ...";
 return $final;
 return;
 }
echo num_of_word($text, 10);

Solution 12 - Php

	function get_first_num_of_words($string, $num_of_words)
	{
	    $string = preg_replace('/\s+/', ' ', trim($string));
	    $words = explode(" ", $string); // an array

        // if number of words you want to get is greater than number of words in the string
	    if ($num_of_words > count($words)) {
            // then use number of words in the string
	    	$num_of_words = count($words);
	    }

	    $new_string = "";
	    for ($i = 0; $i < $num_of_words; $i++) {
	    	$new_string .= $words[$i] . " ";
	    }

	    return trim($new_string);
	}

Use it like this:

echo get_first_num_of_words("Lorem ipsum dolor sit amet consectetur adipisicing elit. Aliquid, illo?", 5);

Output: Lorem ipsum dolor sit amet

This function also works very well with unicode characters like Arabic characters.

echo get_first_num_of_words("نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.", 100);

Output: نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAAAView Question on Stackoverflow
Solution 1 - PhpKellyView Answer on Stackoverflow
Solution 2 - PhpPebblView Answer on Stackoverflow
Solution 3 - PhpSpyrosView Answer on Stackoverflow
Solution 4 - PhpjawiraView Answer on Stackoverflow
Solution 5 - PhpMilad RahimiView Answer on Stackoverflow
Solution 6 - PhpRowlingsoView Answer on Stackoverflow
Solution 7 - PhpAnkur RastogiView Answer on Stackoverflow
Solution 8 - Phpsaleem ahmedView Answer on Stackoverflow
Solution 9 - PhpRizwan GillView Answer on Stackoverflow
Solution 10 - PhpVaciView Answer on Stackoverflow
Solution 11 - PhprowmoinView Answer on Stackoverflow
Solution 12 - PhpAmrView Answer on Stackoverflow