Automatic plagiarism Search for copies of your page on the web via copyscape API


 

It is often necessary to check double content plagiarism before a post is published online.
Here is an example of an   copyscape API request via python. Django command.
 

 COPYSCAPE_API_URL = "http://www.copyscape.com/api/"

class Command(BaseCommand):
    user = ""
    key = ""
    timeout = 1
    def handle(self, *args, **options):
        pages = StaticPage.objects.filter(Q(html_text__isnull=True) | Q(html_text__exact='')  | Q(has_copyscape_check=False)).exclude(Q(raw_text__isnull=True) | Q(raw_text__exact=''))
        print(pages.count())
        if pages.count() > 0:
            for page in pages:
                self.copyscape(page)
                #page.save()

    def copyscape(self, page):
        payload = {"u": self.user, "k":self.key, "o": "csearch", "e":"UTF-8", "t": page.raw_text}
        response = requests.post(COPYSCAPE_API_URL, data=payload)
        print(response.text)
        import pdb;pdb.set_trace()

 

Example output:

 

<?xml version="1.0" encoding="utf-8"?>

<response>

    <querywords>581</querywords>

    <cost>0.07</cost>

    <count>1</count>

    <result>

       <index>1</index>

       <url>https://www.autoscout24.be/nl/auto/audi/</url>

       <title>Audi tweedehands &amp; goedkoop via AutoScout24.be kopen</title>

       <textsnippet>... de onderneming eerst naar Reichenbach (Saksen) en daarna naar Zwickau. ... 30 was de Auto Union AG de belangrijkste autoleverancier van het leger.... Het huidige productenpalet omvat vooral auto's voor de hoge  middenklasse. Vooral op het gebied van design is Audi wereldwijd nummer 1, want de auto's </textsnippet>

       <htmlsnippet>&lt;font color=&quot;#777777&quot;&gt;... de onderneming &lt;/font&gt;&lt;font color=&quot;#000000&quot;&gt;eerst naar Reichenbach (Saksen) en &lt;/font&gt;&lt;font color=&quot;#777777&quot;&gt;daarna naar Zwickau. ... 30 was de &lt;/font&gt;&lt;font color=&quot;#000000&quot;&gt;Auto Union AG de belangrijkste autoleverancier van het leger.&lt;/font&gt;&lt;font color=&quot;#777777&quot;&gt;... Het huidige productenpalet omvat vooral &lt;/font&gt;&lt;font color=&quot;#000000&quot;&gt;auto's voor de &lt;/font&gt;&lt;font color=&quot;#777777&quot;&gt;hoge  &lt;/font&gt;&lt;font color=&quot;#000000&quot;&gt;middenklasse. Vooral op het gebied van design is Audi &lt;/font&gt;&lt;font color=&quot;#777777&quot;&gt;wereldwijd nummer 1, want de auto's &lt;/font&gt;</htmlsnippet>

       <minwordsmatched>26</minwordsmatched>

       <viewurl>http://view2.copyscape.com/compare/ahjg2a9c9x/1</viewurl>

    </result>

    <allviewurl>http://view2.copyscape.com/search/ahjg2a9c9x</allviewurl>

</response>


--Return--

> /Users/sergejdergatsjev/Documents/auto-verkopen-belgie.com/auto-verkopen-belgie/djangopageadmin/page/management/commands/copyscape.py(41)copyscape()->None

-> import pdb;pdb.set_trace()

(Pdb) n

> /Users/sergejdergatsjev/Documents/auto-verkopen-belgie.com/auto-verkopen-belgie/djangopageadmin/page/management/commands/copyscape.py(33)handle()

-> for page in pages:

(Pdb) c

<?xml version="1.0" encoding="utf-8"?>

<response>

    <querywords>620</querywords>

    <cost>0.08</cost>

    <count>0</count>

    <allviewurl>http://view2.copyscape.com/search/qnuilr1y64</allviewurl>

</response>

 



So if count is 0 then it is valid and in other case you should save snippet and mark this page as a duplicate and have it rewritten.

Comments