Линейная регрессия с фреймворком pandas

У меня есть dataframe в pandas, который я использую для создания диаграммы рассеяния, и хочу включить регрессионную линию для графика. Сейчас я пытаюсь сделать это с помощью polyfit.

Здесь мой код:

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from numpy import *

table1 = pd.DataFrame.from_csv('upregulated_genes.txt', sep='\t', header=0, index_col=0)
table2 = pd.DataFrame.from_csv('misson_genes.txt', sep='\t', header=0, index_col=0)
table1 = table1.join(table2, how='outer')

table1 = table1.dropna(how='any')
table1 = table1.replace('#DIV/0!', 0)

# scatterplot
plt.scatter(table1['log2 fold change misson'], table1['log2 fold change'])
plt.ylabel('log2 expression fold change')
plt.xlabel('log2 expression fold change Misson et al. 2005')
plt.title('Root Early Upregulated Genes')
plt.axis([0,12,-5,12])

# this is the part I'm unsure about
regres = polyfit(table1['log2 fold change misson'], table1['log2 fold change'], 1)

plt.show()

Но я получаю следующую ошибку:

TypeError: cannot concatenate 'str' and 'float' objects

Кто-нибудь знает, где я здесь ошибся? Я также не уверен, как добавить линию регрессии к моему заговору. Любые другие общие комментарии к моему коду также были бы очень оценены, я все еще новичок.

from scipy.stats import linregress def fit_line1(x, y): """Return slope, intercept of best fit line.""" # Remove entries where either x or y is NaN. clean_data = pd.concat([x, y], 1).dropna(0) # row-wise (_, x), (_, y) = clean_data.iteritems() slope, intercept, r, p, stderr = linregress(x, y) return slope, intercept # could also return stderr import statsmodels.api as sm def fit_line2(x, y): """Return slope, intercept of best fit line.""" X = sm.add_constant(x) model = sm.OLS(y, X, missing='drop') # ignores entires where x or y is NaN fit = model.fit() return fit.params[1], fit.params[0] # could also return stderr in each via fit.bse

Ответ 1